Home

Introduction to Programming and Data Science

Search

Bio Overview Takeaways Prerequisites Materials Grading Help Assignments Topics

Overview

Spring 2020: Tu-Th 3:30 PM to 4:45 PM

 

Undergrad TECH-UB-0023 concentration: Computing and Data Science

Undergrad tracks:


Based on the outline of the course taught by Prof. Panos Ipeirotis


This course is the recommended first course for undergrads who 1) want to work in the rapidly growing fields of data science and data analytics or 2) who want to acquire the technical and data analysis skills that are now needed in other disciplines such as finance and marketing. The course provides an introduction to programming (using Python) and covers collection, storage, organization, management, and analysis of data, both structured (record-based) and unstructured (such as text).


Topics

Projects follow-up course

There is a follow-up course titled “Projects in Programming for Data Science”, which covers the topics covered here in more depth as well as additional topics, such as web crawling, text analysis, regular expressions, background processing, visualizations, network analysis, etc. Those interested in deepening and broadening their programming experience are highly encouraged to take the follow on course.

Prerequisites

None

Important Information

Since this is a hands-on course, you must bring your laptop to every class with sufficient battery charge. Make sure you can connect to NYU wifi.

Course Objectives

At a very high level, the course will teach you Python and SQL, plus a few Unix tools that are useful for everyday data handling and processing. At the completion of this course, you should be able to:

Things that we will not use or cover

Grading

Late Assignment Submission Policy

Late submissions (even by 1 minute) will get a zero score because the answers will be posted immediately after the due date and time. No extensions will be granted except for medical or family emergencies. If you have any religious or personal conflicts, please submit the assignments beforehand since the related material will be covered well in advance of the due dates.

Textbooks

We will mainly rely on notes that are distributed in the form of iPython notebooks. We do not have any required textbook for the course, but the following books are a useful reference for some of the material that we will be covering in class.

Course policies

Unless otherwise noted, we follow the default Stern Policies. Due to the hands-on nature of the course, absences are strongly and highly discouraged. Classes are not videotaped.

Frequently Asked Questions

Tentative Timeline

Module Topic
1
  • Introduction and setup: Setting up an EC2 instance

    1. Elastic IP, shutdown & restart
    2. Sync notebooks / Upgrade libraries
    3. Download and upload data

    Running the first Python program

    Expressions
2
  • Primitive Data Types: Variables, and Numerics
3
  • Strings and String formatting: Special characters, indexing, slicing, basic functions
4
  • Strings and String formatting (cont)
5
  • Booleans and “if-then-else” control flow statements
6
  • Lists
7
  • Sets and dictionaries
8
  • Control Flow statements: while loops, for loops
9
  • Control Flow statements: while loops, for loops
10
  • Interacting with Files
11
  • Functions
12
  • Interacting with Files
13
  • Functions
14
  • Entity-Relationship model: Entities, keys, attributes, relations, ER examples
15
  • Entity-Relationship model: ER diagrams to SQL Tables
16
  • SQL 1: Select statements
17
  • SQL 2: LIKE, IS NULL, and Inner Join queries I
18
  • SQL 3: Inner Join II and Outer Join
19
  • SQL 4: Aggregation / GROUP BY queries
20
  • SQL 5: Subqueries / Python and SQL
21
  • Database integrative class practice
22
  • Intro to Numpy
23
  • Intro to Numpy
24
  • Intro to Pandas and Plotting
25
  • Intro to Pandas and Plotting
26
  • Intro to Pandas and Plotting
27
  • Intro to Pandas and Plotting
28
  • Final review