Link to Content:

CS109 Data Science - Github

Created/Published/Taught by:

Harvard SEAS

Joe Blitzstein

Hanspeter Pfister

Verena Kaynig-Fittkau

Content Found Via:

KDNuggets

Free? Yes

Tags: computer science / data exploration / data management / data visualization / descriptive statistics / mapreduce / python / statistics

Difficulty Rating:

From Harvard:

“Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries.

We will be using Python for all programming assignments and projects.

Expected Learning Outcomes

After successful completion of this course, you will be able to:

- Use Python and other tools to scrape, clean, and process data
- Use data management techniques to store data locally and in cloud infrastructures
- Use statistical methods and visualization to quickly explore data
- Apply statistics and computational analysis to make predictions based on data
- Apply basic computer science concepts such as modularity, abstraction, and encapsulation to data analysis problems
- Implement data-intensive computations on cluster and cloud infrastructures using MapReduce
- Effectively communicate the outcome of data analysis using descriptive statistics and visualizations”

Past versions of the course materials are available for self-paced learning: CS109 Github

Recommended Prerequisites: The prerequisite for this class is basic programming knowledge and statistics knowledge. Both undergraduates and graduate students are welcome to take the course.

Go to Content: Harvard CS109 Data Science

## By rgap October 21, 2015 - 9:28 pm

What I find really cool from this course are those ipython notebooks specially from the lab sessions. This 2015 version is a lot better than the previous ones. I totally recommend it.

Intermediate- comfortable with data science / has relevant prerequisites / familiar with this topicIntermediate - had relevant prerequisites / was familiar with this topic

(4) A Lot

No ratings yet.

## By dataWatcher November 20, 2015 - 8:07 am

I really enjoyed this course. All the lab information is provided on GitHub, so you can easily practice and play with the course material. The lectures are helpful and they give you additional sources outside of the course to study from as well..

Intermediate- comfortable with data science / has relevant prerequisites / familiar with this topicIntermediate - had relevant prerequisites / was familiar with this topic

(4) A Lot

No ratings yet.