Link to Content:
CS109 Data Science - Github
Content Found Via:
Tags: computer science / data exploration / data management / data visualization / descriptive statistics / mapreduce / python / statistics
“Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries.
We will be using Python for all programming assignments and projects.
Expected Learning Outcomes
After successful completion of this course, you will be able to:
- Use Python and other tools to scrape, clean, and process data
- Use data management techniques to store data locally and in cloud infrastructures
- Use statistical methods and visualization to quickly explore data
- Apply statistics and computational analysis to make predictions based on data
- Apply basic computer science concepts such as modularity, abstraction, and encapsulation to data analysis problems
- Implement data-intensive computations on cluster and cloud infrastructures using MapReduce
- Effectively communicate the outcome of data analysis using descriptive statistics and visualizations”
Past versions of the course materials are available for self-paced learning: CS109 Github
Recommended Prerequisites: The prerequisite for this class is basic programming knowledge and statistics knowledge. Both undergraduates and graduate students are welcome to take the course.
Go to Content: Harvard CS109 Data Science