Link to Content:

O'Reilly Media

Created/Published/Taught by:

Cathy O'Neil

Rachel Schutt

Content Found Via:

Amazon

Free? No

Cost Range:

$10.20 - $44.99

Tags: causality / data engineering / data journalism / data science (overview) / data visualization / decision trees / exploratory data analysis / financial modeling / fraud detection / hadoop / k nearest neighbors (k-NN) / k-means / kaggle / linear regression / logistic regression / mapreduce / naive bayes / recommendation engines / social networks / statistical inference / timestamps

Difficulty Rating:

from Amazon:

“Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know.

In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.

Topics include:

Statistical inference, exploratory data analysis, and the data science process

Algorithms

Spam filters, Naive Bayes, and data wrangling

Logistic regression

Financial modeling

Recommendation engines and causality

Data visualization

Social networks and data journalism

Data engineering, MapReduce, Pregel, and Hadoop

Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.”

Recommended Prerequisites: From the Book:

"We assume prerequisites of linear algebra, some probability and statistics, and some experience coding in any language. Even so, we will

try to make the book as self-contained as possible, keeping in mind

that it’s up to you to do supplemental reading if you’re missing some

of that background. We’ll try to point out places throughout the

book where supplemental reading might help you gain a deeper

understanding."

Go to Content: Doing Data Science: Straight Talk from the Frontline

## By Renee September 26, 2015 - 3:05 am

You could describe Doing Data Science as kind of a “roadmap” to data science. There is some math and some code, but it is much more breadth than depth. There are a lot of “tips”, “things to think about”, and “lessons learned” that I feel give the reader a great sense of what pitfalls you might come across when doing real-world analysis, and how to avoid the common ones, but only a few step-by-step how-to’s and code examples (in R or Python).

I can imagine that some readers wouldn’t like that the book is “all over the place” and that it gives a combination of not much detail on some topics, and a lot of detail all at once on others; too technical and math-y on some topics, and very “laymans terms” on others. However, I liked that about the writing. It really touches on everything, and gives you enough direction to know where to go next to learn more. It feels like you’re meeting a bunch of people that have had a variety of experiences in the industry, and you’re all trying to give each other a feel for what you do: being technical enough to be impressive, but clear enough to be accessible, and explaining how you learned your particular subset of skills and where someone can get more info.

These comments are excerpts from a blog post I wrote about the book here: http://www.becomingadatascientist.com/2014/06/13/doing-data-science-review/

(4) A Lot

Advanced Beginner - was new to this topic / had some prerequisites

Advanced Beginner- familiar with data science / new to this topic / some prerequisitesNo ratings yet.