Tag: mapreduce

Spark and Python for Big Data with PySpark

/

From Udemy: “Learn how to use Spark with Python, including Spark Streaming, Machine Learning, Spark 2.0 DataFrames and more! What Will I Learn? Use Python and Spark together to analyze Big Data Learn how to use the new Spark 2.0 DataFrame Syntax Work on Consulting Projects that mimic real world situations! Classify Customer Churn with Logistic Regression Use Spark with Random Forests for Classification Learn now to use Spark’s Gradient Boosted … Continue Reading

Data Science from Scratch: First Principles with Python

/

From Amazon: “Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get … Continue Reading

Introduction to Big Data

/

This course is 1 of 6 in the Coursera Big Data Specialization. From Coursera: “Interested in increasing your knowledge of the Big Data landscape? This course is for those new to data science and interested in understanding why the Big Data Era has come to be. It is for those who want to become conversant with the terminology and the core concepts behind big data problems, applications, and systems. It is … Continue Reading

DataQuest – Working with Large Datasets

/

This is step 9 of 11 in the DataQuest Data Scientist Path. “Learn topics like Map-Reduce and Spark to process large datasets.” The curriculum for this step includes: Introduction to Spark Spark Installation and Jupyter Notebook Integration Transformations and Actions Spark DataFrames Spark SQL

Big Data: Principles and best practices of scalable realtime data systems

/

Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data.

Data Science from Scratch: First Principles with Python

/

In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch.

Hadoop: The Definitive Guide

/

With the fourth edition of this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop.

Data Skeptic

/

Data Skeptic is a podcast explaining concepts from data science with interviews featuring practitioners and experts on interesting topics related to data, all through the eye of scientific skepticism.

Harvard CS109 Data Science

/

This course introduces data wrangling, cleaning, and sampling; data management; exploratory data analysis; prediction based on statistical methods; and communication of results through visualization, stories, and interpretable summaries.

The Data Incubator Bootcamp/Fellowship

/

The Data Incubator is an intensive 7 week fellowship that prepares the best scientists and engineers with advanced degrees to work as data scientists and quants.