Tag: Spark

Cloudera Engineering Blog

/

“Best practices, how-tos, use cases, and internals from Cloudera Engineering and the community”, the blog is a deep dive into the functionality of the Cloudera products, covering a variety of topics in big data management and analysis and integrations with technologies such as Apache Spark, Hadoop, and Kafka.

Advanced Analytics with Spark: Patterns for Learning from Data at Scale

/

From Amazon: “In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—classification, collaborative filtering, and anomaly detection among others—to … Continue Reading

Learning Spark: Lightning-Fast Big Data Analysis

/

From Amazon: “Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and … Continue Reading

Machine Learning With Big Data

/

This course is 4 of 6 in the Coursera Big Data Specialization. From Coursera: “Want to make sense of the volumes of data you have collected? Need to incorporate data-driven decisions into your process? This course provides an overview of machine learning techniques to explore, analyze, and leverage data. You will be introduced to tools and algorithms you can use to create machine learning models that learn from data, and to … Continue Reading

Big Data Integration and Processing

/

This course is 3 of 6 in the Coursera Big Data Specialization. From Coursera: “At the end of this course, you will be able to: Retrieve data from example database and big data management systems Describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications. Identify when a big data problem needs data integration Execute simple big data integration and … Continue Reading

DataQuest – Working with Large Datasets

/

This is step 9 of 11 in the DataQuest Data Scientist Path. “Learn topics like Map-Reduce and Spark to process large datasets.” The curriculum for this step includes: Introduction to Spark Spark Installation and Jupyter Notebook Integration Transformations and Actions Spark DataFrames Spark SQL

Hadoop: The Definitive Guide

/

With the fourth edition of this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop.

The Data Incubator Bootcamp/Fellowship

/

The Data Incubator is an intensive 7 week fellowship that prepares the best scientists and engineers with advanced degrees to work as data scientists and quants.