Cloudera Engineering BlogBlogs / Blogs, Podcasts, Etc.
“Best practices, how-tos, use cases, and internals from Cloudera Engineering and the community”, the blog is a deep dive into the functionality of the Cloudera products, covering a variety of topics in big data management and analysis and integrations with technologies such as Apache Spark, Hadoop, and Kafka.
Advanced Analytics with Spark: Patterns for Learning from Data at ScaleBooks / physical books or multiple formats
From Amazon: “In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—classification, collaborative filtering, and anomaly detection among others—to … Continue Reading
Learning Spark: Lightning-Fast Big Data AnalysisBooks / physical books or multiple formats
From Amazon: “Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and … Continue Reading
Machine Learning With Big DataCourses / Self-paced online course
This course is 4 of 6 in the Coursera Big Data Specialization. From Coursera: “Want to make sense of the volumes of data you have collected? Need to incorporate data-driven decisions into your process? This course provides an overview of machine learning techniques to explore, analyze, and leverage data. You will be introduced to tools and algorithms you can use to create machine learning models that learn from data, and to … Continue Reading
Big Data Integration and ProcessingCourses / Self-paced online course
This course is 3 of 6 in the Coursera Big Data Specialization. From Coursera: “At the end of this course, you will be able to: Retrieve data from example database and big data management systems Describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications. Identify when a big data problem needs data integration Execute simple big data integration and … Continue Reading
DataQuest – Working with Large DatasetsCourses / Interactive tutorial style course
This is step 9 of 11 in the DataQuest Data Scientist Path. “Learn topics like Map-Reduce and Spark to process large datasets.” The curriculum for this step includes: Introduction to Spark Spark Installation and Jupyter Notebook Integration Transformations and Actions Spark DataFrames Spark SQL