Python for Data Science and Machine Learning BootcampCourses / Self-paced online course
From Udemy: “Learn how to use NumPy, Pandas, Seaborn, Matplotlib, Plotly, Scikit-Learn, Machine Learning, Tensorflow, and more! What Will I Learn? Use Python for Data Science and Machine Learning Use Spark for Big Data Analysis Implement Machine Learning Algorithms Learn to use NumPy for Numerical Data Learn to use Pandas for Data Analysis Learn to use Matplotlib for Python Plotting Learn to use Seaborn for statistical plots Use Plotly for interactive … Continue Reading
Scala and Spark for Bog Data and Machine LearningCourses / Self-paced online course
From Udemy: “Learn the latests Big Data technology – Spark and Scala, including Spark 2.0 DataFrames! What Will I learn? Use Scala for Programming Use Spark 2.0 DataFrames to read an manipulate data Use Spark to Process Large Datasets Understand how to use Spark on AWS and DataBricks”
Spark and Python for Big Data with PySparkCourses / Self-paced online course
From Udemy: “Learn how to use Spark with Python, including Spark Streaming, Machine Learning, Spark 2.0 DataFrames and more! What Will I Learn? Use Python and Spark together to analyze Big Data Learn how to use the new Spark 2.0 DataFrame Syntax Work on Consulting Projects that mimic real world situations! Classify Customer Churn with Logistic Regression Use Spark with Random Forests for Classification Learn now to use Spark’s Gradient Boosted … Continue Reading
Data Science and Machine Learning with Python – Hands On!Courses / Self-paced online course
From Udemy: “Become a data scientists in the tech industry! Comprehensive data mining and machine learning course with Python & Spark. What Will I Learn? Develop using iPython notebooks Understand statistical measures such as standard deviation Visualize data distributions, probability mass functions, and probability density functions Visualize data with matplotlib Use covariance and correlation metrics Apply conditional probability for finding correlated features Use Bayes’ Theorem to identify false positives Make predictions … Continue Reading
RStudio Cheat SheetsCheat Sheets / References, Learning Guides, Etc.
“These cheat sheets below make it easy to learn about and use some of our favorite packages.” Data Import Cheat Sheet: “reminds you how to read in flat files…,work with the results as tibbles, and reshape messy data with tidyr. Use tidyr to reshape your tables into tidy data, the data format that works the most seamlessly with R and the tidyverse.” Data Transformation Cheat Sheet: “dplyr provides a grammar … Continue Reading
PySpark Cheat Sheet: Spark in PythonCheat Sheets / References, Learning Guides, Etc.
“This PySpark cheat sheet covers the basics, from initializing Spark and loading your data, to retrieving RDD information, sorting, filtering and sampling your data. But that’s not all. You’ll also see that topics such as repartitioning, iterating, merging, saving your data and stopping the SparkContext are included in the cheat sheet.”
Cloudera Engineering BlogBlogs / Blogs, Podcasts, Etc.
“Best practices, how-tos, use cases, and internals from Cloudera Engineering and the community”, the blog is a deep dive into the functionality of the Cloudera products, covering a variety of topics in big data management and analysis and integrations with technologies such as Apache Spark, Hadoop, and Kafka.
Advanced Analytics with Spark: Patterns for Learning from Data at ScaleBooks / physical books or multiple formats
From Amazon: “In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—classification, collaborative filtering, and anomaly detection among others—to … Continue Reading
Learning Spark: Lightning-Fast Big Data AnalysisBooks / physical books or multiple formats
From Amazon: “Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and … Continue Reading
Machine Learning With Big DataCourses / Self-paced online course
This course is 4 of 6 in the Coursera Big Data Specialization. From Coursera: “Want to make sense of the volumes of data you have collected? Need to incorporate data-driven decisions into your process? This course provides an overview of machine learning techniques to explore, analyze, and leverage data. You will be introduced to tools and algorithms you can use to create machine learning models that learn from data, and to … Continue Reading