Tag: Spark

iPython Notebooks

/ /

From GitHub: “This repo contains various iPython notebooks I’ve created to experiment with libraries and work through exercises, and explore subjects that I find interesting.” The notebooks include: Popular Python data science libraries NumPy SciPy Matplotlib Pandas Statsmodels Scikit-learn Seaborn NetworkX PyMC NLTK DEAP Genism Machine Learning Exercises Tensorflow Deep Learning Exercises Spark Big Data Labs Miscellaneous

Data Science and Engineering with Spark


From edX: This course “will teach student how to perform data science and data engineering at scale using Spark, a cluster computing system well-suited for large-scale machine learning tasks. It will also present a integrated view of data processing by highlighting the various components of data analysis pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. Students will gain hands-on experience building and debugging Spark applications. Internal … Continue Reading

Python for Data Science and Machine Learning Bootcamp


From Udemy: “Learn how to use NumPy, Pandas, Seaborn, Matplotlib, Plotly, Scikit-Learn, Machine Learning, Tensorflow, and more! What Will I Learn? Use Python for Data Science and Machine Learning Use Spark for Big Data Analysis Implement Machine Learning Algorithms Learn to use NumPy for Numerical Data Learn to use Pandas for Data Analysis Learn to use Matplotlib for Python Plotting Learn to use Seaborn for statistical plots Use Plotly for interactive … Continue Reading

Scala and Spark for Bog Data and Machine Learning


From Udemy: “Learn the latests Big Data technology – Spark and Scala, including Spark 2.0 DataFrames! What Will I learn? Use Scala for Programming Use Spark 2.0 DataFrames to read an manipulate data Use Spark to Process Large Datasets Understand how to use Spark on AWS and DataBricks”

Spark and Python for Big Data with PySpark


From Udemy: “Learn how to use Spark with Python, including Spark Streaming, Machine Learning, Spark 2.0 DataFrames and more! What Will I Learn? Use Python and Spark together to analyze Big Data Learn how to use the new Spark 2.0 DataFrame Syntax Work on Consulting Projects that mimic real world situations! Classify Customer Churn with Logistic Regression Use Spark with Random Forests for Classification Learn now to use Spark’s Gradient Boosted … Continue Reading

Data Science and Machine Learning with Python – Hands On!


From Udemy: “Become a data scientists in the tech industry! Comprehensive data mining and machine learning course with Python & Spark. What Will I Learn? Develop using iPython notebooks Understand statistical measures such as standard deviation Visualize data distributions, probability mass functions, and probability density functions Visualize data with matplotlib Use covariance and correlation metrics Apply conditional probability for finding correlated features Use Bayes’ Theorem to identify false positives Make predictions … Continue Reading

RStudio Cheat Sheets


“These cheat sheets below make it easy to learn about and use some of our favorite packages.” Data Import Cheat Sheet: “reminds you how to read in flat files…,work with the results as tibbles, and reshape messy data with tidyr. Use tidyr to reshape your tables into tidy data, the data format that works the most seamlessly with R and the tidyverse.” Data Transformation Cheat Sheet: “dplyr provides a grammar … Continue Reading

PySpark Cheat Sheet: Spark in Python


“This PySpark cheat sheet covers the basics, from initializing Spark and loading your data, to retrieving RDD information, sorting, filtering and sampling your data. But that’s not all. You’ll also see that topics such as repartitioning, iterating, merging, saving your data and stopping the SparkContext are included in the cheat sheet.”

Cloudera Engineering Blog


“Best practices, how-tos, use cases, and internals from Cloudera Engineering and the community”, the blog is a deep dive into the functionality of the Cloudera products, covering a variety of topics in big data management and analysis and integrations with technologies such as Apache Spark, Hadoop, and Kafka.

Advanced Analytics with Spark: Patterns for Learning from Data at Scale


From Amazon: “In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—classification, collaborative filtering, and anomaly detection among others—to … Continue Reading