FavoriteLoadingBookmark this content

Data Science and Engineering with Spark No ratings yet.




Link to Content:
Data Science and Engineering with Spark

Created/Published/Taught by:
edX
University of California Berkeley
Databricks
Jon Bates
Ameet Talwalkar
Anthony D. Joseph

Content Found Via:
Data Science Renee

Free? No

Cost Range:
$49.00 - $297.00

Tags: / /
Content Type: /

Difficulty Rating:

No ratings yet.



From edX:

This course “will teach student how to perform data science and data engineering at scale using Spark, a cluster computing system well-suited for large-scale machine learning tasks. It will also present a integrated view of data processing by highlighting the various components of data analysis pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. Students will gain hands-on experience building and debugging Spark applications. Internal details of Spark and distributed machine learning algorithms will be covered, which will provide students with intuition about working with big data and developing code for a distributed environment.

What You’ll Learn:

  • How to use Spark and its libraries to solve big data problems
  • How to approach large scale data science and engineering problems
  • Spark’s API, architecture, and many internal details
  • The trade-offs between communication and computation in a distributed environment
  • Use cases for Spark”

Recommended Prerequisites: "This XSeries requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), but previous experience with Spark or distributed computing is NOT required. Familiarity with basic machine learning concepts and exposure to algorithms, probability, linear algebra and calculus are prerequisites for two of the courses in this series."

Go to Content: Data Science and Engineering with Spark