When you enroll through our links, we may earn a small commission—at no extra cost to you. This helps keep our platform free and inspires us to add more value.

Udemy logo

Machine Learning with Apache Spark 3.0 using Scala

Machine Learning with Apache Spark 3.0 using Scala with Examples and 4 Projects

     
  • 4
  •  |
  • Reviews ( 65 )
₹569

This Course Includes

  • iconudemy
  • icon4 (65 reviews )
  • icon8h 20m
  • iconenglish
  • iconOnline - Self Paced
  • iconprofessional certificate
  • iconUdemy

About Machine Learning with Apache Spark 3.0 using Scala

Do you want to master

Machine Learning at scale

using one of the most powerful Big Data frameworks in the world? This course will teach you

Machine Learning with Apache Spark 3.0 and Scala

, step by step, through

real-world projects and hands-on coding examples

. Apache Spark is the

industry-standard framework

for processing and analyzing large datasets. Its

MLlib (Machine Learning Library)

provides scalable implementations of machine learning algorithms, making it possible to train, evaluate, and deploy models on

massive amounts of data

efficiently. Combined with

Scala

, the native language of Spark, you’ll learn how to build and optimize

end-to-end machine learning pipelines

. This course is designed for

beginners to intermediate learners

who want to get practical experience in applying machine learning techniques in Spark. You’ll start with

Big Data and Spark basics

, then move on to

core machine learning concepts

, and finally apply them to

real-world datasets

through hands-on projects like

rain prediction, ad click prediction, iris flower classification, and customer segmentation

. By the end of this course, you will have the skills and confidence to

build scalable machine learning models

using Spark 3.0 and Scala—skills that are highly in-demand in industries such as

finance, e-commerce, telecom, and technology

. What You Will Learn

Introduction to Machine Learning & Spark MLlib

Basics of machine learning, types (supervised, unsupervised, classification, regression, clustering).

What is Spark ML? How Spark MLlib simplifies building ML models at scale.

Apache Spark Basics (Optional Section)

Get familiar with Spark fundamentals: RDD, DataFrames, and Datasets.

Set up Spark environment using

Databricks

.

Learn notebook basics, cluster provisioning, and working with Scala.

Data Handling & Preparation

Work with different data sources:

CSV, JSON, LIBSVM, Images, Avro, and Parquet

.

Understand the

Machine Learning data pipeline

in Spark.

Practice feature extraction, transformation, and selection techniques.

Feature Engineering in Spark ML

Learn popular feature extractors like

TF-IDF, Word2Vec, CountVectorizer, FeatureHasher

.

Apply transformers such as

Tokenizer, StopWordsRemover, n-gram, PCA, StringIndexer, OneHotEncoder

.

Use feature selectors like

RFormula and ChiSqSelector

.

Build and connect them into

end-to-end ML pipelines

.

Machine Learning Models with Spark

Classification Models

: Decision Trees, Logistic Regression, Naive Bayes (Iris Prediction), Random Forest, Gradient-Boosted Trees, Linear SVM, One-vs-Rest.

Regression Models

: Linear Regression, Decision Tree Regression, Random Forest Regression, Gradient-Boosted Tree Regression, Predict Ads Clicks project.

Clustering

: KMeans (Customer Segmentation Project).

Hands-On Projects

Rain Prediction in Australia

(complete ML pipeline).

Iris Flower Classification

using Naive Bayes.

Customer Segmentation

using KMeans.

Ad Click Prediction

using Linear Regression.

Multiple other classification and regression use cases with step-by-step Scala implementations.

Spark MLlib in Practice

Understand how to train, evaluate, and optimize ML models at scale.

Explore key concepts like

shuffling, correlation, pipeline components, and evaluation metrics

.

What You Will Learn?

  • Understand the fundamentals of Machine Learning and its types (supervised, unsupervised, classification, regression, clustering). .
  • Learn the basics of Apache Spark 3.0 and how it supports large-scale data processing. .
  • Work hands-on with Spark RDDs, DataFrames, and Datasets using Scala. .
  • Explore Spark MLlib – the machine learning library in Spark – and how it enables scalable ML solutions. .
  • Build end-to-end Machine Learning pipelines using Spark, from data ingestion to model evaluation. .
  • Gain practical experience with real-world datasets such as predict rain in Australia, Iris flower classification, ad click prediction, and mall customer segment .
  • Learn how to work with different data sources like CSV, JSON, Parquet, Avro, LIBSVM, and images. .
  • Master feature engineering techniques such as TF-IDF, Word2Vec, CountVectorizer, PCA, n-grams, StringIndexer, OneHotEncoder, VectorAssembler, and more. .
  • Implement various classification models including Decision Trees, Logistic Regression, Naive Bayes, Random Forests, Gradient-Boosted Trees, Linear SVM, .
  • Apply different regression models such as Linear Regression, Decision Trees, Random Forests, and Gradient-Boosted Trees. .
  • Work with clustering algorithms like KMeans for customer segmentation. .
  • Understand the concepts behind machine learning pipelines and how to use Spark’s pipeline API effectively. .
  • Get tips, tricks, and best practices for writing efficient and production-ready ML models in Spark using Scala. Show moreShow less.