When you enroll through our links, we may earn a small commission—at no extra cost to you. This helps keep our platform free and inspires us to add more value.

Apache Spark for Java Developers
Get processing Big Data using RDDs, DataFrames, SparkSQL and Machine Learning - and real time streaming with Kafka!

This Course Includes
udemy
4.8 (3.6K reviews )
21h 43m
english
Online - Self Paced
professional certificate
Udemy
About Apache Spark for Java Developers
Get started with the amazing Apache Spark parallel computing framework - this course is designed especially for Java Developers. If you're new to Data Science and want to find out about how massive datasets are processed in parallel, then the Java API for spark is a great way to get started, fast. All of the fundamentals you need to understand the main operations you can perform in
Spark Core
,
SparkSQL
and
DataFrames
are covered in detail, with easy to follow examples. You'll be able to follow along with all of the examples, and run them on your own local development computer. Included with the course is a module covering
SparkML
, an exciting addition to Spark that allows you to apply
Machine Learning
models to your Big Data! No mathematical experience is necessary! And finally, there's a full 3 hour module covering
Spark Streaming
, where you will get hands-on experience of integrating Spark with
Apache Kafka
to handle real-time big data streams. We use both the
DStream
and the
Structured Streaming
APIs. Optionally, if you have an AWS account, you'll see how to deploy your work to a live EMR (Elastic Map Reduce) hardware cluster. If you're not familiar with AWS you can skip this video, but it's still worthwhile to watch rather than following along with the coding. You'll be going deep into the internals of Spark and you'll find out how it optimizes your execution plans. We'll be comparing the
performance of RDDs vs SparkSQL
, and you'll learn about the major performance pitfalls which could save a lot of money for live projects. Throughout the course, you'll be getting some great practice with Java Lambdas - a great way to learn functional-style Java if you're new to it.
What You Will Learn?
- Use functional style Java to define complex data processing jobs .
- Learn the differences between the RDD and DataFrame APIs .
- Use an SQL style syntax to produce reports against Big Data sets .
- Use Machine Learning Algorithms with Big Data and SparkML .
- Connect Spark to Apache Kafka to process Streams of Big Data .
- See how Structured Streaming can be used to build pipelines with Kafka.