When you enroll through our links, we may earn a small commission—at no extra cost to you. This helps keep our platform free and inspires us to add more value.

Udemy logo

Apache Spark 3 for Data Engineering & Analytics with Python

Learn how to use Python and PySpark 3.0.1 for Data Engineering / Analytics (Databricks) - Beginner to Ninja

     0 |
  • Reviews ( 723 )
₹589

This Course Includes

  • iconudemy
  • icon0 (723 reviews )
  • icon8h 39m
  • iconenglish
  • iconOnline - Self Paced
  • iconprofessional certificate
  • iconUdemy

About Apache Spark 3 for Data Engineering & Analytics with Python

The key objectives of this course are as follows;

Learn the Spark Architecture

Learn Spark Execution Concepts

Learn Spark Transformations and Actions using the Structured API

Learn Spark Transformations and Actions using the RDD (Resilient Distributed Datasets) API

Learn how to set up your own local PySpark Environment

Learn how to interpret the Spark Web UI

Learn how to interpret DAG (Directed Acyclic Graph) for Spark Execution

Learn the RDD (Resilient Distributed Datasets) API (Crash Course)

RDD Transformations

RDD Actions

Learn the Spark DataFrame API (Structured APIs)

Create Schemas and Assign DataTypes

Read and Write Data using the DataFrame Reader and Writer

Read Semi-Structured Data such as JSON

Create and New Data Columns to the DataFrame using Expressions

Filter the DataFrame using the "Filter" and "Where" Transformations

Ensure that the DataFrame has unique rows

Detect and Drop Duplicates

Augment the DataFrame by Adding New Rows

Combine 2 or More DataFrames

Order the DataFrame by Specific Columns

Renaming and Drop Columns from the DataFrame

Clean the DataFrame by detecting and Removing Missing or Bad Data

Create User-Defined Spark Functions

Read and Write to/from Parquet File

Partition the DataFrame and Write to Parquet File

Aggregate the DataFrame using Spark SQL functions (count, countDistinct, Max, Min, Sum, SumDistinct, AVG)

Perform Aggregations with Grouping

Learn Spark SQL and Databricks

Create a Databricks Account

Create a Databricks Cluster

Create Databricks SQL and Python Notebooks

Learn Databricks shortcuts

Create Databases and Tables using Spark SQL

Use DML, DQL, and DDL with Spark SQL

Use Spark SQL Functions

Learn the differences between Managed and Unmanaged Tables

Read CSV Files from the Databricks File System

Learn to write Complex SQL

Use Spark SQL Functions

Create Visualisations with Databricks

Create a Databricks Dashboard The Python Spark project that we are going to do together;

_Sales Data_

Create a Spark Session

Read a CSV file into a Spark Dataframe

Learn to Infer a Schema

Select data from the Spark Dataframe

Produce analytics that shows the topmost sales orders per Region and Country

_Convert Fahrenheit to Degrees Centigrade_

Create a Spark Session

Read and Parallelize data using the Spark Context into an RDD

Create a Function to Convert Fahrenheit to Degrees Centigrade

Use the Map Function to convert data contained within an RDD

Filter temperatures greater than or equal to 13 degrees celsius

_XYZ Research_

Create a set of RDDs that hold Research Data

Use the union transformation to combine RDDs

Learn to use the subtract transformation to minus values from an RDD

Use the RDD API to answer the following questions

How many research projects were initiated in the first three years?

How many projects were completed in the first year?

How many projects were completed in the first two years?

_Sales Analytics_

Create the Sales Analytics DataFrame to a set of CSV Files

Prepare the DataFrame by applying a Structure

Remove bad records from the DataFrame (Cleaning)

Generate New Columns from the DataFrame

Write a Partitioned DataFrame to a Parquet Directory

Answer the following questions and create visualizations using Seaborn and Matplotlib

What was the best month in sales?

What city sold the most products?

What time should the business display advertisements to maximize the likelihood of customers buying products?

What products are often sold together in the state "NY"?

Technology Spec

1. Python 2. Jupyter Notebook 3. Jupyter Lab 4. PySpark (Spark with Python) 5. Pandas 6. Matplotlib 7. Seaborne 8. Databricks 9. SQL

What You Will Learn?

  • Learn the Spark Architecture .
  • Learn Spark Execution Concepts .
  • Learn Spark Transformations and Actions using the Structured API .
  • Learn Spark Transformations and Actions using the RDD (Resilient Distributed Datasets) API .
  • Learn how to set up your own local PySpark Environment .
  • Learn how to interpret the Spark Web UI .
  • Learn how to interpret DAG (Directed Acyclic Graph) for Spark Execution .
  • Learn the RDD (Resilient Distributed Datasets) API (Crash Course) .
  • Learn the Spark DataFrame API (Structured APIs) .
  • Learn Spark SQL .
  • Learn Spark on Databricks .
  • Learn to Visualize (Graphs and Dashboards) Data on Databricks Show moreShow less.