When you enroll through our links, we may earn a small commission—at no extra cost to you. This helps keep our platform free and inspires us to add more value.

Udemy logo

Sqoop, Hive and Impala for Data Analysts (Formerly CCA 159)

Hands on Sqoop, Hive and Impala for Data Analysts

     
  • 4.5
  •  |
  • Reviews ( 714 )
₹519

This Course Includes

  • iconudemy
  • icon4.5 (714 reviews )
  • icon20h 31m
  • iconenglish
  • iconOnline - Self Paced
  • iconprofessional certificate
  • iconUdemy

About Sqoop, Hive and Impala for Data Analysts (Formerly CCA 159)

As part of

Sqoop, Hive, and Impala for Data Analysts (Formerly CCA 159)

, you will learn key skills such as Sqoop, Hive, and Impala. This comprehensive course covers all aspects of the certification with real-world examples and data sets.

Overview of Big Data ecosystem

Overview Of Distributions and Management Tools

Properties and Properties Files - General Guidelines

Hadoop Distributed File System

YARN and Map Reduce2

Submitting Map ReduceJob

Determining Number of Mappers and Reducers

Understanding YARN and Map Reduce Configuration Properties

Review and Override Job Properties

Reviewing Map Reduce Job Logs

Map Reduce Job Counters

Overview of Hive

Databases and Query Engines

Overview of Data Ingestion in Big Data

Data Processing using Spark

HDFS Commands to manage files

Introduction to HDFS for Certification Exams

Overview of HDFS and PropertiesFiles

Overview of Hadoop CLI

Listing Files in HDFS

User Spaces or Home Directories in HDFS

Creating Directories in HDFS

Copying Files and Directories into HDFS

File and Directory Permissions Overview

Getting Files and Directories from HDFS

Previewing Text Files in HDFS

Copying or Moving Files and Directories within HDFS

Understanding Size of File System and Files

Overview of Block Size and ReplicationFactor

Getting File Metadata using hdfs fsck

Resources and Exercises

Getting Started with Hive

Overview of Hive Language Manual

Launching and using Hive CLI

Overview of Hive Properties

Hive CLI History and hiverc

Running HDFS Commands in Hive CLI

Understanding Warehouse Directory

Creating and Using Hive Databases

Creating and Describing Hive Tables

Retrieve Matadata of Tables using DESCRIBE

Role of Hive Metastore Database

Overview of beeline

Running Hive Commands and Queries using beeline

Creating Tables in Hive using Hive QL

Creating Tables in Hive - orders

Overview of Basic Data Types in Hive

Adding Comments to Columns and Tables

Loading Data into Hive Tables from Local File System

Loading Data into Hive Tables from HDFS

Loading Data - Overwrite vs Append

Creating External tables in Hive

Specifying Location for Hive Tables

Difference between Managed Table and External Table

Default Delimiters in Hive Tables using Text File

Overview of File Formats in Hive

Differences between Hive and RDBMS

Truncate and Drop tables in Hive

Resources and Exercises

Loading/Inserting data into Hive tables using Hive QL

Introduction to Partitioning and Bucketing

Creating Tables using Orc Format - order_items

Inserting Data into Tables using Stage Tables

Load vs. Insert in Hive

Creating Partitioned Tables in Hive

Adding Partitions to Tables in Hive

Loading into Partitions in Hive Tables

Inserting Data Into Partitions in Hive Tables

Insert Using Dynamic Partition Mode

Creating Bucketed Tables in Hive

Inserting Data into Bucketed Tables

Bucketing with Sorting

Overview of ACID Transactions

Create Tables for Transactions

Inserting Individual Records into Hive Tables

Update and Delete Data in Hive Tables

Overview of functions in Hive

Overview of Functions

Validating Functions

String Manipulation - Case Conversion and Length

String Manipulation - substr and split

String Manipulation - Trimming and Padding Functions

String Manipulation - Reverse and Concatenating Multiple Strings

Date Manipulation - Current Date and Timestamp

Date Manipulation - Date Arithmetic

Date Manipulation - trunc

Date Manipulation - Using date format

Date Manipulation - Extract Functions

Date Manipulation - Dealing with Unix Timestamp

Overview of Numeric Functions

Data Type Conversion Using Cast

Handling Null Values

Query Example - Get Word Count

Writing Basic Queries in Hive

Overview of SQL or Hive QL

Execution Life Cycle of Hive Query

Reviewing Logs of Hive Queries

Projecting Data using Select and Overview of From

Derive Conditional Values using CASE and WHEN

Projecting Distinct Values

Filtering Data using Where Clause

Boolean Operations in Where Clause

Boolean OR vs IN Operator

Filtering Data using LIKE Operator

Performing Basic Aggregations using Aggregate Functions

Performing Aggregations using GROUP BY

Filtering Aggregated Data Using HAVING

Global Sorting using ORDER BY

Overview of DISTRIBUTE BY

Sorting Data within Groups using SORT BY

Using CLUSTERED BY

Joining Data Sets and Set Operations in Hive

Overview of Nested Sub Queries

Nested Sub Queries - Using IN Operator

Nested Sub Queries - Using EXISTS Operator

Overview of Joins in Hive

Performing Inner Joins using Hive

Performing Outer Joins using Hive

Performing Full Outer Joins using Hive

Map Side Join and Reduce Side Join in Hive

Joining in Hive using Legacy Syntax

Cross Joins in Hive

Overview of Set Operations in Hive

Perform Set Union between two Hive Query Results

Set Operations - Intersect and Minus Not Supported

Windowing or Analytics Functions in Hive

Prepare HR Database in Hive with Employees Table

Overview of Analytics or Windowing Functions in Hive

Performing Aggregations using Hive Queries

Create Tables to Get Daily Revenue using CTAS in Hive

Getting Lead and Lag using Windowing Functions in Hive

Getting First and Last Values using Windowing Functions in Hive

Applying Rank using Windowing Functions in Hive

Applying Dense Rank using Windowing Functions in Hive

Applying Row Number using Windowing Functions in Hive

Difference Between rank, dense_rank, and row_number in Hive

Understanding the order of execution of Hive Queries

Overview of Nested Sub Queries in Hive

Filtering Data on Top of Window Functions in Hive

Getting Top 5 Products by Revenue for Each Day using Windowing Functions in Hive - Recap

Running Queries using Impala

Introduction to Impala

Role of Impala Daemons

Impala State Store and Catalog Server

Overview of Impala Shell

Relationship between Hive and Impala

Overview of Creating Databases and Tables using Impala

Loading and Inserting Data into Tables using Impala

Running Queries using Impala Shell

Reviewing Logs of Impala Queries

Synching Hive and Impala - Using Invalidate Metadata

Running Scripts using Impala Shell

Assignment - Using NYSE Data

Assignment - Solution

Getting Started with Sqoop

Introduction to Sqoop

Validate Source Database - MySQL

Review JDBC Jar to Connect to MySQL

Getting Help using Sqoop CLI

Overview of Sqoop User Guide

Validate Sqoop and MySQL Integration using Sqoop List Databases

Listing Tables in Database using Sqoop

Run Queries in MySQL using Sqoop Eval

Understanding Logs in Sqoop

Redirecting Sqoop Job Logs into Log Files

Importing data from MySQL to HDFS using Sqoop Import

Overview of Sqoop Import Command

Import Orders using target-dir

Import Order Items using warehouse-dir

Managing HDFS Directories

Sqoop Import Execution Flow

Reviewing Logs of Sqoop Import

Sqoop Import Specifying Number of Mappers

Review the Output Files generated by Sqoop Import

Sqoop Import Supported File Formats

Validating avro files using Avro Tools

Sqoop Import Using Compression

Apache Sqoop - Importing Data into HDFS - Customizing

Introduction to customizing Sqoop Import

Sqoop Import by Specifying Columns

Sqoop import Using Boundary Query

Sqoop import while filtering Unnecessary Data

Sqoop Import Using Split By to distribute import using non default column

Getting Query Results using Sqoop eval

Dealing with tables with Composite Keys while using Sqoop Import

Dealing with tables with Non Numeric Key Fields while using Sqoop Import

Dealing with tables with No Key Fields while using Sqoop Import

Using autoreset-to-one-mapper to use only one mapper while importing data using Sqoop from tables with no key fields

Default Delimiters used by Sqoop Import for Text File Format

Specifying Delimiters for Sqoop Import using Text File Format

Dealing with Null Values using Sqoop Import

Import Mulitple Tables from source database using Sqoop Import

Importing data from MySQL to Hive Tables using Sqoop Import

Quick Overview of Hive

Create Hive Database for Sqoop Import

Create Empty Hive Table for Sqoop Import

Import Data into Hive Table from source database table using Sqoop Import

Managing Hive Tables while importing data using Sqoop Import using Overwrite

Managing Hive Tables while importing data using Sqoop Import - Errors Out If Table Already Exists

Understanding Execution Flow of Sqoop Import into Hive tables

Review Files generated by Sqoop Import in Hive Tables

Sqoop Delimiters vs Hive Delimiters

Different File Formats supported by Sqoop Import while importing into Hive Tables

Sqoop Import all Tables into Hive from source database

Exporting Data from HDFS/Hive to MySQL using Sqoop Export

Introduction to Sqoop Export

Prepare Data for Sqoop Export

Create Table in MySQL for Sqoop Export

Perform Simple Sqoop Export from HDFS to MySQL table

Understanding Execution Flow of Sqoop Export

Specifying Number of Mappers for Sqoop Export

Troubleshooting the Issues related to Sqoop Export

Merging or Upserting Data using Sqoop Export - Overview

Quick Overview of MySQL - Upsert using Sqoop Export

Update Data using Update Key using Sqoop Export

Merging Data using allowInsert in Sqoop Export

Specifying Columns using Sqoop Export

Specifying Delimiters using Sqoop Export

Using Stage Table for Sqoop Export

Submitting Sqoop Jobs and Incremental Sqoop Imports

Introduction to Sqoop Jobs

Adding Password File for Sqoop Jobs

Creating Sqoop Job

Run Sqoop Job

Overview of Incremental Loads using Sqoop

Incremental Sqoop Import - Using Where

Incremental Sqoop Import - Using Append Mode

Incremental Sqoop Import - Create Table

Incremental Sqoop Import - Create Sqoop Job

Incremental Sqoop Import - Execute Job

Incremental Sqoop Import - Add Additional Data

Incremental Sqoop Import - Rerun Job

Incremental Sqoop Import - Using Last Modified

Here are the objectives for this course.

Provide Structure to the Data Use Data Definition Language (DDL) statements to create or alter structures in the metastore for use by Hive and Impala.

Create tables using a variety of data types, delimiters, and file formats

Create new tables using existing tables to define the schema

Improve query performance by creating partitioned tables in the metastore

Alter tables to modify the existing schema

Create views in order to simplify queries Data Analysis Use Query Language (QL) statements in Hive and Impala to analyze data on the cluster.

Prepare reports using SELECT commands including unions and subqueries

Calculate aggregate statistics, such as sums and averages, during a query

Create queries against multiple data sources by using join commands

Transform the output format of queries by using built-in functions

Perform queries across a group of rows using windowing functions Exercises will be provided to have enough practice to get better at Sqoop as well as writing queries using Hive and Impala.

All the demos are given on our state-of-the-art Big Data cluster. If you do not have multi-node cluster, you can sign up for our labs and practice on our multi-node cluster. You will be able to practice Sqoop and Hive on the cluster.

What You Will Learn?

  • Overview of Big Data ecosystem such as Hadoop HDFS, YARN, Map Reduce, Sqoop, Hive, etc .
  • Overview of HDFS Commands such as put or copyFromLocal, get or copyToLocal, cat, etc along with concepts such as block size, replication factor, etc .
  • Managing Tables in Hive Metastore using DDL Commands .
  • Load or Insert data into Hive Metastore Tables using commands such as LOAD and INSERT .
  • Overview of Functions in Hive to manipulate strings, dates, etc .
  • Writing Basic Hive QL Queries using WHERE, JOIN, GROUP BY, etc .
  • Analytical or Windowing Functions in Hive .
  • Overview of Impala and understanding similarities and differences between Hive and Impala .
  • Getting Started with Sqoop by reviewing official documentation and also exploring commands such as Sqoop eval .
  • Importing Data from RDBMS tables into HDFS using Sqoop Import .
  • Importing Data from RDBMS tables into Hive tables using Sqoop Import .
  • Exporting Data from Hive or HDFS to RDBMS tables using Sqoop Export .
  • Incremental Imports using Sqoop Import into HDFS or Hive Tables Show moreShow less.