When you enroll through our links, we may earn a small commission—at no extra cost to you. This helps keep our platform free and inspires us to add more value.

Udemy logo

Practical Guide to setup Hadoop and Spark Cluster using CDH

Step by step instructions to setup Hadoop and Spark Cluster using Cloudera Distribution of Hadoop (Formerly CCA 131)

     
  • 4.2
  •  |
  • Reviews ( 535 )
₹519

This Course Includes

  • iconudemy
  • icon4.2 (535 reviews )
  • icon20h 56m
  • iconenglish
  • iconOnline - Self Paced
  • iconprofessional certificate
  • iconUdemy

About Practical Guide to setup Hadoop and Spark Cluster using CDH

Cloudera is one of the leading vendor for distributions related to Hadoop and Spark. As part of this Practical Guide, you will learn step by step process of setting up Hadoop and Spark Cluster using CDH.

Install - Demonstrate an understanding of the installation process for Cloudera Manager, CDH, and the ecosystem projects.

Set up a local CDH repository

Perform OS-level configuration for Hadoop installation

Install Cloudera Manager server and agents

Install CDH using Cloudera Manager

Add a new node to an existing cluster

Add a service using Cloudera Manager

Configure - Perform basic and advanced configuration needed to effectively administer a Hadoop cluster

Configure a service using Cloudera Manager

Create an HDFS user's home directory

Configure NameNode HA

Configure ResourceManager HA

Configure proxy for Hiveserver2/Impala

Manage - Maintain and modify the cluster to support day-to-day operations in the enterprise

Rebalance the cluster

Set up alerting for excessive disk fill

Define and install a rack topology script

Install new type of I/O compression library in cluster

Revise YARN resource assignment based on user feedback

Commission/decommission a node

Secure - Enable relevant services and configure the cluster to meet goals defined by security policy; demonstrate knowledge of basic security practices

Configure HDFS ACLs

Install and configure Sentry

Configure Hue user authorization and authentication

Enable/configure log and query redaction

Create encrypted zones in HDFS

Test - Benchmark the cluster operational metrics, test system configuration for operation and efficiency

Execute file system commands via HTTPFS

Efficiently copy data within a cluster/between clusters

Create/restore a snapshot of an HDFS directory

Get/set ACLs for a file or directory structure

Benchmark the cluster (I/O, CPU, network)

Troubleshoot - Demonstrate ability to find the root cause of a problem, optimize inefficient execution, and resolve resource contention scenarios

Resolve errors/warnings in Cloudera Manager

Resolve performance problems/errors in cluster operation

Determine reason for application failure

Configure the Fair Scheduler to resolve application delays

Our Approach

You will start with creating Cloudera QuickStart VM (in case you have laptop with 16 GB RAM with Quad Core). This will facilitate you to get comfortable with Cloudera Manager.

You will be able to sign up for GCP and avail credit up to $300 while offer lasts. Credits are valid up to year.

You will then understand brief overview about GCP and provision 7 to 8 Virtual Machines using templates. You will also attaching external hard drive to configure for HDFS later.

Once servers are provisioned, you will go ahead and set up Ansible for Server Automation.

You will take care of local repository for Cloudera Manager and Cloudera Distribution of Hadoop using Packages.

You will then setup Cloudera Manager with custom database and then Cloudera Distribution of Hadoop using Wizard that comes as part of Cloudera Manager.

As part of setting up of Cloudera Distribution of Hadoop you will setup HDFS, learn HDFS Commands, Setup YARN, Configure HDFS and YARN High Availability, Understand about Schedulers, Setup Spark, Transition to Parcels, Setup Hive and Impala, Setup HBase and Kafka etc.

What You Will Learn?

  • Learn Hadoop and Spark Administration using CDH .
  • Provision Cluster from GCP (Google Cloud Platform) to setup Hadoop and Spark Cluster using CDH .
  • Setup Ansible for server automation to setup pre-requisites to setup Hadoop and Spark Cluster using CDH .
  • Setup 8 node cluster from scratch using CDH .
  • Understand Architecture of HDFS, YARN, Spark, Hive, Hue and many more.