Open Source Hadoop Administration (MR-1CN-OSHDADMIN)

Overview:This open source course provides participants with a comprehensive understanding of the steps necessary to install, configure, operate and maintain Hadoop. The course begins with an overview of the Big Data landscape, and then dives into a system administration working view of running Hadoop

Who should attend this class:
This course is intended for System administrators, DevOps engineers, and software developers responsible for managing and maintaining Hadoop clusters.

Prerequisite: Not required

Course Objectives
Upon successful completion of this course, participants should be able to:
Describe the fundamental concepts of using Big Data
Identify where Hadoop fits into a Big Data strategy
Learn to plan your Hadoop cluster.
Learn HDFS features.
Learn how to get data into HDFS.
Learn to work with MapReduce.
Learn installation and configuration of Hadoop.
Learn cluster maintenance.

Duration:4 days

Show details
Course Outline
The content of this course is designed to support the course objectives. 
1.0. Hadoop Introduction
1.1. A Brief History of Hadoop
1.2. Core Hadoop Components
1.3. Fundamental Concepts

2. Planning Your Hadoop Cluster
2.1. General Planning Considerations
2.2. Choosing Hardware
2.3. Network Considerations
2.4. Configuring Nodes
2.5. Planning for Cluster Management

3.1. HDFS Features
3.2. Writing and Reading Files
3.3. NameNode Considerations
3.4. HDFS Security
3.5. Namenode Web UI
3.6. Hadoop File Shell

4. Getting Data into HDFS
4.1. Pulling data from External Sources with Flume
4.2. Importing Data from Relational Databases with Sqoop
4.3. REST Interfaces
4.4. Best Practices

5. MapReduce
5.1. MapReduce overview
5.2. Features of MapReduce
5.3. Architectural Overview
5.4. YARN MapReduce Version 2
5.5. Failure Recovery
5.6. The JobTracker Web UI

6. Hadoop Installation & Initial Configuration
6.1. Configuration & Deployment Types
6.2. Installing Hadoop
6.3. Specifying the Hadoop Configuration
6.4. Initial HDFS & MapReduce Configuration
6.5. Log Files

7. Installing/Configuring Hive, Impala, and Pig
7.1. Hive
7.2. Impala
7.3. Pig

8.0. Hadoop Clients
8.1. What is a Hadoop Client?
8.2. Installing and Configuring Hadoop Clients
8.3. Installing and Configuring Hue
8.4. Hue Authentication and Configuration

9. Advanced Cluster Configuration
9.1. Advanced Configuration Parameters
9.2. Configuring Hadoop Ports
9.3. Explicitly Including and Excluding Hosts
9.4. Configuring HDFS for Rack Awareness & HDFS High Availability

10. Hadoop Security
10.1. Why Hadoop Security Is Important
10.2. Hadoop?s Security System Concepts
10.3. What Kerberos Is and How it Works
10.4. Securing a Hadoop Cluster with Kerberos

11. Managing and Scheduling Jobs
11.1. Managing Running Jobs
11.2. Scheduling Hadoop Jobs
11.3. Configuring the FairScheduler

12. Cluster Maintenance
12.1. Checking HDFS Status
12.2. Copying Data Between Clusters
12.3. Adding/Removing Cluster Nodes
12.4. Rebalancing the Cluster
12.5. NameNode Metadata Backup
12.6. Cluster Upgrades

13. Cluster Monitoring and Troubleshooting
13.1. General System Monitoring
13.2. Managing Hadoop?s Log Files
13.3. Monitoring the Clusters
13.4. Common Troubleshooting Issues