ETL Offload in Hadoop for Data Warehouse Optimization (ETLOFFLOAD)

This onDemand course instructs students on optimizing the data warehouse by offloading data and processing into Hadoop. The course is designed for ETL developers who want to move to the world of Big Data.
 
Objectives
After successfully completing this course, participants should be able to:
  • Review the data warehouse optimization process
  • Identify and offload unused data to Hadoop
  • Identify and offload complex ETL processing to Hadoop
  • Overcome Hadoop-related challenges
 
Target Audience
Experienced PowerCenter and new ETL/Hadoop developers
 
Prerequisites
  • Introduction to Informatica Developer Tool
  • Introduction to Hadoop
Show details
Agenda
1. Data Warehouse Optimization Process
  • Identify challenges with traditional data warehouses
  • Review requirements for an optimal data warehouse
  • Optimize data warehouse Hadoop
2. ETL Offloading to Hadoop
  • Identify unused data and complex ETL process to offload to Hadoop
  • Use various Informatica tools for performing offload
  • Overcome common challenges with running mappings on Hadoop
3. Performance Tuning
  • Perform the following:
  • Mapping level tuning
  • Data Integration Server level tuning
  • Hadoop Cluster level tuning
4. Case Study: Selective Partition Rebuild in Hive