Big Data for Developers (BDD)(10.2)

This course is applicable to users of Big Data version 10.2.1 and forward. Learn to accelerate Big Data Integration through mass ingestion, transformations, and processing of complex files. Optimize the Big Data system performance through monitoring, troubleshooting, and best practices.
After successfully completing this course, students should be able to:
  • Mass ingest data to Hive and HDFS
  • Integrate with relational databases using SQOOP
  • Perform transformations across various engines
  • Perform initial and incremental loads
  • Process complex files
  • Monitor logs and troubleshoot
  • Tune performances of Spark and Blaze jobs
Target Audience
Informatica Developer Tool for Big Data Developers (Instructor Led OR onDemand)
Show details

Module 0: Big Data Integration
  • Course Introduction
  • Course Agenda
  • Accessing the lab environment
  • Related Courses
Module 1: Big Data Management Basics
  • BDM concepts
  • BDM – features and benefits
  • BDM architecture
  • BDM Developer tasks
Module 2: Ingestion and Extraction
  • Application Services of BDM 10.2.1
  • Metadata Access Service
  • Mass Ingestion Service
  • Hadoop file systems
  • Mass Ingestion architecture
  • Mass Ingestion process
  • Mass Ingestion tool user interface
  • Mass ingestion to HDFS
  • Mass ingestion to Hive
  • Integrating with relational databases using SQOOP
  • SQOOP architecture
  • SQOOP optimizations
  • SQOOP for Teradata
Module 3: Big Data Engine Strategy
  • Describe BDM engine strategy
  • Discuss Hive
  • Hive Engine architecture
  • MapReduce
  • Tez
  • Spark architecture
  • Spark support in 10.2.1
  • Blaze architecture
  • Steps to choosing the right engine
  • Transformations in Hadoop
  • Expression Transformation
  • Filter Transformation
  • Lookup Transformation
  • Python Transformation
  • Router Transformation
  • Union Transformation
  • Update Strategy Transformation
Module 4: Big Data Development Process
  • Initial load
  • Incremental loads
  • Dynamic mapping ingestion
  • Stateful computing and windowing
  • Data science integration using Python
Module 5:Complex File Processing
  • Big Data file formats
  • Complex file data objects
  • Processing complex file types
  • Arrays
  • Structs
  • Maps
  • Intelligent structure discovery
Module 6: Monitoring and Troubleshooting
  • Spark Monitoring
  • Blaze Monitoring
  • Viewing logs
  • Troubleshooting
Module 7: Performance Tuning and Best Practices
  • Differentiate Native Vs Hadoop Mode of execution
  • Tuning performance of Spark jobs
  • Tuning performance of Blaze jobs
  • Best practices for different engines