Home > Brands > TD SYNNEX > Intelligenza Artificiale

Big Data (TDAI006)

Learn how big data is driving organizational change and essential analytical tools and techniques. Understand big data and how it will impact your business with the tools and systems used by big data scientists and engineers.

Chapter 1. Defining Big Data

In-Class Discussion
Gartner's Definition of Big Data
More Definitions of Big Data
Transforming Data into Business Information
Challenges Posed by Big Data
Processing Big Data
Apache Hadoop
The Cloud and Big Data
The CAP Theorem
Summary

Chapter 2. Hadoop Overview

The Client – Server Processing Pattern
Apache Hadoop
Apache Hadoop Logo
Typical Hadoop Applications
Hadoop Clusters
Hadoop Distributions
Hadoop's Main Components
HDFS
HDFS Blocks
YARN
Hadoop-based Systems for Data Analysis
MapReduce
Similarity with SQL Aggregation Operations
Distributed Computing Economics
Discussion: Divide and Conquer
Apache Pig
Pig Latin
Running Pig
Pig Latin Script Example
What is Hive?
Hive's Value Proposition
Who uses Hive?
What Hive Does Not Have
HiveQL
Working with Hive Tables
What is HBase?
HBase vs RDBS
Interfacing with HBase
HBase Table Design Digest
A Cell's Value Versioning
Creating and Populating a Table in HBase Shell
Getting a Cell's Value
Counting Rows in an HBase Table
Summary

Chapter 3. Big Data Analytics in the Cloud

Data is King
Big Data Stores in the Cloud
Example: AWS Simple Storage Service (S3)
MapReduce (and Hadoop) in the Cloud
Information and Data Security
Data-at-rest Security Examples
Example of Object Encryption in S3
One S3 Use Case: Backup and Archiving
Data Analytics Services in the Cloud
Analytics Services with AWS
AWS EMR: Software Configuration Screen
AWS EMR: Hardware Configuration Screen
Big Data Analytics Solutions from Google Cloud
Google Data Processing and Analytics Pipelines
Google BigQuery
Machine Learning
Microsoft Azure ML Studio
Machine Learning Pipeline
Summary

Chapter 4. Making Big Data Small Techniques

What is Data Science?
Data Science, Machine Learning, AI?
Making Big Data Small
Descriptive Statistics
Correlation
Reducing the Number of Data Attributes
Lasso Regularization
Sampling Examples
Data Compression
Summary

Chapter 5. Introduction to Apache Spark

What is Apache Spark
Where to Get Spark?
The Spark Platform
Spark Logo
Common Spark Use Cases
Running Spark on a Cluster
The Driver Process
Spark Shell
Interfaces with Data Storage Systems
Limitations of Hadoop's MapReduce
Spark vs MapReduce
The Resilient Distributed Dataset (RDD)
Spark Streaming (Micro-batching)
Spark SQL
Example of Spark SQL
Spark Machine Learning Library
Example: Using Random Forests with Spark MLlib
The Output (the “Confusion” matrix)
Dumping the Trained Model
Clustering
Finding Centroids Example
Using kMeans Module with Spark MLlib
Printing the Centroids
GraphX

Pianifica

* Il prezzo indicato non include l’IVA che sarà però applicata in fattura

2.00 Days

EUR 1.400,00

Request a course / private training