Preaload Image

Overview

The Big Data Hadoop developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark and Data Science.

Learning Outcomes

  • Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
  • Gain knowledge of Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
  • Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
  • Get an overview of Sqoop and Flume and describe how to ingest data using them
  • Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
  • Get to know about HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
  • Gain a working knowledge of Pig and its components
  • Do functional programming in Spark
  • Understand resilient distribution datasets (RDD) in detail
  • Implement and build Spark applications
  • Understand the common use-cases of Spark and the various interactive algorithms
  • Learn Spark SQL, creating, transforming, and querying Data frames

Duration: 4days Workshop + Post Workshop Support

Modules

  • Introduction
  • Introduction to Big data and Hadoop Ecosystem
  • HDFS and YARN
  • MapReduce and Scoop
  • Basics of Hive and Impala
  • Types of Data Formats
  • Advanced Hive Concept and Data File Partitioning
  • Apache Flume and HBase
  • Pig/Tableau & QlikView
  • Basics of Apache Spark
  • RDDs in Spark
  • Implementation of Spark Applications
  • Spark Parallel Processing
  • Spark RDD Optimization Techniques
  • Spark Algorithm

Deliverables

  • 2 days Instructor-Led Classroom training from Certified Trainer of Senior Profile.
  • Course materials(soft copy) and practice exercises
  • Big data Course Completion Certification