Big Data Training

Syllabus

Introduction

  • Introduction to Big data
  • Hadoop eco systems
  • Introduction to SQOOP, PIG, HIVE, IMPALA
  • Features of each eco systems
  • Data extraction, Data storage, Data Analysis and Data Mining using Sqoop, Pig, Hive and Impala
  • How to load data from relational databases and other sources
  • How Pig, Hive, and Impala improve productivity for typical analysis tasks
  • How to determine which tool is the best choice for a given task
  • How R statistics can be used for analytics

HDFS

  • Introduction to HDFS
  • Difference between Hadoop version 1.0 and 2.0
  • Name node
  • Data node
  • Standby node
  • Job tracker
  • Task tracker
  • Storage mechanism
  • Efficiency of HDFS
Course Duration : 3 Days

Sqoop

  • Sqoop introduction
  • Why Sqoop
  • Sqoop Basic architecture
  • Supported Databases
  • Connecting to RDBMS
  • Basic Syntax
  • Import Process
  • Export Process
  • Hands on

Hive

  • Hive Introduction
  • What hive is not
  • RDBMS vs hive
  • Accessing hive
  • Data units, Data types
  • Hive Meta store
  • Hive File Formats
  • Creating and using a database in hive
  • Internal table creation and loading in hive
  • External table creation and loading in hive
  • Partition table creation and loading in hive
  • Complex datatypes in hive
  • Altering tables and databases
  • Queries and subqueries
  • Joins
  • Built in functions
  • Grouping and aggregation
  • Storing query results
  • Text processing (n gram,bi-gram,histogram)
  • UDF in hive

Apache Pig

  • Pig introduction and overview
  • Interaction with pig
  • Modes of Execution
  • Pig Latin Grammar
  • Complex datatypes in Pig
  • Data loading
  • Store and Dump in pig
  • Filtering Data
  • Grouping Data
  • Flatten Data
  • Built in Functions
  • Nested For each
  • Joins,Cogroup in Pig
  • Parameters in pig
  • Macros in Pig
  • UDF in pig
  • Sampling
  • Illustrate

Frequently Asked Questions

What is Hadoop?

Hadoop is a big data technology created in java to handle data explosion. It is not a database; it is a file system which stores everything in the form of files. In most cases the data will be taken to the place where logic is written for processing, but in Hadoop, logic will be taken to nodes where data is present since transfer of such large amount data is very expensive and won’t be efficient.

Why Hadoop?

In the world we live in, where the amount of data generated per day in in petabytes, we cannot just lose the data because important business decision has to be taken based upon the data generated over a period of time. To handle such huge amount of data we have a open source technology called Hadoop. To process that amount of data, we have Hadoop tools like Pig, Hive, Sqoop, Impala and many more.

What we do at Dreams plus for Hadoop?

In DreamsPlus, the best institute in Chennai for Hadoop and we explain the need of Hadoop, what types of analysis that can be done through Hadoop, how to work with Hadoop. Candidates will be given real time datasets on which they can have real hands on experience by using Hadoop tools. Concepts will be explained clearly and Practical classes will be conducted parallel on which they can get a clear understanding of the concepts.

Whom Hadoop is suitable for?

Hadoop is suitable for a Professional or a Candidate who has got a basic knowledge on SQL and UNIX environment. Even though candidates are new to SQL or UNIX they will be given an introduction and basic commands and functions will be taught.

Whom do we train?

Professionals as well as Students with basic database knowledge.

Job Opportunity for Hadoop:

Any leading company who aims on getting a huge profit based on data driven decision, has a big opening for people sound in a Big data technology, So Hadoop being a big data technology has got a very bright future and we make sure that you become an expert in Hadoop.


DreamsPlus. Copyright © 2017. All rights reserved.

Design & Marketed by OneDot