Total Hours: 30

BIG Data - Hadoop Introduction

  • Understand What is Big Data.
  • Analyze limitations and solutions of existing Data Analytics Architecture
  • Understand What is Hadoop and its features
  • Hadoop Ecosystem
  • Understand Hadoop 2.x core components
  • Perform Read and Write in Hadoop
  • Understand Rack Awareness concept
  • Analyze Hadoop 2.x Cluster Architecture –Federation
  • Analyze Hadoop 2.x Cluster Architecture –High Availability
  • Run Hadoop in different cluster modes
  • Implement basic Hadoop commands on Terminal
  • Prepare Hadoop 2.x configuration files and analyze the parameters in it
  • Implement Password-less SSH on Hadoop cluster
  • Analyze dump of a Map Reduce program
  • Implement different data loading techniques

Map Reduce

  • Analyze different use-cases where Map Reduce is used
  • Difference between Traditional way and Map Reduce way
  • Learn about Hadoop 2. X Map Reduce Architecture and components
  • Understand execution flow of YARN Map Reduce Application
  • Run a basic Map Reduce Program



  • Need of PIG
  • Why should we go for PIG where there is MR
  • Where not to use PIG
  • What is PIG
  • Use cases where PIG is used
  • PIG – Basic Program Structure
  • PIG Execution
  • PIG Latin Program
  • PIG – Data Model
  • PIG Latin Operators


  • Understand what is Hive and its Use Cases
  • Understand Hive Architecture and Hive Components
  • Analyze limitations of Hive
  • Implement Primitive and Complex types in Hive
  • Understand Hive Data Model
  • Perform basic Hive operations
  • Execute Hive scripts and Hive UDFs

Advance HIVE and HBASE

  • Implement Joins in Hive
  • Implement Dynamic Partitioning
  • Analyze Custom Map/Reduce Scripts
  • Create Hive UDF
  • Understand NoSQL Databases and HBASE
  • Analyze difference between HBASE and RDBMS
  • Understand HBASE Components and Storage Architecture
  • Analyze HBASE Read and Write
  • Perform HBASE Cluster Deployment
  • Understand HBASE Attributes
  • Understand Data Model and Physical Storage in HBASE
  • Execute basic commands on HBASE shell
  • Analyze Data Loading Techniques in HBASE
  • Implement HBASE API
  • Understand Zookeeper Data Model and its Services
  • Analyze relationship between HBASE and Zookeeper
  • Perform Advance HBASE Actions


Implement Flume and Sqoop


  • Understand Oozie
  • Schedule Job in Oozie
  • Implement OozieWorkflow
  • Implement OozieCoordinator

Hardware and Software Requirement

Pre-requisites to install Apache Hadoop 2.0 VMWare Player
  • Minimum 4 GB RAM
  • Dual Core Processor or above
Follow below link to install and configure Hadoop 2.0 cluster @VMWare player for Pseudo Distributed Mode :
Follow below link to install and configure Apache PIG :
Follow below link to install and configure Apache HIVE :
Follow below link to install and configure HBASE :
Follow below link to install and configure SQOOP :