Itelligence banner


BigData and Hadoop

Get a Callback!




 1 on 1 sessions


Free cloud access


24*7 call support

Course Features


    VM for practice.


   Free cloud storage


    CCDH preparation




    Video Lectures


    Complete modules


    Multiple batches

Our Courses



Hadoop Fundamentals

  • What is Bigdata?
  • Evolution of Bigdata
  • Types of Data and their Significance
  • Need for Bigdata Analytics
  • Why Bigdata with Hadoop?
  • History of Hadoop
  • Why Hadoop is in demand in market nowadays?
  • Limitations of SQL based Tools
  • Hadoop Nodes
  • Hadoop Rack
  • Hadoop Cluster
  • Architecture of Hadoop
  • Characteristics of Namenode
  • Workaround with Datanodes
  • Significance of JobTracker and Tasktrackers
  • Hase co-ordination with JobTracker
  • Secondary Namenode usage and Workaround
  • Hadoop Releases and their Significance
  • Introduction to Hadoop Release-1
  • Hadoop Daemons in Hadoop Release-1
  • Introduction to Hadoop Release-2
  • Hadoop Daemons in Hadoop Release-2
  • Hadoop Cluster Demo
  • Hadoop 2.x Cluster Architecture
  • A Typical Production Hadoop Cluster
  • Hadoop Cluster Modes
  • Hadoop 2.x Configuration Files
  • Single node cluster and Multi node cluster setup
  • Hadoop installation
  • Introduction to Hadoop FS and Processing Environment’s UIs
  • How to read and write files
  • Basic Unix commands for Hadoop
  • Hadoop  FS shell
  • Hadoop releases practical
  • Hadoop daemons practical
  • Common Hadoop Shell Commands
  • An Overview of Hadoop Administration
  • How Hadoop is getting two categories Projects
  • New projects on Hadoop
  • Hadoop Storage – HDFS (Hadoop Distributed file system)
  • Hadoop Processing Framework (Map Reduce / YARN)
  • Alternates of Map Reduce
  • Why NOSQL is in much demand instead of SQL
  • Distributed warehouse for HDFS
  • YARN Architecture
  • Significance of Scalability of Operation
  • Use cases where not to use Hadoop
  • Use cases where Hadoop Is used
  • Facebook,Twitter,Snapdeal, Flipkart


Hadoop Java API

  • Hadoop Classes
  • What is MapReduceBase?
  • Mapper Class and its Methods
  • What is Partitioner and types
  • MapReduce Use Cases
  • Traditional way VS MapReduce way
  • Significance of MapReduce
  • Hadoop 2. X MapReduce Architecture
  • Hadoop 2. MapReduce Program
  • Understanding Input Splits
  • Relationship between Input Splits and HDFS Blocks
  • MapReduce: Combiner & Partitioner
  • Hadoop specific Data types
  • Working on Unstructured Data Analytics
  • What is an Iterator and its usage techniques
  • Types of Mappers and Reducers
  • What is Output collector and its Significance
  • Workaround with Joining of datasets
  • Complications with MapReduce
  • Mapreduce Anatomy
  • Anagram example,Teragen Example,Terasort Example
  • WordCount Example
  • Working with multiple mappers
  • Working with weather data on multiple Data nodes in a Fully distributed Architecture
  • Use Cases where MapReduce anatomy fails
  • Advanced MapReduce
  • Counters
  • Distributed Cache
  • MRunit
  • Joins in MapReduce
  • Reduce Side Join
  • Replicated Join
  • Composite Join
  • Cartesian Product
  • Custom Input Format
  • Sequence Input Format
  • XML File Parsing using MapReduce
  • Interview questions based on JAVA MapReduce


Working with Pig Latin - (Fundamentals)

  • Introduction to Pig Latin
  • History and Evolution of Pig Latin
  • Why Pig is used only with Bigdata
  • MapReduce VS Pig
  • Pig Architecture and Overview of Compiler and Execution Engine
  • Programming Structure in Pig
  • Pig Running Modes
  • Pig Components
  • Pig Execution
  • Pig Release and Significance of Bugfixes
  • Pig Specific Datatypes
  • Complex Datatypes
  • Bags, Tuples, Fields
  • Pig Specific Methods
  • Comparison between Yahoo Pig & Facebook Hive
  • Shell and Utility Commands
  • Working with Grunt Shell
  • Grunt commands: 17 in number
  • Pig Latin: Relational Operators
  • Pig Latin: File Loaders
  • Pig Latin: Group Operator
  • Cogroup Operator
  • Joins and Cogroup
  • Union
  • Understanding Diagnostic Operators
  • Specialized Joins in Pig
  • Built in Functions
  • Eval Function
  • Load and Store Functions
  • Math Function
  • String Function
  • Date Function
  • Pig UDF
  • Piggybank
  • Parameter Substitution
  • Pig Streaming
  • Pig Use Cases: Aviation and Healthcare
  • Pig Data Input Techniques for flatfiles
  • Flatfiles: Comma separated, Tab delimited, and fixed width
  • Working with Schemaless Approach
  • How to attach Schema to a file/table in Pig
  • Schema referencing for similar Tables and Files
  • Working with Delimiters




Working with Pig Latin - II (Advanced)

  • Working with Binary Storage and Text Loader
  • Bigdata Operations and Read write Analogy
  • Filtering Datasets
  • Filtering rows with specific condition
  • Filtering rows with multiple conditions
  • Filtering rows with String Based Conditions
  • Sorting DataSets
  • Sorting rows with Specific column or columns
  • Multi level Sort
  • Analogy of a Sort Operation
  • Grouping Datasets and Co-grouping data
  • Joining DataSets
  • Types of Joins supported by Pig Latin
  • Aggregate Operations like average, sum, min, max, count
  • Flatten Operator
  • Creating a UDF (USER DEFINED FUNCTION) using java
  • Calling UDF from a Pig Script
  • Data validation Scripts



Working with Hive

  • Overview of Hive
  • Background of Hive
  • Hive VS Pig
  • Installation and Configuration
  • Interacting HDFS using HIVE
  • Map Reduce Programs through HIVE
  • Hive Architecture and Components
  • Hive Commands
  • Loading, Filtering, Grouping
  • What is Meta Storage and Meta Store
  • Derby Database
  • HQL
  • DDL, DML, and other Sub Languages of Hive
  • Data types in Hive
  • Partitions and Buckets
  • Hive Tables: Managed and External
  • Importing Data
  • Querying Data
  • Managing Outputs
  • Hive Scipts
  • Hive UDF
  • Hive Operators
  • Hive Joins, Unions, and Groups
  • Sample Programs in Hive
  • Alter and Delete in Hive
  • Partition in Hive
  • Indexing
  • Industry Specific Configuration of Hive Parameters
  • Authentication & Authorization
  • Statistics with Hive
  • Archiving in Hive
  • Hands-on exercise



     Advanced Hive

  • Understanding Hive Releases
  • Hive and OLTP
  • OLAP in Hive
  • Hive QL: Joining Tables
  • Dynamic Partitioning & Bucketing
  • Serialization and Deserialization
  • Custom Map/Reduce Scripts
  • Hive Indexes and Views
  • Hive Query Optimizers
  • Hive Architecture
  • Understanding Thrift Server
  • User Defined Functions
  • Hue Interface for Hive
  • Analyzing Data with Hive Script
  • Difference between Hive and Impala
  • UDFs in Hive
  • Complex Use cases in Hive




HBase & Zookeeper

  • Introduction to HBase
  • HBase VS RDBMS
  • HBase Components
  • Hbase Architecture
  • HBase Shell
  • HBase Client API
  • Data Loading Techniques
  • Run Modes & Configuration
  • HBase Cluster Deployment
  • Regionservers and their implementation
  • Client API's and their features
  • How messaging system works
  • Columns and column families
  • Configuring hbase-site.xml
  • Available Client
  • Loading Hbase with semi-structured data
  • Internal data storage in hbase
  • Timestamps
  • Creating table with column families
  • MapReduce Integration.
  • HBase: Advanced Usage, Schema Design
  • Load data from pig to hbase
  • Zookeeper Data Model
  • Zookeeper Service
  • Challenges faced in Distributed Applications
  • Coordination
  • Znode
  • Client API Functions
  • Bulk Loading
  • Receiving and Inserting Data
  • Filters in HBase
  • Sqoop architecture
  • Data Import and export in SQOOP
  • Deploying quorum and configuration throughout the Cluster



Hadoop on Amazon Cloud

  • Introduction to Cloud Infrastructure
  • Amazon SaaS, Paas and IaaS
  • Creating EC2 Instance for Processing
  • Creating S3 Buckets
  • Deploying Data on to the Cloud
  • Choosing size of our instance
  • Configuration of EMR Instance
  • Creating a virtual cluster on Amazon
  • Deploying project and getting stats




  • An Overview of Sqoop
  • Sqoop Real-life Connect
  • Sqoop and its Uses
  • Advantages of Sqoop
  • Sqoop Processing
  • Sqoop Execution Process
  • Importing Data Using Sqoop
  • Sqoop Import Process
  • How to Import data to Hive and HBase?
  • How to Export Data from Hadoop using Sqoop?
  • Sqoop Alternative
  • Sqoop Connector




  • Introduction to Flume
  • Introduction to Apache Flume
  • Flume Model
  • Flume Goals
  • Scalability in Flume
  • Flume Data Integration
  • Flume Installation on Single Node and Multinode Cluster
  • Flume Architecture and various Components
  • Data Sources: Types and Variants
  • Data Target: Types and Variants
  • Deploying an agent onto a single node cluster
  • Problems associated with Flume
  • Interview questions based on Flume

   Oozie and Hue

  • Introduction to Apache Oozie
  • Oozie: Components
  • Oozie: Workflow
  • Scheduling with Oozie
  • Hands-on Training on Oozie Workflow
  • Oozie Coordinator
  • Oozie Commands
  • Oozie Web Console
  • Oozie for MapReduce
  • Hive in Oozie
  • An Overview of Hue
  • Hue in Real-time Scenarios
  • Use Cases in Hue


  • Understanding MongoDB
  • NoSQL Databases
  • JSON and BSON
  • Vertical and Horizontal Scaling
  • Data Types
  • MongoDB Tools
  • Collection and Database
  • Schema Design and Modeling
  • CRUD Operations in MongoDB
  • Indexing and Aggregation
  • Replication and Sharding
  • MongoDB Cluster Operations

   Spark & Scala

  • What is Apache Spark?
  • Spark Ecosystem
  • Components of Spark
  • History of Spark
  • Spark Versions and Releases
  • What is Scala?
  • Significance of Scala
  • Scala and Spark Interdependency
  • Spark Shell and Operations
  • RDD and Spark
  • MapReduce and RDD
  • Spark and Hadoop Integration
  • HDFS and Yarn

 Yarn Architecture

  • Introduction to YARN and MR2 daemons
  • Active and Standby Namenodes
  • Resource Manager and Application Master
  • Node Manager
  • Container Objects and Container
  • Namenode Federation
  • Cloudera Manager and Impala
  • Load balancing in cluster with namenode federation
  • Architectural differences between Hadoop 1.0 and 2.0

Basics of JAVA for Hadoop

  • The Java Virtual Machine
  • Variables
  • Data types
  • Constructs: Conditional and Looping
  • Types: Wrapper classes
  • Object-Oriented JAVA
  • Fields and Methods
  • Constructors
  • Overloading methods
  • Garbage collection
  • Nested classes
  • Overriding methods
  • Polymorphism
  • Making methods and classes final
  • Abstract classes and methods
  • Interfaces
  • Threads
  • Classes
  • The I/O Package
  • JAVA Security


"What our customers say"


Vignesh Raj


I am a QA professional looking to get trained in a niche technology and decided to pursue big data technology for it. I approached ETLHive to undergo training. The demo session with Vaibhav Bajaj gave me an insight into what to expect from the course and what exactly Big Data is. I enrolled for the course as the syllabus was structured such a way that it lead to an industry recognized certification (HDPCD).

Post training I have now realized the scope of big data and hadoop is an essential starting point. It has given me the knowledge and skill-set to proceed with enhancing my grip on big data technologies.
I would definitely recommend ETLHive trainings for every individual or corporate who want to receive professional and practical training close to real time environment.



Q: What if I do not know java and linux?

Ans: We will provide you complete support for java and linux.


Q: Would this course help me for CloudEra Certification?

Ans: We have designed our syllabus in such a way that it fits the suitability for CloudEra certified developer certification.


Q: Would this course help me for HortonWorks Certification?

Ans: We have designed our syllabus in such a way that it fits the suitability for HortonWorks certified developer certification.


Q: What if I miss a lecture?

Ans: We have multiple batches running simultaneously, so if u misses a lecture u can attend it in next lecture.


Q: Do you have multiple centers?

Ans: Yes, we have multiple centers.


Q: Would you provide Job assistance?

Ans: Your profile would be evaluated by experts, your resume would be rated and you would start getting calls after completion of your module.

companies that hire


  1. Freshers who have completed their Graduation
  2. Database Developers working with Oracle,Sybase DB2 etc
  3. Mainframe developers working with IBM mainframe or AS400
  4. Application Developers working with JAVA,C# etc
  5. ETL developers (informatica, datastage, cognos, etc)
  6. Testing professionals working with Manual or Automation testing
  7. SAP professionals






Big Data Hadoop Training in Pune

          Big Data Hadoop Training in Pune is available with Itelligence in different training formats. In our first format we provide hadoop training in classroom. In second format we offer hadoop online training, with the help of webinars, with high definition video and audio capability servers.  We have many training institutes distributed through the globe. In India we are offering big data and hadoop training in various areas of pune such as Pimple Saudagar,Kharadi,Vimannagar,Hadapsar,Nal Stop.
    We have designed this course in such a way that after completion you will have good  knowledge and skills to become hadoop developer. We have included deep concepts from Pig Latin, Hive, HBase and Zookeeper in our syllabus for hadoop training. Hadoop on amazon cloud is included in curse’s syllabus to clear deep concepts about cloud computing. Your skills will differentiate you from others. This course is designed for the professionals ambitious to make career in Bigdata. Freshers who have completed their Graduation, Database Developers working with Oracle, Sybase DB2 etc, Mainframe developers working with IBM mainframe or AS400, Application Developers working with JAVA, C# etc, ETL developers (informatica, datastage, cognos, etc), Testing professionals working with Manual or Automation testing, SAP professionals can join this course.
    We have 3+ years of experience in big data and hadoop training in pune. Likewise we have trained over 300 students in last 6 months worldwide and helped them to place in organizations having very high reputation. Our Indian branch for big data and hadoop training in pune is located centrally so you can reach to our institute from any corner of pune. But if you are not ready to travel or you are living outside of pune then hadoop online training through webinars is available for you. Online module is ready with high definition video and audio capabilities.
ETLHIVE is a Trademark of Itelligence Infotech Pvt Ltd.
21th JAN FREE DEMO|(P.SAUDAGAR) Hadoop-11:00 AM|Selenium-10:30AM|Informatica-03:30PM|AngularJs-04.30 PM |Java-04.00PM (NALSTOP)-Informatica-12.30PM | AngularJs-12.30PM|Data Science-11:00AM|(KHARADI)-Hadoop-02.00 PM |Informatica-04:00 PM | AngularJs 11:00 AM