
Course Contents:
1. Overview of Big Data and Hadoop
Introduction to big data, limitations of existing solutions,Hadoop architecture, HDFS design, HA name node, MapReduce design, Hadoop deamons
2. Hadoop Installation and Mapreduce
Hadoop server roles and their usage, Hadoop installation and initial configuration, deploying Hadoop in a single node, deploying a multi-node Hadoop cluster, Installing Hadoop Clients, working with HDFS, understanding MapReduce.
3. Hadoop Administration
File System Image and Edit Log, Checkpoint Procedure ,The Checkpoint Procedure (contd.) , NameNode Failure and Recovery Procedure, Safe Mode, Metadata and Data Backup, Adding or removing of nodes, schedulers and enabling schedulers
4. Hadoop Cluster Planning
Planning the Hadoop cluster, cluster sizing, hardware, network and software considerations, popular Hadoop distributions, Balanced and Unbalanced Cluster. Cluster commands.
5. Important Hadoop components
Hive ,Hive vs. Other Traditional Databases ,Hive Data Types,Prerequisites for Hive ,Installing Hive ,Installing Hive from Tarball ,Configuring Hive ,Hive site , Hive default . xml . template ,Log Files , Hive Configuration Variables, Hive Configuration Variables Used to Interact with Hadoop,
Pig, Prerequisites for Pig , Installing Pig, Useful Commands for Pig ,Configuring Pig
6. Hadoop Ecosystem
Ganglia, Nagios , Sqoop and Other miscellaneous componenets viz Oozie, Avro, Mahout etc.,Hadoop security using kerberos.