Big Data Training Topics


Technology has developed in an exponential scale and this demands the IT industry growth and in turns the knowledge of employees to run with technology advancement to meet the challenges. In this context, Hadoop plays leading role. Hadoop is an open source java based programming framework which supports processing of large data in different distributed networking environment. It has the capacity to tolerate high degree of errors. It has the capacity to run on systems which involves large number of nodes and trillion bytes of memory. The distributed file system provides higher data transfer rates and allows continuous operations in case of a node failure. This approach reduces the risk of system failure even if significant number of node is affected.

Credo Systemz is the best big data training provider in chennai.Having all the above advancements in mind, we should agree on one thing that is associates should know the ins and outs of rapid growing technology. We are here to serve the demanding needs. Best big data training in Chennai we are one of the leading providers of Best big data training in chennai and we have designed the syllabus in such a way to meet all the requirements. We have framed the syllabus that facilitates the needs of beginners and advanced levels professionals.

CREDO Systemz (best hadoop training institute in chennai) provides the Best big data training in chennai / best Hadoop Training in Chennai. Best big data training in chennai by experienced professionals with top placements and 100% certification track record.Big data is data that is too large to process using traditional methods.Big data is difficult to work with using most relational database management systems and desktop statistics and visualization packages, requiring instead “massively parallel software running on tens, hundreds, or even thousands of servers”.

Big Data Syllabus:


Section1: INTRODUCTION TO BIG DATA-HADOOP
  • Overview of Hadoop Ecosystem
  • Role of Hadoop in Big data– Overview of other Big Data Systems
  • Who is using Hadoop
  • Hadoop integrations into Exiting Software Products
  • Current Scenario in Hadoop Ecosystem
  • Installation
  • Configuration
  • Use Cases ofHadoop (HealthCare, Retail, Telecom)
Section2: HDFS
  • Concepts
  • Architecture
  • Data Flow (File Read , File Write)
  • Fault Tolerance
  • Shell Commands
  • Data Flow Archives
  • Coherency -Data Integrity
  • Role of Secondary NameNode
Section3: MAPREDUCE
  • Theory
  • Data Flow (Map – Shuffle – Reduce)
  • MapRed vs MapReduce APIs
  • Programming [Mapper, Reducer, Combiner, Partitioner]
  • Writables
  • InputFormat
  • Outputformat
  • Streaming API using python
  • Inherent Failure Handling using Speculative Execution
  • Magic of Shuffle Phase
  • FileFormats
  • Sequence Files
Section4: HBASE
  • Introduction to NoSQL
  • CAP Theorem
  • Classification of NoSQL
  • Hbase and RDBMS
  • HBASE and HDFS
  • Architecture (Read Path, Write Path, Compactions, Splits)
  • Installation
  • Configuration
  • Role of Zookeeper
  • HBase Shell Introduction to Filters
  • RowKeyDesign -What’s New in HBase Hands On
Section5: HIVE
  • Architecture
  • Installation
  • Configuration
  • Hive vs RDBMS
  • Tables
  • DDL
  • DML
  • UDF
  • Partitioning
  • Bucketing
  • Hive functions
  • Date functions
  • String functions
  • Cast function Meta Store
  • Joins
  • Real-time HQL will be shared along with database migration project
Section6: PIG
  • Architecture
  • Installation
  • Hive vs Pig
  • Pig Latin Syntax
  • Data Types
  • Functions (Eval, Load/Store, String, DateTime)
  • Joins
  • UDFs- Performance
  • Troubleshooting
  • Commonly Used Functions
Section7: SQOOP
  • Architecture , Installation, Commands(Import , Hive-Import, EVal, Hbase Import, Import All tables, Export)
  • Connectors to Existing DBs and DW
Section8: KAFKA
  • Kafka introduction
  • Data streaming Introduction
  • Producer-consumer-topics
  • Brokers
  • Partitions
  • Unix Streaming via kafka
Section9: OOZIE
  • Architecture
  • Installation
  • Workflow
  • Coordinator
  • Action (Mapreduce, Hive, Pig, Sqoop)
  • Introduction to Bundle
  • Mail Notifications
Section10: HADOOP 2.0 AND SPARK
  • Limitations in Hadoop
  • 1.0 – HDFS Federation
  • High Availability in HDFS
  • HDFS Snapshots
  • Other Improvements in HDFS2
  • Introduction to YARN aka MR2
  • Limitations in MR1
  • Architecture of YARN
  • MapReduce Job Flow in YARN
  • Introduction to Stinger Initiative and Tez
  • BackWard Compatibility for Hadoop 1.X
  • Spark Fundamentals
  • RDD- Sample Scala Program- Spark Streaming

Big Data Training Course Content – FREE PDF DOWNLOAD
Book your FREE DEMO session with our Hadoop Trainer – Contact Us