Big Data Training Topics


Technology has developed in an exponential scale and this demands the IT industry growth and in turns the knowledge of employees to run with technology advancement to meet the challenges. In this context, Hadoop plays leading role. Hadoop is an open source java based programming framework which supports processing of large data in different distributed networking environment. It has the capacity to tolerate high degree of errors. It has the capacity to run on systems which involves large number of nodes and trillion bytes of memory.

The distributed file system provides higher data transfer rates and allows continuous operations in case of a node failure. This approach reduces the risk of system failure even if significant number of node is affected. Big data is too large to process the data using traditional methods. Big data is difficult to work with using most relational database management systems and desktop statistics and visualization packages, requiring instead “massively parallel software running on tens, hundreds, or even thousands of servers”.

Credo Systemz is best provider of big data training in chennai. Having all the above advancements in mind, we should agree on one thing that is associates should know the ins and outs of rapid growing technology. We are here to serve the demanding needs. Credo Systemz is the Best big data training in Chennai and we have designed the syllabus in such a way to meet all the requirements. We have framed the best big data training in chennai course syllabus that facilitates the needs of beginners and advanced levels professionals.

CREDO Systemz (best Big Data training in chennai) also provides the Best big data online training in chennai. Best hadoop online training in chennai by experienced professionals with top placements and 100% certification track record.

Big Data Syllabus:


Section1: INTRODUCTION TO BIG DATA-HADOOP
  • Overview of Hadoop Ecosystem
  • Role of Hadoop in Big data– Overview of Big Data Systems
  • Who is using Hadoop
  • Hadoop integrations into Exiting Software Products
  • Current Scenario in Hadoop Ecosystem
  • Installation
  • Configuration
  • Use Cases ofHadoop (HealthCare, Retail, Telecom)
Section2: HDFS
  • Concepts
  • Architecture
  • Data Flow (File Read , File Write)
  • Fault Tolerance
  • Shell Commands
  • Data Flow Archives
  • Coherency -Data Integrity
  • Role of Secondary NameNode
Section3: MAPREDUCE
  • Theory
  • Data Flow (Map – Shuffle – Reduce)
  • MapRed vs MapReduce APIs
  • Programming [Mapper, Reducer, Combiner, Partitioner]
  • Writables
  • InputFormat
  • Outputformat
  • Streaming API using python
  • Inherent Failure Handling using Speculative Execution
  • Magic of Shuffle Phase
  • FileFormats
  • Sequence Files
Section4: HBASE
  • Introduction to NoSQL
  • CAP Theorem
  • Classification of NoSQL
  • Hbase and RDBMS
  • HBASE and HDFS
  • Architecture (Read Path, Write Path, Compactions, Splits)
  • Installation
  • Configuration
  • Role of Zookeeper
  • HBase Shell Introduction to Filters
  • RowKeyDesign -What’s New in HBase Hands On
Section5: HIVE
  • Architecture
  • Installation
  • Configuration
  • Hive vs RDBMS
  • Tables
  • DDL
  • DML
  • UDF
  • Partitioning
  • Bucketing
  • Hive functions
  • Date functions
  • String functions
  • Cast function Meta Store
  • Joins
  • Real-time HQL will be shared along with database migration project
Section6: PIG
  • Architecture
  • Installation
  • Hive vs Pig
  • Pig Latin Syntax
  • Data Types
  • Functions (Eval, Load/Store, String, DateTime)
  • Joins
  • UDFs- Performance
  • Troubleshooting
  • Commonly Used Functions
Section7: SQOOP
  • Architecture,Installation,Commands(Hive-Import,EVal,Hbase Import,Import All tables,Export)
  • Connectors to Existing DBs and DW
Section8: KAFKA
  • Kafka introduction
  • Data streaming Introduction
  • Producer-consumer-topics
  • Brokers
  • Partitions
  • Unix Streaming via kafka
Section9: OOZIE
  • Architecture
  • Installation
  • Workflow
  • Coordinator
  • Action (Mapreduce, Hive, Pig, Sqoop)
  • Introduction to Bundle
  • Mail Notifications
Section10: HADOOP 2.0 AND SPARK
  • Limitations in Hadoop
  • 1.0 – HDFS Federation
  • High Availability in HDFS
  • HDFS Snapshots
  • Other Improvements in HDFS2
  • Introduction to YARN aka MR2
  • Limitations in MR1
  • Architecture of YARN
  • MapReduce Job Flow in YARN
  • Introduction to Stinger Initiative and Tez
  • BackWard Compatibility for Hadoop 1.X
  • Spark Fundamentals
  • RDD- Sample Scala Program- Spark Streaming

Big Data Training Course Content – FREE PDF DOWNLOAD