Blog

Apache Hadoop Oozie Tutorial

Introduction:
Oozie is mainly used to manages the hadoop jobs in HDFS and it combines the multiple jobs in particular order to achieve the big task. It is the open source framework and used to make multiple hadoop jobs. Oozie supports the jobs in mapreduce,hive and hdfs also. In Oozie job workflow based on Directed Acylic […]

Apache Spark Tutorial

What is Apche Spark?

Spark also open source framework and mainly used for data analytics. Spark runs more faster than hadoop and it designed on top of the hadoop. Spark does not have separate file system and it integrated with another one. Main feature of spark is does not use YARN for functioning.

Spark does not have […]

Apche Hadoop Flume Tutorial

What is Apache Flume?
Apache Flume is one tool and used to moving data from one place to another place.Flume is the distributed systems that transporting the data at reliable manner.Flume is most important part of hadoop ecosystem.In Apache flume all data unit consider as one event. It collecting log data from various web servers to […]

Apache Kafka Architecture and Components

What is Apache Kafka?

Kafka is designed for distributed systems.It mainly used to transfer data from Hadoop using the messaging system. Messaging system means transferring data from one application to another one but it does not consider how to transfer data and is based on message queuing.There are two types of messaging system in Kafka.

1.Point to […]

Apache Hadoop Hive Tutorial

What is Hive?

Hive is Data warehousing tool and used to process the data in Hadoop and HDFS.Hive is similar to SQL because it analyzes and processes the data with querying language. Hive runs on MapReduce and top of the Hadoop. Hive also knew as HiveQL.Main Functions of the hive is data summarization, querying, and analysis.
Recommended Reading […]

Roles And Responsibilities of Hadoop Administrator

Who is Hadoop Administrator?

Hadoop Administrator maintains the Hadoop cluster and manages the full resources of the Hadoop. Administrator job not related to Hadoop application development. The main job of the administrator is installing the Hadoop cluster for company needed. If any error occurs in the Hadoop cluster, an administrator is fully responsible for that problem.

Latest […]

Difference Between NoSql Cassandra and Apache Hadoop

What is Cassandra?

Cassandra is the NoSql Database and it handles the more amount of data between multiple servers. It serves data from database to online transactional applications and business intelligence because Cassandra is the open source database. Cassandra created by Facebook and designed for peer to peer nodes. It partitions the data across the Hadoop […]

CAP Theorem in Hadoop

What is CAP Theorem?
CAP theorem is designed for distributed file systems(collection of interconnected nodes).CAP Theorem also known as Brewer’s theorem and used to distributed consistency.It contains follwing three technical terms for distributed systems.
C – Consistency
A – Availability
P – Partition Tolerance

Consistency:
When you read data it will give same data how many times read and server send […]

How to Install Hadoop on Ubuntu

What is Hadoop?

Hadoop is the open source and java based framework.It is used to storing lage amount amount of data and having more components to accessing the data.In Hadoop installation java is most important because hadoop is java based framework.Here we are discuss about how to install hadoop on Ubuntu operating system.

Hadoop Having following three […]

Hadoop Ecosystem Tutorial

Meaning of Hadoop Ecosystem:

Hadoop ecosystem is not a service and programming, Hadoop ecosystem is the one type of platform which used to process a large amount of Hadoop Data.Hadoop ecosystem using HDFS and MapReduce for Storing and processing a large amount of data and also used Hive for querying the data.Hadoop Ecosystem consists of following […]