Blog

Top Ten Difference Between Apache Hbase and Hive

 

S.NO
Apache Hive
Apache Hbase

1
Hive is Datawarehousing tool and used to process the data in hadoop and HDFS.Hive is similar to SQL because it analyze and process the data with querying language.
Apache Hbase is open source framework and it is a NoSql Database.

2
Hive runs on MapReduce and top of the Hadoop
Hbase runs on top of the HDFS

3
Main […]

Apache Hadoop Oozie Tutorial

Introduction:
Oozie is mainly used to manages the hadoop jobs in HDFS and it combines the multiple jobs in particular order to achieve the big task. It is the open source framework and used to make multiple hadoop jobs. Oozie supports the jobs in mapreduce,hive and hdfs also. In Oozie job workflow based on Directed Acylic […]

Apache Spark Tutorial

What is Apche Spark?

Spark also open source framework and mainly used for data analytics. Spark runs more faster than hadoop and it designed on top of the hadoop. Spark does not have separate file system and it integrated with another one. Main feature of spark is does not use YARN for functioning.

Spark does not have […]

Apche Hadoop Flume Tutorial

What is Apache Flume?
Apache Flume is one tool and used to moving data from one place to another place.Flume is the distributed systems that transporting the data at reliable manner.Flume is most important part of hadoop ecosystem.In Apache flume all data unit consider as one event. It collecting log data from various web servers to […]

Apache Kafka Architecture and Components

What is Apache Kafka?

Kafka is designed for distributed systems.It mainly used to transferring data from hadoop using messaging system. Messaging system means tranferring data from one application to another one but it not consider about how to transfer data and it based on message queuing.There are two types of messaging system in kafka.

1.Point to Point

2.Publish […]

Apache Hadoop Hive Tutorial

What is Hive?
Hive is Datawarehousing tool and used to process the data in hadoop and HDFS.Hive is similar to SQL because it analyze and process the data with querying language.Hive runs on MapReduce and top of the hadoop.Hive also known as HiveQL.Main Fuctions of the hive is data summarisation,querying and analysis.

Features of Hive:

Hive processed […]

Roles And Responsibilities of Hadoop Administrator

Who is Hadoop Administrator?
Hadoop Administrator is maintains the hadoop cluster and manages the full resources of the hadoop. Administrator job not related to hadoop application developement. Main job of administrator is installing the hadoop cluster for company needed. If any error occurs in the hadoop cluster,administrator is full responsible for that problem.

Following roles and responsibilities […]

Difference Between NoSql Cassandra and Apache Hadoop

What is Cassandra?
Cassandra is the NoSql Database and it handles the more amount of data between multiple servers. It serves data from database to online transactional applications and business intelligence because cassandra is the open source database. Cassandra created by facebook and designed for peer to peer nodes. It partitions the data across hadoop cluster […]

CAP Theorem in Hadoop

What is CAP Theorem?
CAP theorem is designed for distributed file systems(collection of interconnected nodes).CAP Theorem also known as Brewer’s theorem and used to distributed consistency.It contains follwing three technical terms for distributed systems.
C – Consistency
A – Availability
P – Partition Tolerance

Consistency:
When you read data it will give same data how many times read and server send […]

How to Install Hadoop on Ubuntu

What is Hadoop?

Hadoop is the open source and java based framework.It is used to storing lage amount amount of data and having more components to accessing the data.In Hadoop installation java is most important because hadoop is java based framework.Here we are discuss about how to install hadoop on Ubuntu operating system.

Hadoop Having following three […]