Monday, 14 November 2016

What is Hadoop?

Hadoop is an open source technology or framework which is written in java by Apache software foundation. This framework is used to write software applications which requires to process vast amount of data (typically terabytes of data). This framework functions in parallel on large clusters and each cluster may have thousands of nodes. Hadoop processes the data very reliably and in a fault tolerant manner using simple programming models.  Hadoop is designed to scale up from a single server to thousands of the machines, each offering us local computation and storage.

There are two core concepts in Hadoop i.e., HDFS (Hadoop distributed file system) and Map-Reduce. Hadoop distributed is provided as file system which is capable of storing huge amount of data. The Map-Reduce technology was introduced for processing of such a huge data. So Hadoop is a combination of HDFS and Map-Reduce. HDFS can also be defined as a specially designed file system for storing huge data sets with cluster of commodity hardware and with streaming access patterns.
As Java uses the slogan “Write once run anywhere”, which means program written in Java can be executed on any platform provided there is a Java environment on that platform. HDFS also uses a slogan “Streaming access patterns” which means write once, read any number of times and don’t try to change the contents of file, once you are keeping data in HDFS.

This Technology of Hadoop was introduced by Doug Cutting. A Hadoop doesn’t have any expanding version like oops .The charming elephant we see is basically named after Doug Cutting’s son toy elephant.

 
                                             Hadoop Logo


Hadoop operates on massive datasets by horizontally scaling (aka scaling out), the processing across very large numbers of servers through an approach called MapReduce. Vertical scaling (aka scaling up), i.e., running on the most powerful single server available, is both very expensive and limiting. There is no single server available today or in the foreseeable future that has the necessary power to process so much data in a timely manner.
Map-Reduce is a frame work for processing such a vast amount of data by assigning data to number of different processors which works in  parallel and gives the result in a timely manner.

No comments:

Post a Comment