Hadoop is an open source technology or framework
which is written in java by Apache software foundation. This framework is used
to write software applications which requires to process vast amount of data
(typically terabytes of data). This framework functions in parallel on large
clusters and each cluster may have thousands of nodes. Hadoop processes the
data very reliably and in a fault tolerant manner using simple programming
models. Hadoop is designed to scale up
from a single server to thousands of the machines, each offering us local
computation and storage.
There
are two core concepts in Hadoop i.e., HDFS (Hadoop distributed file system) and
Map-Reduce. Hadoop distributed is provided as file system which is capable of
storing huge amount of data. The Map-Reduce technology was introduced for
processing of such a huge data. So Hadoop is a combination of HDFS and
Map-Reduce. HDFS can also be defined as a specially designed file system for
storing huge data sets with cluster of commodity hardware and with streaming
access patterns.
As
Java uses the slogan “Write once run anywhere”, which means program written in
Java can be executed on any platform provided there is a Java environment on
that platform. HDFS also uses a slogan “Streaming access patterns” which means
write once, read any number of times and don’t try to change the contents of file,
once you are keeping data in HDFS.
This
Technology of Hadoop was introduced by Doug Cutting. A Hadoop doesn’t have any
expanding version like oops .The charming elephant we see is basically named
after Doug Cutting’s son toy elephant.
Hadoop Logo
Hadoop operates on massive datasets by horizontally scaling
(aka scaling out), the processing across very large numbers of servers through
an approach called MapReduce. Vertical scaling (aka scaling up), i.e., running
on the most powerful single server available, is both very expensive and
limiting. There is no single server available today or in the foreseeable
future that has the necessary power to process so much data in a timely manner.
Map-Reduce is a frame
work for processing such a vast amount of data by assigning data to number of
different processors which works in
parallel and gives the result in a timely manner.
No comments:
Post a Comment