Map reduce
is a technique for processing the huge data stored in Hadoop distributed
file system.The Map Reduce algorithm contains two important tasks, namely Map
and Reduce. The component Map takes a data set and converts it into
another data set where individual elements are splitted into key, value pairs
Then the reducer comes in picture whose task is to take the output from maps as
input and combine those inputs to generate final output. The number of maps
will be equal to the number of input splits.
There are basically four formats of a file:
1
TextInput Format
2
KeyValueTextInput Format
3
SequencefileInput Format
4
SequencefileAsTextIput Format .
5
TextInput Format is the default
format and the other three are explicitely specified in driver code for record
reader understanding. If file format is
TextInput Format then the record reader reads one line at a time from its
corresponding input split and it is converted into Block offset,entire line
pair as key, value pair. If file format is KeyValueTextInput Format then it
splits that key as per the basis of tab character.
Architecture of Map Reduce
No comments:
Post a Comment