Tuesday 15 November 2016

MAP REDUCE

Map reduce  is a technique for processing the huge data stored in Hadoop distributed file system.The Map Reduce algorithm contains two important tasks, namely Map and Reduce. The component Map takes a data set and converts it into another data set where individual elements are splitted into key, value pairs Then the reducer comes in picture whose task is to take the output from maps as input and combine those inputs to generate final output. The number of maps will be equal to the number of input splits.
There are basically four formats of a file:
1              TextInput Format
2              KeyValueTextInput Format
3              SequencefileInput Format
4              SequencefileAsTextIput Format .

5              TextInput Format is the default format and the other three are explicitely specified in driver code for record reader understanding. If  file format is TextInput Format then the record reader reads one line at a time from its corresponding input split and it is converted into Block offset,entire line pair as key, value pair. If file format is KeyValueTextInput Format then it splits that key as per the basis of tab character.

 


Architecture of Map Reduce

No comments:

Post a Comment