MapReduce

Simplified data Processing on Large Clusters

 

BY Iuliia Proskurnia

Problem?

Data

Logging

Problem?

Computation

Huge Graph

Just because you can...

Just because you can

Solution

 

Simple and Powerful interface.

 

Enables automatic parallelization and distribution of large-scale computations

 

Map/Reduce primitives from LISP

Simple Computation

Meny

Hide unnecessary complexity

Hiding unnecessary complexity

How it actually works

Input -> Output

How it actually works

Map function

How it actually works

Reduce function

How it actually works?

Example


 map(String key, String value): 
 // key: document name
 // value: document contents 
 for each word w in value:
      EmitIntermediate(w, "1");

 reduce(String key, Iterator values): 
 // key: a word
 // values: a list of counts
    int result = 0;
    for each v in values:
      result += ParseInt(v);
    Emit(AsString(result));
					

Other Examples

 

  • Distributed Grep
  •  

  • Count of URI Access Frequency
  •  

  • Reverse Web-Link Graph
  •  

  • Inverted Index

MapReduce Instances Over Time

Implementation Details

Input -> Output

Conclusions

MapReduce at Google

 

Easy to use

Scalability

Problems are expressible through MapReduce

Conclusions

Lessons Learned

 

Restricting programming model could make life easier

Network Bandwidth is a scarce resource

Redundant execution could help with failures

Conclusions

MapReduce at Google

  • Easy to use
  • Scalability
  • Problems are expressible through MapReduce

 

Lessons Learned

  • Restricting programming model could make life easier
  • Network Bandwidth is a scarce resource
  • Redundant execution could help with failures

By Iuliia Proskurnia
blog.proskurnia.in.ua:)