Pregel is a system for processing large-scale graphs. The main contributions of the paper are the following: 1) fault-tolerant distributed programming framework for graph algorithms execution; 2)API with direct message passing among vertices.
Pregel is a synchronized computation process on vertices. Upon inserting a graph as an input, the graph is divided into partitions (using consistent hashing), which include a set of vertices and their outgoing edges. One of the machine is coordinator. The workers then undergo a series of iterations, called supersteps. In every superstep, all vertices in each worker execute the same user-defined function which can (a) receive messages sent during the previous superstep, (b) modify the state of the vertex and its outgoing edges (vertices and edges are kept on the machines) and (c) send messages to be delivered during the next superstep. At the end of each superstep a global synchronization point occurs. Vertices can become inactive and the sequence of iterations terminates when all vertices are inactive and there are no messages in transit. During computation, the master also sends ping messages to check for workers failures. The network is used only for sending messages and therefore it significantly reduces the communication overhead, becoming more efficient.
The presentation can be found here:
- Transparency in vertex-to-machine assigning
Not good/Open questions/Future improvements:
- How to detect a master failure?
- Why not to compare with MapReduce comparison in evaluation part.
- Failure Detection needed to be improved to confined recovery.
- Improve automatisation for defining user-defined functions
- hash(VertexID) mod Partition – Why not to use smarter vertex assigning (to reduce then network usage)?