Feb 282013

PNUTS is a parallel and geographically distributed database system for serving web application with per-record consistency guarantees. Main concerns in the system are availability and low latency, this way consistency is tuned according to the fault-resilience and geo-replication is supported for latency reduction during the multi continent reads.


PNUTS is focused on serving web application, rather that complex queries, this was a simple relational model is exposed to the users. To stabilize robustness of the systems, different levels of redundancy are used that exhaust consistency against high availability during failures. Pub/Sub paradigm (Yahoo! Message Broker) is used for the asynchronous operations.

Data is organized into tables of attributed records, where schemas are quire flexible and can be changed without halting any operations. Moreover, the query language is quite simple and limited. It is limited to the selection and simple projection from a single table. However, this provides more flexible access compared to the hashes and ordered data, as they claimed.

System Architecture

Data table are horizontal and partitioned into groups across different servers. To determine the location of the tablet record and storage unit, routers are used, which contains only of cached copy of the mapping, while tablet controller owns that mapping.

Replication is based on the asynchronous messaging and ensures low latency. Data updates are considered committed only if it is published to the Yahoo! Message broker (YMB). It is guaranteed that after the commitment other YMB, updates will be asynchronously propagated to the appropriate regions. However, this guarantees only partial ordering. Messages published to one YMB are guaranteed to be delivered in the arrival order, while messaged arrived to the different YMB instances can be delivered in any order. For this reason, timeline consistency is used for such commits. This consistency supported by the presence of per record master whose order is preserved to be the order of delivery on every replica. Based on the replication, faults recovery are maintained. Copying is used for the recovery of lost tablets. Also checkpoint mechanism is issued after the copy request to ensure applicability of the current updates.

The most interesting thing is a multiget processing, that is based on the scatter-gather engine and is a component of the router. Here, router splits the request into parts and then gather the results on arrival.

Strong points:

  • Asynchronous updates = low latency
  • Record level consistency
  • Flexible schemas
  • Multiget, which retrieved multiple records in parallel
  • Scalable
  • Various application

Weak points:

  • No complex queries support, joins, group by
  • No always the most current version of the data returned on the request
  • No referential integrity
  • No serializable transactions
  • Partial ordering of the transactions
  • Concurrent updates conflicts resolution is not highlighted
  • Per-record master is a mystery 🙂
  • Slow multiget parallel requests
  • Failure detection is preserved…