Nov 082012
 

In the field of Distributed Data-Storage is it almost impossible to come up with universal system that will satisfy all needs. That’s why, recently, various distributed storage systems appear to face different needs and use different approaches.

DynamoDB uses a key-value interface with only replication within a region. I haven’t checked myself the latencies range, but from its website latency is varies witing single digit milliseconds, what is at least 10 times more that I want to reach in the thesis system.

Megastore doesn’t reach a great performance because it is based on the Bigtable (with high communication cost), however it is scalalble and consistent. Synchronization for wide area reslication is done with Paxos.  Taking into account scalbility, consistency and faults priviledges, latency is sacrifiesed and is witing 100-400 milliseconds for reads and writes.

Scatter is a DHT-based key-value store that layers transactions on top of consistent replication (uses a low level interface). Even though it provides high availaility and scales well, still latencies for the operations are witin milliseconds.

VoltDB is an in-memory db that support master-slave replication over wide area range.

Cassandra is an column based storage developed and used by Facebook with reads within milliseconds.

Spanner provides semi-relational data model support and provides high performance, high level interface, general-purpose transactions and external consistency (using GPS and atomics clocks with new concept of time leases: TrueTime). Spanner also integrates concurrency control with replication. The main contribution of the paper is that the system solves the problem of wide-area replication system and that it implements globally synchronized timestamps (support strong consistency and linearizability for writes and snapshop isolation for reads). Good: TrueTime. Interleaving data. Atomic schema change. Snapshop reads for the past. Weak: Possible clocks uncertainty. Paxos groups are not reconfigurable. Read-Only transaction with trivial solution for executing reads (if there are a few Paxos groups, Spanner is not using communication within this groups and simply apply the latest timestamp on the read). Typical reads are near 10 ms and writes average is 100 ms.

Which characteristics can be sacrified in order to reach specific goals? The answer is: the system should be adopted as much as posible to the needs. Another thing when you are actually chasing for the latencies… Most probably rare DB will fit your requirements…

If it is not 90% well suited – Let the funny part start -> Do it yourself 🙂 Like me:))))

  One Response to “Distributed Data Store. What? How? Why? or Motivation…”

  1. […] The problem of choosing the most suitable distributed storage system is quite tricky and require some reading in the field. Some information on Storage System with their deep review, from my previous project on Decentralized Storage Systems, can be found on my wiki. Also a brief review of come hot systems are represented in my previous post. […]