Feb 182013
 

Any suggestion on classification and extra Distributed Data Stores to review are more than welcome 🙂

Recently there has been increasing interest in NOSQL data storage to meet the highly intense demand of the applications. Representative work includes Bigtable, Cassandra and Yahoo PNUTS. In these systems, scalability is achieved by sacrificing some properties, e.g. transactions support. On the other side, most prevailing data storage systems use asynchronous replication schemes with a  weaker consistency model, e.g., Cassandra, HBase, CouchDB and Dynamo use an  eventual consistency model. Conventional database systems provide mature and sophisticated data management features, ut have difficulties is serving large-scale interactive applications. Open source database systems such as MySQL do not scale up to required levels, while expensive commercial database systems like Oracle significantly increase the total cost of ownership in large deployments. Moreover, neither of them offer fault-tolerant synchronous replication mechanism which is the key piece to build robust applications.

Cassandra

  Review follows in the next posts… Some info here

Yahoo PNUTS

  Review follows in the next posts…

Combining the merit from both scalable data stores and databases, Genium Data Store (GDS) provides ACID guarantees with high scalability, fault-tolerance, consistency and availability. However in case of GDS, wide-area network semantic is not taken into account, as the range of applications, that will use GDS, do not require wide-area replication.

To guarantee consistency a few systems use Paxos to achieve synchronous replication, e.g. SCALARIS, Keyspace, Megastore.

Megastore

  Review follows in the next posts…

SCALARIS

  Review follows in the next posts…

Keyspace

  Review follows in the next posts…

In a chase for latency, MySQL Cluster is the one that can meet our requirements, however …. (should be something) 🙂

MySQL Cluster

  Review follows in the next posts…

Redis, ElasticSearch, Spanner, BlinkDB, God ….

 

Also I was thinking on the following classification:

  • Wide-Area Deployment. Those which are trying to solve wide-range synchronization
  • Short-Area Deployment. This is opposite to the above one.
  • Chase for latency

The main reason for this classification is that my project is not concerned about wide-area deployment.

  2 Responses to “Thesis Report: Background Draft, Distributed Data Store”

  1. Just a thought, Dynamo (and BigTable) inspired Cassandra, maybe worthwhile checking it out http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html. glhf