Apache Cassandra

Cassandra - A Decentralized Structured Storage System

Cassandra

Cassandra is a distributed storage system for managing very large amount of structured data spread out across many servers with high availability and no single point of failure. The system doesn’t support a ful relational data model.

Related Works:

  • Farsite is the system without any centralized server.
  • GFS is another DFS built for hosting the state of Google’s internal applications. But it uses single master server for metadata. but somehow it’s fault tolerant by means of Chubby abstraction.
  • Dynamo allows read and write operations and resolves update conflicts. Dynamo uses Gossip based membership algorithm. It is actually structures overlay with at most one hop request routing. It detects updates conflicts using vector clock scheme.

Data Model:

DHT in Cassandra is a distributed multi dimensional map indexed by a key. The row key in a table is a string with no size restrictions. Every operation under a single row key is atomic per replica. Columns a re grouped into column families: simple and super column families. Columns could be sorted by time or by name.

API in Casandra consist of three simple methods: insert, get, delete.

Architecture:

Distributed system techniques used in the system:

  • Partitioning. It’s done by means of consistent hashing (order preserving hash function). As far as I understood in Cassandra, to deal with non-uniform data and load distribution and heterogeneity of nodes performance, developers applies nodes moving in the ring, according to their load.
  • Replication. Each data replicated at N hosts, where N is a replication factor per instance. Coordinator is in charge of replication. Replication policies: Rack Unaware, Rack Aware, Datacenter Aware. For the last two policies Cassandra uses Zookeeper.
  • Membership is based on Scuttlebutt, a very efficient anti-entropy Gossip based mechanism.
  • Failure handling. Cassandra is using a modified version of Ф Accrual Failure Detector with Exponential Distribution for inter-arrival time for other nodes gossip messages. Ф is a value which represent a suspicion level for each of monitored nodes. The more Ф the less likelihood % of mistake we will do in the future about its failing.
  • Bootstrapping. For the fault tolerance, the mapping is persisted to disk locally and also in Zookeeper. Then the token is gossiped around the cluster. When a node needs to join a cluster, it reads its configuration file with a few contact points (seeds) within the cluster.
  • Scaling. Each new node is assigned with a token to alleviate a heavily loaded nodes.
  • Read/write requests. Write is a write to into a commit log. Writing to in-memory is performed only after successful write to commit log. When in-memory is filled it dumps itself to disk. Read operation queries the in-memory data structure before looking into the file in disk. Lookup a key could be done through many data files. But here developers use bloom filter to summarize the keys in the file.

Advantages.

  • Linear scalability and fault-tolerance. No single point of failure. Map/reduce possible with Apache Hadoop.

Disadvantages.

  • Writes are faster than reads. Limitation for column and row size in the system. Cassandra is a bad fit for a large objects.

Volunteer Computing Extension.

  • The main feature of the system is to perform writes faster that reads. It could be a first barrier for VC application, as the main idea of storage systems in general to perform reads more that writes. From the other side, as the system is highly scalable, VC availability feature could be solved through replication techniques. Finally, fault-tolerance could be handled quite good even with gossip algorithm application. But, from the other point of view, Cassandra has limitation on the object size, hence, big object couldn’t be stored in the system. That’s why all files should be partitioned. Overall, due to good replication policy and fault-tolerance techniques the system could be quite good platform for future VC application.

Cassandra-by-Example

Cassandra:Replication and Consistency

Cassandra Open Source

Cassandra Distributed Database

Decentralized Storage System

decentralized_storage_systems/cassandra.txt · Last modified: 2012/04/24 16:19 by julia
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki