Mar 042013
 

It is quite a preliminary version of the problem description, i.e. motivation.

Again, any comments are more than welcome 🙂

Problem description

There are many existing distributed systems (DS) which are focused on optimization of the various systems properties, e.g. availability, robustness, consistency. Designing of a distributed data storage and data processing system for real time stock exchange environment is quite challenging and should meet strict SLA requirements. Current general purpose solutions are eager to sacrifice some properties in order to achieve great improvements in the other ones. Moreover, none of them leverages a uniform reliable total order multicast properties [] to supply fault-tolerant and ACID properties for the data operations. (Here a few paragraphs with some basic classification of DSS and their solution focus).

However, despite algorithmic advancements in total order broadcast and the developments of distributed database replication techniques based on it, limited research on applying these algorithms for large-scale data storage and data processing systems exists. (Here are a few sentences about total order algorithms and its application). Limited application in the real-time large-scale systems might be due to the previous scalability issues of the messaging systems, which was limited to the messaging bus capacity.

We are proposing a system, based on the NASDAQ OMX low latency uniform reliable totally ordered message bus, which is highly scalable, as the capacity of the message bus exceeds 2 million messages per second, available, and consistent. This messaging abstraction interprets unordered incoming stream of data into an ordered sequence of operation which are backed up by rewinders and therefore message gap-filling mechanism automatically supported and served by them. An ordered stream of data is published on the, so called, “message stream” and is seen by everyone on the stream. Based on this message bus, optimistic delivery can be assumed. In other words, an early indication of the estimated uniform total order is preserved and it is guaranteed to commit eventually all messages in the same order to all subscribed servers.

The main focus of this work is the leverage of reliable total order multicast protocol for building real time, fault-tolerant, ACID and low-latency distributed data store system. The major difficulty is to be able to guarantee fault-tolerance, availability for the system and ACID properties for the data operations. Moreover, supporting system in real time is challenging and maintaining distributed read queries and concurrent updates is no straightforward endeavor. To reach the performance goals, the following approach is applied:

  • Scalability: Adding extra instances on the stream is very easy. Therefore, the only thing that is required is to declare schemas and tables that are served by the data store.
  • Availability: Ability to serve request at any given time is provided for both simple operations and queries. First, capacity of the message bus can handle simple operations without extra tweaks. Second, read queries responses are sent directly to the requester and are served by the fastest data replica.
  • Consistency: As the underlying message passing abstraction produces a uniform reliable totally ordered stream of requests, each instance sees exactly the same sequence of messages. This gives a consistent view by any instance at any request time. Similarly for concurrent updates, totally ordered timestamps per update are used, hence timestamp concurrency control [] is deployed.
  • Fault-Resilience: As absolutely equal stream of requests are received by any of the replica, this way, failure of any instance during simple operations is not important. Failure of the data store during the query serving is handled by the simple snapshot indication message on the message stream. This way the query can be requested again from the fracture place.
  • Read Query Support: In order to increase the availability level, limitation on the query response is set. If the extension of the response is required, the query should be submitted again.

 

Mar 022013
 

I think it is kind of time to start working on the report draft 🙂

Here is first version of an abstract for my project report. Any commects are more that welcome!

Abstract

In recent years the need for distributed, fault-tolerant, ACID and low latency data storage and data processing systems has led the way for new systems in the area of distributed systems. The growth of unbounded streams of data and the need to process them with low latency are some of the reasons for such interest in this area. At the same time, it was discussed that a total order algorithms is a fundamental building block in construction of a distributed fault-tolerant applications.

In this work, we are leveraging NASDAQ OMX low-latency uniform reliable totally ordered message bus with a capacity of 2 million messages per second. The ACID properties of the data operations are easily implemented using the messaging bus as it forwards all transactions in reliable total order fashion. Moreover, relying on the reliable totally ordered messaging, active replication support for fault handling and load balancing is integrated. Consequently, the prototype was developed using requirements from a production environment to demonstrate its feasibility.

Experimental results show that around 250 000 operations per second can be served with 100 microseconds latency. Queries response capacity is 100 Mbps. It was concluded that uniform totally ordered sequenced input data can be used in real time for large-scale distributed data storage and processing systems to provide availability, consistency and high performance.

Feb 182013
 

Any suggestion on classification and extra Distributed Data Stores to review are more than welcome 🙂

Recently there has been increasing interest in NOSQL data storage to meet the highly intense demand of the applications. Representative work includes Bigtable, Cassandra and Yahoo PNUTS. In these systems, scalability is achieved by sacrificing some properties, e.g. transactions support. On the other side, most prevailing data storage systems use asynchronous replication schemes with a  weaker consistency model, e.g., Cassandra, HBase, CouchDB and Dynamo use an  eventual consistency model. Conventional database systems provide mature and sophisticated data management features, ut have difficulties is serving large-scale interactive applications. Open source database systems such as MySQL do not scale up to required levels, while expensive commercial database systems like Oracle significantly increase the total cost of ownership in large deployments. Moreover, neither of them offer fault-tolerant synchronous replication mechanism which is the key piece to build robust applications.

Cassandra

  Review follows in the next posts… Some info here

Yahoo PNUTS

  Review follows in the next posts…

Combining the merit from both scalable data stores and databases, Genium Data Store (GDS) provides ACID guarantees with high scalability, fault-tolerance, consistency and availability. However in case of GDS, wide-area network semantic is not taken into account, as the range of applications, that will use GDS, do not require wide-area replication.

To guarantee consistency a few systems use Paxos to achieve synchronous replication, e.g. SCALARIS, Keyspace, Megastore.

Megastore

  Review follows in the next posts…

SCALARIS

  Review follows in the next posts…

Keyspace

  Review follows in the next posts…

In a chase for latency, MySQL Cluster is the one that can meet our requirements, however …. (should be something) 🙂

MySQL Cluster

  Review follows in the next posts…

Redis, ElasticSearch, Spanner, BlinkDB, God ….

 

Also I was thinking on the following classification:

  • Wide-Area Deployment. Those which are trying to solve wide-range synchronization
  • Short-Area Deployment. This is opposite to the above one.
  • Chase for latency

The main reason for this classification is that my project is not concerned about wide-area deployment.