Mar 042013

It is quite a preliminary version of the problem description, i.e. motivation.

Again, any comments are more than welcome 🙂

Problem description

There are many existing distributed systems (DS) which are focused on optimization of the various systems properties, e.g. availability, robustness, consistency. Designing of a distributed data storage and data processing system for real time stock exchange environment is quite challenging and should meet strict SLA requirements. Current general purpose solutions are eager to sacrifice some properties in order to achieve great improvements in the other ones. Moreover, none of them leverages a uniform reliable total order multicast properties [] to supply fault-tolerant and ACID properties for the data operations. (Here a few paragraphs with some basic classification of DSS and their solution focus).

However, despite algorithmic advancements in total order broadcast and the developments of distributed database replication techniques based on it, limited research on applying these algorithms for large-scale data storage and data processing systems exists. (Here are a few sentences about total order algorithms and its application). Limited application in the real-time large-scale systems might be due to the previous scalability issues of the messaging systems, which was limited to the messaging bus capacity.

We are proposing a system, based on the NASDAQ OMX low latency uniform reliable totally ordered message bus, which is highly scalable, as the capacity of the message bus exceeds 2 million messages per second, available, and consistent. This messaging abstraction interprets unordered incoming stream of data into an ordered sequence of operation which are backed up by rewinders and therefore message gap-filling mechanism automatically supported and served by them. An ordered stream of data is published on the, so called, “message stream” and is seen by everyone on the stream. Based on this message bus, optimistic delivery can be assumed. In other words, an early indication of the estimated uniform total order is preserved and it is guaranteed to commit eventually all messages in the same order to all subscribed servers.

The main focus of this work is the leverage of reliable total order multicast protocol for building real time, fault-tolerant, ACID and low-latency distributed data store system. The major difficulty is to be able to guarantee fault-tolerance, availability for the system and ACID properties for the data operations. Moreover, supporting system in real time is challenging and maintaining distributed read queries and concurrent updates is no straightforward endeavor. To reach the performance goals, the following approach is applied:

  • Scalability: Adding extra instances on the stream is very easy. Therefore, the only thing that is required is to declare schemas and tables that are served by the data store.
  • Availability: Ability to serve request at any given time is provided for both simple operations and queries. First, capacity of the message bus can handle simple operations without extra tweaks. Second, read queries responses are sent directly to the requester and are served by the fastest data replica.
  • Consistency: As the underlying message passing abstraction produces a uniform reliable totally ordered stream of requests, each instance sees exactly the same sequence of messages. This gives a consistent view by any instance at any request time. Similarly for concurrent updates, totally ordered timestamps per update are used, hence timestamp concurrency control [] is deployed.
  • Fault-Resilience: As absolutely equal stream of requests are received by any of the replica, this way, failure of any instance during simple operations is not important. Failure of the data store during the query serving is handled by the simple snapshot indication message on the message stream. This way the query can be requested again from the fracture place.
  • Read Query Support: In order to increase the availability level, limitation on the query response is set. If the extension of the response is required, the query should be submitted again.