Jan 152013

My third day at my thesis host company – NASDAQ OMX, Stockholm.

Big eyes.

Happy face.

Hands ready to code.

This is the spirit!

Feel like I will have a crazy 5 month of work with an amazing experience and hands/head ready to handle any problem.

Finally, going down to the thing I’m working on and the thesis topic:

GENIUM DataStore:Distriubted Data Store and Data Processing System.

The system should face very strict requirements of extremely low latency, high availability and scalability. When I’m talking about the low latency requirement, I mean a latency of 100 microseconds per an operation and 250000 transactions per seconds, which for now sounds quite insane:). Two main paradigms will be supported: storing and processing the data. Storing will be a kinda integration part, through which API an information can be stored to already internally used DBs, both relational and key-value. Processing part is similar to the stream processing system and be integrated with the storage part.

Building a system with a high requirements on consistency(C) and fault-tolerance(FT) is supposed to be a pain in the rear part of the body, but using the internally developed libraries for C and FT support will make my life way much easier. However, the performance goals can be a real challenge.

The main goal is not to make a system to have a variety of features and be “almost stable”version, but to have less features and be a stable release. Another side of the work here, is that whatever you are building should be well thought from the perspective of having critical data and a great responsibility for making mistakes. Hooorayyy…

Thesis structure. Preliminary thoughts.

I was also thinking on the structure of the final document already and it is actually quite hard to think of any at this point. I still sure that my vision will change, however here it is:

  • Abstract (Cap. Obvious :D)
  • Introduction

Structure of the Document

  • Background and Related Work

NASDAQ INET (just some words to make at least unclear pic of the base system for GENIUM DS)
Distributed Data Storage Systems
Distributed Data Stream Processing Systems

  • GENIUM DataStore

NASDAQ OMX Domain (What is NASDAQ, what data is used, volume of the data to be processed)
Architecture (and Reasoning for such architecture)
Fault Resilience
Have no idea what more… but for sure should be something

  • Implementation Details (I’m sure that this part will be neseccary, as the coding will be my main occupation these days:))

Failure Scenarios

  • Experimental Results

Set Up
Scalability and Performance

  • Discussion

Main findings
Low Latency
Positioning??? (can’t find any more suitable word… will think on it… one day:))
Comparison to other existing systems (if possible)
Future Work

  • Conclusions
  • References

Let’s see what is going to be in 1 month:)

Finally, I will hope not to become a turtle below and behave:)

Julia is not a turtle 🙂