Dec 202016

Today I got twice lucky and got two paper accepted to one of the best conferences in my field: WWW’17.

Here I would like to talk about THE one that I worked the hardest during the last 6 month and that magically got accepted from the first try.

In the paper we present a detailed analysis of what distinguishes a successful online petition from a failed one. We study what are the effects of the social media and front page promotion on the petition’s performance and which models are best suited to model signature time evolution. 

Multidimensional time-series have been the subject of intense research over the last decades. However, applying classical time-series techniques to online content is challenging, as web data tends to have data quality issues and is often incomplete, noisy, or poorly aligned. In this paper, we tackle the problem of predicting the evolution of a time series of user activity on the web in a manner that is both accurate and interpretable, using related time series to produce a more accurate prediction. We test our methods in the context of predicting signatures for online petitions, using data from thousands of petitions posted on The Petition Site – one of the largest platforms of its kind. We observe that the success of these petitions is driven by a number external factors, including their promotion through social media channels and on the front page of the petitions platform. The interplay between these elements remains largely unexplored. The model we propose incorporates seasonality, aging effects, self-excitation, external shocks, and continuous effects. We also were careful to ensure that all model parameters have simple interpretations. We show through an extensive empirical evaluation that our model is significantly better at predicting the outcome of a petition than state-of-the-art techniques.

In short, there are a few cool findings that are worth checking out:

  1. It seems that social media (middle) has a prolonged impact on the signature counts, compared to the self excitation and front page effect. Moreover, prediction models are usually good to catch signature decay rather than rise (left).
    Influence function
  2. Various petitions (successful, failed, font-page promoted) has different signature gain evolution, i.e., (1) failed petitions exhibit strong daily fluctuations, while having the lowest intensity (1st, 2nd column), (2) front-page promoted petitions obtain their peak signature counts during the first hours of the promotion (around 2-3 day), which is similar to peak counts for the failed petitions (3rd column), (3) most of the successful petitions does not only acquire all the signatures during a few initial days but also have the longest aging of the popularity (4th column).
  3. Front page promotion seems to have a strong effect on the speed at which signatures are acquired. However, we show that being already successful is not sufficient to be promoted, thus, the statement “already successful petitions are promoted on the front page” does not hold.
  4. People are tweeting about the petitions that they sign, as well as their followers reciprocate to support those petitions. on the median, it takes about 15 minutes to tweet about the signature. In 26% of the cases with single human twitter accounts, petitions are signed only after a user tweet about, out of which about 30% happen after a retweet.
  5. Both petition’s signatures and tweets exhibit strong circadian nature.
  6. It is hard to distinguish between possible successful and failed petitions by the first 24 hours, since about 60% of the successful petitions have similar counterparts among the failed ones.
  7. We have collected the data about multiple petitions, their metadata, signatures, tweets and front page rankings. The data is very rich and still has a lot to discover 🙂

Overall, it was a great experience to do research with such an awesome team I had! Camera ready is soon to be attached 🙂

Jul 062013


I am a master of science… for the second time and now officially have three master diplomas (Ukrainian, Spanish and Swedish)…. A bit too much but I will manage:)

Getting down to business, the abstract for the thesis is the following:

In recent years the need for distributed data storage has led the way to design new systems in a large-scale environment. The growth of unbounded stream of data, the necessity to store and analyze it in real time, reliably, scalably and fast are the reasons for appearance of such systems in financial sector, stock exchange Nasdaq OMX especially.

Futhermore, internally designed totally ordered reliable message bus is used in Nasdaq OMX for almost all internal subsystems. Theoretical and practical extensive studies on reliable totally ordered multicast were made in academia and it was proven to serve as a fundamental block in construction of distributed fault-tolerant applications.

In this work, we are leveraging Nasdaq OMX low-latency reliable totally ordered message bus with a capacity of at least 2 million messages per second to build high performance distributed data store. The data operations consistency can be easily achieved by using the messaging bus as it forwards all messages in reliable total order fashion. Moreover, relying on the reliable totally ordered messaging, active in-memory replication support for fault tolerance and load balancing is integrated. Consequently, the prototype was developed using pro- duction environment requirements to demonstrate its feasibility.

Experimental results show a great scalability, and performace serving around 400,000 insert operations per second over 6 data nodes that can be served with 100 microseconds latency. Latency for single record read operations are bound to sub-half millisecond, while data ranges are retrieved with sub-100 Mbps capacity from one node. Moreover, performance improvements under a greater number of data store nodes are shown for both writes and reads. It is concluded that uniform totally ordered sequenced input data can be used in real time for large-scale distributed data storage to maintain strong consistency, fault-tolerance and high performance.

The report is here. And the presentation can be found below: