Long-term availability prediction for groups of volunteer resources

Review

Problem and Proposal

As VC uses the free resources in Internet and Intranet it’s important to discover endless options for its application. Authors of the paper mainly focused on the node availability prediction. They evaluate their prediction methods using real availability traces gathered from hundreds of thousands of hosts from SETI@home VC project.

One of the node’s availability application could be deploying a service over a set of hosts. The service is available if collective availability is presented. In such systems, the target is to keep the service available for an arbitrarily long period of time with minimun resources. The other application could be devoted to data storage, where a data object is stored over distributed hosts and needs to be available or clients to retrieve it at any time. Data should be replicated in such system and stored in M hosts out of which at least one needs to be available at a given moment.

Mostly all previous researches on availability were focused on short-term availability. Usually prediction time was no greater that 4 hours what is actually too less for application this forecast for real volunteer computing system. However, long-term prediction will have less accurace that short-term as the last one has more recent data.

Contribution of the paper:

  • Analyzed large set of trace data extracted from a real system (SETI@home) to identify types fo hosts.
  • Propose a prediction tool
  • Present a method to apply the bit vector-besad availability predictor to deploy a service. This method provides high availability with low redundancy.

Related works:

Previous works were focused on availability prediction of the CPU. Also there were some works analyzing availability based on host load information.

Douceur has been studying the distribution of host availability.

Trace analysis:

Authors collected traces from SETI@home and studied them in terms of:

  • Availability in term of interval lengths. They found mean availably time around 33 h. It was clear for them that continuous availability cannot be achieved by relying on a single host.
  • Volatility of the resources in term of the fraction of time the CPU was available. They found that residential machines are less available but contributing more.
  • Contribution to the system. They found that 90% of host availability gives only 30% of available CPU.

After analysis of the traces next step became more obvious - find availability patterns of the nodes. Patterns in the behavior were found: from random nodes to almost always available. They took 168 bit availability vector for each node (24/7) and process it. Clusterisation was found a great technique to long-term availability prediction. Still remains the question about season availability prediction and prediction of permanent failures.

decentralized_storage_systems/avapredfor_vc.txt · Last modified: 2012/04/23 01:06 by julia
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki