Survey on Decentralized Storage Systems used by Volunteer Computing

Full paper could be found here.

Abstract

Recently Volunteer Computing (VC) is quite popular for providing its resources for large-scale computational problems in scientific researches. Centralized control of the VC systems is the main bottleneck, hence introduction the asymmetry to the system. Several systems have already used benefits from VC in terms of CPU sharing, e.g. SETI@home etc. Another side of the problem is storage sharing, where volunteer nodes will dedicate their free memory space to the system.

Introduction of Decentralized Storage System that is suitable for the VC is appropriate, mainly, to reduce possible overhead from the centralized part of the VC and provide quite available and eventually consistent decentralized storage system with minimal cost.

The focus of this paper is throughly survey currently existing decentralized storage systems, evaluate them according to its suitability for VC system, and provide a state-of-the-art solution for this issue.

Introduction

Basic peer-to-peer (P2P) system’s goals are decentralization, reduced cost and fault tolerance. At the same time P2P system provide inherent scalability and availability of resources.

Main design issues of P2P file system should be the following:

  • Symmetry. Roles among the peers should be equally distributed.
  • Decentralization. P2P systems are decentralized by their nature, hence, they could support distributed storage, processing, information sharing etc.
  • Robustness. System should be resilient to removal and failure of nodes at any moment.
  • Fast Resource Location. Efficient mechanism for resource location is an important point.
  • Load Balancing. System should make optimal distribution of resources based on nodes capability and availability.
  • Churn Protection. Denial of service attack should be handled in the system.
  • Anonymity, Security. To ensure resistance to censorship and security from the attacks this two properties should be introduced.
  • Scalability. Supporting millions of users are essential for decentralized storage systems.

So, how to achieve all this goals is still the question and current solutions could support only part of the properties above, sacrificing the other part.

The most popular techniques to achieve all these properties among large-scale following:

  • Consistent Hashing. In consistent hashing, the output range of a hash function is treated as a fixed circular space or ”ring”. Each node is assigned a random value

within this space. Each data item identified by a key is assigned to a node by hashing the data item’s key to yield its position on the ring.

  • Active or passive replication. In active replication each client request is processed by all the servers. In passive replication there is only one server (called primary) that processes client requests.
  • Gossip-based protocol for failure handling. The protocol is based on the gossip/virus based distribution of the information, including random destinations to spread the information.
  • Logging read/write operations. The main function of logging is to store all the changes made: reads and writes by all sides during the object life.
  • Ring locality for load balancing. To deal with nonuniform data, load distribution and heterogeneity of nodes performance, this technique should be applied.

Most of these properties can be found in Cassandra and Dynamo systems. And they are partly covered by other systems like Ivy, Squirrel, Pastis, PAST, Riak, Voldemort, OceanStore, Farsite.

So, what is volunteer computing (VC)?

VC uses the free resources in Internet and Intranet for some computational, storage purposes. It is important to discover endless options for its application. One of the differences between VC and P2P systems is nodes behavior. Analysis of the real traces from SETI@home project proved clients contribution consciousness. For example, SETI@home follows a typical model of a volunteer computing project, with an agent installed on the user’s machine after they register to participate. All the registered participants are contributing with their CPU to complete some important computational problem: biological, chemical etc.

However, current architectures are based on the client-server architecture. In such VC systems, a central server is usually used to assign jobs to voluntarily contributed machines/volunteers. That’s why, it is easy to notice bottlenecks in such systems, in terms of centralized task distributor. An improvement that reduces influence of the bottleneck was suggested by Harvard University - Harvard’s TONIC project, where centralized server is split to a central storage system and lookup service. But still, one point of failure still exists.

Moreover, TONIC can not contribute and share storage. That’s why, P2P-Tuple solution should be a appropriate to apply nowadays.

Till now VC was popular on the CPU sharing area. As the volume of existing data and knowledge is growing rapidly, the necessity of new approaches for storage is critical. One of the solution could scalable decentralized storage systems used in Volunteer Computing.

The best way to prove it is to evaluate distributed scalable systems by experimenting in Distributed System testbed (ex. Planet Lab). Main goals of the survey is to provide a new metric with which decentralized storage systems can be evaluated in terms of VC usage and evaluation on how decentralized storage systems can be used in VC. Another extension of the work will include proposal of the perfect system that fits Volunteer Computing storage needs and its evaluation.

The rest of the paper is organized as follows. In the Section 2 we will introduce some basics on Decentralized Storage Systems and Volunteer Computing. Section 3 will cover review of the existing storage systems with VC extension. Section 4 will present the results of the storage system experiments on the testbed (i.e PlanetLab). Section 5 will provide a proposal ans design of the most suitable decentralized storage system for VC. Finally the survey will end up with conclusion and list of references.

Decentralized Storage Systems

Background

Existed Systems

In the quest for defining the state-of-the-art decentralized storage system for volunteer computing, the following well-known distributed storage systems were thoroughly surveyed: FarSite, IVY, Overnet, PAST, PASTIS, Voldemort, OceanStore, Glacier, Total Recall, Cassandra, Riak, Dynamo, and Attic. The systems are arranged in the order of least useful to most useful. The last two systems, TFS and Squirrel, are not decentralized storage systems, however their implementations and characteristics make them equally important for the survey. The survey was based on analysis of the characteristics, implementations and the system architecture of each storage systems. Also, taking into considerations the potential implementations in volunteer computing systems, each distributed storage systems were evaluated by its advantages, disadvantages and VC compatibility.

Appendix contains an evaluation table, classifying each of the storage systems above according to their read/write access, replication management, symmetry, fault handling and security characteristics. Section IV further describes why these characteristics are necessary, how they could be implemented, and the challenges in their implementation in volunteer computing systems.

Reviews

Volunteer Computing

Background

Reviews

State of Art System used by Volunteer Computing

Conclusions

The implementation of decentralized storage system (DSS) in volunteer computing field may open doors to numerous research ideas for both VC and (DSS) fields. Only few of them were discussed in this paper, as we faced with potential challenges of currently existing DSS systems in the survey. The major characteristics of the state-of-the-art DSS for the use of VC are read&write access, robustness, security, and scalability. In addition to the survey evaluation of currently existing DSS, the use of incentives for both VC and DSS were discussed.

decentralized_storage_systems.txt · Last modified: 2013/05/30 13:11 by julia
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki