Introduction

WHY?

In current scientific volunteer computing software infrastructures, such as BOINC and XtremWeb, data is distributed centrally from a project’s coordinating nodes or servers. In BOINC, this is achieved through a set of HTTP mirrors, each providing clients with full copies of data input files. Similarly, in XtremWeb, clients are given the URIs of data input files. These centralized systems require projects not only to have the necessary network capacity needed to provide data to all volunteers, but also to have data readily available and persistent on their servers at all times to fulfill client requests. Further, the network throughput requirements of serving so many client machines can prove to be an impedi- ment to projects wishing to explore new types of data intensive application scenarios that are currently prohibitive in terms of their large data transfer needs. Alternatively, a viable approach to such centralized systems is to employ the use of peer-to-peer (P2P) techniques to implement data distribution.

GOALS?

  • Offload the central network needs. P2P data storage techniques can be used to introduce a new kind of data distribution system for volunteer computing projects, one that takes advantage of volunteer-side network capabilities.
  • Scalability not only needs to take into account the network bandwidth, but also the potential sizes of data with respect to the data and job throughput in a particular VC project, and their distribution over time.
  • Security in these systems goes beyond the traditional notion of simply ensuring authentication or file integrity. Due to the volatile and insecure nature of volunteer networks, a product of their open participation policies, there can be reason to enforce limitations on which nodes are allowed to distribute and cache data. By opening the data distribution channels to public participation, security now becomes a larger concern for projects that before had centrally managed servers. The area of security with respect to VCS can be roughly split into the following: user security and data security. It is important to support both data integrity and reliability, whilst also providing safeguards that can limit a peer nodes’ exposure to malicious attacks.
    • User Security: any P2P data distribution scheme that is implemented must allow for users to opt-out if they do not wish to share their bandwidth or storage capacity.
    • Data Security: find security schemes and policies and how to apply them to volunteer networks when selecting and distributing data-sets to peers.

HOW?

There are many ways this could be implemented, ranging from a BitTorrent-style network, where data is centrally tracked and all participants share relatively equal loads, to KaZaa-like super-peer networks, where select nodes are assigned greater responsibility in the network.

However, applying a traditional P2P network infrastructure to scientific computing, and in particular volunteer computing, can be highly problematic. In such environments, policies and safeguards for scientific data and users’ computers become more critical concerns for limiting consumption rather than any technical feasibility.

A tailor-made solution that could take into account the requirements of scientific communities, as opposed to a generic overarching P2P architecture, would have the advantage of facilitating different network topologies and data distribution algorithms, whilst retaining the safety of each participant’s computer. Further, each scientific application has different network and data needs, and customized solutions would allow for tailoring the network towards individual requirements, al- though with the disadvantage of increased development effort, complexity, and code maintenance.

As example there is ADICS, a customizable and brokered Peer-to-Peer Architecture for Data-Intensive Cycle Sharing that allows fine-grained provisioning of resources and application of project-based roles to network participants. Specifically, ADICS provides a brokered P2P system that offloads central network needs while limiting client exposure to foreign hosts. The brokered network-overlay introduced in ADICS acts as a buffer, in the form of a select group of trusted data-sharing nodes, between the scheduler and the clients.

decentralized_storage_systems/soaintroduction.txt · Last modified: 2012/04/25 01:33 by julia
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki