Characteristics

The ideal distributed storage system should be designed with keeping in mind volunteer computing members as end users. Given the right incentives for both volunteer computing does as well as the storage nodes, the following assumptions are safely made:

  • Storage nodes in this system are relatively more trust- worthy than regular P2P storage nodes.
  • Not only are the storage nodes trustworthy, but they are also committed. Thus we expect lower churn rate among the nodes, and the system should not need to take into account the worst-case scenario of nodes dropping, as mentioned in [glacier].

By evaluating the successful deployments of other p2p storage system in our survey, the following characteristics were defined in for ideal distributed storage system: read/write access, replication and fault-tolerance, and symmetry and availability. A short suggestion of incentives are mentioned, since incentives strongly influence the behaviour of any dis- tributed systems.

Read and Write Access: Need for read and write access for distributed storage system depends entirely on the frequency of updates in the files. Implementation of file updates and the maintenance of their consistencies among all replications in the distributed system create issues mentioned in [pastis]. Consideration of updated files as entirely different from their older versions, therefore adding them as new files in the distributed system is mentioned in [anyone has ideas for this?]. This method is more appropriate for distributed system with nodes that do not frequently update data. Also, a distributed storage system can be implemented as an archival storage for computed data from volunteer computing system. Such storage systems designed as an archive are mentioned in [glacier, past, anything else]. The frequency of file updates from the volunteer computing nodes, such as re-submitting or correction of computational data, should be rare. We expect to see more of uploading of computational data, instead of correction of one’s work. However, we should not assume that volunteer computers would only upload computational data. Therefore the read access vs. write access in our system should have a ratio of about 7:3. Implementations of this type of R/W access is very common in other distributed storage systems, such as [farsite].

Fault Tolerance & Replication Techniques: As with aforementioned assumptions, the necessity of extremely high fault tolerance is insignificant in the design of a distributed storage system for volunteer computing. The assumption of extremely high churn tolerance is unnecessary, but the average amount of smart replication techniques and fault tolerance is inevitably important. The replication technique requirements should be low memory/resources usage, and implementation of locality based load balancing. Erasure coding and Byzantine-Fault-Tolerant algorithm provide certain amount of fault tolerance, load balancing and lower memory usage, and they have been implemented in [farsite, glacier, oceanstore, voldemort]. Also, a replication technique based on popularity, the access rate, and file types is used in [squirrel, overnet, totalrecall]. The ideal distributed storage system should use a combination of BFT, erasure coding and popularity and locality based replication techniques.

Availability and Symmetry: We define data availability as data localization, an easy location and acquisition of data when necessary. Therefore data availability and symmetry of the system goes hand-in-hand. In the distributed storage system, the symmetry is not necessary, as shown in systems such as [farsite, total recall, riak, oceanstore]. The availability is achieved through geographic locality based replication techniques, such as mentioned in [ivy]. The majority of the volunteers are located in USA, Germany, UK and Canada. Therefore, geographic locality aware traffic balancing is necessary. Also, a use of BitTorrent like swarming techniques was mentioned in [attic] for data serving and load managing.

Incentives: The use of incentives in the distributed storage system, as same as the volunteers in the volunteer computing system, is an effective way to increase and maintain storage nodes. Incentives based on credit system, in which individuals are granted credits and certificates for their contribution of computing powers, have proven very effective in systems like SETI@home. These type incentives encourage users to compete among each other with their credits, thus contributing more towards the computing system. Similarly, a credit and certificate based on amount of storage shared, and available time in the system would contribute towards a highly available distributed storage system. Also, a government tax break for both volunteer computing members as well as the storage node members could be considered. There are over 1 million volunteer computing nodes and they are located in a relatively concentrated manner in the world – four countries lead by their number of volunteers in the entire volunteer computing across the world. Thus, storage nodes will probably be located in these countries. Therefore, a collaboration with these countries’ governments would greatly increase the number of storage nodes in those geographic locations, thus contributing towards locality based load management.

decentralized_storage_systems/soacharacteristics.txt · Last modified: 2012/04/25 01:36 by julia
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki