Scalable Storage for Data Intensive Computing

Review

Problem and Proposal

Cloud Computing needs are mainly to have scalable, elastic and fault-tolerant storage system. Mainly current cloud computing storage systems are P2P ones. Authors developed new file system - RFS, which uses one hop DHT. So this solution doesn’t have single point of failure.

Major author’s contribution:

  • A metadata storage architecture that provides fault tolerance, improved throughput and increased scalability
  • Studying the impact of the proposed design through analysis and simulation
  • Implementing and deploying RFS

Related Works:

  • Traditional Distributed FileSystems
  • NFS
  • AFS

P2P-based Storage Systems

  • OceanStore Goal of the system: unstructured infrastructure and aggressive promiscuous caching. System assumes that the infrastructure is unstructured. That’s why data is caching anywhere, anytime in order to provide faster access. OceanStore employs a Byzantine-fault tolerant commit protocol to provide strong consistency across replicas. It stores each version of a data object in a permanent, read-only form objects in the system are identified by a globally unique identifier, which is a secure hash (SHA-1) of the owner’s key and readable name. OceanStore uses Tapestry(scalable overlay network, built on TCP/IP).
  • PAST is a large scale, decentralized, persistent, P2P storage system with high availability and scalability. Within the system could be storage imbalances in the system. PAST uses to schemes to deal with imbalances: Replica Division and File Division.

Cloud Storage Systems

  • Google File System
  • Dynamo The system takes into account churn. Data is partitioned and replicated using consistent hashing. Consistency provided by using versioning. Also consistency among replicas during failures is maintained by quorum-like replica synchronisation protocol. Dynamo uses a gossip based distributed failure detection and membership protocol.

FRS Design

Architecture consists of metaservers, chunkservers and clients. Metaservers are joined to one hop DHT. Chunk servers are grouped into into multiple cells and each cell is communicate with one metadata server. Such system could tolerate the failure of multiple metaservers and it handles large number of files.

decentralized_storage_systems/scalable_storage_for_data-intensive_computing.txt · Last modified: 2012/04/23 00:51 by julia
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki