Yahoo! Cloud Serving Benchmark (YCSB)

The goal of the YCSB project is to develop a framework and common set of workloads for evaluating the performance of different “key-value” and “cloud” serving stores. The project comprises two things:

  • The YCSB Client, an extensible workload generator
  • The Core workloads, a set of workload scenarios to be executed by the generator

For benchmarking there was used Workload A - Update Heavy with 50% of both reads and updates. One fo the examples of such workload could be session store recording recent actions in a user session.

sudo apt-get install openjdk-6-jdk
wget https://github.com/downloads/brianfrankcooper/YCSB/ycsb-0.1.4.tar.gz
tar xfvz ycsb-0.1.4.tar.gz 
cd ycsb-0.1.4

FoundationDB

Installation

Follow the foundationdb.com guide.

  • 3 m1.micro amazon instances.
  • 100 000 operation and records to load the DB on most of workloads
  • Scan operation is not supported

workload A

  • load (3 instances, 10 000 operation and records)
[OVERALL], RunTime(ms), 109338.0
[OVERALL], Throughput(ops/sec), 91.45951087453584
[INSERT], Operations, 10000
[INSERT], AverageLatency(us), 10888.5414
[INSERT], MinLatency(us), 4208
[INSERT], MaxLatency(us), 88057
[INSERT], 95thPercentileLatency(ms), 15
[INSERT], 99thPercentileLatency(ms), 19
  • load (1 instance, 10 000 operations and records)
[OVERALL], RunTime(ms), 168662.0
[OVERALL], Throughput(ops/sec), 59.290177989114326
[INSERT], Operations, 10000
[INSERT], AverageLatency(us), 16775.8892
[INSERT], MinLatency(us), 7817
[INSERT], MaxLatency(us), 657408
[INSERT], 95thPercentileLatency(ms), 23
[INSERT], 99thPercentileLatency(ms), 29
  • run (10 000 operation and records)
[OVERALL], RunTime(ms), 58885.0
[OVERALL], Throughput(ops/sec), 169.82253545045427
[UPDATE], Operations, 5047
[UPDATE], AverageLatency(us), 8313.717455914404
[UPDATE], MinLatency(us), 3405
[UPDATE], MaxLatency(us), 34135
[UPDATE], 95thPercentileLatency(ms), 12
[UPDATE], 99thPercentileLatency(ms), 15
[READ], Operations, 4953
[READ], AverageLatency(us), 3317.584292348072
[READ], MinLatency(us), 2691
[READ], MaxLatency(us), 302373
[READ], 95thPercentileLatency(ms), 4
[READ], 99thPercentileLatency(ms), 6
  • run (1 instance, 10 000 operations and records)



Workload B

  • load (100 000 operation and records)
[OVERALL], RunTime(ms), 476019.0
[OVERALL], Throughput(ops/sec), 92.45009127786916
[INSERT], Operations, 44009
[INSERT], AverageLatency(us), 10796.638732986434
[INSERT], MinLatency(us), 1161
[INSERT], MaxLatency(us), 372030
[INSERT], 95thPercentileLatency(ms), 14
[INSERT], 99thPercentileLatency(ms), 19
  • run (100 000 operation and records)
[OVERALL], RunTime(ms), 334835.0
[OVERALL], Throughput(ops/sec), 298.65456120178595
[UPDATE], Operations, 5014
[UPDATE], AverageLatency(us), 6385.420422816115
[UPDATE], MinLatency(us), 3508
[UPDATE], MaxLatency(us), 88374
[UPDATE], 95thPercentileLatency(ms), 10
[UPDATE], 99thPercentileLatency(ms), 14
[READ], Operations, 94986
[READ], AverageLatency(us), 3174.2867685764218
[READ], MinLatency(us), 2660
[READ], MaxLatency(us), 593336
[READ], 95thPercentileLatency(ms), 3
[READ], 99thPercentileLatency(ms), 7

Workload C

  • load (100 000 operation and records)
[OVERALL], RunTime(ms), 63236.0
[OVERALL], Throughput(ops/sec), 66.86064899740654
[INSERT], Operations, 4229
[INSERT], AverageLatency(us), 14851.65098131946
[INSERT], MinLatency(us), 1983
[INSERT], MaxLatency(us), 154020
[INSERT], 95thPercentileLatency(ms), 22
[INSERT], 99thPercentileLatency(ms), 34
  • run (100 000 operation and records)
[OVERALL], RunTime(ms), 330623.0
[OVERALL], Throughput(ops/sec), 302.4592965401681
[READ], Operations, 100000
[READ], AverageLatency(us), 3295.67228
[READ], MinLatency(us), 2645
[READ], MaxLatency(us), 440142
[READ], 95thPercentileLatency(ms), 4
[READ], 99thPercentileLatency(ms), 7

Workload D

  • load (100 000 operation and records)
[OVERALL], RunTime(ms), 877528.0
[OVERALL], Throughput(ops/sec), 91.74066240621381
[INSERT], Operations, 80506
[INSERT], AverageLatency(us), 10884.805082850968
[INSERT], MinLatency(us), 1170
[INSERT], MaxLatency(us), 240559
[INSERT], 95thPercentileLatency(ms), 15
[INSERT], 99thPercentileLatency(ms), 20
  • run (100 000 operation and records)
[OVERALL], RunTime(ms), 373844.0
[OVERALL], Throughput(ops/sec), 267.49125303602574
[INSERT], Operations, 5065
[INSERT], AverageLatency(us), 9196.684698914116
[INSERT], MinLatency(us), 4214
[INSERT], MaxLatency(us), 322476
[INSERT], 95thPercentileLatency(ms), 14
[INSERT], 99thPercentileLatency(ms), 18
[READ], Operations, 94935
[READ], AverageLatency(us), 3434.842945173013
[READ], MinLatency(us), 2711
[READ], MaxLatency(us), 504955
[READ], 95thPercentileLatency(ms), 4
[READ], 99thPercentileLatency(ms), 8

Workload F

  • load (10 000 operation and records)
[OVERALL], RunTime(ms), 113812.0
[OVERALL], Throughput(ops/sec), 87.86419709696693
[INSERT], Operations, 10000
[INSERT], AverageLatency(us), 11335.0265
[INSERT], MinLatency(us), 4507
[INSERT], MaxLatency(us), 308013
[INSERT], 95thPercentileLatency(ms), 16
[INSERT], 99thPercentileLatency(ms), 20
  • run (10 000 operation and records)
[READ], Operations, 10000
[READ], AverageLatency(us), 3661.6484
[READ], MinLatency(us), 2690
[READ], MaxLatency(us), 478005
[READ], 95thPercentileLatency(ms), 5
[READ], 99thPercentileLatency(ms), 12
[READ-MODIFY-WRITE], Operations, 4958
[READ-MODIFY-WRITE], AverageLatency(us), 10300.612343686971
[READ-MODIFY-WRITE], MinLatency(us), 6425
[READ-MODIFY-WRITE], MaxLatency(us), 249055
[READ-MODIFY-WRITE], 95thPercentileLatency(ms), 16
[READ-MODIFY-WRITE], 99thPercentileLatency(ms), 24
[OVERALL], RunTime(ms), 70356.0
[OVERALL], Throughput(ops/sec), 142.1342884757519
[UPDATE], Operations, 4958
[UPDATE], AverageLatency(us), 6691.750302541347
[UPDATE], MinLatency(us), 3473
[UPDATE], MaxLatency(us), 93724
[UPDATE], 95thPercentileLatency(ms), 10
[UPDATE], 99thPercentileLatency(ms), 15

Redis

Redis is a versatile key/value store which is hosted primarily in-memory. Redis is designed as an in-memory database and thus supposed to keep the entire key space in memory during operation. When terminated all data will be purged from memory. Such volatility is not suitable for production setups. In order to achieve data persistence and fault tolerance, Redis uses periodic snapshots.

Installation

wget http://download.redis.io/redis-stable.tar.gz
tar xvzf redis-stable.tar.gz
cd redis-stable
apt-get install build-essential 
make

Evaluation

Workload A

Execute the load phase

./bin/ycsb load redis -P workloads/workloada -P test.properties
where test.properties  :  redis.host=10.32.22.55

[OVERALL], RunTime(ms), 2525.0
[OVERALL], Throughput(ops/sec), 396.03960396039605

Execute the transaction phase

./bin/ycsb run redis -P workloads/workloada -P test.properties
where test.properties  :  redis.host=10.32.22.55

[OVERALL], RunTime(ms), 1579.0
[OVERALL], Throughput(ops/sec), 633.3122229259025

Voldemort

Voldemort is a distributed key-value storage system. It is used at LinkedIn for certain high-scalability storage problems where simple functional partitioning is not sufficient. Voldemort combines in memory caching with the storage system so that a separate caching tier is not required.

  • Data is automatically replicated over multiple servers.
  • Data is automatically partitioned so each server contains only a subset of the total data
  • Server failure is handled transparently
  • Pluggable serialization is supported to allow rich keys and values including lists and tuples with named fields, as well as to integrate with common serialization frameworks like Protocol Buffers, Thrift, Avro and Java Serialization
  • Data items are versioned to maximize data integrity in failure scenarios without compromising availability of the system
  • Each node is independent of other nodes with no central point of failure or coordination
  • Good single node performance: you can expect 10-20k operations per second depending on the machines, the network, the disk system, and the data replication factor
  • Support for pluggable data placement strategies to support things like distribution across data centers that are geographically far apart.

Installation

wget https://github.com/downloads/voldemort/voldemort/voldemort-0.90.1.tar.gz
tar xvzf voldemort-0.90.1.tar.gz
apt-get -y install openjdk-6-jdk
./bin/voldemort-server.sh config/single_node_cluster/

Evaluation

Workload A

Execute the load phase

Execute the transaction phase

MongoDB

MongoDB is an open source document-oriented NoSQL database system. MongoDB is part of the NoSQL family of database systems. Instead of storing data in tables as is done in a classical relational database, MongoDB stores structured data as JSON-like documents with dynamic schemas, making the integration of data in certain types of applications easier and faster.

Installation

curl -O http://downloads.mongodb.org/linux/mongodb-linux-x86_64-2.0.6.tgz
tar -xzf mongodb-linux-x86_64-2.0.6.tgz
cd mongodb-linux-x86_64-1.0.1/bin
mkdir -p data/db
./mongod --dbpath ./data/db/

Evaluation

Workload A

Execute the load phase

./bin/ycsb load mongodb -P workloads/workloada -P test.properties
where test.properties: 
mongodb.url=mongodb://10.240.241.77:27017
mongodb.database=ycsb
mongodb.writeConcern=normal

[OVERALL], RunTime(ms), 2358.0
[OVERALL], Throughput(ops/sec), 424.08821034775235

Execute the transaction phase

./bin/ycsb run mongodb -P workloads/workloada -P test.properties
where test.properties: 
mongodb.url=mongodb://10.240.241.77:27017
mongodb.database=ycsb
mongodb.writeConcern=normal
[OVERALL], RunTime(ms), 2016.0
[OVERALL], Throughput(ops/sec), 496.031746031746

Cassandra

Cassandra is a distributed storage system for managing very large amount of structured data spread out across many servers with high availability and no single point of failure. The system doesn’t support a ful relational data model.

Architecture:

Distributed system techniques used in the system:

  • Partitioning. It’s done by means of consistent hashing (order preserving hash function). As far as I understood in Cassandra, to deal with non-uniform data and load distribution and heterogeneity of nodes performance, developers applies nodes moving in the ring, according to their load.
  • Replication. Each data replicated at N hosts, where N is a replication factor per instance. Coordinator is in charge of replication. Replication policies: Rack Unaware, Rack Aware, Datacenter Aware. For the last two policies Cassandra uses Zookeeper.
  • Membership is based on Scuttlebutt, a very efficient anti-entropy Gossip based mechanism.
  • Failure handling. Cassandra is using a modified version of Ф Accrual Failure Detector with Exponential Distribution for inter-arrival time for other nodes gossip messages. Ф is a value which represent a suspicion level for each of monitored nodes. The more Ф the less likelihood % of mistake we will do in the future about its failing.
  • Bootstrapping. For the fault tolerance, the mapping is persisted to disk locally and also in Zookeeper. Then the token is gossiped around the cluster. When a node needs to join a cluster, it reads its configuration file with a few contact points (seeds) within the cluster.
  • Scaling. Each new node is assigned with a token to alleviate a heavily loaded nodes.
  • Read/write requests. Write is a write to into a commit log. Writing to in-memory is performed only after successful write to commit log. When in-memory is filled it dumps itself to disk. Read operation queries the in-memory data structure before looking into the file in disk. Lookup a key could be done through many data files. But here developers use bloom filter to summarize the keys in the file.

Installation

echo "deb http://www.apache.org/dist/cassandra/debian 10x main" >> /etc/apt/sources.list
echo "deb-src http://www.apache.org/dist/cassandra/debian 10x main" >> /etc/apt/sources.list
gpg --keyserver pgp.mit.edu --recv-keys F758CE318D77295D
gpg --export --armor F758CE318D77295D | sudo apt-key add -
gpg --keyserver pgp.mit.edu --recv-keys 2B5C1B00
gpg --export --armor 2B5C1B00 | sudo apt-key add -
sudo apt-get update
sudo apt-get install cassandra

Modify /etc/cassandra/cassandra.yaml:

listen_address: 0.0.0.0
rpc_address: 0.0.0.0

Execute:

cassandra-cli -h localhost -p 9160
create keyspace usertable;
use usertable;
create column family data;
^C

Evaluation

Workload A

Execute the load phase

./bin/ycsb load cassandra-10 -P workloads/workloada -P test.properties
where test.properties: 
hosts=10.239.11.143
cassandra.connectionretries=1
cassandra.operationretries=1
[OVERALL], RunTime(ms), 2582.0
[OVERALL], Throughput(ops/sec), 387.29666924864443

Execute the transaction phase

./bin/ycsb run cassandra-10 -P workloads/workloada -P test.properties
where test.properties: 
hosts=10.239.11.143
cassandra.connectionretries=1
cassandra.operationretries=1
[OVERALL], RunTime(ms), 1611.0
[OVERALL], Throughput(ops/sec), 620.7324643078833
nosql_db.txt · Last modified: 2013/03/21 02:16 by julia
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki