Benchmark Cassandra & CockroachDB with YCSB and HAProxy
The group I was in at Microsoft Research, Asia, was on database, distributed database in particular. GraphView, which is a middleware within Microsoft’s CosmosDB that accepts graph operations and translates them to T-SQL executed in SQL Server or Azure SQL Database, is our focus. In order to provide a Transaction-as-a-service at a global and persistent states, our team was formulating a new concurrency control protocol, which is based on a protocol from MIT, called TicToc and several KV backends. At memory level, Redis is picked and at disk level Facebook’s CassandraDB is picked.
Hence, a thorough study on Cassandra’s performance on cluster is in need. CockroachDB, as the flagship product of a startup by some ex-Googlers, is a distributed SQL database built on a transactional and strongly-consistent key-value store, which is a perfect match against our own datastore. So I will also put down the guide to benchmark CockroachDB on our own cluster with YCSB.
-
- Environment:
- 10 Ubuntu server nodes w. 1 Ubuntu client node
- Cassandra 3.11.2 binaries
- CockroachDB 2.0.2 binaries
- Cassandra Benchmark with Cassandra-stress tool
- Install Java8 via:
sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer
- Download Cassandra via:
wget http://www-us.apache.org/dist/cassandra/3.11.2/apache-cassandra-3.11.2-bin.tar.gz
- Make the following changes to conf/cassandra.yaml to setup the cluster
- seeds: Interneal IP address of each seed node. DO NOT MAKE ALL NODES SEED NODES. It is a best practice to have more than one seed node per datacenter. In our 10-node cluster, there is one seed node - 10.6.0.4
- listen_address: The IP address or hostname that Cassandra binds to for conencting to other Cassandra nodes. A.K.A. the private IP address of the currect node
- [Optional] rpc_address: Private IP address of the current node.
- Start, Stop and Monitor Cassandra cluster
./bin/cassandra & pkill -9 java ./bin/nodetool status
- Run Cassandra-stress tool
Populate the cluster with 1 million record (default) and 500 threads
./tools/bin/cassandra-stress write -node 10.6.0.4,10.6.0.5,10.6.0.6,10.6.0.12,10.6.0.13,10.6.0.14,10.6.0.15,10.6.0.16,10.6.0.17,10.6.0.18 -rate threads=500
Test the cluster with 1 million read-requests and 500 threads
./tools/bin/cassandra-stress read -node 10.6.0.4,10.6.0.5,10.6.0.6,10.6.0.12,10.6.0.13,10.6.0.14,10.6.0.15,10.6.0.16,10.6.0.17,10.6.0.18 -rate threads=500
Test the cluster with a mixed workload (R/W: 50%:50%) and 500 threads
./tools/bin/cassandra-stress mixed ratio\(write=1,read=1\) -node 10.6.0.4,10.6.0.5,10.6.0.6,10.6.0.12,10.6.0.13,10.6.0.14,10.6.0.15,10.6.0.16,10.6.0.17,10.6.0.18 -rate threads=500
- Install Java8 via:
- Cockroach Benchmark via YCSB
- Install Cockroach cluster
Download Cockroach binaries
wget -qO- https://binaries.cockroachdb.com/cockroach-v2.0.2.linux-amd64.tgz | tar xvz
Expand the limit on the number of file descriptors a process may have
ulimit -HSn 51200
Launch the root node
nohup cockroach start --cache=0.4 --max-sql-memory=0.4 --insecure --host=10.6.0.4 &
Launch other nodes to join this cluster
nohup cockroach start --cache=0.4 --max-sql-memory=0.4 --insecure --host=10.6.0.5 --join=10.6.0.4:26257 &
Verify the cluster is successfully launched by visiting on client node:
https://10.6.0.4:8080
Kill Cockroach process on current node
pkill -9 cockroach
- Create YCSB database and usertable
Open SQL shell from any node within our cluster
./cockroach sql --insecure --host=10.6.0.4
Create database "ycsb" if not exists
create database ycsb;
Create single table "usertable" if not exists
CREATE TABLE usertable ( YCSB_KEY VARCHAR(255) PRIMARY KEY, FIELD0 TEXT, FIELD1 TEXT, FIELD2 TEXT, FIELD3 TEXT, FIELD4 TEXT, FIELD5 TEXT, FIELD6 TEXT, FIELD7 TEXT, FIELD8 TEXT, FIELD9 TEXT );
- Launch HA proxy as load balancer
Synchronize time on each node since Cockroach node fails easily if out of sync
sudo apt-get install ntp sudo service stop ntp
Edit /etc/ntp.conf by commenting any lines start with "server" or "pool" and add:
server ntp.int.scaleway.com iburst server ntp.ubuntu.com iburst
Start ntp and verify if the server has been changed
sudo service start ntp sudo ntpq -p
Output should be something like this
remote refid st t when poll reach delay offset jitter ============================================================================== 10.1.31.39 .INIT. 16 u - 1024 0 0.000 0.000 0.000 *alphyn.canonica 192.53.103.108 2 u 832 1024 377 237.330 -0.197 0.648
Generate HAproxy configuration file from root(primary) node
./cockroach gen haproxy --insecure --host=10.6.0.4 --port=26257
Copy it to the client node, install HAproxy and run it with this .cfg file
scp haproxy.cfg [email protected]:~/ sudo apt-get install haproxy ulimit -HSn 51200 haproxy -f haproxy.cfg
- Run YCSB script
Download YCSB binaries to client node
https://github.com/brianfrankcooper/YCSB/releases
Load data from workloads
python bin/ycsb load jdbc -P workloads/workloadc -p db.driver=org.postgresql.Driver -p db.url=jdbc:postgresql://10.6.0.9:26257/ycsb -p db.user=root -p db.passwd=root -p recordcount=200000 -threads 32 -p operationcount=500000 -p db.batchsize=100 -s
Run workloads
python bin/ycsb run jdbc -P workloads/workloadRO -p db.driver=org.postgresql.Driver -p db.url=jdbc:postgresql://10.6.0.9:26257/ycsb -p db.user=root -p db.passwd=root -p recordcount=200000 -threads 1024 -p operationcount=500000 -p db.batchsize=100 -s
- Since I have done this benchmarks a couple times, in order to simplify this process, I put a start.sh file in each cockroach folder. With our own servers, since all binaries are downloaded with cluster & haproxy configured, you only need to change to cockroach directory and run start.sh (start-haproxy.sh for client node).
- For Read-only & Write-only workloads, change directory to ycsb-ro/ycsb-wo on ubuntu client node and run start.sh
- Install Cockroach cluster
- Environment:
Useful links:
Comments