To answer your original question: YCSB is a standard benchmarking tool for databases that provides various types of read/write workloads.

https://github.com/brianfrankcooper/YCSB/tree/master/accumulo1.7

On 8/29/18 8:04 AM, guy sharon wrote:
hi,

Continuing my performance benchmarks, I'm still trying to figure out if the results I'm getting are reasonable and why throwing more hardware at the problem doesn't help. What I'm doing is a full table scan on a table with 6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and Hadoop 2.8.4. The table is populated by org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter modified to write 6M entries instead of 50k. Reads are performed by "bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData -i muchos -z localhost:2181 -u root -t hellotable -p secret". Here are the results I got:

1. 5 tserver cluster as configured by Muchos (https://github.com/apache/fluo-muchos), running on m5d.large AWS machines (2vCPU, 8GB RAM) running CentOS 7. Master is on a separate server. Scan took 12 seconds.
2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results.
3. Splitting the table to 4 tablets causes the runtime to increase to 16 seconds.
4. 7 tserver cluster running m5d.xlarge servers. 12 seconds.
5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM), running Amazon Linux. Configuration as provided by Uno (https://github.com/apache/fluo-uno). Total time was 26 seconds.

Offhand I would say this is very slow. I'm guessing I'm making some sort of newbie (possibly configuration) mistake but I can't figure out what it is. Can anyone point me to something that might help me find out what it is?

thanks,
Guy.


Reply via email to