hi,

Continuing my performance benchmarks, I'm still trying to figure out if the
results I'm getting are reasonable and why throwing more hardware at the
problem doesn't help. What I'm doing is a full table scan on a table with
6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and Hadoop 2.8.4.
The table is populated by
org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter
modified to write 6M entries instead of 50k. Reads are performed by
"bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData -i
muchos -z localhost:2181 -u root -t hellotable -p secret". Here are the
results I got:

1. 5 tserver cluster as configured by Muchos (
https://github.com/apache/fluo-muchos), running on m5d.large AWS machines
(2vCPU, 8GB RAM) running CentOS 7. Master is on a separate server. Scan
took 12 seconds.
2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results.
3. Splitting the table to 4 tablets causes the runtime to increase to 16
seconds.
4. 7 tserver cluster running m5d.xlarge servers. 12 seconds.
5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM), running
Amazon Linux. Configuration as provided by Uno (
https://github.com/apache/fluo-uno). Total time was 26 seconds.

Offhand I would say this is very slow. I'm guessing I'm making some sort of
newbie (possibly configuration) mistake but I can't figure out what it is.
Can anyone point me to something that might help me find out what it is?

thanks,
Guy.

Reply via email to