Guy, To clarify :
[1] If you have four tablets it's reasonable to suspect that the RPC time to access those servers may increase a bit if you access them sequentially versus in parallel. On Wed, Aug 29, 2018 at 8:16 AM Marc <phroc...@apache.org> wrote: > > Guy, > The ReadData example appears to use a sequential scanner. Can you > change that to a batch scanner and see if there is improvement [1]? > Also, while you are there can you remove the log statement or set your > log level so that the trace message isn't printed? > > In this case we are reading the entirety of that data. If you were to > perform a query you would likely prefer to do it at the data instead > of bringing all data back to the client. > > What are your expectations since it appears very slow. Do you want > faster client side access to the data? Certainly improvements could be > made -- of that I have no doubt -- but the time to bring 6M entries to > the client is a cost you will incur if you use the ReadData example. > > [1] If you have four tablets it's reasonable to suspect that the RPC > time to access those servers may increase a bit. > > On Wed, Aug 29, 2018 at 8:05 AM guy sharon <guy.sharon.1...@gmail.com> wrote: > > > > hi, > > > > Continuing my performance benchmarks, I'm still trying to figure out if the > > results I'm getting are reasonable and why throwing more hardware at the > > problem doesn't help. What I'm doing is a full table scan on a table with > > 6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and Hadoop 2.8.4. > > The table is populated by > > org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter > > modified to write 6M entries instead of 50k. Reads are performed by > > "bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData -i > > muchos -z localhost:2181 -u root -t hellotable -p secret". Here are the > > results I got: > > > > 1. 5 tserver cluster as configured by Muchos > > (https://github.com/apache/fluo-muchos), running on m5d.large AWS machines > > (2vCPU, 8GB RAM) running CentOS 7. Master is on a separate server. Scan > > took 12 seconds. > > 2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results. > > 3. Splitting the table to 4 tablets causes the runtime to increase to 16 > > seconds. > > 4. 7 tserver cluster running m5d.xlarge servers. 12 seconds. > > 5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM), running > > Amazon Linux. Configuration as provided by Uno > > (https://github.com/apache/fluo-uno). Total time was 26 seconds. > > > > Offhand I would say this is very slow. I'm guessing I'm making some sort of > > newbie (possibly configuration) mistake but I can't figure out what it is. > > Can anyone point me to something that might help me find out what it is? > > > > thanks, > > Guy. > > > >