Guy,
  To clarify :

[1] If you have four tablets it's reasonable to suspect that the RPC
time to access those servers may increase a bit if you access them
sequentially versus in parallel.
On Wed, Aug 29, 2018 at 8:16 AM Marc <phroc...@apache.org> wrote:
>
> Guy,
>   The ReadData example appears to use a sequential scanner. Can you
> change that to a batch scanner and see if there is improvement [1]?
> Also, while you are there can you remove the log statement or set your
> log level so that the trace message isn't printed?
>
> In this case we are reading the entirety of that data. If you were to
> perform a query you would likely prefer to do it at the data instead
> of bringing all data back to the client.
>
> What are your expectations since it appears very slow. Do you want
> faster client side access to the data? Certainly improvements could be
> made -- of that I have no doubt -- but the time to bring 6M entries to
> the client is a cost you will incur if you use the ReadData example.
>
> [1] If you have four tablets it's reasonable to suspect that the RPC
> time to access those servers may increase a bit.
>
> On Wed, Aug 29, 2018 at 8:05 AM guy sharon <guy.sharon.1...@gmail.com> wrote:
> >
> > hi,
> >
> > Continuing my performance benchmarks, I'm still trying to figure out if the 
> > results I'm getting are reasonable and why throwing more hardware at the 
> > problem doesn't help. What I'm doing is a full table scan on a table with 
> > 6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and Hadoop 2.8.4. 
> > The table is populated by 
> > org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter 
> > modified to write 6M entries instead of 50k. Reads are performed by 
> > "bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData -i 
> > muchos -z localhost:2181 -u root -t hellotable -p secret". Here are the 
> > results I got:
> >
> > 1. 5 tserver cluster as configured by Muchos 
> > (https://github.com/apache/fluo-muchos), running on m5d.large AWS machines 
> > (2vCPU, 8GB RAM) running CentOS 7. Master is on a separate server. Scan 
> > took 12 seconds.
> > 2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results.
> > 3. Splitting the table to 4 tablets causes the runtime to increase to 16 
> > seconds.
> > 4. 7 tserver cluster running m5d.xlarge servers. 12 seconds.
> > 5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM), running 
> > Amazon Linux. Configuration as provided by Uno 
> > (https://github.com/apache/fluo-uno). Total time was 26 seconds.
> >
> > Offhand I would say this is very slow. I'm guessing I'm making some sort of 
> > newbie (possibly configuration) mistake but I can't figure out what it is. 
> > Can anyone point me to something that might help me find out what it is?
> >
> > thanks,
> > Guy.
> >
> >

Reply via email to