Couple of things to look at/try 1 - Is the data spread out amongst all the tablets and tservers when you have multiple tservers? 2 - How much of the data is in memory on the tablet server and how much is on disk. You can try flushing the table before running your scan. 3 - You could also launch compaction before running your scan to minimize the number of rfiles per tablet
Mike On Wed, Aug 29, 2018 at 3:12 PM guy sharon <guy.sharon.1...@gmail.com> wrote: > hi Mike, > > As per Mike Miller's suggestion I started using > org.apache.accumulo.examples.simple.helloworld.ReadData from Accumulo with > debugging turned off and a BatchScanner with 10 threads. I redid all the > measurements and although this was 20% faster than using the shell there > was no difference once I started playing with the hardware configurations. > > Guy. > > On Wed, Aug 29, 2018 at 10:06 PM Michael Wall <mjw...@gmail.com> wrote: > >> Guy, >> >> Can you go into specifics about how you are measuring this? Are you >> still using "bin/accumulo shell -u root -p secret -e "scan -t hellotable >> -np" | wc -l" as you mentioned earlier in the thread? As Mike Miller >> suggested, serializing that back to the display and then counting 6M >> entries is going to take some time. Try using a Batch Scanner directly. >> >> Mike >> >> On Wed, Aug 29, 2018 at 2:56 PM guy sharon <guy.sharon.1...@gmail.com> >> wrote: >> >>> Yes, I tried the high performance configuration which translates to 4G >>> heap size, but that didn't affect performance. Neither did setting >>> table.scan.max.memory to 4096k (default is 512k). Even if I accept that the >>> read performance here is reasonable I don't understand why none of the >>> hardware configuration changes (except going to 48 cores, which made things >>> worse) made any difference. >>> >>> On Wed, Aug 29, 2018 at 8:33 PM Mike Walch <mwa...@apache.org> wrote: >>> >>>> Muchos does not automatically change its Accumulo configuration to take >>>> advantage of better hardware. However, it does have a performance profile >>>> setting in its configuration (see link below) where you can select a >>>> profile (or create your own) based on your the hardware you are using. >>>> >>>> >>>> https://github.com/apache/fluo-muchos/blob/master/conf/muchos.props.example#L94 >>>> >>>> On Wed, Aug 29, 2018 at 11:35 AM Josh Elser <els...@apache.org> wrote: >>>> >>>>> Does Muchos actually change the Accumulo configuration when you are >>>>> changing the underlying hardware? >>>>> >>>>> On 8/29/18 8:04 AM, guy sharon wrote: >>>>> > hi, >>>>> > >>>>> > Continuing my performance benchmarks, I'm still trying to figure out >>>>> if >>>>> > the results I'm getting are reasonable and why throwing more >>>>> hardware at >>>>> > the problem doesn't help. What I'm doing is a full table scan on a >>>>> table >>>>> > with 6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and >>>>> Hadoop >>>>> > 2.8.4. The table is populated by >>>>> > org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter >>>>> > modified to write 6M entries instead of 50k. Reads are performed by >>>>> > "bin/accumulo >>>>> org.apache.accumulo.examples.simple.helloworld.ReadData -i >>>>> > muchos -z localhost:2181 -u root -t hellotable -p secret". Here are >>>>> the >>>>> > results I got: >>>>> > >>>>> > 1. 5 tserver cluster as configured by Muchos >>>>> > (https://github.com/apache/fluo-muchos), running on m5d.large AWS >>>>> > machines (2vCPU, 8GB RAM) running CentOS 7. Master is on a separate >>>>> > server. Scan took 12 seconds. >>>>> > 2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results. >>>>> > 3. Splitting the table to 4 tablets causes the runtime to increase >>>>> to 16 >>>>> > seconds. >>>>> > 4. 7 tserver cluster running m5d.xlarge servers. 12 seconds. >>>>> > 5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM), >>>>> running >>>>> > Amazon Linux. Configuration as provided by Uno >>>>> > (https://github.com/apache/fluo-uno). Total time was 26 seconds. >>>>> > >>>>> > Offhand I would say this is very slow. I'm guessing I'm making some >>>>> sort >>>>> > of newbie (possibly configuration) mistake but I can't figure out >>>>> what >>>>> > it is. Can anyone point me to something that might help me find out >>>>> what >>>>> > it is? >>>>> > >>>>> > thanks, >>>>> > Guy. >>>>> > >>>>> > >>>>> >>>>