hi Mike,

As per Mike Miller's suggestion I started using
org.apache.accumulo.examples.simple.helloworld.ReadData from Accumulo with
debugging turned off and a BatchScanner with 10 threads. I redid all the
measurements and although this was 20% faster than using the shell there
was no difference once I started playing with the hardware configurations.

Guy.

On Wed, Aug 29, 2018 at 10:06 PM Michael Wall <mjw...@gmail.com> wrote:

> Guy,
>
> Can you go into specifics about how you are measuring this?  Are you still
> using "bin/accumulo shell -u root -p secret -e "scan -t hellotable -np" |
> wc -l" as you mentioned earlier in the thread?  As Mike Miller suggested,
> serializing that back to the display and then counting 6M entries is going
> to take some time.  Try using a Batch Scanner directly.
>
> Mike
>
> On Wed, Aug 29, 2018 at 2:56 PM guy sharon <guy.sharon.1...@gmail.com>
> wrote:
>
>> Yes, I tried the high performance configuration which translates to 4G
>> heap size, but that didn't affect performance. Neither did setting
>> table.scan.max.memory to 4096k (default is 512k). Even if I accept that the
>> read performance here is reasonable I don't understand why none of the
>> hardware configuration changes (except going to 48 cores, which made things
>> worse) made any difference.
>>
>> On Wed, Aug 29, 2018 at 8:33 PM Mike Walch <mwa...@apache.org> wrote:
>>
>>> Muchos does not automatically change its Accumulo configuration to take
>>> advantage of better hardware. However, it does have a performance profile
>>> setting in its configuration (see link below) where you can select a
>>> profile (or create your own) based on your the hardware you are using.
>>>
>>>
>>> https://github.com/apache/fluo-muchos/blob/master/conf/muchos.props.example#L94
>>>
>>> On Wed, Aug 29, 2018 at 11:35 AM Josh Elser <els...@apache.org> wrote:
>>>
>>>> Does Muchos actually change the Accumulo configuration when you are
>>>> changing the underlying hardware?
>>>>
>>>> On 8/29/18 8:04 AM, guy sharon wrote:
>>>> > hi,
>>>> >
>>>> > Continuing my performance benchmarks, I'm still trying to figure out
>>>> if
>>>> > the results I'm getting are reasonable and why throwing more hardware
>>>> at
>>>> > the problem doesn't help. What I'm doing is a full table scan on a
>>>> table
>>>> > with 6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and
>>>> Hadoop
>>>> > 2.8.4. The table is populated by
>>>> > org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter
>>>> > modified to write 6M entries instead of 50k. Reads are performed by
>>>> > "bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData
>>>> -i
>>>> > muchos -z localhost:2181 -u root -t hellotable -p secret". Here are
>>>> the
>>>> > results I got:
>>>> >
>>>> > 1. 5 tserver cluster as configured by Muchos
>>>> > (https://github.com/apache/fluo-muchos), running on m5d.large AWS
>>>> > machines (2vCPU, 8GB RAM) running CentOS 7. Master is on a separate
>>>> > server. Scan took 12 seconds.
>>>> > 2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results.
>>>> > 3. Splitting the table to 4 tablets causes the runtime to increase to
>>>> 16
>>>> > seconds.
>>>> > 4. 7 tserver cluster running m5d.xlarge servers. 12 seconds.
>>>> > 5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM), running
>>>> > Amazon Linux. Configuration as provided by Uno
>>>> > (https://github.com/apache/fluo-uno). Total time was 26 seconds.
>>>> >
>>>> > Offhand I would say this is very slow. I'm guessing I'm making some
>>>> sort
>>>> > of newbie (possibly configuration) mistake but I can't figure out
>>>> what
>>>> > it is. Can anyone point me to something that might help me find out
>>>> what
>>>> > it is?
>>>> >
>>>> > thanks,
>>>> > Guy.
>>>> >
>>>> >
>>>>
>>>

Reply via email to