RE: Accumulo Seek performance

Dan Blum Mon, 12 Sep 2016 11:47:39 -0700

I think the 450 ranges returned a total of about 7.5M entries, but the ranges 
were in fact quite small relative to the size of the table.


-----Original Message-----
From: Josh Elser [mailto:[email protected]] 
Sent: Monday, September 12, 2016 2:43 PM
To: [email protected]
Subject: Re: Accumulo Seek performance

What does a "large scan" mean here, Dan?

Sven's original problem statement was running many small/pointed Ranges 
(e.g. point lookups). My observation was that BatchScanners were slower 
than running each in a Scanner when using multiple BS's concurrently.

Dan Blum wrote:
> I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using 
> Scanners was much slower than using a BatchScanner with 11 threads, by about 
> a 5:1 ratio. There were 450 ranges.
>
> -----Original Message-----
> From: Josh Elser [mailto:[email protected]]
> Sent: Monday, September 12, 2016 1:42 PM
> To: [email protected]
> Subject: Re: Accumulo Seek performance
>
> I had increased the readahead threed pool to 32 (from 16). I had also
> increased the minimum thread pool size from 20 to 40. I had 10 tablets
> with the data block cache turned on (probably only 256M tho).
>
> Each tablet had a single file (manually compacted). Did not observe
> cache rates.
>
> I've been working through this with Keith on IRC this morning too. Found
> that a single batchscanner (one partition) is faster than the Scanner.
> Two partitions and things started to slow down.
>
> Two interesting points to still pursue, IMO:
>
> 1. I saw that the tserver-side logging for MultiScanSess was near
> identical to the BatchScanner timings
> 2. The minimum server threads did not seem to be taking effect. Despite
> having the value set to 64, I only saw a few ClientPool threads in a
> jstack after running the test.
>
> Adam Fuchs wrote:
>> Sorry, Monday morning poor reading skills, I guess. :)
>>
>> So, 3000 ranges in 40 seconds with the BatchScanner. In my past
>> experience HDFS seeks tend to take something like 10-100ms, and I would
>> expect that time to dominate here. With 60 client threads your
>> bottleneck should be the readahead pool, which I believe defaults to 16
>> threads. If you get perfect index caching then you should be seeing
>> something like 3000/16*50ms = 9,375ms. That's in the right ballpark, but
>> it assumes no data cache hits. Do you have any idea of how many files
>> you had per tablet after the ingest? Do you know what your cache hit
>> rate was?
>>
>> Adam
>>
>>
>> On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser<[email protected]
>> <mailto:[email protected]>>  wrote:
>>
>>      5 iterations, figured that would be apparent from the log messages :)
>>
>>      The code is already posted in my original message.
>>
>>      Adam Fuchs wrote:
>>
>>          Josh,
>>
>>          Two questions:
>>
>>          1. How many iterations did you do? I would like to see an absolute
>>          number of lookups per second to compare against other observations.
>>
>>          2. Can you post your code somewhere so I can run it?
>>
>>          Thanks,
>>          Adam
>>
>>
>>          On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
>>          <[email protected]<mailto:[email protected]>
>>          <mailto:[email protected]<mailto:[email protected]>>>  wrote:
>>
>>               Sven, et al:
>>
>>               So, it would appear that I have been able to reproduce this one
>>               (better late than never, I guess...). tl;dr Serially using
>>          Scanners
>>               to do point lookups instead of a BatchScanner is ~20x
>>          faster. This
>>               sounds like a pretty serious performance issue to me.
>>
>>               Here's a general outline for what I did.
>>
>>               * Accumulo 1.8.0
>>               * Created a table with 1M rows, each row with 10 columns
>>          using YCSB
>>               (workloada)
>>               * Split the table into 9 tablets
>>               * Computed the set of all rows in the table
>>
>>               For a number of iterations:
>>               * Shuffle this set of rows
>>               * Choose the first N rows
>>               * Construct an equivalent set of Ranges from the set of Rows,
>>               choosing a random column (0-9)
>>               * Partition the N rows into X collections
>>               * Submit X tasks to query one partition of the N rows (to a
>>          thread
>>               pool with X fixed threads)
>>
>>               I have two implementations of these tasks. One, where all
>>          ranges in
>>               a partition are executed via one BatchWriter. A second
>>          where each
>>               range is executed in serial using a Scanner. The numbers
>>          speak for
>>               themselves.
>>
>>               ** BatchScanners **
>>               2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO :
>>          Shuffled
>>               all rows
>>               2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
>>               ranges calculated: 3000 ranges found
>>               2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 40178 ms
>>               2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 42296 ms
>>               2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 46094 ms
>>               2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 47704 ms
>>               2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 49221 ms
>>
>>               ** Scanners **
>>               2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO :
>>          Shuffled
>>               all rows
>>               2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
>>               ranges calculated: 3000 ranges found
>>               2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2833 ms
>>               2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2536 ms
>>               2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2150 ms
>>               2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2061 ms
>>               2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2140 ms
>>
>>               Query code is available
>>          https://github.com/joshelser/accumulo-range-binning
>>          <https://github.com/joshelser/accumulo-range-binning>
>>          <https://github.com/joshelser/accumulo-range-binning
>>          <https://github.com/joshelser/accumulo-range-binning>>
>>
>>
>>               Sven Hodapp wrote:
>>
>>                   Hi Keith,
>>
>>                   I've tried it with 1, 2 or 10 threads. Unfortunately
>>          there where
>>                   no amazing differences.
>>                   Maybe it's a problem with the table structure? For
>>          example it
>>                   may happen that one row id (e.g. a sentence) has several
>>                   thousand column families. Can this affect the seek
>>          performance?
>>
>>                   So for my initial example it has about 3000 row ids to
>>          seek,
>>                   which will return about 500k entries. If I filter for
>>          specific
>>                   column families (e.g. a document without annotations)
>>          it will
>>                   return about 5k entries, but the seek time will only be
>>          halved.
>>                   Are there to much column families to seek it fast?
>>
>>                   Thanks!
>>
>>                   Regards,
>>                   Sven
>>
>>
>>
>

RE: Accumulo Seek performance

Reply via email to