I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using Scanners was much slower than using a BatchScanner with 11 threads, by about a 5:1 ratio. There were 450 ranges.
-----Original Message----- From: Josh Elser [mailto:[email protected]] Sent: Monday, September 12, 2016 1:42 PM To: [email protected] Subject: Re: Accumulo Seek performance I had increased the readahead threed pool to 32 (from 16). I had also increased the minimum thread pool size from 20 to 40. I had 10 tablets with the data block cache turned on (probably only 256M tho). Each tablet had a single file (manually compacted). Did not observe cache rates. I've been working through this with Keith on IRC this morning too. Found that a single batchscanner (one partition) is faster than the Scanner. Two partitions and things started to slow down. Two interesting points to still pursue, IMO: 1. I saw that the tserver-side logging for MultiScanSess was near identical to the BatchScanner timings 2. The minimum server threads did not seem to be taking effect. Despite having the value set to 64, I only saw a few ClientPool threads in a jstack after running the test. Adam Fuchs wrote: > Sorry, Monday morning poor reading skills, I guess. :) > > So, 3000 ranges in 40 seconds with the BatchScanner. In my past > experience HDFS seeks tend to take something like 10-100ms, and I would > expect that time to dominate here. With 60 client threads your > bottleneck should be the readahead pool, which I believe defaults to 16 > threads. If you get perfect index caching then you should be seeing > something like 3000/16*50ms = 9,375ms. That's in the right ballpark, but > it assumes no data cache hits. Do you have any idea of how many files > you had per tablet after the ingest? Do you know what your cache hit > rate was? > > Adam > > > On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser <[email protected] > <mailto:[email protected]>> wrote: > > 5 iterations, figured that would be apparent from the log messages :) > > The code is already posted in my original message. > > Adam Fuchs wrote: > > Josh, > > Two questions: > > 1. How many iterations did you do? I would like to see an absolute > number of lookups per second to compare against other observations. > > 2. Can you post your code somewhere so I can run it? > > Thanks, > Adam > > > On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser > <[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > Sven, et al: > > So, it would appear that I have been able to reproduce this one > (better late than never, I guess...). tl;dr Serially using > Scanners > to do point lookups instead of a BatchScanner is ~20x > faster. This > sounds like a pretty serious performance issue to me. > > Here's a general outline for what I did. > > * Accumulo 1.8.0 > * Created a table with 1M rows, each row with 10 columns > using YCSB > (workloada) > * Split the table into 9 tablets > * Computed the set of all rows in the table > > For a number of iterations: > * Shuffle this set of rows > * Choose the first N rows > * Construct an equivalent set of Ranges from the set of Rows, > choosing a random column (0-9) > * Partition the N rows into X collections > * Submit X tasks to query one partition of the N rows (to a > thread > pool with X fixed threads) > > I have two implementations of these tasks. One, where all > ranges in > a partition are executed via one BatchWriter. A second > where each > range is executed in serial using a Scanner. The numbers > speak for > themselves. > > ** BatchScanners ** > 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : > Shuffled > all rows > 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All > ranges calculated: 3000 ranges found > 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : > Executing 6 range partitions using a pool of 6 threads > 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : > Queries > executed in 40178 ms > 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : > Executing 6 range partitions using a pool of 6 threads > 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : > Queries > executed in 42296 ms > 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : > Executing 6 range partitions using a pool of 6 threads > 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : > Queries > executed in 46094 ms > 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : > Executing 6 range partitions using a pool of 6 threads > 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : > Queries > executed in 47704 ms > 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : > Executing 6 range partitions using a pool of 6 threads > 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : > Queries > executed in 49221 ms > > ** Scanners ** > 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : > Shuffled > all rows > 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All > ranges calculated: 3000 ranges found > 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : > Executing 6 range partitions using a pool of 6 threads > 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : > Queries > executed in 2833 ms > 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : > Executing 6 range partitions using a pool of 6 threads > 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : > Queries > executed in 2536 ms > 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : > Executing 6 range partitions using a pool of 6 threads > 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : > Queries > executed in 2150 ms > 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : > Executing 6 range partitions using a pool of 6 threads > 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : > Queries > executed in 2061 ms > 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : > Executing 6 range partitions using a pool of 6 threads > 2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : > Queries > executed in 2140 ms > > Query code is available > https://github.com/joshelser/accumulo-range-binning > <https://github.com/joshelser/accumulo-range-binning> > <https://github.com/joshelser/accumulo-range-binning > <https://github.com/joshelser/accumulo-range-binning>> > > > Sven Hodapp wrote: > > Hi Keith, > > I've tried it with 1, 2 or 10 threads. Unfortunately > there where > no amazing differences. > Maybe it's a problem with the table structure? For > example it > may happen that one row id (e.g. a sentence) has several > thousand column families. Can this affect the seek > performance? > > So for my initial example it has about 3000 row ids to > seek, > which will return about 500k entries. If I filter for > specific > column families (e.g. a document without annotations) > it will > return about 5k entries, but the seek time will only be > halved. > Are there to much column families to seek it fast? > > Thanks! > > Regards, > Sven > > >
