I had increased the readahead threed pool to 32 (from 16). I had also increased the minimum thread pool size from 20 to 40. I had 10 tablets with the data block cache turned on (probably only 256M tho).

Each tablet had a single file (manually compacted). Did not observe cache rates.

I've been working through this with Keith on IRC this morning too. Found that a single batchscanner (one partition) is faster than the Scanner. Two partitions and things started to slow down.

Two interesting points to still pursue, IMO:

1. I saw that the tserver-side logging for MultiScanSess was near identical to the BatchScanner timings 2. The minimum server threads did not seem to be taking effect. Despite having the value set to 64, I only saw a few ClientPool threads in a jstack after running the test.

Adam Fuchs wrote:
Sorry, Monday morning poor reading skills, I guess. :)

So, 3000 ranges in 40 seconds with the BatchScanner. In my past
experience HDFS seeks tend to take something like 10-100ms, and I would
expect that time to dominate here. With 60 client threads your
bottleneck should be the readahead pool, which I believe defaults to 16
threads. If you get perfect index caching then you should be seeing
something like 3000/16*50ms = 9,375ms. That's in the right ballpark, but
it assumes no data cache hits. Do you have any idea of how many files
you had per tablet after the ingest? Do you know what your cache hit
rate was?

Adam


On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser <[email protected]
<mailto:[email protected]>> wrote:

    5 iterations, figured that would be apparent from the log messages :)

    The code is already posted in my original message.

    Adam Fuchs wrote:

        Josh,

        Two questions:

        1. How many iterations did you do? I would like to see an absolute
        number of lookups per second to compare against other observations.

        2. Can you post your code somewhere so I can run it?

        Thanks,
        Adam


        On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
        <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>> wrote:

             Sven, et al:

             So, it would appear that I have been able to reproduce this one
             (better late than never, I guess...). tl;dr Serially using
        Scanners
             to do point lookups instead of a BatchScanner is ~20x
        faster. This
             sounds like a pretty serious performance issue to me.

             Here's a general outline for what I did.

             * Accumulo 1.8.0
             * Created a table with 1M rows, each row with 10 columns
        using YCSB
             (workloada)
             * Split the table into 9 tablets
             * Computed the set of all rows in the table

             For a number of iterations:
             * Shuffle this set of rows
             * Choose the first N rows
             * Construct an equivalent set of Ranges from the set of Rows,
             choosing a random column (0-9)
             * Partition the N rows into X collections
             * Submit X tasks to query one partition of the N rows (to a
        thread
             pool with X fixed threads)

             I have two implementations of these tasks. One, where all
        ranges in
             a partition are executed via one BatchWriter. A second
        where each
             range is executed in serial using a Scanner. The numbers
        speak for
             themselves.

             ** BatchScanners **
             2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO :
        Shuffled
             all rows
             2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
             ranges calculated: 3000 ranges found
             2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
             Executing 6 range partitions using a pool of 6 threads
             2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
        Queries
             executed in 40178 ms
             2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
             Executing 6 range partitions using a pool of 6 threads
             2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
        Queries
             executed in 42296 ms
             2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
             Executing 6 range partitions using a pool of 6 threads
             2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO :
        Queries
             executed in 46094 ms
             2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
             Executing 6 range partitions using a pool of 6 threads
             2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO :
        Queries
             executed in 47704 ms
             2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
             Executing 6 range partitions using a pool of 6 threads
             2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO :
        Queries
             executed in 49221 ms

             ** Scanners **
             2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO :
        Shuffled
             all rows
             2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
             ranges calculated: 3000 ranges found
             2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
             Executing 6 range partitions using a pool of 6 threads
             2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
        Queries
             executed in 2833 ms
             2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
             Executing 6 range partitions using a pool of 6 threads
             2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
        Queries
             executed in 2536 ms
             2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
             Executing 6 range partitions using a pool of 6 threads
             2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
        Queries
             executed in 2150 ms
             2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
             Executing 6 range partitions using a pool of 6 threads
             2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
        Queries
             executed in 2061 ms
             2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
             Executing 6 range partitions using a pool of 6 threads
             2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO :
        Queries
             executed in 2140 ms

             Query code is available
        https://github.com/joshelser/accumulo-range-binning
        <https://github.com/joshelser/accumulo-range-binning>
        <https://github.com/joshelser/accumulo-range-binning
        <https://github.com/joshelser/accumulo-range-binning>>


             Sven Hodapp wrote:

                 Hi Keith,

                 I've tried it with 1, 2 or 10 threads. Unfortunately
        there where
                 no amazing differences.
                 Maybe it's a problem with the table structure? For
        example it
                 may happen that one row id (e.g. a sentence) has several
                 thousand column families. Can this affect the seek
        performance?

                 So for my initial example it has about 3000 row ids to
        seek,
                 which will return about 500k entries. If I filter for
        specific
                 column families (e.g. a document without annotations)
        it will
                 return about 5k entries, but the seek time will only be
        halved.
                 Are there to much column families to seek it fast?

                 Thanks!

                 Regards,
                 Sven



Reply via email to