Re: Why may "tablet read ahead" take long time? (was: Profile a (batch) scan)

Maxim Kolchin Wed, 16 Jan 2019 01:10:07 -0800

Hi Adam,

Thank you for the list!


In my case, there is only one client which runs a single query which I'm
tracing, so I hope the thread pools have free threads.

I'd like to look at the code which runs in the scope of the "table read
ahead". Do you know where I should look at? I tried to search by keywords
on GitHub, but it wasn't able to find a string "table read ahead" which is
given to the tracer.

Cheers,
Maxim

On Tue, Jan 15, 2019 at 9:01 PM Adam Fuchs <[email protected]> wrote:

> Hi Maxim,
>
> What you're seeing is an artifact of the threading model that Accumulo
> uses. When you launch a query, Accumulo tablet servers will coordinate RPCs
> via Thrift in one thread pool (which grows unbounded) and queue up scans
> (rfile lookups, decryption/decompression, iterators, etc.) in another
> threadpool known as the readahead pool (which has a fixed number of
> threads). You're seeing everything that happens in that readahead thread in
> one big chunk. You may need to look a bit deeper into profiling/sampling
> tablet server CPU to get insights into how to improve your query
> performance. If you want to speed up queries in general you might try (in
> no particular order):
> 1. Increase parallelism by bumping up the readahead threads
> (tserver.readahead.concurrent.max). This will still be bounded by the
> number of parallel scans clients are driving.
> 2. Increase parallelism driven by clients by querying more, smaller
> ranges, or by splitting tablets.
> 3. Increase scan batch sizes if the readahead thread or thrift
> coordination overhead is high.
> 4. Optimize custom iterators if that is a CPU bottleneck.
> 5. Increase cache sizes or otherwise modify queries to improve cache hit
> rates.
> 6. Change compression settings if that is a CPU bottleneck. Try snappy
> instead of gz.
>
> Cheers,
> Adam
>
> On Tue, Jan 15, 2019, 10:45 AM Maxim Kolchin <[email protected] wrote:
>
>> Hi all,
>>
>> I try to trace some scans with Zipkin and see that quite often the trace
>> called "tablet read ahead" takes 10x or 100x more time than the other
>> similar traces.
>>
>> Why it may happen? What could be done to reduce the time? I found a
>> similar discussion on the list, but it doesn't have an answer. I'd be great
>> to have a how-to article listing some steps which could be done.
>>
>> Attaching a screenshot of one of the traces having this issue.
>>
>> Maxim Kolchin
>>
>> E-mail: [email protected]
>> Tel.: +7 (911) 199-55-73
>> Homepage: http://kolchinmax.ru
>>
>> Below you can find a good example of what I'm struggling to understand
>>> right now. It's a trace for a simple scan over some columns with a
>>> BatchScanner using 75 threads. The scan takes 877 milliseconds and the main
>>> contributor is the entry "tablet read ahead 1", which starts at 248 ms.
>>> These are the questions that I cannot answer with this trace:
>>>
>>>    1. why this heavy operation starts after 248ms? By summing up the delay
>>>    before this operation you get a number which is not even close to 248ms.
>>>    2. what does "tablet read ahead 1" means? In general, how to map the
>>>    entries of a trace to their meaning? Is there a guide about this?
>>>    3. why "tablet read ahead 1" takes 600ms? It's clearly not the sum of
>>>    the entries under this one but that's the important part.
>>>    4. I may be naive but...how much data have been read by this scan? How
>>>    many entries? That's very important to understand what's going on.
>>>
>>> Thanks for the help,
>>>
>>> Mario
>>>
>>> 877+ 0 Dice@h01 counts
>>> 2+ 7 tserver@h12 startScan
>>> 6+ 10 tserver@h15 startScan
>>> 5+ 11 tserver@h15 metadata tablets read ahead 4
>>> 843+ 34 Dice@h01 batch scanner 74- 1
>>> 620+ 230 tserver@h09 startMultiScan
>>> 600+ 248 tserver@h09 tablet read ahead 1
>>> 22+ 299 tserver@h09 newDFSInputStream
>>> 22+ 299 tserver@h09 getBlockLocations
>>> 2+ 310 tserver@h09 ClientNamenodeProtocol#getBlockLocations
>>> 1+ 321 tserver@h09 getFileInfo
>>> 1+ 321 tserver@h09 ClientNamenodeProtocol#getFileInfo
>>> 2+ 322 tserver@h09 DFSInputStream#byteArrayRead
>>> 1+ 324 tserver@h09 DFSInputStream#byteArrayRead
>>> 2+ 831 tserver@h09 DFSInputStream#byteArrayRead
>>> 2+ 834 tserver@h09 DFSInputStream#byteArrayRead
>>> 1+ 835 tserver@h09 BlockReaderLocal#fillBuffer(1091850413)
>>> 1+ 874 tserver@h09 closeMultiScan
>>> --
>>> Mario Pastorelli | TERALYTICS
>>>
>>> *software engineer*
>>>
>>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
>>> phone: +41794381682
>>> email: [email protected]
>>>
>>> Company registration number: CH-020.3.037.709-7 | Trade register Canton
>>> Zurich
>>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
>>> de Vries
>>>
>>> This e-mail message contains confidential information which is for the sole
>>> attention and use of the intended recipient. Please notify us at once if
>>> you think that it may not be intended for you and delete it immediately.
>>>
>>>

Re: Why may "tablet read ahead" take long time? (was: Profile a (batch) scan)

Reply via email to