Hi Adam, Thank you for the list!
In my case, there is only one client which runs a single query which I'm tracing, so I hope the thread pools have free threads. I'd like to look at the code which runs in the scope of the "table read ahead". Do you know where I should look at? I tried to search by keywords on GitHub, but it wasn't able to find a string "table read ahead" which is given to the tracer. Cheers, Maxim On Tue, Jan 15, 2019 at 9:01 PM Adam Fuchs <[email protected]> wrote: > Hi Maxim, > > What you're seeing is an artifact of the threading model that Accumulo > uses. When you launch a query, Accumulo tablet servers will coordinate RPCs > via Thrift in one thread pool (which grows unbounded) and queue up scans > (rfile lookups, decryption/decompression, iterators, etc.) in another > threadpool known as the readahead pool (which has a fixed number of > threads). You're seeing everything that happens in that readahead thread in > one big chunk. You may need to look a bit deeper into profiling/sampling > tablet server CPU to get insights into how to improve your query > performance. If you want to speed up queries in general you might try (in > no particular order): > 1. Increase parallelism by bumping up the readahead threads > (tserver.readahead.concurrent.max). This will still be bounded by the > number of parallel scans clients are driving. > 2. Increase parallelism driven by clients by querying more, smaller > ranges, or by splitting tablets. > 3. Increase scan batch sizes if the readahead thread or thrift > coordination overhead is high. > 4. Optimize custom iterators if that is a CPU bottleneck. > 5. Increase cache sizes or otherwise modify queries to improve cache hit > rates. > 6. Change compression settings if that is a CPU bottleneck. Try snappy > instead of gz. > > Cheers, > Adam > > On Tue, Jan 15, 2019, 10:45 AM Maxim Kolchin <[email protected] wrote: > >> Hi all, >> >> I try to trace some scans with Zipkin and see that quite often the trace >> called "tablet read ahead" takes 10x or 100x more time than the other >> similar traces. >> >> Why it may happen? What could be done to reduce the time? I found a >> similar discussion on the list, but it doesn't have an answer. I'd be great >> to have a how-to article listing some steps which could be done. >> >> Attaching a screenshot of one of the traces having this issue. >> >> Maxim Kolchin >> >> E-mail: [email protected] >> Tel.: +7 (911) 199-55-73 >> Homepage: http://kolchinmax.ru >> >> Below you can find a good example of what I'm struggling to understand >>> right now. It's a trace for a simple scan over some columns with a >>> BatchScanner using 75 threads. The scan takes 877 milliseconds and the main >>> contributor is the entry "tablet read ahead 1", which starts at 248 ms. >>> These are the questions that I cannot answer with this trace: >>> >>> 1. why this heavy operation starts after 248ms? By summing up the delay >>> before this operation you get a number which is not even close to 248ms. >>> 2. what does "tablet read ahead 1" means? In general, how to map the >>> entries of a trace to their meaning? Is there a guide about this? >>> 3. why "tablet read ahead 1" takes 600ms? It's clearly not the sum of >>> the entries under this one but that's the important part. >>> 4. I may be naive but...how much data have been read by this scan? How >>> many entries? That's very important to understand what's going on. >>> >>> Thanks for the help, >>> >>> Mario >>> >>> 877+ 0 Dice@h01 counts >>> 2+ 7 tserver@h12 startScan >>> 6+ 10 tserver@h15 startScan >>> 5+ 11 tserver@h15 metadata tablets read ahead 4 >>> 843+ 34 Dice@h01 batch scanner 74- 1 >>> 620+ 230 tserver@h09 startMultiScan >>> 600+ 248 tserver@h09 tablet read ahead 1 >>> 22+ 299 tserver@h09 newDFSInputStream >>> 22+ 299 tserver@h09 getBlockLocations >>> 2+ 310 tserver@h09 ClientNamenodeProtocol#getBlockLocations >>> 1+ 321 tserver@h09 getFileInfo >>> 1+ 321 tserver@h09 ClientNamenodeProtocol#getFileInfo >>> 2+ 322 tserver@h09 DFSInputStream#byteArrayRead >>> 1+ 324 tserver@h09 DFSInputStream#byteArrayRead >>> 2+ 831 tserver@h09 DFSInputStream#byteArrayRead >>> 2+ 834 tserver@h09 DFSInputStream#byteArrayRead >>> 1+ 835 tserver@h09 BlockReaderLocal#fillBuffer(1091850413) >>> 1+ 874 tserver@h09 closeMultiScan >>> -- >>> Mario Pastorelli | TERALYTICS >>> >>> *software engineer* >>> >>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland >>> phone: +41794381682 >>> email: [email protected] >>> >>> Company registration number: CH-020.3.037.709-7 | Trade register Canton >>> Zurich >>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann >>> de Vries >>> >>> This e-mail message contains confidential information which is for the sole >>> attention and use of the intended recipient. Please notify us at once if >>> you think that it may not be intended for you and delete it immediately. >>> >>>
