Maxim, Below are some pointers into the code you asked about. Let me know if you have any questions about the code.
The following tablet server code puts a task on a thread pool for a batch scan. https://github.com/apache/accumulo/blob/rel/1.9.2/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java#L776 This thread pool is created here. https://github.com/apache/accumulo/blob/rel/1.9.2/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServerResourceManager.java#L227 The function that creates the thread pool calls the following code to wrap the thread pool https://github.com/apache/accumulo/blob/rel/1.9.2/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServerResourceManager.java#L111 Keith On Wed, Jan 16, 2019 at 4:09 AM Maxim Kolchin <[email protected]> wrote: > > Hi Adam, > > Thank you for the list! > > In my case, there is only one client which runs a single query which I'm > tracing, so I hope the thread pools have free threads. > > I'd like to look at the code which runs in the scope of the "table read > ahead". Do you know where I should look at? I tried to search by keywords on > GitHub, but it wasn't able to find a string "table read ahead" which is given > to the tracer. > > Cheers, > Maxim > > On Tue, Jan 15, 2019 at 9:01 PM Adam Fuchs <[email protected]> wrote: >> >> Hi Maxim, >> >> What you're seeing is an artifact of the threading model that Accumulo uses. >> When you launch a query, Accumulo tablet servers will coordinate RPCs via >> Thrift in one thread pool (which grows unbounded) and queue up scans (rfile >> lookups, decryption/decompression, iterators, etc.) in another threadpool >> known as the readahead pool (which has a fixed number of threads). You're >> seeing everything that happens in that readahead thread in one big chunk. >> You may need to look a bit deeper into profiling/sampling tablet server CPU >> to get insights into how to improve your query performance. If you want to >> speed up queries in general you might try (in no particular order): >> 1. Increase parallelism by bumping up the readahead threads >> (tserver.readahead.concurrent.max). This will still be bounded by the number >> of parallel scans clients are driving. >> 2. Increase parallelism driven by clients by querying more, smaller ranges, >> or by splitting tablets. >> 3. Increase scan batch sizes if the readahead thread or thrift coordination >> overhead is high. >> 4. Optimize custom iterators if that is a CPU bottleneck. >> 5. Increase cache sizes or otherwise modify queries to improve cache hit >> rates. >> 6. Change compression settings if that is a CPU bottleneck. Try snappy >> instead of gz. >> >> Cheers, >> Adam >> >> On Tue, Jan 15, 2019, 10:45 AM Maxim Kolchin <[email protected] wrote: >>> >>> Hi all, >>> >>> I try to trace some scans with Zipkin and see that quite often the trace >>> called "tablet read ahead" takes 10x or 100x more time than the other >>> similar traces. >>> >>> Why it may happen? What could be done to reduce the time? I found a similar >>> discussion on the list, but it doesn't have an answer. I'd be great to have >>> a how-to article listing some steps which could be done. >>> >>> Attaching a screenshot of one of the traces having this issue. >>> >>> Maxim Kolchin >>> >>> E-mail: [email protected] >>> Tel.: +7 (911) 199-55-73 >>> Homepage: http://kolchinmax.ru >>> >>>> Below you can find a good example of what I'm struggling to understand >>>> right now. It's a trace for a simple scan over some columns with a >>>> BatchScanner using 75 threads. The scan takes 877 milliseconds and the main >>>> contributor is the entry "tablet read ahead 1", which starts at 248 ms. >>>> These are the questions that I cannot answer with this trace: >>>> >>>> 1. why this heavy operation starts after 248ms? By summing up the delay >>>> before this operation you get a number which is not even close to 248ms. >>>> 2. what does "tablet read ahead 1" means? In general, how to map the >>>> entries of a trace to their meaning? Is there a guide about this? >>>> 3. why "tablet read ahead 1" takes 600ms? It's clearly not the sum of >>>> the entries under this one but that's the important part. >>>> 4. I may be naive but...how much data have been read by this scan? How >>>> many entries? That's very important to understand what's going on. >>>> >>>> Thanks for the help, >>>> >>>> Mario >>>> >>>> 877+ 0 Dice@h01 counts >>>> 2+ 7 tserver@h12 startScan >>>> 6+ 10 tserver@h15 startScan >>>> 5+ 11 tserver@h15 metadata tablets read ahead 4 >>>> 843+ 34 Dice@h01 batch scanner 74- 1 >>>> 620+ 230 tserver@h09 startMultiScan >>>> 600+ 248 tserver@h09 tablet read ahead 1 >>>> 22+ 299 tserver@h09 newDFSInputStream >>>> 22+ 299 tserver@h09 getBlockLocations >>>> 2+ 310 tserver@h09 ClientNamenodeProtocol#getBlockLocations >>>> 1+ 321 tserver@h09 getFileInfo >>>> 1+ 321 tserver@h09 ClientNamenodeProtocol#getFileInfo >>>> 2+ 322 tserver@h09 DFSInputStream#byteArrayRead >>>> 1+ 324 tserver@h09 DFSInputStream#byteArrayRead >>>> 2+ 831 tserver@h09 DFSInputStream#byteArrayRead >>>> 2+ 834 tserver@h09 DFSInputStream#byteArrayRead >>>> 1+ 835 tserver@h09 BlockReaderLocal#fillBuffer(1091850413) >>>> 1+ 874 tserver@h09 closeMultiScan >>>> -- >>>> Mario Pastorelli | TERALYTICS >>>> >>>> *software engineer* >>>> >>>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland >>>> phone: +41794381682 >>>> email: [email protected] >>>> www.teralytics.net >>>> >>>> Company registration number: CH-020.3.037.709-7 | Trade register Canton >>>> Zurich >>>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann >>>> de Vries >>>> >>>> This e-mail message contains confidential information which is for the sole >>>> attention and use of the intended recipient. Please notify us at once if >>>> you think that it may not be intended for you and delete it immediately.
