Re: Suggestion needed for UNION ALL performance in Apache drill

Paul Rogers Wed, 22 Apr 2020 22:14:13 -0700

Hi Sreeparna,

The short answer is it *should* work: a UNION ALL is simply an append. (Be sure 
you are not using a plain UNION as that needs to do more work to remove 
duplicates.)


Since you are seeing unexpected behavior, we may have some kind of issue to 
investigate and perhaps fix. Always hard to do over e-mail, but let's see what 
we can do.


The first question is to understand the full query: are you doing more than a 
simple scan of two files and a UNION ALL? Are there sorts or joins involved?

The best place to start to investigate performance issues is the query profile, 
which it looks like you are doing. What is the time for the scans if you run 
each of the two scans separately? You said that they take 8 and 1 seconds. Is 
that for the whole query or just the scan operators?

Then, when you run the UNION ALL, again looking at the scan operators, is there 
any difference in run times? If the scans take longer, that is one thing to 
investigate. If the scans take the same amount of time, what other operator(s) 
are taking the rest of the time? Your note suggests that it is the scan taking 
the time. But, there should be two scan operators: one for each file. How is 
the time divided between them?


How large are the data files? Using what storage system? How many Drillbits? 
How much memory?


Thanks,
- Paul

 

    On Wednesday, April 22, 2020, 11:32:24 AM PDT, sreeparna bhabani 
<[email protected]> wrote:  
 
 Hi Team,

I reach out to you for a specific problem regarding UNION ALL. There is one
UNION ALL statement which combines 2 queries. The individual queries are
taking 8 secs and 1 sec respectively. But UNION ALL takes 30 secs.
PARQUET_SCAN_ROW_GROUP takes the maximum time. Apache drill version is 1.17.

Please help to suggest how to improve this UNION ALL performance. We are
using parquet file.

Thanks,
Sreeparna Bhabani

Re: Suggestion needed for UNION ALL performance in Apache drill

Reply via email to