Re: Creating a Tableau extracts with Drill 1.12 uses unlimited memory

Andries Engelbrecht Mon, 29 Jan 2018 07:30:27 -0800

Also a couple of other things to consider (may not directly fix your current 
issue, but good practices).

1) In the S3 bucket how many parquet files are present that you are trying to
query with Drill? (Other have asked for the Drill plans as well, as it will
also be good to see how many minor fragments the Drill cluster is using to read
the S3 data).
2) The View with Select * from xxxxx is typically a bad idea when dealing with
BI tools. Drill is based on Schema discovery and by doing this you are forcing
Drill to discover the Schema every time a tool requests metadata, and you also
make query planning much harder and more expensive.
3) For parquet data you may consider updating the metadata cache, help to
improve query planning speed, metadata operations, etc. Also a good check to
see if the data is healthy.

You may consider looking at some pointers here .
https://community.mapr.com/community/exchange/blog/2017/01/25/drill-best-practices-for-bi-and-analytical-tools

--Andries

On 1/28/18, 6:18 PM, "Kunal Khatua" <kkha...@mapr.com> wrote:

Hi Francis

Looks like the logs didn’t get attached.

As for the physical plan, ignore the “memory = 0.0” ; because it is a
physical plan and not the actual executed query’s profile.

What you want to do to debug the possible area where the memory is being
consumed is in the query’s profile page.

This will tell you which is the longest running major fragment:

http://<hostname>:8047/profiles/<queryId>#fragment-overview<https://urldefense.proofpoint.com/v2/url?u=http-3A__-253chostname-253e-3A8047_profiles_-253cqueryId-253e-23fragment-2Doverview&d=DwIGaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU&m=tIURVi72-_36So0nsCefh5MA-mOIGlH1cc7sltau_LE&s=3uyVn3LfMZcx8gIK3Ji3-FUNu5YHfF95t-bcVlSw_JU&e=>

You’re looking for the Max Peak Memory column to see which ones are holding
the most memory.

http://<hostname>:8047/profiles/<queryId>#operator-overview<https://urldefense.proofpoint.com/v2/url?u=http-3A__-253chostname-253e-3A8047_profiles_-253cqueryId-253e-23operator-2Doverview&d=DwIGaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU&m=tIURVi72-_36So0nsCefh5MA-mOIGlH1cc7sltau_LE&s=WHrGV2670nWERkG-w8vw_mw-6qnX-aHRJxeZW8vJjJQ&e=>

And this will tell you the statistics for the various operators’
consumption of memory.

For running such a query through the WebUI is generally not recommended,
because (by design), the WebUI creates a ResultSetListener in the Drillbit
where it holds all the records before sending anything back.
(https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6050&d=DwIGaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU&m=tIURVi72-_36So0nsCefh5MA-mOIGlH1cc7sltau_LE&s=cgP4XNEuVrjWyzrt2dbKEP6oC2dDKEYdr8jnGO-kJkc&e=)

However, for Tableau, the DrillClient should have been consuming the
records, so I’m wondering why Drillbit should run out of memory .

Could you share the following for both scenarios (the Web UI and when
running via Tableau)?

1. The profiles

2. The Drillbit logs (drillbit.out, drillbit.log, etc)

Thanks

Kunal

From: Francis McGregor-Macdonald [mailto:fran...@mc-mac.com]

Sent: Sunday, January 28, 2018 2:49 PM

To: user@drill.apache.org

Subject: Re: Fwd: Creating a Tableau extracts with Drill 1.12 uses
unlimited memory

Hi all,

A physical plan attached ... all memory appears to be 0.0 which seems odd?

Thanks

On Sun, Jan 28, 2018 at 10:37 PM, Francis McGregor-Macdonald
<fran...@mc-mac.com<mailto:fran...@mc-mac.com>> wrote:

And with logs as attachments.

On Sun, Jan 28, 2018 at 9:40 PM, Francis McGregor-Macdonald
<fran...@mc-mac.com<mailto:fran...@mc-mac.com>> wrote:

Thanks Paul and Kunal,

I think I have the right information now. With Paul's changes (and fixing
up a zoo.cfg error) it isn't crashing, rather failing. Logs attached, still
blowing past memory limits. It does the same thing when re-running the query
from the web console so presumably its not actually Tableau related despite me
first generating it that way.

Thanks.

On Sat, Jan 27, 2018 at 1:15 PM, Francis McGregor-Macdonald
<fran...@mc-mac.com<mailto:fran...@mc-mac.com>> wrote:

Thanks Paul,

I will update with your suggested memory allocations also and retry.

Zookeeper crashed too which might explain more? I have attached the logs
from Zookeeper too.

Thanks

On Sat, Jan 27, 2018 at 6:45 AM, Paul Rogers
<par0...@yahoo.com<mailto:par0...@yahoo.com>> wrote:

Hi Francis,

Thanks much for the log. The log shows running a query, then immediately
shows entries that occur when starting Drill. I'm guessing that Drill literally
crashed at this point? This is more severe than the usual error in which a
query exhausts memory.

Some general observations. The Drill memory is 60 GB, but system memory is
61 GB. Perhaps try dropping total Drill memory some to give the OS and other
tasks more headroom. For a SELECT * memory, Drill needs far less than what you
have, so maybe try giving Drill 48 GB total.

Then, Drill needs direct memory much more than heap. So, maybe give Drill
39 GB direct, 8 GB heap and 1 GB (the default) for code cache. These settings
are in drill-env.sh.

Kunal, you have more experience with these issues. Can you make additional
suggestions by looking at the log?

Thanks,

- Paul

On Thursday, January 25, 2018, 10:20:29 PM PST, Francis McGregor-Macdonald
<fran...@mc-mac.com<mailto:fran...@mc-mac.com>> wrote:

Hi all,

I am guessing that each of your EMR nodes are quite large? EMR nodes are:
r4.2xlarge ('vcpu': 8, 'memory': 61)

Property "planner.width.max_per_node" is set to = 6

What is the system memory and what are the allocations for heap and direct?

System Memory: 61GB (EMR nodes above)

drill_mem_heap: 12G

drill_mem_max: 48G

The view is simple: SELECT * FROM s3://myparquet.parquet (14GB)

planner.memory.max_query_memor y_per_node = 10479720202

Drillbit.log attached (I think I have the correct selection included).

Thanks

On Fri, Jan 26, 2018 at 2:41 PM, Kunal Khatua
<kkha...@mapr.com<mailto:kkha...@mapr.com>> wrote:

What is the system memory and what are the allocations for heap and direct?
The memory crash might be occurring due to insufficient heap. The limits
parameter applies to the direct memory and not Heap.

Can you share details in the logs from the crash?

-----Original Message-----

From: Timothy Farkas [mailto:tfar...@mapr.com<mailto:tfar...@mapr.com>]

Sent: Thursday, January 25, 2018 2:58 PM

To: user@drill.apache.org<mailto:user@drill.apache.org>

Subject: Re: Creating a Tableau extracts with Drill 1.12 uses unlimited
memory

Hi Francis,

I am guessing that each of your EMR nodes are quite large (32 or 64 vcpus).
On large machines Drill's planner over parallelizes and over allocates memory.
There is a property "planner.width.max_per_node" which limits the number of
operators that can simultaneously execute on a Drillbit for a query. If you
configure the width per node to something like 5 or 10 (you may have to play
around with it) things should start working.

Thanks,

Tim

______________________________ __

From: Francis McGregor-Macdonald
<fran...@mc-mac.com<mailto:fran...@mc-mac.com>>

Sent: Thursday, January 25, 2018 1:58:22 PM

To: user@drill.apache.org<mailto:user@drill.apache.org>

Subject: Creating a Tableau extracts with Drill 1.12 uses unlimited memory

Creating a creating a Tableau (with 10.3, 10.5 desktop) extract from a
Drill (1.12 on EMR) cluster memory appears not to adhere to the limits set by
planner.memory.max_query_memor y_per_node.

The extract query consumes all memory and then crashes drill.

Running the same query as a create table memory behaves as expected.

The query complexity is trivial:

select * from view only a single parquet with no calculated fields.

Has anyone else observed this behavior?

Re: Creating a Tableau extracts with Drill 1.12 uses unlimited memory

Reply via email to