Paul Rogers created DRILL-5370:
----------------------------------

             Summary: Drillbit dies for 5 MB SELECT statement
                 Key: DRILL-5370
                 URL: https://issues.apache.org/jira/browse/DRILL-5370
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.10.0
            Reporter: Paul Rogers


Some community users use Drill with BI tools that generate queries. One such 
tool generates queries that map Drill data into a "cube" format for a 
cube-based visualization engine. Such tools tend to create very large, very 
complex queries.

In replicating an issue found by this user, I created a simple program that 
creates deeply-nested queries of the form:

SELECT a99 AS a98 FROM (SELECT a97 AS a98 FROM(… SELECT a1 FROM myTable)…))

The test used 200 columns each with names of 500 characters long. (Drill has a 
hard limit of 1024 characters for a symbol name.)

The setup was an embedded Drillbit using the new "cluster fixture" test 
framework. The test ran multiple iterations, each wrapping the prior SELECT in 
a new one as shown above. The result is a series of queries that grew in size 
by about 100K each iteration.

Drill handled SELECT statements up to 5 MB in size, after which the Drillbit 
ran out of heap memory, suffered a fatal exception and exited.

One question is why a 5 MB query exhausted multiple GB of heap during query 
parsing and planning.

But, more importantly, Drill should have some way to protect itself from such 
failures. In a production cluster, heap exhaustion will bring down all 
in-flight queries and require a manual restart of the Drillbit.

So, Drill should enforce some limit on the amount of heap memory used by a 
query during the parsing and planning process.

The community user found a failure at around 1 MB, but they very likely had a 
query with much more complex structure than the simple nested-SELECT used in my 
test.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to