RE: Drill Capacity

Yun Liu Fri, 03 Nov 2017 13:02:48 -0700

Hi Boaz,

Looks like I've already had those set to "false". So it didn't change much.


Thanks,
Yun

-----Original Message-----
From: Boaz Ben-Zvi [mailto:bben-...@mapr.com] 
Sent: Thursday, November 2, 2017 6:14 PM
To: user@drill.apache.org
Subject: Re: Drill Capacity

 Hi Yun,

     Can you try using the “managed” version of the external sort – either 
change this option to false:

0: jdbc:drill:zk=local> select * from sys.options where name like '%man%';
+----------------------------+----------+-------------------+--------------+----------+----------+-------------+-----------+------------+
|            name            |   kind   | accessibleScopes  | optionScope  |  
status  | num_val  | string_val  | bool_val  | float_val  |
+----------------------------+----------+-------------------+--------------+----------+----------+-------------+-----------+------------+
| exec.sort.disable_managed  | BOOLEAN  | ALL               | BOOT         | 
DEFAULT  | null     | null        | false     | null       |
+----------------------------+----------+-------------------+--------------+----------+----------+-------------+-----------+------------+

Or override it into ‘false’ in the configuration:

0: jdbc:drill:zk=local> select * from sys.boot where name like '%managed%';
+-----------------------------------------------+----------+-------------------+--------------+---------+----------+-------------+-----------+------------+
|                     name                      |   kind   | accessibleScopes  
| optionScope  | status  | num_val  | string_val  | bool_val  | float_val  |
+-----------------------------------------------+----------+-------------------+--------------+---------+----------+-------------+-----------+------------+
| drill.exec.options.exec.sort.disable_managed  | BOOLEAN  | BOOT              
| BOOT         | BOOT    | null     | null        | false     | null       |
+-----------------------------------------------+----------+-------------------+--------------+---------+----------+-------------+-----------+------------+

i.e., in the drill-override.conf file:

  sort: {
     external: {
         disable_managed: false
      }
  }

  Please let us know if this change helped,

         -- Boaz 


On 11/2/17, 1:12 PM, "Yun Liu" <y....@castsoftware.com> wrote:

    Please help me as to what further information I could provide to get this 
going. I am also experiencing a separate issue:
    
    RESOURCE ERROR: One or more nodes ran out of memory while executing the 
query.
    
    Unable to allocate sv2 for 8501 records, and not enough batchGroups to 
spill.
    batchGroups.size 1
    spilledBatchGroups.size 0
    allocated memory 42768000
    allocator limit 41943040
    
    Current setting is: 
    planner.memory.max_query_memory_per_node= 10GB 
    HEAP to 12G 
    Direct memory to 32G 
    Perm to 1024M
    
    What is the issue here?
    
    Thanks,
    Yun
    
    -----Original Message-----
    From: Yun Liu [mailto:y....@castsoftware.com] 
    Sent: Thursday, November 2, 2017 3:52 PM
    To: user@drill.apache.org
    Subject: RE: Drill Capacity
    
    Yes- I increased planner.memory.max_query_memory_per_node to 10GB HEAP to 
12G Direct memory to 16G And Perm to 1024M
    
    It didn't have any schema changes. As with the same file format but less 
data- it works perfectly ok. I am unable to tell if there's corruption.
    
    Yun
    
    -----Original Message-----
    From: Andries Engelbrecht [mailto:aengelbre...@mapr.com]
    Sent: Thursday, November 2, 2017 3:35 PM
    To: user@drill.apache.org
    Subject: Re: Drill Capacity
    
    What memory setting did you increase? Have you tried 6 or 8GB?
    
    How much memory is allocated to Drill Heap and Direct memory for the 
embedded Drillbit?
    
    Also did you check the larger document doesn’t have any schema changes or 
corruption?
    
    --Andries
    
    
    
    On 11/2/17, 12:31 PM, "Yun Liu" <y....@castsoftware.com> wrote:
    
        Hi Kunal and Andries,
        
        Thanks for your reply. We need json in this case because Drill only 
supports up to 65536 columns in a csv file. I also tried increasing the memory 
size to 4GB but I am still experiencing same issues. Drill is installed in 
Embedded Mode.
        
        Thanks,
        Yun
        
        -----Original Message-----
        From: Kunal Khatua [mailto:kkha...@mapr.com] 
        Sent: Thursday, November 2, 2017 2:01 PM
        To: user@drill.apache.org
        Subject: RE: Drill Capacity
        
        Hi Yun
        
        Andries solution should address your problem. However, do understand 
that, unlike CSV files, a JSON file cannot be processed in parallel, because 
there is no clear record delimiter (CSV data usually has a new-line character 
to indicate the end of a record). So, the larger a file gets, the more work a 
single minor fragment has to do in processing it, including maintaining 
internal data-structures to represent the complex JSON document. 
        
        The preferable way would be to create more JSON files so that the files 
can be processed in parallel. 
        
        Hope that helps.
        
        ~ Kunal
        
        -----Original Message-----
        From: Andries Engelbrecht [mailto:aengelbre...@mapr.com] 
        Sent: Thursday, November 02, 2017 10:26 AM
        To: user@drill.apache.org
        Subject: Re: Drill Capacity
        
        How much memory is allocated to the Drill environment?
        Embedded or in a cluster?
        
        I don’t think there is a particular limit, but a single JSON file will 
be read by a single minor fragment, in general it is better to match the 
number/size of files to the Drill environment.
        
        In the short term try to bump up 
planner.memory.max_query_memory_per_node in the options and see if that works 
for you.
        
        --Andries
        
        
        
        On 11/2/17, 7:46 AM, "Yun Liu" <y....@castsoftware.com> wrote:
        
            Hi,
            
            I've been using Apache Drill actively and just wondering what is 
the capacity of Drill? I have a json file which is 390MB and it keeps throwing 
me an DATA_READ ERROR. I have another json file with exact same format but only 
150MB and it's processing fine. When I did a *select* on the large json, it 
returns successfully for some of the fields. None of these errors really apply 
to me. So I am trying to understand the capacity of the json files Drill 
supports up to. Or if there's something else I missed.
            
            Thanks,
            
            Yun Liu
            Solutions Delivery Consultant
            321 West 44th St | Suite 501 | New York, NY 10036
            +1 212.871.8355 office | +1 646.752.4933 mobile
            
            CAST, Leader in Software Analysis and Measurement
            Achieve Insight. Deliver Excellence.
            Join the discussion http://blog.castsoftware.com/
            LinkedIn<http://www.linkedin.com/companies/162909> | 
Twitter<http://twitter.com/onquality> | 
Facebook<http://www.facebook.com/pages/CAST/105668942817177>

RE: Drill Capacity

Reply via email to