Hi Kunal and Andries,

Thanks for your reply. We need json in this case because Drill only supports up 
to 65536 columns in a csv file. I also tried increasing the memory size to 4GB 
but I am still experiencing same issues. Drill is installed in Embedded Mode.

Thanks,
Yun

-----Original Message-----
From: Kunal Khatua [mailto:kkha...@mapr.com] 
Sent: Thursday, November 2, 2017 2:01 PM
To: user@drill.apache.org
Subject: RE: Drill Capacity

Hi Yun

Andries solution should address your problem. However, do understand that, 
unlike CSV files, a JSON file cannot be processed in parallel, because there is 
no clear record delimiter (CSV data usually has a new-line character to 
indicate the end of a record). So, the larger a file gets, the more work a 
single minor fragment has to do in processing it, including maintaining 
internal data-structures to represent the complex JSON document. 

The preferable way would be to create more JSON files so that the files can be 
processed in parallel. 

Hope that helps.

~ Kunal

-----Original Message-----
From: Andries Engelbrecht [mailto:aengelbre...@mapr.com] 
Sent: Thursday, November 02, 2017 10:26 AM
To: user@drill.apache.org
Subject: Re: Drill Capacity

How much memory is allocated to the Drill environment?
Embedded or in a cluster?

I don’t think there is a particular limit, but a single JSON file will be read 
by a single minor fragment, in general it is better to match the number/size of 
files to the Drill environment.

In the short term try to bump up planner.memory.max_query_memory_per_node in 
the options and see if that works for you.

--Andries



On 11/2/17, 7:46 AM, "Yun Liu" <y....@castsoftware.com> wrote:

    Hi,
    
    I've been using Apache Drill actively and just wondering what is the 
capacity of Drill? I have a json file which is 390MB and it keeps throwing me 
an DATA_READ ERROR. I have another json file with exact same format but only 
150MB and it's processing fine. When I did a *select* on the large json, it 
returns successfully for some of the fields. None of these errors really apply 
to me. So I am trying to understand the capacity of the json files Drill 
supports up to. Or if there's something else I missed.
    
    Thanks,
    
    Yun Liu
    Solutions Delivery Consultant
    321 West 44th St | Suite 501 | New York, NY 10036
    +1 212.871.8355 office | +1 646.752.4933 mobile
    
    CAST, Leader in Software Analysis and Measurement
    Achieve Insight. Deliver Excellence.
    Join the discussion http://blog.castsoftware.com/
    LinkedIn<http://www.linkedin.com/companies/162909> | 
Twitter<http://twitter.com/onquality> | 
Facebook<http://www.facebook.com/pages/CAST/105668942817177>
    
    

Reply via email to