Hi Boaz, Looks like I've already had those set to "false". So it didn't change much.
Thanks, Yun -----Original Message----- From: Boaz Ben-Zvi [mailto:bben-...@mapr.com] Sent: Thursday, November 2, 2017 6:14 PM To: user@drill.apache.org Subject: Re: Drill Capacity Hi Yun, Can you try using the “managed” version of the external sort – either change this option to false: 0: jdbc:drill:zk=local> select * from sys.options where name like '%man%'; +----------------------------+----------+-------------------+--------------+----------+----------+-------------+-----------+------------+ | name | kind | accessibleScopes | optionScope | status | num_val | string_val | bool_val | float_val | +----------------------------+----------+-------------------+--------------+----------+----------+-------------+-----------+------------+ | exec.sort.disable_managed | BOOLEAN | ALL | BOOT | DEFAULT | null | null | false | null | +----------------------------+----------+-------------------+--------------+----------+----------+-------------+-----------+------------+ Or override it into ‘false’ in the configuration: 0: jdbc:drill:zk=local> select * from sys.boot where name like '%managed%'; +-----------------------------------------------+----------+-------------------+--------------+---------+----------+-------------+-----------+------------+ | name | kind | accessibleScopes | optionScope | status | num_val | string_val | bool_val | float_val | +-----------------------------------------------+----------+-------------------+--------------+---------+----------+-------------+-----------+------------+ | drill.exec.options.exec.sort.disable_managed | BOOLEAN | BOOT | BOOT | BOOT | null | null | false | null | +-----------------------------------------------+----------+-------------------+--------------+---------+----------+-------------+-----------+------------+ i.e., in the drill-override.conf file: sort: { external: { disable_managed: false } } Please let us know if this change helped, -- Boaz On 11/2/17, 1:12 PM, "Yun Liu" <y....@castsoftware.com> wrote: Please help me as to what further information I could provide to get this going. I am also experiencing a separate issue: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. Unable to allocate sv2 for 8501 records, and not enough batchGroups to spill. batchGroups.size 1 spilledBatchGroups.size 0 allocated memory 42768000 allocator limit 41943040 Current setting is: planner.memory.max_query_memory_per_node= 10GB HEAP to 12G Direct memory to 32G Perm to 1024M What is the issue here? Thanks, Yun -----Original Message----- From: Yun Liu [mailto:y....@castsoftware.com] Sent: Thursday, November 2, 2017 3:52 PM To: user@drill.apache.org Subject: RE: Drill Capacity Yes- I increased planner.memory.max_query_memory_per_node to 10GB HEAP to 12G Direct memory to 16G And Perm to 1024M It didn't have any schema changes. As with the same file format but less data- it works perfectly ok. I am unable to tell if there's corruption. Yun -----Original Message----- From: Andries Engelbrecht [mailto:aengelbre...@mapr.com] Sent: Thursday, November 2, 2017 3:35 PM To: user@drill.apache.org Subject: Re: Drill Capacity What memory setting did you increase? Have you tried 6 or 8GB? How much memory is allocated to Drill Heap and Direct memory for the embedded Drillbit? Also did you check the larger document doesn’t have any schema changes or corruption? --Andries On 11/2/17, 12:31 PM, "Yun Liu" <y....@castsoftware.com> wrote: Hi Kunal and Andries, Thanks for your reply. We need json in this case because Drill only supports up to 65536 columns in a csv file. I also tried increasing the memory size to 4GB but I am still experiencing same issues. Drill is installed in Embedded Mode. Thanks, Yun -----Original Message----- From: Kunal Khatua [mailto:kkha...@mapr.com] Sent: Thursday, November 2, 2017 2:01 PM To: user@drill.apache.org Subject: RE: Drill Capacity Hi Yun Andries solution should address your problem. However, do understand that, unlike CSV files, a JSON file cannot be processed in parallel, because there is no clear record delimiter (CSV data usually has a new-line character to indicate the end of a record). So, the larger a file gets, the more work a single minor fragment has to do in processing it, including maintaining internal data-structures to represent the complex JSON document. The preferable way would be to create more JSON files so that the files can be processed in parallel. Hope that helps. ~ Kunal -----Original Message----- From: Andries Engelbrecht [mailto:aengelbre...@mapr.com] Sent: Thursday, November 02, 2017 10:26 AM To: user@drill.apache.org Subject: Re: Drill Capacity How much memory is allocated to the Drill environment? Embedded or in a cluster? I don’t think there is a particular limit, but a single JSON file will be read by a single minor fragment, in general it is better to match the number/size of files to the Drill environment. In the short term try to bump up planner.memory.max_query_memory_per_node in the options and see if that works for you. --Andries On 11/2/17, 7:46 AM, "Yun Liu" <y....@castsoftware.com> wrote: Hi, I've been using Apache Drill actively and just wondering what is the capacity of Drill? I have a json file which is 390MB and it keeps throwing me an DATA_READ ERROR. I have another json file with exact same format but only 150MB and it's processing fine. When I did a *select* on the large json, it returns successfully for some of the fields. None of these errors really apply to me. So I am trying to understand the capacity of the json files Drill supports up to. Or if there's something else I missed. Thanks, Yun Liu Solutions Delivery Consultant 321 West 44th St | Suite 501 | New York, NY 10036 +1 212.871.8355 office | +1 646.752.4933 mobile CAST, Leader in Software Analysis and Measurement Achieve Insight. Deliver Excellence. Join the discussion http://blog.castsoftware.com/ LinkedIn<http://www.linkedin.com/companies/162909> | Twitter<http://twitter.com/onquality> | Facebook<http://www.facebook.com/pages/CAST/105668942817177>