Re: what's the differenct between drill and optiq

2015-05-28 Thread Ted Dunning
Answers in-line. On Thu, May 28, 2015 at 8:08 AM, Andrew Brust andrew.br...@bluebadgeinsights.com wrote: Absolutely nothing to apologize for, and the below explanation is very helpful. You are too kind. FWIW, I certainly understood that Hive's use of Calcite offered relatively little

Re: what's the differenct between drill and optiq

2015-05-28 Thread Ted Dunning
Flink is very impressive (I helped bring them to Apache and maintain close contacts with the project founders). Flink is also very nicely complementary to Drill in that it brings a new kind of execution environment. This environment has some very cool capabilities that might work well in Drill.

Re: Drill logical plan optimization

2015-05-28 Thread Piotr Sokólski
What about the QueryType.LOGICAL parameter of the submit_plan script? Is this broken as well? -- Piotr Sokólski On Friday 29 May 2015 at 00:29, Jason Altekruse wrote: Currently Drill does not allow submission of logical plans. I think that the web interface is out of date and claims you

Re: Monitoring long / stuck CTAS

2015-05-28 Thread Mehant Baid
I think the problem might be related to a single laggard, looks like we are waiting for one minor fragment to complete. Based on the output you provided looks like the fragment 1_1 hasn't completed. You might want to find out where the fragment was scheduled and what is going on in that node.

Re: Monitoring long / stuck CTAS

2015-05-28 Thread Matt
That is a good point. The difference between the number of source rows, and those that made it into the parquet files is about the same count as the other fragments. Indeed the query profile does show fragment 1_1 as CANCELED while the others all have State FINISHED. Additionally the other

Re: Monitoring long / stuck CTAS

2015-05-28 Thread Matt
Bumping memory to: DRILL_MAX_DIRECT_MEMORY=16G DRILL_HEAP=8G The 44GB file imported successfully in 25 minutes - acceptable on this hardware. I don't know if the default memory setting was to blame or not. On 28 May 2015, at 14:22, Andries Engelbrecht wrote: That is the Drill direct

Re: Monitoring long / stuck CTAS

2015-05-28 Thread Jason Altekruse
He mentioned in his original post that he saw CPU and IO on all of the nodes for a while when the query was active, but it suddenly dropped down to low CPU usage and stopped producing files. It seems like we are failing to detect an error an cancel the query. It is possible that the failure

Re: Monitoring long / stuck CTAS

2015-05-28 Thread Matt
Did you check the log files for any errors? No messages related to this query containing errors or warning, nor nothing mentioning memory or heap. Querying now to determine what is missing in the parquet destination. drillbit.out on the master shows no error messages, and what looks like

Re: Monitoring long / stuck CTAS

2015-05-28 Thread Andries Engelbrecht
That is the Drill direct memory per node. DRILL_HEAP is for the heap size per node. More info here http://drill.apache.org/docs/configuring-drill-memory/ —Andries On May 28, 2015, at 11:09 AM, Matt bsg...@gmail.com wrote: Referencing http://drill.apache.org/docs/configuring-drill-memory/

Re: Drill logical plan optimization

2015-05-28 Thread Jason Altekruse
I would assume so, one of the developers more familiar with the planning process might be able to give better insight into the implications of not sending these logical plans through the final physical rewrite rules. That being said, the code path that it is taking is not doing anything fancy to

Re: Custom UDFS slow

2015-05-28 Thread Steven Phillips
Could you include the physical plan generated for each query? Since you say you tried copying the exact code from Drill's EXTRACT function, you should see the same performance, unless for some reason the plan is different. There is no difference whatsoever between UDFs and built in functions.

Drill logical plan optimization

2015-05-28 Thread Rajkumar Singh
Hi I am looking for some measures/params to looked upon to optimize the drill logical query plan if i want to resubmit it through the Drill UI, Could you please points me some docs so that I can go through it. Rajkumar Singh MapR Technologies

RE: what's the differenct between drill and optiq

2015-05-28 Thread Andrew Brust
Agreed, and very interesting. Lots of people at Datameer seem impressed by Flink. I have to look up Kylin... -Original Message- From: Jacques Nadeau [mailto:jacq...@apache.org] Sent: Thursday, May 28, 2015 1:20 AM To: user@drill.apache.org Subject: Re: what's the differenct between

Re: Monitoring long / stuck CTAS

2015-05-28 Thread Matt
To make sure I am adjusting the correct config, these are heap parameters within the Drill configure path, not for Hadoop or Zookeeper? On May 28, 2015, at 12:08 PM, Jason Altekruse altekruseja...@gmail.com wrote: There should be no upper limit on the size of the tables you can create

Re: Monitoring long / stuck CTAS

2015-05-28 Thread Matt
I did not note any memory errors or warnings in a quick scan of the logs, but to double check, is there a specific log I would find such warnings in? On May 28, 2015, at 12:01 PM, Andries Engelbrecht aengelbre...@maprtech.com wrote: I have used a single CTAS to create tables using parquet

Re: Monitoring long / stuck CTAS

2015-05-28 Thread Jason Altekruse
That is correct. I guess it could be possible that HDFS might run out of heap, but I'm guessing that is unlikely the cause of the failure you are seeing. We should not be taxing zookeeper enough to be causing any issues there. On Thu, May 28, 2015 at 9:17 AM, Matt bsg...@gmail.com wrote: To

Re: Monitoring long / stuck CTAS

2015-05-28 Thread Matt
How large is the data set you are working with, and your cluster/nodes? Just testing with that single 44GB source file currently, and my test cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB Ext4 volume (RAID-10). Drill defaults left as come in v1.0. I will be adjusting

Re: Monitoring long / stuck CTAS

2015-05-28 Thread Jason Altekruse
There should be no upper limit on the size of the tables you can create with Drill. Be advised that Drill does currently operate entirely optimistically in regards to available resources. If a network connection between two drillbits fails during a query, we will not currently re-schedule the work

Re: what's the differenct between drill and optiq

2015-05-28 Thread Jinfeng Ni
AFAIK, traditional RDBMSes do not do later binding functionality for SQL query. But their XQuery/XPath extension may do late binding. This is because an XML is simply a semi structured doc, just like what JSON looks like. The query planner normally does not have knowledge of the schema of XML

Re: Drill logical plan optimization

2015-05-28 Thread Sudheesh Katkam
Hi Rajkumar, Here are some links: http://drill.apache.org/docs/performance-tuning-introduction/ http://drill.apache.org/docs/performance-tuning-introduction/ (Performance Tuning Guide) http://drill.apache.org/docs/query-plans/ http://drill.apache.org/docs/query-plans/ Did you mean optimize

Re: Drill logical plan optimization

2015-05-28 Thread Rajkumar Singh
I am looking to optimize the logical query plan so that I can resubmit it through the DRILL UI and see the results. Thanks On May 28, 2015, at 10:23 PM, Sudheesh Katkam skat...@maprtech.com wrote: Hi Rajkumar, Here are some links: