Answers in-line.
On Thu, May 28, 2015 at 8:08 AM, Andrew Brust
andrew.br...@bluebadgeinsights.com wrote:
Absolutely nothing to apologize for, and the below explanation is very
helpful.
You are too kind.
FWIW, I certainly understood that Hive's use of Calcite offered relatively
little
Flink is very impressive (I helped bring them to Apache and maintain close
contacts with the project founders).
Flink is also very nicely complementary to Drill in that it brings a new
kind of execution environment. This environment has some very cool
capabilities that might work well in Drill.
What about the QueryType.LOGICAL parameter of the submit_plan script? Is this
broken as well?
--
Piotr Sokólski
On Friday 29 May 2015 at 00:29, Jason Altekruse wrote:
Currently Drill does not allow submission of logical plans. I think that
the web interface is out of date and claims you
I think the problem might be related to a single laggard, looks like we
are waiting for one minor fragment to complete. Based on the output you
provided looks like the fragment 1_1 hasn't completed. You might want to
find out where the fragment was scheduled and what is going on in that
node.
That is a good point. The difference between the number of source rows,
and those that made it into the parquet files is about the same count as
the other fragments.
Indeed the query profile does show fragment 1_1 as CANCELED while the
others all have State FINISHED. Additionally the other
Bumping memory to:
DRILL_MAX_DIRECT_MEMORY=16G
DRILL_HEAP=8G
The 44GB file imported successfully in 25 minutes - acceptable on this
hardware.
I don't know if the default memory setting was to blame or not.
On 28 May 2015, at 14:22, Andries Engelbrecht wrote:
That is the Drill direct
He mentioned in his original post that he saw CPU and IO on all of the
nodes for a while when the query was active, but it suddenly dropped down
to low CPU usage and stopped producing files. It seems like we are failing
to detect an error an cancel the query.
It is possible that the failure
Did you check the log files for any errors?
No messages related to this query containing errors or warning, nor
nothing mentioning memory or heap. Querying now to determine what is
missing in the parquet destination.
drillbit.out on the master shows no error messages, and what looks like
That is the Drill direct memory per node.
DRILL_HEAP is for the heap size per node.
More info here
http://drill.apache.org/docs/configuring-drill-memory/
—Andries
On May 28, 2015, at 11:09 AM, Matt bsg...@gmail.com wrote:
Referencing http://drill.apache.org/docs/configuring-drill-memory/
I would assume so, one of the developers more familiar with the planning
process might be able to give better insight into the implications of not
sending these logical plans through the final physical rewrite rules.
That being said, the code path that it is taking is not doing anything
fancy to
Could you include the physical plan generated for each query?
Since you say you tried copying the exact code from Drill's EXTRACT
function, you should see the same performance, unless for some reason the
plan is different. There is no difference whatsoever between UDFs and built
in functions.
Hi
I am looking for some measures/params to looked upon to optimize the drill
logical query plan if i want to resubmit it through the Drill UI, Could you
please points me some docs so that I can go through it.
Rajkumar Singh
MapR Technologies
Agreed, and very interesting. Lots of people at Datameer seem impressed by
Flink.
I have to look up Kylin...
-Original Message-
From: Jacques Nadeau [mailto:jacq...@apache.org]
Sent: Thursday, May 28, 2015 1:20 AM
To: user@drill.apache.org
Subject: Re: what's the differenct between
To make sure I am adjusting the correct config, these are heap parameters
within the Drill configure path, not for Hadoop or Zookeeper?
On May 28, 2015, at 12:08 PM, Jason Altekruse altekruseja...@gmail.com
wrote:
There should be no upper limit on the size of the tables you can create
I did not note any memory errors or warnings in a quick scan of the logs, but
to double check, is there a specific log I would find such warnings in?
On May 28, 2015, at 12:01 PM, Andries Engelbrecht aengelbre...@maprtech.com
wrote:
I have used a single CTAS to create tables using parquet
That is correct. I guess it could be possible that HDFS might run out of
heap, but I'm guessing that is unlikely the cause of the failure you are
seeing. We should not be taxing zookeeper enough to be causing any issues
there.
On Thu, May 28, 2015 at 9:17 AM, Matt bsg...@gmail.com wrote:
To
How large is the data set you are working with, and your
cluster/nodes?
Just testing with that single 44GB source file currently, and my test
cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB
Ext4 volume (RAID-10).
Drill defaults left as come in v1.0. I will be adjusting
There should be no upper limit on the size of the tables you can create
with Drill. Be advised that Drill does currently operate entirely
optimistically in regards to available resources. If a network connection
between two drillbits fails during a query, we will not currently
re-schedule the work
AFAIK, traditional RDBMSes do not do later binding functionality for SQL
query. But their XQuery/XPath extension may do late binding. This is
because an XML is simply a semi structured doc, just like what JSON looks
like. The query planner normally does not have knowledge of the schema of
XML
Hi Rajkumar,
Here are some links:
http://drill.apache.org/docs/performance-tuning-introduction/
http://drill.apache.org/docs/performance-tuning-introduction/ (Performance
Tuning Guide)
http://drill.apache.org/docs/query-plans/
http://drill.apache.org/docs/query-plans/
Did you mean optimize
I am looking to optimize the logical query plan so that I can resubmit it
through the DRILL UI and see the results.
Thanks
On May 28, 2015, at 10:23 PM, Sudheesh Katkam skat...@maprtech.com wrote:
Hi Rajkumar,
Here are some links:
21 matches
Mail list logo