Apologies for the plug, but using MapR FS would help you a lot here. The
trick is that you can run an NFS server on every node and mount that server
as localhost.
The benefits are:
1) the entire cluster appears as a conventional POSIX style file system in
addition to being available via HDFS
What Ted just talked about is also explained in this On Demand Training
https://www.mapr.com/services/mapr-academy/mapr-distribution-essentials-training-course-on-demand
(which is free)
On Fri, May 29, 2015 at 5:29 PM, Ted Dunning ted.dunn...@gmail.com wrote:
There are two methods to
I think the problem might be related to a single laggard, looks like we
are waiting for one minor fragment to complete. Based on the output you
provided looks like the fragment 1_1 hasn't completed. You might want to
find out where the fragment was scheduled and what is going on in that
node.
That is a good point. The difference between the number of source rows,
and those that made it into the parquet files is about the same count as
the other fragments.
Indeed the query profile does show fragment 1_1 as CANCELED while the
others all have State FINISHED. Additionally the other
Bumping memory to:
DRILL_MAX_DIRECT_MEMORY=16G
DRILL_HEAP=8G
The 44GB file imported successfully in 25 minutes - acceptable on this
hardware.
I don't know if the default memory setting was to blame or not.
On 28 May 2015, at 14:22, Andries Engelbrecht wrote:
That is the Drill direct
He mentioned in his original post that he saw CPU and IO on all of the
nodes for a while when the query was active, but it suddenly dropped down
to low CPU usage and stopped producing files. It seems like we are failing
to detect an error an cancel the query.
It is possible that the failure
Did you check the log files for any errors?
No messages related to this query containing errors or warning, nor
nothing mentioning memory or heap. Querying now to determine what is
missing in the parquet destination.
drillbit.out on the master shows no error messages, and what looks like
That is the Drill direct memory per node.
DRILL_HEAP is for the heap size per node.
More info here
http://drill.apache.org/docs/configuring-drill-memory/
—Andries
On May 28, 2015, at 11:09 AM, Matt bsg...@gmail.com wrote:
Referencing http://drill.apache.org/docs/configuring-drill-memory/
To make sure I am adjusting the correct config, these are heap parameters
within the Drill configure path, not for Hadoop or Zookeeper?
On May 28, 2015, at 12:08 PM, Jason Altekruse altekruseja...@gmail.com
wrote:
There should be no upper limit on the size of the tables you can create
I did not note any memory errors or warnings in a quick scan of the logs, but
to double check, is there a specific log I would find such warnings in?
On May 28, 2015, at 12:01 PM, Andries Engelbrecht aengelbre...@maprtech.com
wrote:
I have used a single CTAS to create tables using parquet
That is correct. I guess it could be possible that HDFS might run out of
heap, but I'm guessing that is unlikely the cause of the failure you are
seeing. We should not be taxing zookeeper enough to be causing any issues
there.
On Thu, May 28, 2015 at 9:17 AM, Matt bsg...@gmail.com wrote:
To
How large is the data set you are working with, and your
cluster/nodes?
Just testing with that single 44GB source file currently, and my test
cluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TB
Ext4 volume (RAID-10).
Drill defaults left as come in v1.0. I will be adjusting
There should be no upper limit on the size of the tables you can create
with Drill. Be advised that Drill does currently operate entirely
optimistically in regards to available resources. If a network connection
between two drillbits fails during a query, we will not currently
re-schedule the work
13 matches
Mail list logo