Re: Monitoring long / stuck CTAS

Matt Thu, 28 May 2015 10:37:36 -0700

The time seems pretty long for that file size. What type of file isit?


Tab delimited UTF-8 text.

I left the query to run overnight to see if it would complete, but 24hours for an import like this would indeed be too long.

Is the CTAS running single threaded?

In the first hour, with this being the only client connected to thecluster, I observed activity on all 4 nodes.

Is multi-threaded query execution the default? I would not have changedanything deliberately to force single thread execution.



On 28 May 2015, at 13:06, Andries Engelbrecht wrote:

The time seems pretty long for that file size. What type of file isit?
Is the CTAS running single threaded?

—Andries


On May 28, 2015, at 9:37 AM, Matt <bsg...@gmail.com> wrote:
How large is the data set you are working with, and yourcluster/nodes?
Just testing with that single 44GB source file currently, and my testcluster is made from 4 nodes, each with 8 CPU cores, 32GB RAM, a 6TBExt4 volume (RAID-10).
Drill defaults left as come in v1.0. I will be adjusting memory andretrying the CTAS.
I know I can / should assign individual disks to HDFS, but as a testcluster there are apps that expect data volumes to work on. Adedicated Hadoop production cluster would have a disk layout specificto the task.
On 28 May 2015, at 12:26, Andries Engelbrecht wrote:
Just check the drillbit.log and drillbit.out files in the logdirectory.Before adjusting memory, see if that is an issue first. It was forme, but as Jason mentioned there can be other causes as well.
You adjust memory allocation in the drill-env.sh files, and have torestart the drill bits.
How large is the data set you are working with, and yourcluster/nodes?
—Andries


On May 28, 2015, at 9:17 AM, Matt <bsg...@gmail.com> wrote:
To make sure I am adjusting the correct config, these are heapparameters within the Drill configure path, not for Hadoop orZookeeper?
On May 28, 2015, at 12:08 PM, Jason Altekruse<altekruseja...@gmail.com> wrote:
There should be no upper limit on the size of the tables you cancreate
with Drill. Be advised that Drill does currently operate entirely
optimistically in regards to available resources. If a networkconnection
between two drillbits fails during a query, we will not currently
re-schedule the work to make use of remaining nodes and networkconnectionsthat are still live. While we have had a good amount of successusing Drillfor data conversion, be aware that these conditions could causelong
running queries to fail.
That being said, it isn't the only possible cause for such afailure. Inthe case of a network failure we would expect to see a messagereturned toyou that part of the query was unsuccessful and that it had beencancelled.Andries has a good suggestion in regards to checking the heapmemory, thisshould also be detected and reported back to you at the CLI, butwe may befailing to propagate the error back to the head node for thequery. Ibelieve writing parquet may still be the most heap-intensiveoperation inDrill, despite our efforts to refactor the write path to usedirect memoryinstead of on-heap for large buffers needed in the process ofcreating
parquet files.
On Thu, May 28, 2015 at 8:43 AM, Matt <bsg...@gmail.com> wrote:

Is 300MM records too much to do in a single CTAS statement?

After almost 23 hours I killed the query (^c) and it returned:

~~~
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 1_20      | 13568824                   |
| 1_15      | 12411822                   |
| 1_7       | 12470329                   |
| 1_12      | 13693867                   |
| 1_5       | 13292136                   |
| 1_18      | 13874321                   |
| 1_16      | 13303094                   |
| 1_9       | 13639049                   |
| 1_10      | 13698380                   |
| 1_22      | 13501073                   |
| 1_8       | 13533736                   |
| 1_2       | 13549402                   |
| 1_21      | 13665183                   |
| 1_0       | 13544745                   |
| 1_4       | 13532957                   |
| 1_19      | 12767473                   |
| 1_17      | 13670687                   |
| 1_13      | 13469515                   |
| 1_23      | 12517632                   |
| 1_6       | 13634338                   |
| 1_14      | 13611322                   |
| 1_3       | 13061900                   |
| 1_11      | 12760978                   |
+-----------+----------------------------+
23 rows selected (82294.854 seconds)
~~~
The sum of those record counts is 306,772,763 which is close tothe
320,843,454 in the source file:

~~~
0: jdbc:drill:zk=es05:2181> select count(*) FROMroot.`sample_201501.dat`;
+------------+
|   EXPR$0   |
+------------+
| 320843454  |
+------------+
1 row selected (384.665 seconds)
~~~
It represents one month of data, 4 key columns and 38 numericmeasurecolumns, which could also be partitioned daily. The test here wasto createmonthly Parquet files to see how the min/max stats on Parquetchunks help
with range select performance.
Instead of a small number of large monthly RDBMS tables, I amattemptingto determine how many Parquet files should be used with Drill /HDFS.
On 27 May 2015, at 15:17, Matt wrote:
Attempting to create a Parquet backed table with a CTAS from an44GB tab
delimited file in HDFS. The process seemed to be running, as CPUand IO wasseen on all 4 nodes in this cluster, and .parquet files beingcreated in
the expected path.
In however in the last two hours or so, all nodes show near zeroCPU orIO, and the Last Modified date on the .parquet have not changed.Same timedelay shown in the Last Progress column in the active fragmentprofile.
What approach can I take to determine what is happening (ornot)?

Re: Monitoring long / stuck CTAS

Reply via email to