Running a CTAS from csv files in a 4 node HDFS cluster into a Parquet
file, and I note the physical plan in the Drill UI references scans of
all the csv sources on a single node.
collectl implies read and write IO on all 4 nodes - does this imply that
the full cluster is used for scanning the source files, or does my
configuration have the reads pinned to a single node?
Nodes in cluster: es0{5-8}:
Scan(groupscan=[EasyGroupScan
[selectionRoot=hdfs://es05:54310/csv/customer, numFiles=33,
columns=[`*`],
files=[hdfs://es05:54310/csv/customer/customer_20151001.csv,
hdfs://es05:54310/csv/customer/customer_20151002.csv,
hdfs://es05:54310/csv/customer/customer_20151003.csv,
hdfs://es05:54310/csv/customer/customer_20151004.csv,
hdfs://es05:54310/csv/customer/customer_20151005.csv,
hdfs://es05:54310/csv/customer/customer_20151006.csv,
hdfs://es05:54310/csv/customer/customer_20151007.csv,
hdfs://es05:54310/csv/customer/customer_20151008.csv,
hdfs://es05:54310/csv/customer/customer_20151009.csv,
hdfs://es05:54310/csv/customer/customer_20151010.csv,
hdfs://es05:54310/csv/customer/customer_20151011.csv,
hdfs://es05:54310/csv/customer/customer_20151012.csv,
hdfs://es05:54310/csv/customer/customer_20151013.csv,
hdfs://es05:54310/csv/customer/customer_20151014.csv,
hdfs://es05:54310/csv/customer/customer_20151015.csv,
hdfs://es05:54310/csv/customer/customer_20151016.csv,
hdfs://es05:54310/csv/customer/customer_20151017.csv,
hdfs://es05:54310/csv/customer/customer_20151018.csv,
hdfs://es05:54310/csv/customer/customer_20151019.csv,
hdfs://es05:54310/csv/customer/customer_20151020.csv,
hdfs://es05:54310/csv/customer/customer_20151021.csv,
hdfs://es05:54310/csv/customer/customer_20151022.csv,
hdfs://es05:54310/csv/customer/customer_20151023.csv,
hdfs://es05:54310/csv/customer/customer_20151024.csv,
hdfs://es05:54310/csv/customer/customer_20151025.csv,
hdfs://es05:54310/csv/customer/customer_20151026.csv,
hdfs://es05:54310/csv/customer/customer_20151027.csv,
hdfs://es05:54310/csv/customer/customer_20151028.csv,
hdfs://es05:54310/csv/customer/customer_20151029.csv,
hdfs://es05:54310/csv/customer/customer_20151030.csv,
hdfs://es05:54310/csv/customer/customer_20151031.csv,
hdfs://es05:54310/csv/customer/customer_20151101.csv,
hdfs://es05:54310/csv/customer/customer_20151102.csv]]]) : rowType =
(DrillRecordRow[*]): rowcount = 2.407374395E9, cumulative cost =
{2.407374395E9 rows, 2.407374395E9 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 4355