File size limit for CTAS?

Matt Thu, 21 Jan 2016 16:11:48 -0800

Converting CSV files to Parquet with CTAS, and getting errors on somelarger files:


With a source file of 16.34GB (as reported in the HDFS explorer):

~~~

create table `/parquet/customer_20151017` partition by (date_tm) ASselect * from `/csv/customer/customer_20151017.csv`;Error: SYSTEM ERROR: IllegalArgumentException: length: -484 (expected:>= 0)


Fragment 1:1

[Error Id: da53d687-a8d5-4927-88ec-e56d5da17112 on es07:31010](state=,code=0)

~~~

But an optation on a 70 MB file of the same format succeeds.

Given some HDFS advice is to avoid large numbers of small files [1], isthere a general guideline for the max file size to ingest into Parquetfiles with CTAS?

---

[1] HDFS put performance is very poor with a large number of smallfiles, thus trying to find the right amount of source rollup to perform.Pointers to HDFS configuration guides for beginners would be appreciatedtoo. I have only used HDFS for Drill - no other Hadoop experience.

File size limit for CTAS?

Reply via email to