The TSV original files is 600GB and generated 40k files of 15-25MB. y
From: Cheng Lian [mailto:lian.cs....@gmail.com] Sent: October-07-15 3:18 PM To: Younes Naguib; 'user@spark.apache.org' Subject: Re: Parquet file size Why do you want larger files? Doesn't the result Parquet file contain all the data in the original TSV file? Cheng On 10/7/15 11:07 AM, Younes Naguib wrote: Hi, I'm reading a large tsv file, and creating parquet files using sparksql: insert overwrite table tbl partition(year, month, day).... Select .... from tbl_tsv; This works nicely, but generates small parquet files (15MB). I wanted to generate larger files, any idea how to address this? Thanks, Younes Naguib Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC H3G 1R8 Tel.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 | younes.nag...@tritondigital.com <mailto:younes.nag...@streamtheworld.com>