Multiple DataFrames per Parquet file?

Peter Aberline Sun, 10 May 2015 07:39:05 -0700

Hi

I have many thousands of small DataFrames that I would like to save to the
one Parquet file to avoid the HDFS 'small files' problem. My understanding
is that there is a 1:1 relationship between DataFrames and Parquet files if
a single partition is used.


Is it possible to have multiple DataFrames within the one Parquet File
using PySpark?
Or is the only way to achieve this to union the DataFrames into one?

Thanks,
Peter

Multiple DataFrames per Parquet file?

Reply via email to