subject:"Re\: Multiple DataFrames per Parquet file\?"

Re: Multiple DataFrames per Parquet file?

2015-05-17 Thread Davies Liu

You can union all the df together, then call repartition(). On Sun, May 10, 2015 at 8:34 AM, Peter Aberline wrote: > Hi > > Thanks for the quick response. > > No I'm not using Streaming. Each DataFrame represents tabular data read from > a CSV file. They have the same schema. > > There is also th

Re: Multiple DataFrames per Parquet file?

2015-05-10 Thread ayan guha

Hi In that case read entire folder as a rdd and give some reasonable number of partitions. Best Ayan On 11 May 2015 01:35, "Peter Aberline" wrote: > Hi > > Thanks for the quick response. > > No I'm not using Streaming. Each DataFrame represents tabular data read > from a CSV file. They have the

Re: Multiple DataFrames per Parquet file?

2015-05-10 Thread Peter Aberline

Hi Thanks for the quick response. No I'm not using Streaming. Each DataFrame represents tabular data read from a CSV file. They have the same schema. There is also the option of appending each DF to the parquet file, but then I can't maintain them as separate DF when reading back in without filt

Re: Multiple DataFrames per Parquet file?

2015-05-10 Thread ayan guha

How did you end up with thousands of df? Are you using streaming? In that case you can do foreachRDD and keep merging incoming rdds to single rdd and then save it through your own checkpoint mechanism. If not, please share your use case. On 11 May 2015 00:38, "Peter Aberline" wrote: > Hi > > I