You can union all the df together, then call repartition().
On Sun, May 10, 2015 at 8:34 AM, Peter Aberline
wrote:
> Hi
>
> Thanks for the quick response.
>
> No I'm not using Streaming. Each DataFrame represents tabular data read from
> a CSV file. They have the same schema.
>
> There is also th
Hi
In that case read entire folder as a rdd and give some reasonable number of
partitions.
Best
Ayan
On 11 May 2015 01:35, "Peter Aberline" wrote:
> Hi
>
> Thanks for the quick response.
>
> No I'm not using Streaming. Each DataFrame represents tabular data read
> from a CSV file. They have the
Hi
Thanks for the quick response.
No I'm not using Streaming. Each DataFrame represents tabular data read
from a CSV file. They have the same schema.
There is also the option of appending each DF to the parquet file, but then
I can't maintain them as separate DF when reading back in without filt
How did you end up with thousands of df? Are you using streaming? In that
case you can do foreachRDD and keep merging incoming rdds to single rdd and
then save it through your own checkpoint mechanism.
If not, please share your use case.
On 11 May 2015 00:38, "Peter Aberline" wrote:
> Hi
>
> I