On 16 Nov 2017, at 10:22, Michael Shtelma wrote:
> you call repartition(1) before starting processing your files. This
> will ensure that you end up with just one partition.
One question and one remark:
Q) val ds = sqlContext.read.parquet(path).repartition(1)
Am I absolutely sure that my file h
Dear Sparkers,
A while back, I asked how to process non-splittable files in parallel, one file
per executor. Vadim's suggested "scheduling within an application" approach
worked out beautifully.
I am now facing the 'opposite' problem:
- I have a bunch of parquet files to process
- Once proce