If I understand correctly this would set the split size in the Hadoop
configuration when reading file. I can see that being useful when you want
to create more partitions than what the block size in HDFS might dictate.
Instead what I want to do is to create a single partition for each file
written
You may try
`sparkContext.hadoopConfiguration().set("mapred.max.split.size",
"33554432")` to tune the partition size when reading from HDFS.
Thanks,
Manu Zhang
On Mon, Apr 15, 2019 at 11:28 PM M Bilal wrote:
> Hi,
>
> I have implemented a custom partitioning algorithm to partition graphs in
>
Hi,
I have implemented a custom partitioning algorithm to partition graphs in
GraphX. Saving the partitioning graph (the edges) to HDFS creates separate
files in the output folder with the number of files equal to the number of
Partitions.
However, reading back the edges creates number of