subject:"\[GraphX\] Preserving Partitions when reading from HDFS"

Re: [GraphX] Preserving Partitions when reading from HDFS

2019-04-25 Thread M Bilal

If I understand correctly this would set the split size in the Hadoop configuration when reading file. I can see that being useful when you want to create more partitions than what the block size in HDFS might dictate. Instead what I want to do is to create a single partition for each file written

Re: [GraphX] Preserving Partitions when reading from HDFS

2019-04-15 Thread Manu Zhang

You may try `sparkContext.hadoopConfiguration().set("mapred.max.split.size", "33554432")` to tune the partition size when reading from HDFS. Thanks, Manu Zhang On Mon, Apr 15, 2019 at 11:28 PM M Bilal wrote: > Hi, > > I have implemented a custom partitioning algorithm to partition graphs in >

[GraphX] Preserving Partitions when reading from HDFS

2019-04-15 Thread M Bilal

Hi, I have implemented a custom partitioning algorithm to partition graphs in GraphX. Saving the partitioning graph (the edges) to HDFS creates separate files in the output folder with the number of files equal to the number of Partitions. However, reading back the edges creates number of