Re: [GraphX] Preserving Partitions when reading from HDFS

2019-04-15 Thread Manu Zhang
You may try `sparkContext.hadoopConfiguration().set("mapred.max.split.size", "33554432")` to tune the partition size when reading from HDFS. Thanks, Manu Zhang On Mon, Apr 15, 2019 at 11:28 PM M Bilal wrote: > Hi, > > I have implemented a custom partitioning algorithm to partition graphs in >

Re: JvmPauseMonitor

2019-04-15 Thread Arun Mahadevan
Spark TaskMetrics[1] has a "jvmGCTime" metric that captures the amount of time spent in GC. This is also available via the listener I guess. Thanks, Arun [1] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala#L89 On Mon, 15 Apr 2019 at

JvmPauseMonitor

2019-04-15 Thread Eugene Koifman
Hi, A number of projects in Hadoop echo system use org.apache.hadoop.util.JvmPauseMonitor (or clones of it) to log long GC pauses. Is there something like that for a Spark Executor, that can make a log entry based on GC time exceeding a configured limit? Thank you, Eugene

[GraphX] Preserving Partitions when reading from HDFS

2019-04-15 Thread M Bilal
Hi, I have implemented a custom partitioning algorithm to partition graphs in GraphX. Saving the partitioning graph (the edges) to HDFS creates separate files in the output folder with the number of files equal to the number of Partitions. However, reading back the edges creates number of

How to speedup your Spark ML training

2019-04-15 Thread chris
Hi Spark community, If you are using Spark ML and want to run 10x faster your applications please check how you can utilize the performance of FPGAs without changing your code at all: