date:20190415

Re: [GraphX] Preserving Partitions when reading from HDFS

2019-04-15 Thread Manu Zhang

You may try `sparkContext.hadoopConfiguration().set("mapred.max.split.size", "33554432")` to tune the partition size when reading from HDFS. Thanks, Manu Zhang On Mon, Apr 15, 2019 at 11:28 PM M Bilal wrote: > Hi, > > I have implemented a custom partitioning algorithm to partition graphs in > G

Re: JvmPauseMonitor

2019-04-15 Thread Arun Mahadevan

Spark TaskMetrics[1] has a "jvmGCTime" metric that captures the amount of time spent in GC. This is also available via the listener I guess. Thanks, Arun [1] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala#L89 On Mon, 15 Apr 2019 at 09

JvmPauseMonitor

2019-04-15 Thread Eugene Koifman

Hi, A number of projects in Hadoop echo system use org.apache.hadoop.util.JvmPauseMonitor (or clones of it) to log long GC pauses. Is there something like that for a Spark Executor, that can make a log entry based on GC time exceeding a configured limit? Thank you, Eugene

[GraphX] Preserving Partitions when reading from HDFS

2019-04-15 Thread M Bilal

Hi, I have implemented a custom partitioning algorithm to partition graphs in GraphX. Saving the partitioning graph (the edges) to HDFS creates separate files in the output folder with the number of files equal to the number of Partitions. However, reading back the edges creates number of partiti

How to speedup your Spark ML training

2019-04-15 Thread chris

Hi Spark community, If you are using Spark ML and want to run 10x faster your applications please check how you can utilize the performance of FPGAs without changing your code at all: https://www.inaccel.com/accelerated-machine-learnin

Re: [GraphX] Preserving Partitions when reading from HDFS

Re: JvmPauseMonitor

JvmPauseMonitor

[GraphX] Preserving Partitions when reading from HDFS

How to speedup your Spark ML training

5 matches

Site Navigation

Mail list logo

Footer information