(For what it is worth, I happened to look into this with Anton earlier and
am also pretty convinced it's related to GraphX rather than the app. It's
somewhat difficult to debug what gets sent in the closure AFAICT.)

On Tue, Dec 6, 2016 at 7:49 PM AntonIpp <an...@simudyne.com> wrote:

> Hi everyone,
>
> I have a small Scala test project which uses GraphX and for some reason has
> extreme scheduler delay when executed on the cluster. The problem is not
> related to the cluster configuration, as other GraphX applications run
> without any issue.
> I have attached the source code ( MatrixTest.scala
> <
> http://apache-spark-user-list.1001560.n3.nabble.com/file/n28162/MatrixTest.scala
> >
> ), it creates a sort of a  GraphGenerators.gridGraph
> <
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.util.GraphGenerators$
> >
> (but with diagonal edges too) using data from a matrix inside the Map
> class.
> There are in reality only 4 lines related to GraphX itself: creating a
> VertexRDD, creating an EdgeRDD, creating a Graph and then calling
> graph.edges.count.
> As you can see on the  Spark History Server
> <
> http://cdhdns-mn0.westeurope.cloudapp.azure.com:18088/history/application_1480677653852_0050/jobs/
> >
> , the task has very significant scheduler delay. There is also the
> following
> warning in the logs (I have attached them too:  MatrixTest.log
> <
> http://apache-spark-user-list.1001560.n3.nabble.com/file/n28162/MatrixTest.log
> >
> ) : "WARN scheduler.TaskSetManager: Stage 0 contains a task of very large
> size (2905 KB). The maximum recommended task size is 100 KB."
> This also happens with .aggregateMessages.collect and Pregel. I have tested
> with Spark 1.6 and 2.0, different levels of parallelism, different number
> of
> executors, etc but the scheduler delay is still there and grows more and
> more extreme as the number of vertices and edges grows.
>
> Does anyone have any idea as to what could be the source of the issue?
> Thank you!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Extreme-scheduler-delay-tp28162.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to