failing GraphX application ('GC overhead limit exceeded', 'Lost executor', 'Connection refused', etc.)

Matthew Cornell Thu, 12 Feb 2015 07:49:09 -0800

Hi Folks,

I'm running a five-step path following-algorithm on a movie graph with 120K 
verticies and 400K edges. The graph has vertices for actors, directors, movies, 
users, and user ratings, and my Scala code is walking the path "rating > movie 
> rating > user > rating". There are 75K rating nodes and each has ~100 edges. 
My program iterates over each path item, calling aggregateMessages() then 
joinVertices() each time, and then processing that result on the next 
iteration. The program never finishes the second 'rating' step, which makes 
sense as, IIUC from my back-of-the-napkin estimate, the intermediate result 
would have ~4B active vertices.


Spark is version 1.2.0 and running in standalone mode on a small cluster of 
five hosts: four compute nodes and a head node where the computes have 4 cores 
and 32GB RAM each, and the head has 32 cores and 128GB RAM. After restarting 
Spark just now, the Master web UI shows 15 workers (5 dead), two per node, with 
cores and memory listed as "32 (0 Used)" and "125.0 GB (0.0 B Used)" on the two 
head node workers and "4 (0 Used)" and "30.5 GB (0.0 B Used)" for the 8 workers 
running on the compute nodes. (Note: I don't understand why it's configured to 
run two workers per node.) The small Spark example programs run to completion.

I've listed the console output at http://pastebin.com/DPECKgQ9 (I'm running in 
spark-shell).

I hope you can provide some advice on things to try next (e.g., configuration 
vars). My guess is the cluster is running out of memory, though I think it has 
adequate aggregate ram to handle this app.

Thanks very much -- matt


----
Matthew Cornell, Research Fellow, Computer Science Department, Umass Amherst


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

failing GraphX application ('GC overhead limit exceeded', 'Lost executor', 'Connection refused', etc.)

Reply via email to