Seem you have not enough memory on the spark driver. Hints below :

On 2014-04-15 12:10, Qin Wei wrote:
     val resourcesRDD = jsonRDD.map(arg =>
arg.get("rid").toString.toLong).distinct

     // the program crashes at this line of code
     val bcResources = sc.broadcast(resourcesRDD.collect.toList)
what is returned by resources.RDD.count() ?

The data file “/home/deployer/uris.dat” is 2G  with lines like this :     {
"id" : 1, "a" : { "0" : 1 }, "rid" : 5487628, "zid" : "10550869" }

And here is my spark-env.sh
     export SCALA_HOME=/usr/local/scala-2.10.3
     export SPARK_MASTER_IP=192.168.2.184
     export SPARK_MASTER_PORT=7077
     export SPARK_LOCAL_IP=192.168.2.182
     export SPARK_WORKER_MEMORY=20g
     export SPARK_MEM=10g
     export SPARK_JAVA_OPTS="-Xms4g -Xmx40g -XX:MaxPermSize=10g
-XX:-UseGCOverheadLimit"
/try setting SPARK_DRIVER_MEMORY to a bigger value, as default 512m is
probably too small for the resourcesRDD.collect()/
By the way, are you really sure you need to collect all that ?

/André Bois-Crettez

Software Architect
Big Data Developer
http://www.kelkoo.com/
/

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.

Reply via email to