Seem you have not enough memory on the spark driver. Hints below :
On 2014-04-15 12:10, Qin Wei wrote:
val resourcesRDD = jsonRDD.map(arg => arg.get("rid").toString.toLong).distinct // the program crashes at this line of code val bcResources = sc.broadcast(resourcesRDD.collect.toList)
what is returned by resources.RDD.count() ?
The data file “/home/deployer/uris.dat” is 2G with lines like this : { "id" : 1, "a" : { "0" : 1 }, "rid" : 5487628, "zid" : "10550869" } And here is my spark-env.sh export SCALA_HOME=/usr/local/scala-2.10.3 export SPARK_MASTER_IP=192.168.2.184 export SPARK_MASTER_PORT=7077 export SPARK_LOCAL_IP=192.168.2.182 export SPARK_WORKER_MEMORY=20g export SPARK_MEM=10g export SPARK_JAVA_OPTS="-Xms4g -Xmx40g -XX:MaxPermSize=10g -XX:-UseGCOverheadLimit"
/try setting SPARK_DRIVER_MEMORY to a bigger value, as default 512m is probably too small for the resourcesRDD.collect()/ By the way, are you really sure you need to collect all that ? /André Bois-Crettez Software Architect Big Data Developer http://www.kelkoo.com/ / Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.