[ https://issues.apache.org/jira/browse/SPARK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-1967. ------------------------------ Resolution: Cannot Reproduce > Using parallelize method to create RDD, wordcount app just hanging there > without errors or warnings > --------------------------------------------------------------------------------------------------- > > Key: SPARK-1967 > URL: https://issues.apache.org/jira/browse/SPARK-1967 > Project: Spark > Issue Type: Bug > Affects Versions: 0.9.1 > Environment: Ubuntu-12.04, single machine spark standalone, 8 core, > 8G mem, spark 0.9.1, java-1.7 > Reporter: Min Li > > I was trying the parallelize method to create RDD. I used Java. And it's a > simple wordcount program, except that I first read the input into memory and > then use the parallelize method to create the RDD, rather than the default > textFile method in the given example. > Pseudo codes: > JavaSparkContext ctx = new JavaSparkContext($SparkMasterURL, $NAME, > $SparkHome, $jars); > List<String> input = #read lines from input file and form a ArrayList<String> > JavaRDD lines = ctx.parallelize(input); > //followed by wordcount > ----above is not working. > JavaRDD lines = ctx.textFile(file); > //followed by wordcount > ----this is working > The log is: > 14/05/29 16:18:43 INFO Slf4jLogger: Slf4jLogger started > 14/05/29 16:18:43 INFO Remoting: Starting remoting > 14/05/29 16:18:43 INFO Remoting: Remoting started; listening on addresses > :[akka.tcp://spark@spark:55224] > 14/05/29 16:18:43 INFO Remoting: Remoting now listens on addresses: > [akka.tcp://spark@spark:55224] > 14/05/29 16:18:43 INFO SparkEnv: Registering BlockManagerMaster > 14/05/29 16:18:43 INFO DiskBlockManager: Created local directory at > /tmp/spark-local-20140529161843-836a > 14/05/29 16:18:43 INFO MemoryStore: MemoryStore started with capacity 1056.0 > MB. > 14/05/29 16:18:43 INFO ConnectionManager: Bound socket to port 42942 with id > = ConnectionManagerId(spark,42942) > 14/05/29 16:18:43 INFO BlockManagerMaster: Trying to register BlockManager > 14/05/29 16:18:43 INFO BlockManagerMasterActor$BlockManagerInfo: Registering > block manager spark:42942 with 1056.0 MB RAM > 14/05/29 16:18:43 INFO BlockManagerMaster: Registered BlockManager > 14/05/29 16:18:43 INFO HttpServer: Starting HTTP Server > 14/05/29 16:18:43 INFO HttpBroadcast: Broadcast server started at > http://10.227.119.185:43522 > 14/05/29 16:18:43 INFO SparkEnv: Registering MapOutputTracker > 14/05/29 16:18:43 INFO HttpFileServer: HTTP File server directory is > /tmp/spark-3704a621-789c-4d97-b1fc-9654236dba3e > 14/05/29 16:18:43 INFO HttpServer: Starting HTTP Server > 14/05/29 16:18:43 INFO SparkUI: Started Spark Web UI at http://spark:4040 > 14/05/29 16:18:44 INFO SparkContext: Added JAR > /home/maxmin/tmp/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar at > http://10.227.119.185:55286/jars/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar > with timestamp 1401394724045 > 14/05/29 16:18:44 INFO AppClient$ClientActor: Connecting to master > spark://spark:7077... > 14/05/29 16:18:44 INFO SparkDeploySchedulerBackend: Connected to Spark > cluster with app ID app-20140529161844-0001 > 14/05/29 16:18:44 INFO AppClient$ClientActor: Executor added: > app-20140529161844-0001/0 on worker-20140529155406-spark-59658 (spark:59658) > with 8 cores > The app is hanging here forever. And spark:8080 spark:4040 are not showing > any strange info. The Spark Stages page shows the Active Stages is > reduceByKey, tasks: Succeeded/Total is 0/2. I've also tried directly call > lines.count after parallelize, and the app will stuck at the count stage. > I've also tried to use some static give string list and use the parallelize > to create rdd. This time, the app is still hanging but the stages show > nothing active. And the log is similar. > I used spark-0.9.1 and used default spark-env.sh. In the slaves file I have > only one host. I used maven to compile a fat jar with spark specified as > provided. I modified the run-example script to submit the jar. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org