[ 
https://issues.apache.org/jira/browse/SPARK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1967.
------------------------------
    Resolution: Cannot Reproduce

> Using parallelize method to create RDD, wordcount app just hanging there 
> without errors or warnings
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-1967
>                 URL: https://issues.apache.org/jira/browse/SPARK-1967
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 0.9.1
>         Environment: Ubuntu-12.04, single machine spark standalone, 8 core, 
> 8G mem, spark 0.9.1, java-1.7
>            Reporter: Min Li
>
> I was trying the parallelize method to create RDD. I used Java. And it's a 
> simple wordcount program, except that I first read the input into memory and 
> then use the parallelize method to create the RDD, rather than the default 
> textFile method in the given example. 
> Pseudo codes:
> JavaSparkContext ctx = new JavaSparkContext($SparkMasterURL, $NAME, 
> $SparkHome, $jars);
> List<String> input = #read lines from input file and form a ArrayList<String>
> JavaRDD lines = ctx.parallelize(input);
> //followed by wordcount
> ----above is not working.
> JavaRDD lines = ctx.textFile(file);
> //followed by wordcount
> ----this is working
> The log is:
> 14/05/29 16:18:43 INFO Slf4jLogger: Slf4jLogger started
> 14/05/29 16:18:43 INFO Remoting: Starting remoting
> 14/05/29 16:18:43 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://spark@spark:55224]
> 14/05/29 16:18:43 INFO Remoting: Remoting now listens on addresses: 
> [akka.tcp://spark@spark:55224]
> 14/05/29 16:18:43 INFO SparkEnv: Registering BlockManagerMaster
> 14/05/29 16:18:43 INFO DiskBlockManager: Created local directory at 
> /tmp/spark-local-20140529161843-836a
> 14/05/29 16:18:43 INFO MemoryStore: MemoryStore started with capacity 1056.0 
> MB.
> 14/05/29 16:18:43 INFO ConnectionManager: Bound socket to port 42942 with id 
> = ConnectionManagerId(spark,42942)
> 14/05/29 16:18:43 INFO BlockManagerMaster: Trying to register BlockManager
> 14/05/29 16:18:43 INFO BlockManagerMasterActor$BlockManagerInfo: Registering 
> block manager spark:42942 with 1056.0 MB RAM
> 14/05/29 16:18:43 INFO BlockManagerMaster: Registered BlockManager
> 14/05/29 16:18:43 INFO HttpServer: Starting HTTP Server
> 14/05/29 16:18:43 INFO HttpBroadcast: Broadcast server started at 
> http://10.227.119.185:43522
> 14/05/29 16:18:43 INFO SparkEnv: Registering MapOutputTracker
> 14/05/29 16:18:43 INFO HttpFileServer: HTTP File server directory is 
> /tmp/spark-3704a621-789c-4d97-b1fc-9654236dba3e
> 14/05/29 16:18:43 INFO HttpServer: Starting HTTP Server
> 14/05/29 16:18:43 INFO SparkUI: Started Spark Web UI at http://spark:4040
> 14/05/29 16:18:44 INFO SparkContext: Added JAR 
> /home/maxmin/tmp/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar at 
> http://10.227.119.185:55286/jars/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar
>  with timestamp 1401394724045
> 14/05/29 16:18:44 INFO AppClient$ClientActor: Connecting to master 
> spark://spark:7077...
> 14/05/29 16:18:44 INFO SparkDeploySchedulerBackend: Connected to Spark 
> cluster with app ID app-20140529161844-0001
> 14/05/29 16:18:44 INFO AppClient$ClientActor: Executor added: 
> app-20140529161844-0001/0 on worker-20140529155406-spark-59658 (spark:59658) 
> with 8 cores
> The app is hanging here forever. And spark:8080 spark:4040 are not showing 
> any strange info. The Spark Stages page shows the Active Stages is 
> reduceByKey, tasks: Succeeded/Total is 0/2. I've also tried directly call 
> lines.count after parallelize, and the app will stuck at the count stage.
> I've also tried to use some static give string list and use the parallelize 
> to create rdd. This time, the app is still hanging but the stages show 
> nothing active. And the log is similar. 
> I used spark-0.9.1 and used default spark-env.sh. In the slaves file I have 
> only one host. I used maven to compile a fat jar with spark specified as 
> provided. I modified the run-example script to submit the jar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to