pyspark is crashing in this case. why?

genesis fatum Sun, 14 Dec 2014 06:04:52 -0800

Hi,

My environment is: standalone spark 1.1.1 on windows 8.1 pro.


The following case works fine:
>>> a = [1,2,3,4,5,6,7,8,9]
>>> b = []
>>> for x in range(100000):
...  b.append(a)
...
>>> rdd1 = sc.parallelize(b)
>>> rdd1.first()
>>>[1, 2, 3, 4, 5, 6, 7, 8, 9]

The following case does not work. The only difference is the size of the
array. Note the loop range: 100K vs. 1M.
>>> a = [1,2,3,4,5,6,7,8,9]
>>> b = []
>>> for x in range(1000000):
...  b.append(a)
...
>>> rdd1 = sc.parallelize(b)
>>> rdd1.first()
>>>
14/12/14 07:52:19 ERROR PythonRDD: Python worker exited unexpectedly
(crashed)
java.net.SocketException: Connection reset by peer: socket write error
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(Unknown Source)
        at java.net.SocketOutputStream.write(Unknown Source)
        at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
        at java.io.BufferedOutputStream.write(Unknown Source)
        at java.io.DataOutputStream.write(Unknown Source)
        at java.io.FilterOutputStream.write(Unknown Source)
        at
org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$
1.apply(PythonRDD.scala:341)
        at
org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$
1.apply(PythonRDD.scala:339)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRD
D.scala:339)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.app
ly$mcV$sp(PythonRDD.scala:209)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.app
ly(PythonRDD.scala:184)
        at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.app
ly(PythonRDD.scala:184)
        at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1364)
        at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scal
a:183)

What I have tried:
1. Replaced JRE 32bit with JRE64 
2. Multiple configurations when I start pyspark: --driver-memory,
--executor-memory
3. Tried to set the SparkConf with different settings
4. Tried also with spark 1.1.0

Being new to Spark, I am sure that it is something simple that I am missing
and would appreciate any thoughts.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-is-crashing-in-this-case-why-tp20675.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

pyspark is crashing in this case. why?

Reply via email to