Re: PySpark with OpenCV causes python worker to crash

Davies Liu Mon, 01 Jun 2015 14:07:18 -0700

Could you run the single thread version in worker machine to make sure
that OpenCV is installed and configured correctly?


On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga <sammiest...@gmail.com> wrote:
> I've verified the issue lies within Spark running OpenCV code and not within
> the sequence file BytesWritable formatting.
>
> This is the code which can reproduce that spark is causing the failure by
> not using the sequencefile as input at all but running the same function
> with same input on spark but fails:
>
> def extract_sift_features_opencv(imgfile_imgbytes):
>     imgfilename, discardsequencefile = imgfile_imgbytes
>     imgbytes = bytearray(open("/tmp/img.jpg", "rb").read())
>     nparr = np.fromstring(buffer(imgbytes), np.uint8)
>     img = cv2.imdecode(nparr, 1)
>     gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
>     sift = cv2.xfeatures2d.SIFT_create()
>     kp, descriptors = sift.detectAndCompute(gray, None)
>     return (imgfilename, "test")
>
> And corresponding tests.py:
> https://gist.github.com/samos123/d383c26f6d47d34d32d6
>
>
> On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga <sammiest...@gmail.com>
> wrote:
>>
>> Thanks for the advice! The following line causes spark to crash:
>>
>> kp, descriptors = sift.detectAndCompute(gray, None)
>>
>> But I do need this line to be executed and the code does not crash when
>> running outside of Spark but passing the same parameters. You're saying
>> maybe the bytes from the sequencefile got somehow transformed and don't
>> represent an image anymore causing OpenCV to crash the whole python
>> executor.
>>
>> On Fri, May 29, 2015 at 2:06 AM, Davies Liu <dav...@databricks.com> wrote:
>>>
>>> Could you try to comment out some lines in
>>> `extract_sift_features_opencv` to find which line cause the crash?
>>>
>>> If the bytes came from sequenceFile() is broken, it's easy to crash a
>>> C library in Python (OpenCV).
>>>
>>> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga <sammiest...@gmail.com>
>>> wrote:
>>> > Hi sparkers,
>>> >
>>> > I am working on a PySpark application which uses the OpenCV library. It
>>> > runs
>>> > fine when running the code locally but when I try to run it on Spark on
>>> > the
>>> > same Machine it crashes the worker.
>>> >
>>> > The code can be found here:
>>> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f
>>> >
>>> > This is the error message taken from STDERR of the worker log:
>>> > https://gist.github.com/samos123/3300191684aee7fc8013
>>> >
>>> > Would like pointers or tips on how to debug further? Would be nice to
>>> > know
>>> > the reason why the worker crashed.
>>> >
>>> > Thanks,
>>> > Sam Stoelinga
>>> >
>>> >
>>> > org.apache.spark.SparkException: Python worker exited unexpectedly
>>> > (crashed)
>>> > at
>>> > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
>>> > at
>>> >
>>> > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176)
>>> > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
>>> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>>> > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>> > at org.apache.spark.scheduler.Task.run(Task.scala:64)
>>> > at
>>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>>> > at
>>> >
>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> > at
>>> >
>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> > at java.lang.Thread.run(Thread.java:745)
>>> > Caused by: java.io.EOFException
>>> > at java.io.DataInputStream.readInt(DataInputStream.java:392)
>>> > at
>>> > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
>>> >
>>> >
>>> >
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: PySpark with OpenCV causes python worker to crash

Reply via email to