Re: PySpark with OpenCV causes python worker to crash

Sam Stoelinga Thu, 04 Jun 2015 23:32:55 -0700

I've changed the SIFT feature extraction to SURF feature extraction and it
works...


Following line was changed:
sift = cv2.xfeatures2d.SIFT_create()

to

sift = cv2.xfeatures2d.SURF_create()

Where should I file this as a bug? When not running on Spark it works fine
so I'm saying it's a spark bug.

On Fri, Jun 5, 2015 at 2:17 PM, Sam Stoelinga <sammiest...@gmail.com> wrote:

> Yea should have emphasized that. I'm running the same code on the same VM.
> It's a VM with spark in standalone mode and I run the unit test directly on
> that same VM. So OpenCV is working correctly on that same machine but when
> moving the exact same OpenCV code to spark it just crashes.
>
> On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu <dav...@databricks.com> wrote:
>
>> Could you run the single thread version in worker machine to make sure
>> that OpenCV is installed and configured correctly?
>>
>> On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga <sammiest...@gmail.com>
>> wrote:
>> > I've verified the issue lies within Spark running OpenCV code and not
>> within
>> > the sequence file BytesWritable formatting.
>> >
>> > This is the code which can reproduce that spark is causing the failure
>> by
>> > not using the sequencefile as input at all but running the same function
>> > with same input on spark but fails:
>> >
>> > def extract_sift_features_opencv(imgfile_imgbytes):
>> >     imgfilename, discardsequencefile = imgfile_imgbytes
>> >     imgbytes = bytearray(open("/tmp/img.jpg", "rb").read())
>> >     nparr = np.fromstring(buffer(imgbytes), np.uint8)
>> >     img = cv2.imdecode(nparr, 1)
>> >     gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
>> >     sift = cv2.xfeatures2d.SIFT_create()
>> >     kp, descriptors = sift.detectAndCompute(gray, None)
>> >     return (imgfilename, "test")
>> >
>> > And corresponding tests.py:
>> > https://gist.github.com/samos123/d383c26f6d47d34d32d6
>> >
>> >
>> > On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga <sammiest...@gmail.com>
>> > wrote:
>> >>
>> >> Thanks for the advice! The following line causes spark to crash:
>> >>
>> >> kp, descriptors = sift.detectAndCompute(gray, None)
>> >>
>> >> But I do need this line to be executed and the code does not crash when
>> >> running outside of Spark but passing the same parameters. You're saying
>> >> maybe the bytes from the sequencefile got somehow transformed and don't
>> >> represent an image anymore causing OpenCV to crash the whole python
>> >> executor.
>> >>
>> >> On Fri, May 29, 2015 at 2:06 AM, Davies Liu <dav...@databricks.com>
>> wrote:
>> >>>
>> >>> Could you try to comment out some lines in
>> >>> `extract_sift_features_opencv` to find which line cause the crash?
>> >>>
>> >>> If the bytes came from sequenceFile() is broken, it's easy to crash a
>> >>> C library in Python (OpenCV).
>> >>>
>> >>> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga <sammiest...@gmail.com
>> >
>> >>> wrote:
>> >>> > Hi sparkers,
>> >>> >
>> >>> > I am working on a PySpark application which uses the OpenCV
>> library. It
>> >>> > runs
>> >>> > fine when running the code locally but when I try to run it on
>> Spark on
>> >>> > the
>> >>> > same Machine it crashes the worker.
>> >>> >
>> >>> > The code can be found here:
>> >>> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f
>> >>> >
>> >>> > This is the error message taken from STDERR of the worker log:
>> >>> > https://gist.github.com/samos123/3300191684aee7fc8013
>> >>> >
>> >>> > Would like pointers or tips on how to debug further? Would be nice
>> to
>> >>> > know
>> >>> > the reason why the worker crashed.
>> >>> >
>> >>> > Thanks,
>> >>> > Sam Stoelinga
>> >>> >
>> >>> >
>> >>> > org.apache.spark.SparkException: Python worker exited unexpectedly
>> >>> > (crashed)
>> >>> > at
>> >>> >
>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
>> >>> > at
>> >>> >
>> >>> >
>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176)
>> >>> > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
>> >>> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> >>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> >>> > at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>> >>> > at org.apache.spark.scheduler.Task.run(Task.scala:64)
>> >>> > at
>> >>> >
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>> >>> > at
>> >>> >
>> >>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >>> > at
>> >>> >
>> >>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >>> > at java.lang.Thread.run(Thread.java:745)
>> >>> > Caused by: java.io.EOFException
>> >>> > at java.io.DataInputStream.readInt(DataInputStream.java:392)
>> >>> > at
>> >>> >
>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
>> >>> >
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>>
>
>

Re: PySpark with OpenCV causes python worker to crash

Reply via email to