Could you run the single thread version in worker machine to make sure that OpenCV is installed and configured correctly?
On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga <sammiest...@gmail.com> wrote: > I've verified the issue lies within Spark running OpenCV code and not within > the sequence file BytesWritable formatting. > > This is the code which can reproduce that spark is causing the failure by > not using the sequencefile as input at all but running the same function > with same input on spark but fails: > > def extract_sift_features_opencv(imgfile_imgbytes): > imgfilename, discardsequencefile = imgfile_imgbytes > imgbytes = bytearray(open("/tmp/img.jpg", "rb").read()) > nparr = np.fromstring(buffer(imgbytes), np.uint8) > img = cv2.imdecode(nparr, 1) > gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) > sift = cv2.xfeatures2d.SIFT_create() > kp, descriptors = sift.detectAndCompute(gray, None) > return (imgfilename, "test") > > And corresponding tests.py: > https://gist.github.com/samos123/d383c26f6d47d34d32d6 > > > On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga <sammiest...@gmail.com> > wrote: >> >> Thanks for the advice! The following line causes spark to crash: >> >> kp, descriptors = sift.detectAndCompute(gray, None) >> >> But I do need this line to be executed and the code does not crash when >> running outside of Spark but passing the same parameters. You're saying >> maybe the bytes from the sequencefile got somehow transformed and don't >> represent an image anymore causing OpenCV to crash the whole python >> executor. >> >> On Fri, May 29, 2015 at 2:06 AM, Davies Liu <dav...@databricks.com> wrote: >>> >>> Could you try to comment out some lines in >>> `extract_sift_features_opencv` to find which line cause the crash? >>> >>> If the bytes came from sequenceFile() is broken, it's easy to crash a >>> C library in Python (OpenCV). >>> >>> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga <sammiest...@gmail.com> >>> wrote: >>> > Hi sparkers, >>> > >>> > I am working on a PySpark application which uses the OpenCV library. It >>> > runs >>> > fine when running the code locally but when I try to run it on Spark on >>> > the >>> > same Machine it crashes the worker. >>> > >>> > The code can be found here: >>> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f >>> > >>> > This is the error message taken from STDERR of the worker log: >>> > https://gist.github.com/samos123/3300191684aee7fc8013 >>> > >>> > Would like pointers or tips on how to debug further? Would be nice to >>> > know >>> > the reason why the worker crashed. >>> > >>> > Thanks, >>> > Sam Stoelinga >>> > >>> > >>> > org.apache.spark.SparkException: Python worker exited unexpectedly >>> > (crashed) >>> > at >>> > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172) >>> > at >>> > >>> > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176) >>> > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94) >>> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>> > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >>> > at org.apache.spark.scheduler.Task.run(Task.scala:64) >>> > at >>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) >>> > at >>> > >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> > at >>> > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> > at java.lang.Thread.run(Thread.java:745) >>> > Caused by: java.io.EOFException >>> > at java.io.DataInputStream.readInt(DataInputStream.java:392) >>> > at >>> > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108) >>> > >>> > >>> > >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org