Thanks for let us now. On Fri, Jun 5, 2015 at 8:34 AM, Sam Stoelinga <sammiest...@gmail.com> wrote: > Please ignore this whole thread. It's working out of nowhere. I'm not sure > what was the root cause. After I restarted the VM the previous SIFT code > also started working. > > On Fri, Jun 5, 2015 at 10:40 PM, Sam Stoelinga <sammiest...@gmail.com> > wrote: >> >> Thanks Davies. I will file a bug later with code and single image as >> dataset. Next to that I can give anybody access to my vagrant VM that >> already has spark with OpenCV and the dataset available. >> >> Or you can setup the same vagrant machine at your place. All is automated >> ^^ >> git clone https://github.com/samos123/computer-vision-cloud-platform >> cd computer-vision-cloud-platform >> ./scripts/setup.sh >> vagrant ssh >> >> (Expect failures, I haven't cleaned up and tested it for other people) btw >> I study at Tsinghua also currently. >> >> On Fri, Jun 5, 2015 at 2:43 PM, Davies Liu <dav...@databricks.com> wrote: >>> >>> Please file a bug here: https://issues.apache.org/jira/browse/SPARK/ >>> >>> Could you also provide a way to reproduce this bug (including some >>> datasets)? >>> >>> On Thu, Jun 4, 2015 at 11:30 PM, Sam Stoelinga <sammiest...@gmail.com> >>> wrote: >>> > I've changed the SIFT feature extraction to SURF feature extraction and >>> > it >>> > works... >>> > >>> > Following line was changed: >>> > sift = cv2.xfeatures2d.SIFT_create() >>> > >>> > to >>> > >>> > sift = cv2.xfeatures2d.SURF_create() >>> > >>> > Where should I file this as a bug? When not running on Spark it works >>> > fine >>> > so I'm saying it's a spark bug. >>> > >>> > On Fri, Jun 5, 2015 at 2:17 PM, Sam Stoelinga <sammiest...@gmail.com> >>> > wrote: >>> >> >>> >> Yea should have emphasized that. I'm running the same code on the same >>> >> VM. >>> >> It's a VM with spark in standalone mode and I run the unit test >>> >> directly on >>> >> that same VM. So OpenCV is working correctly on that same machine but >>> >> when >>> >> moving the exact same OpenCV code to spark it just crashes. >>> >> >>> >> On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu <dav...@databricks.com> >>> >> wrote: >>> >>> >>> >>> Could you run the single thread version in worker machine to make >>> >>> sure >>> >>> that OpenCV is installed and configured correctly? >>> >>> >>> >>> On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga >>> >>> <sammiest...@gmail.com> >>> >>> wrote: >>> >>> > I've verified the issue lies within Spark running OpenCV code and >>> >>> > not >>> >>> > within >>> >>> > the sequence file BytesWritable formatting. >>> >>> > >>> >>> > This is the code which can reproduce that spark is causing the >>> >>> > failure >>> >>> > by >>> >>> > not using the sequencefile as input at all but running the same >>> >>> > function >>> >>> > with same input on spark but fails: >>> >>> > >>> >>> > def extract_sift_features_opencv(imgfile_imgbytes): >>> >>> > imgfilename, discardsequencefile = imgfile_imgbytes >>> >>> > imgbytes = bytearray(open("/tmp/img.jpg", "rb").read()) >>> >>> > nparr = np.fromstring(buffer(imgbytes), np.uint8) >>> >>> > img = cv2.imdecode(nparr, 1) >>> >>> > gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) >>> >>> > sift = cv2.xfeatures2d.SIFT_create() >>> >>> > kp, descriptors = sift.detectAndCompute(gray, None) >>> >>> > return (imgfilename, "test") >>> >>> > >>> >>> > And corresponding tests.py: >>> >>> > https://gist.github.com/samos123/d383c26f6d47d34d32d6 >>> >>> > >>> >>> > >>> >>> > On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga >>> >>> > <sammiest...@gmail.com> >>> >>> > wrote: >>> >>> >> >>> >>> >> Thanks for the advice! The following line causes spark to crash: >>> >>> >> >>> >>> >> kp, descriptors = sift.detectAndCompute(gray, None) >>> >>> >> >>> >>> >> But I do need this line to be executed and the code does not crash >>> >>> >> when >>> >>> >> running outside of Spark but passing the same parameters. You're >>> >>> >> saying >>> >>> >> maybe the bytes from the sequencefile got somehow transformed and >>> >>> >> don't >>> >>> >> represent an image anymore causing OpenCV to crash the whole >>> >>> >> python >>> >>> >> executor. >>> >>> >> >>> >>> >> On Fri, May 29, 2015 at 2:06 AM, Davies Liu >>> >>> >> <dav...@databricks.com> >>> >>> >> wrote: >>> >>> >>> >>> >>> >>> Could you try to comment out some lines in >>> >>> >>> `extract_sift_features_opencv` to find which line cause the >>> >>> >>> crash? >>> >>> >>> >>> >>> >>> If the bytes came from sequenceFile() is broken, it's easy to >>> >>> >>> crash a >>> >>> >>> C library in Python (OpenCV). >>> >>> >>> >>> >>> >>> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga >>> >>> >>> <sammiest...@gmail.com> >>> >>> >>> wrote: >>> >>> >>> > Hi sparkers, >>> >>> >>> > >>> >>> >>> > I am working on a PySpark application which uses the OpenCV >>> >>> >>> > library. It >>> >>> >>> > runs >>> >>> >>> > fine when running the code locally but when I try to run it on >>> >>> >>> > Spark on >>> >>> >>> > the >>> >>> >>> > same Machine it crashes the worker. >>> >>> >>> > >>> >>> >>> > The code can be found here: >>> >>> >>> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f >>> >>> >>> > >>> >>> >>> > This is the error message taken from STDERR of the worker log: >>> >>> >>> > https://gist.github.com/samos123/3300191684aee7fc8013 >>> >>> >>> > >>> >>> >>> > Would like pointers or tips on how to debug further? Would be >>> >>> >>> > nice >>> >>> >>> > to >>> >>> >>> > know >>> >>> >>> > the reason why the worker crashed. >>> >>> >>> > >>> >>> >>> > Thanks, >>> >>> >>> > Sam Stoelinga >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > org.apache.spark.SparkException: Python worker exited >>> >>> >>> > unexpectedly >>> >>> >>> > (crashed) >>> >>> >>> > at >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172) >>> >>> >>> > at >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176) >>> >>> >>> > at >>> >>> >>> > >>> >>> >>> > org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94) >>> >>> >>> > at >>> >>> >>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>> >>> >>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>> >>> >>> > at >>> >>> >>> > >>> >>> >>> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >>> >>> >>> > at org.apache.spark.scheduler.Task.run(Task.scala:64) >>> >>> >>> > at >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) >>> >>> >>> > at >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> >>> >>> > at >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> >>> >>> > at java.lang.Thread.run(Thread.java:745) >>> >>> >>> > Caused by: java.io.EOFException >>> >>> >>> > at java.io.DataInputStream.readInt(DataInputStream.java:392) >>> >>> >>> > at >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108) >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > >>> >>> >> >>> >>> >> >>> >>> > >>> >> >>> >> >>> > >> >> >
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org