Ok, maybe these test versions will help me then. I’ll check it out.

On 15.12.2014, at 22:33, Michael Armbrust <mich...@databricks.com> wrote:

> Using a single SparkContext should not cause this problem.  In the SQL tests 
> we use TestSQLContext and TestHive which are global singletons for all of our 
> unit testing.
> 
> On Mon, Dec 15, 2014 at 1:27 PM, Marius Soutier <mps....@gmail.com> wrote:
> Possible, yes, although I’m trying everything I can to prevent it, i.e. fork 
> in Test := true and isolated. Can you confirm that reusing a single 
> SparkContext for multiple tests poses a problem as well?
> 
> Other than that, just switching from SQLContext to HiveContext also provoked 
> the error.
> 
> 
> On 15.12.2014, at 20:22, Michael Armbrust <mich...@databricks.com> wrote:
> 
>> Is it possible that you are starting more than one SparkContext in a single 
>> JVM with out stopping previous ones?  I'd try testing with Spark 1.2, which 
>> will throw an exception in this case.
>> 
>> On Mon, Dec 15, 2014 at 8:48 AM, Marius Soutier <mps....@gmail.com> wrote:
>> Hi,
>> 
>> I’m seeing strange, random errors when running unit tests for my Spark jobs. 
>> In this particular case I’m using Spark SQL to read and write Parquet files, 
>> and one error that I keep running into is this one:
>> 
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 
>> in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 
>> 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2)
>>         org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
>>         org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
>>         org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545)
>> 
>> I can only prevent this from happening by using isolated Specs tests thats 
>> always create a new SparkContext that is not shared between tests (but there 
>> can also be only a single SparkContext per test), and also by using standard 
>> SQLContext instead of HiveContext. It does not seem to have anything to do 
>> with the actual files that I also create during the test run with 
>> SQLContext.saveAsParquetFile.
>> 
>> 
>> Cheers
>> - Marius
>> 
>> 
>> PS The full trace:
>> 
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 
>> in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 
>> 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2)
>>         org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
>>         org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
>>         org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545)
>>         
>> org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125)
>>         
>> org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)
>>         org.xerial.snappy.SnappyInputStream.<init>(SnappyInputStream.java:58)
>>         
>> org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)
>>         
>> org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:232)
>>         
>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readObject$1.apply$mcV$sp(TorrentBroadcast.scala:169)
>>         org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:927)
>>         
>> org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:155)
>>         sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>>         
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         java.lang.reflect.Method.invoke(Method.java:606)
>>         
>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>>         java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>>         
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>         java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>         
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>         java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>         
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>         java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>         java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>         
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
>>         
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
>>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:160)
>>         
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         java.lang.Thread.run(Thread.java:745)
>> Driver stacktrace:
>>         at 
>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
>>  ~[spark-core_2.10-1.1.1.jar:1.1.1]
>>         at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
>>  ~[spark-core_2.10-1.1.1.jar:1.1.1]
>>         at 
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
>>  ~[spark-core_2.10-1.1.1.jar:1.1.1]
>>         at 
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>  ~[scala-library.jar:na]
>>         at 
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
>> ~[scala-library.jar:na]
>>         at 
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) 
>> ~[spark-core_2.10-1.1.1.jar:1.1.1]
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>> 
> 

Reply via email to