Re: Upgrade to Spark 1.2.1 using Guava

Marcelo Vanzin Fri, 27 Feb 2015 12:03:09 -0800

Ah, I see. That makes a lot of sense now.

You might be running into some weird class loader visibility issue.
I've seen some bugs in jira about this in the past, maybe you're
hitting one of them.


Until I have some time to investigate (of if you're curious feel free
to scavenge jira), a workaround could be to manually copy the guava
jar to your executor nodes, and add them to the executor's class path
manually (spark.executor.extraClassPath). That will place your guava
in the Spark classloader (vs. your app's class loader when using
--jars), and things should work.


On Fri, Feb 27, 2015 at 11:52 AM, Pat Ferrel <p...@occamsmachete.com> wrote:
> I understand that I need to supply Guava to Spark. The HashBiMap is created 
> in the client and broadcast to the workers. So it is needed in both. To 
> achieve this there is a deps.jar with Guava (and Scopt but that is only for 
> the client). Scopt is found so I know the jar is fine for the client.
>
> I pass in the deps.jar to the context creation code. I’ve checked the content 
> of the jar and have verified that it is used at context creation time.
>
> I register the serializer as follows:
>
> class MyKryoRegistrator extends KryoRegistrator {
>
>   override def registerClasses(kryo: Kryo) = {
>
>     val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]()
>     //kryo.addDefaultSerializer(h.getClass, new JavaSerializer())
>     log.info("\n\n\nRegister Serializer for " + h.getClass.getCanonicalName + 
> "\n\n\n") // just to be sure this does indeed get logged
>     kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], 
> new JavaSerializer())
>   }
> }
>
> The job proceeds up until the broadcast value, a HashBiMap, is deserialized, 
> which is where I get the following error.
>
> Have I missed a step for deserialization of broadcast values? Odd that 
> serialization happened but deserialization failed. I’m running on a 
> standalone localhost-only cluster.
>
>
> 15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 
> (TID 9, 192.168.0.2): java.io.IOException: 
> com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
>         at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
>         at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
>         at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
>         at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
>         at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
>         at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>         at 
> my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
>         at 
> my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         at 
> org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
>         at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
>         at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:56)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: com.esotericsoftware.kryo.KryoException: Error during Java 
> deserialization.
>         at 
> com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
>         at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
>         at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
>         at 
> org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
>         at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
>         at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
>         ... 19 more
>
> ======== root eror ==========
> Caused by: java.lang.ClassNotFoundException: 
> com.google.common.collect.HashBiMap
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>         ...
>
>
>
>
>
>
>
>
> On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
>
> Guava is not in Spark. (Well, long version: it's in Spark but it's
> relocated to a different package except for some special classes
> leaked through the public API.)
>
> If your app needs Guava, it needs to package Guava with it (e.g. by
> using maven-shade-plugin, or using "--jars" if only executors use
> Guava).
>
> On Wed, Feb 25, 2015 at 5:17 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
>> The root Spark pom has guava set at a certain version number. It’s very hard
>> to read the shading xml. Someone suggested that I try using
>> userClassPathFirst but that sounds too heavy handed since I don’t really
>> care which version of guava I get, not picky.
>>
>> When I set my project to use the same version as Spark I get a missing
>> classdef, which usually means a version conflict.
>>
>> At this point I am quite confused about what is actually in Spark as far as
>> Guava and how to coexist with it happily.
>>
>> Let me rephrase my question: Does anyone know how or has anyone used Guava
>> in a project? Is there a recommended way to use it in a job?
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Upgrade to Spark 1.2.1 using Guava

Reply via email to