Re: Upgrade to Spark 1.1.0?

Dmitriy Lyubimov Tue, 21 Oct 2014 12:27:59 -0700

fwiw i never built spark using maven. Always use sbt assembly.

On Tue, Oct 21, 2014 at 11:55 AM, Pat Ferrel <p...@occamsmachete.com> wrote:


> Ok, the mystery is solved.
>
> The safe sequence from my limited testing is:
> 1) delete ~/.m2/repository/org/spark and mahout
> 2) build Spark for your version of Hadoop *but do not use "mvn package
> ...”* use “mvn install …” This will put a copy of the exact bits you need
> into the maven cache for building mahout against. In my case using hadoop
> 1.2.1 it was "mvn -Dhadoop.version=1.2.1 -DskipTests clean install” If you
> run tests on Spark some failures can safely be ignored according to the
> Spark guys so check before giving up.
> 3) build mahout with “mvn clean install"
>
> This will create mahout from exactly the same bits you will run on your
> cluster. It got rid of a missing anon function for me. The problem occurs
> when you use a different version of Spark on your cluster than you used to
> build Mahout and this is rather hidden by Maven. Maven downloads from repos
> any dependency that is not in the local .m2 cache and so you have to make
> sure your version of Spark is there so Maven wont download one that is
> incompatible. Unless you really know what you are doing I’d build both
> Spark and Mahout for now
>
> BTW I will check in the Spark 1.1.0 version of Mahout once I do some more
> testing.
>
> On Oct 21, 2014, at 10:26 AM, Pat Ferrel <p...@occamsmachete.com> wrote:
>
> Sorry to hear. I bet you’ll find a way.
>
> The Spark Jira trail leads to two suggestions:
> 1) use spark-submit to execute code with your own entry point (other than
> spark-shell) One theory points to not loading all needed Spark classes from
> calling code (Mahout in our case). I can hand check the jars for the anon
> function I am missing.
> 2) there may be different class names in the running code (created by
> building Spark locally) and the  version referenced in the Mahout POM. If
> this turns out to be true it means we can’t rely on building Spark locally.
> Is there a maven target that puts the artifacts of the Spark build in the
> .m2/repository local cache? That would be an easy way to test this theory.
>
> either of these could cause missing classes.
>
>
> On Oct 21, 2014, at 9:52 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>
> no i havent used it with anything but 1.0.1 and 0.9.x .
>
> on a side note, I just have changed my employer. It is one of these big
> guys that make it very difficult to do any contributions. So I am not sure
> how much of anything i will be able to share/contribute.
>
> On Tue, Oct 21, 2014 at 9:43 AM, Pat Ferrel <p...@occamsmachete.com> wrote:
>
> > But unless you have the time to devote to errors avoid it. I’ve built
> > everything from scratch using 1.0.2 and 1.1.0 and am getting these and
> > missing class errors. The 1.x branch seems to have some kind of peculiar
> > build order dependencies. The errors sometimes don’t show up until
> runtime,
> > passing all build tests.
> >
> > Dmitriy, have you successfully used any Spark version other than 1.0.1 on
> > a cluster? If so do you recall the exact order and from what sources you
> > built?
> >
> >
> > On Oct 21, 2014, at 9:35 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> >
> > You can't use spark client of one version and have the backend of
> another.
> > You can try to change spark dependency in mahout poms to match your
> backend
> > (or vice versa, you can change your backend to match what's on the
> client).
> >
> > On Tue, Oct 21, 2014 at 7:12 AM, Mahesh Balija <
> balijamahesh....@gmail.com
> >>
> > wrote:
> >
> >> Hi All,
> >>
> >> Here are the errors I get which I run in a pseudo distributed mode,
> >>
> >> Spark 1.0.2 and Mahout latest code (Clone)
> >>
> >> When I run the command in page,
> >> https://mahout.apache.org/users/sparkbindings/play-with-shell.html
> >>
> >> val drmX = drmData(::, 0 until 4)
> >>
> >> java.io.InvalidClassException: org.apache.spark.rdd.RDD; local class
> >> incompatible: stream classdesc serialVersionUID = 385418487991259089,
> >> local class serialVersionUID = -6766554341038829528
> >>      at
> >> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:592)
> >>      at
> >> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
> >>      at
> >> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1516)
> >>      at
> >> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
> >>      at
> >> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1516)
> >>      at
> >>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
> >>      at
> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
> >>      at
> > java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
> >>      at
> >>
> >
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
> >>      at
> >>
> >
> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61)
> >>      at
> >> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141)
> >>      at
> >> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1836)
> >>      at
> >>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1795)
> >>      at
> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
> >>      at
> > java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
> >>      at
> >>
> >
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
> >>      at
> >>
> >
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
> >>      at
> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
> >>      at
> >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >>      at
> >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>      at java.lang.Thread.run(Thread.java:701)
> >> 14/10/21 19:35:37 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
> >> 14/10/21 19:35:37 WARN TaskSetManager: Lost TID 2 (task 0.0:0)
> >> 14/10/21 19:35:37 WARN TaskSetManager: Lost TID 3 (task 0.0:1)
> >> 14/10/21 19:35:38 WARN TaskSetManager: Lost TID 4 (task 0.0:0)
> >> 14/10/21 19:35:38 WARN TaskSetManager: Lost TID 5 (task 0.0:1)
> >> 14/10/21 19:35:38 WARN TaskSetManager: Lost TID 6 (task 0.0:0)
> >> org.apache.spark.SparkException: Job aborted due to stage failure:
> >> Task 0.0:0 failed 4 times, most recent failure: Exception failure in
> >> TID 6 on host mahesh-VirtualBox.local: java.io.InvalidClassException:
> >> org.apache.spark.rdd.RDD; local class incompatible: stream classdesc
> >> serialVersionUID = 385418487991259089, local class serialVersionUID =
> >> -6766554341038829528
> >>      java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:592)
> >>
> >> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
> >>
> >> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1516)
> >>
> >> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
> >>
> >> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1516)
> >>
> >>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
> >>      java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
> >>      java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
> >>
> >>
> >
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
> >>
> >>
> >
> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61)
> >>
> >> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141)
> >>
> >> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1836)
> >>
> >>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1795)
> >>      java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
> >>      java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
> >>
> >>
> >
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
> >>
> >>
> >
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
> >>
> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
> >>
> >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >>
> >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>      java.lang.Thread.run(Thread.java:701)
> >> Driver stacktrace:
> >>      at org.apache.spark.scheduler.DAGScheduler.org
> >>
> >
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
> >>      at
> >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
> >>      at
> >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
> >>      at
> >>
> >
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> >>      at
> >> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> >>      at
> >>
> >
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
> >>      at
> >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
> >>      at
> >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
> >>      at scala.Option.foreach(Option.scala:236)
> >>      at
> >>
> >
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
> >>      at
> >>
> >
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
> >>      at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> >>      at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> >>      at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> >>      at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> >>      at
> >>
> >
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> >>      at
> >> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >>      at
> >>
> >
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> >>      at
> >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >>      at
> >>
> >
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >>
> >> Best,
> >> Mahesh Balija.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Oct 21, 2014 at 2:38 AM, Dmitriy Lyubimov <dlie...@gmail.com>
> >> wrote:
> >>
> >>> On Mon, Oct 20, 2014 at 1:51 PM, Pat Ferrel <p...@occamsmachete.com>
> >> wrote:
> >>>
> >>>> Is anyone else nervous about ignoring this issue or relying on
> >> non-build
> >>>> (hand run) test driven transitive dependency checking. I hope someone
> >>> else
> >>>> will chime in.
> >>>>
> >>>> As to running unit tests on a TEST_MASTER I’ll look into it. Can we
> set
> >>> up
> >>>> the build machine to do this? I’d feel better about eyeballing deps if
> >> we
> >>>> could have a TEST_MASTER automatically run during builds at Apache.
> >> Maybe
> >>>> the regular unit tests are OK for building locally ourselves.
> >>>>
> >>>>>
> >>>>> On Oct 20, 2014, at 12:23 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> On Mon, Oct 20, 2014 at 11:44 AM, Pat Ferrel <p...@occamsmachete.com>
> >>>> wrote:
> >>>>>
> >>>>>> Maybe a more fundamental issue is that we don’t know for sure
> >> whether
> >>> we
> >>>>>> have missing classes or not. The job.jar at least used the pom
> >>>> dependencies
> >>>>>> to guarantee every needed class was present. So the job.jar seems to
> >>>> solve
> >>>>>> the problem but may ship some unnecessary duplicate code, right?
> >>>>>>
> >>>>>
> >>>>> No, as i wrote spark doesn't  work with job jar format. Neither as it
> >>>> turns
> >>>>> out more recent hadoop MR btw.
> >>>>
> >>>> Not speaking literally of the format. Spark understands jars and maven
> >>> can
> >>>> build one from transitive dependencies.
> >>>>
> >>>>>
> >>>>> Yes, this is A LOT of duplicate code (will take normally MINUTES to
> >>>> startup
> >>>>> tasks with all of it just on copy time). This is absolutely not the
> >> way
> >>>> to
> >>>>> go with this.
> >>>>>
> >>>>
> >>>> Lack of guarantee to load seems like a bigger problem than startup
> >> time.
> >>>> Clearly we can’t just ignore this.
> >>>>
> >>>
> >>> Nope. given highly iterative nature and dynamic task allocation in this
> >>> environment, one is looking to effects similar to Map Reduce. This is
> > not
> >>> the only reason why I never go to MR anymore, but that's one of main
> >> ones.
> >>>
> >>> How about experiment: why don't you create assembly that copies ALL
> >>> transitive dependencies in one folder, and then try to broadcast it
> from
> >>> single point (front end) to well... let's start with 20 machines. (of
> >>> course we ideally want to into 10^3 ..10^4 range -- but why bother if
> we
> >>> can't do it for 20).
> >>>
> >>> Or, heck, let's try to simply parallel-copy it between too machines 20
> >>> times that are not collocated on the same subnet.
> >>>
> >>>
> >>>>>
> >>>>>> There may be any number of bugs waiting for the time we try running
> >>> on a
> >>>>>> node machine that doesn’t have some class in it’s classpath.
> >>>>>
> >>>>>
> >>>>> No. Assuming any given method is tested on all its execution paths,
> >>> there
> >>>>> will be no bugs. The bugs of that sort will only appear if the user
> >> is
> >>>>> using algebra directly and calls something that is not on the path,
> >>> from
> >>>>> the closure. In which case our answer to this is the same as for the
> >>>> solver
> >>>>> methodology developers -- use customized SparkConf while creating
> >>> context
> >>>>> to include stuff you really want.
> >>>>>
> >>>>> Also another right answer to this is that we probably should
> >> reasonably
> >>>>> provide the toolset here. For example, all the stats stuff found in R
> >>>> base
> >>>>> and R stat packages so the user is not compelled to go non-native.
> >>>>>
> >>>>>
> >>>>
> >>>> Huh? this is not true. The one I ran into was found by calling
> >> something
> >>>> in math from something in math-scala. It led outside and you can
> >>> encounter
> >>>> such things even in algebra.  In fact you have no idea if these
> >> problems
> >>>> exists except for the fact you have used it a lot personally.
> >>>>
> >>>
> >>>
> >>> You ran it with your own code that never existed before.
> >>>
> >>> But there's difference between released Mahout code (which is what you
> >> are
> >>> working on) and the user code. Released code must run thru remote tests
> >> as
> >>> you suggested and thus guarantee there are no such problems with post
> >>> release code.
> >>>
> >>> For users, we only can provide a way for them to load stuff that they
> >>> decide to use. We don't have apriori knowledge what they will use. It
> is
> >>> the same thing that spark does, and the same thing that MR does,
> doesn't
> >>> it?
> >>>
> >>> Of course mahout should drop rigorously the stuff it doesn't load, from
> >> the
> >>> scala scope. No argue about that. In fact that's what i suggested as #1
> >>> solution. But there's nothing much to do here but to go dependency
> >>> cleansing for math and spark code. Part of the reason there's so much
> is
> >>> because newer modules still bring in everything from mrLegacy.
> >>>
> >>> You are right in saying it is hard to guess what else dependencies are
> > in
> >>> the util/legacy code that are actually used. but that's not a
> >> justification
> >>> for brute force "copy them all" approach that virtually guarantees
> >> ruining
> >>> one of the foremost legacy issues this work intended to address.
> >>>
> >>
> >
> >
>
>
>

Re: Upgrade to Spark 1.1.0?

Reply via email to