Re: Upgrade to Spark 1.1.0?

Dmitriy Lyubimov Tue, 21 Oct 2014 15:39:45 -0700

either way, for now compiling spark (with push to local maven) and then
mahout (which would use local maven artifacts) on the same machine and then
re-distributing artifacts to worker nodes should work regardless of
parameters of compilation.


On Tue, Oct 21, 2014 at 3:28 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> hm no they don't push different binary releases to maven. I assume they
> only push the default one.
>
> On Tue, Oct 21, 2014 at 3:26 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> wrote:
>
>> ps i remember discussion for packaging binary spark distributions. So
>> there's in fact a number of different spark artifact releases. However, i
>> am not sure if they are pushing them to mvn repositories. (if they did,
>> they might use different maven classifiers for those). If that's the case,
>> then one plausible strategy here is to recommend rebuilding mahout with
>> dependency to a classifier corresponding to the actual spark binary release
>> used.
>>
>> On Tue, Oct 21, 2014 at 2:21 PM, Dmitriy Lyubimov <dlie...@gmail.com>
>> wrote:
>>
>>> if you are using mahout shell or command line drivers (which i dont) it
>>> would seem the correct thing to do is for mahout script simply to take
>>> spark dependencies from installed $SPARK_HOME rather than from Mahout's
>>> assembly. In fact that would be consistent with what other projects are
>>> doing in similar situation. it should also probably make things compatible
>>> between minor releases of spark.
>>>
>>> But i think you are right in a sense that the problem is that spark jars
>>> are not uniquely encompassed by maven artifact id and version, unlike with
>>> most other products. (e.g. if we see mahout-math-0.9.jar we expect there to
>>> be one and only one released artifact in existence -- but one's local build
>>> may create incompatible variations).
>>>
>>> On Tue, Oct 21, 2014 at 1:51 PM, Pat Ferrel <p...@occamsmachete.com>
>>> wrote:
>>>
>>>> The problem is not in building Spark it is in building Mahout using the
>>>> correct Spark jars. If you are using CDH and hadoop 2 the correct jars are
>>>> in the repos.
>>>>
>>>> For the rest of us, though the process below seems like an error prone
>>>> hack to me it does work on Linux and BSD/mac. It should really be addressed
>>>> by Spark imo.
>>>>
>>>> BTW The cache is laid out differently on linux but I don’t think you
>>>> need to delete is anyway.
>>>>
>>>> On Oct 21, 2014, at 12:27 PM, Dmitriy Lyubimov <dlie...@gmail.com>
>>>> wrote:
>>>>
>>>> fwiw i never built spark using maven. Always use sbt assembly.
>>>>
>>>> On Tue, Oct 21, 2014 at 11:55 AM, Pat Ferrel <p...@occamsmachete.com>
>>>> wrote:
>>>>
>>>> > Ok, the mystery is solved.
>>>> >
>>>> > The safe sequence from my limited testing is:
>>>> > 1) delete ~/.m2/repository/org/spark and mahout
>>>> > 2) build Spark for your version of Hadoop *but do not use "mvn package
>>>> > ...”* use “mvn install …” This will put a copy of the exact bits you
>>>> need
>>>> > into the maven cache for building mahout against. In my case using
>>>> hadoop
>>>> > 1.2.1 it was "mvn -Dhadoop.version=1.2.1 -DskipTests clean install”
>>>> If you
>>>> > run tests on Spark some failures can safely be ignored according to
>>>> the
>>>> > Spark guys so check before giving up.
>>>> > 3) build mahout with “mvn clean install"
>>>> >
>>>> > This will create mahout from exactly the same bits you will run on
>>>> your
>>>> > cluster. It got rid of a missing anon function for me. The problem
>>>> occurs
>>>> > when you use a different version of Spark on your cluster than you
>>>> used to
>>>> > build Mahout and this is rather hidden by Maven. Maven downloads from
>>>> repos
>>>> > any dependency that is not in the local .m2 cache and so you have to
>>>> make
>>>> > sure your version of Spark is there so Maven wont download one that is
>>>> > incompatible. Unless you really know what you are doing I’d build both
>>>> > Spark and Mahout for now
>>>> >
>>>> > BTW I will check in the Spark 1.1.0 version of Mahout once I do some
>>>> more
>>>> > testing.
>>>> >
>>>> > On Oct 21, 2014, at 10:26 AM, Pat Ferrel <p...@occamsmachete.com>
>>>> wrote:
>>>> >
>>>> > Sorry to hear. I bet you’ll find a way.
>>>> >
>>>> > The Spark Jira trail leads to two suggestions:
>>>> > 1) use spark-submit to execute code with your own entry point (other
>>>> than
>>>> > spark-shell) One theory points to not loading all needed Spark
>>>> classes from
>>>> > calling code (Mahout in our case). I can hand check the jars for the
>>>> anon
>>>> > function I am missing.
>>>> > 2) there may be different class names in the running code (created by
>>>> > building Spark locally) and the  version referenced in the Mahout
>>>> POM. If
>>>> > this turns out to be true it means we can’t rely on building Spark
>>>> locally.
>>>> > Is there a maven target that puts the artifacts of the Spark build in
>>>> the
>>>> > .m2/repository local cache? That would be an easy way to test this
>>>> theory.
>>>> >
>>>> > either of these could cause missing classes.
>>>> >
>>>> >
>>>> > On Oct 21, 2014, at 9:52 AM, Dmitriy Lyubimov <dlie...@gmail.com>
>>>> wrote:
>>>> >
>>>> > no i havent used it with anything but 1.0.1 and 0.9.x .
>>>> >
>>>> > on a side note, I just have changed my employer. It is one of these
>>>> big
>>>> > guys that make it very difficult to do any contributions. So I am not
>>>> sure
>>>> > how much of anything i will be able to share/contribute.
>>>> >
>>>> > On Tue, Oct 21, 2014 at 9:43 AM, Pat Ferrel <p...@occamsmachete.com>
>>>> wrote:
>>>> >
>>>> >> But unless you have the time to devote to errors avoid it. I’ve built
>>>> >> everything from scratch using 1.0.2 and 1.1.0 and am getting these
>>>> and
>>>> >> missing class errors. The 1.x branch seems to have some kind of
>>>> peculiar
>>>> >> build order dependencies. The errors sometimes don’t show up until
>>>> > runtime,
>>>> >> passing all build tests.
>>>> >>
>>>> >> Dmitriy, have you successfully used any Spark version other than
>>>> 1.0.1 on
>>>> >> a cluster? If so do you recall the exact order and from what sources
>>>> you
>>>> >> built?
>>>> >>
>>>> >>
>>>> >> On Oct 21, 2014, at 9:35 AM, Dmitriy Lyubimov <dlie...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> You can't use spark client of one version and have the backend of
>>>> > another.
>>>> >> You can try to change spark dependency in mahout poms to match your
>>>> > backend
>>>> >> (or vice versa, you can change your backend to match what's on the
>>>> > client).
>>>> >>
>>>> >> On Tue, Oct 21, 2014 at 7:12 AM, Mahesh Balija <
>>>> > balijamahesh....@gmail.com
>>>> >>>
>>>> >> wrote:
>>>> >>
>>>> >>> Hi All,
>>>> >>>
>>>> >>> Here are the errors I get which I run in a pseudo distributed mode,
>>>> >>>
>>>> >>> Spark 1.0.2 and Mahout latest code (Clone)
>>>> >>>
>>>> >>> When I run the command in page,
>>>> >>> https://mahout.apache.org/users/sparkbindings/play-with-shell.html
>>>> >>>
>>>> >>> val drmX = drmData(::, 0 until 4)
>>>> >>>
>>>> >>> java.io.InvalidClassException: org.apache.spark.rdd.RDD; local class
>>>> >>> incompatible: stream classdesc serialVersionUID =
>>>> 385418487991259089,
>>>> >>> local class serialVersionUID = -6766554341038829528
>>>> >>>     at
>>>> >>> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:592)
>>>> >>>     at
>>>> >>>
>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
>>>> >>>     at
>>>> >>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1516)
>>>> >>>     at
>>>> >>>
>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
>>>> >>>     at
>>>> >>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1516)
>>>> >>>     at
>>>> >>>
>>>> >
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
>>>> >>>     at
>>>> >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
>>>> >>>     at
>>>> >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61)
>>>> >>>     at
>>>> >>>
>>>> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141)
>>>> >>>     at
>>>> >>>
>>>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1836)
>>>> >>>     at
>>>> >>>
>>>> >
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1795)
>>>> >>>     at
>>>> >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
>>>> >>>     at
>>>> >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
>>>> >>>     at
>>>> >>>
>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> >>>     at java.lang.Thread.run(Thread.java:701)
>>>> >>> 14/10/21 19:35:37 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
>>>> >>> 14/10/21 19:35:37 WARN TaskSetManager: Lost TID 2 (task 0.0:0)
>>>> >>> 14/10/21 19:35:37 WARN TaskSetManager: Lost TID 3 (task 0.0:1)
>>>> >>> 14/10/21 19:35:38 WARN TaskSetManager: Lost TID 4 (task 0.0:0)
>>>> >>> 14/10/21 19:35:38 WARN TaskSetManager: Lost TID 5 (task 0.0:1)
>>>> >>> 14/10/21 19:35:38 WARN TaskSetManager: Lost TID 6 (task 0.0:0)
>>>> >>> org.apache.spark.SparkException: Job aborted due to stage failure:
>>>> >>> Task 0.0:0 failed 4 times, most recent failure: Exception failure in
>>>> >>> TID 6 on host mahesh-VirtualBox.local:
>>>> java.io.InvalidClassException:
>>>> >>> org.apache.spark.rdd.RDD; local class incompatible: stream classdesc
>>>> >>> serialVersionUID = 385418487991259089, local class serialVersionUID
>>>> =
>>>> >>> -6766554341038829528
>>>> >>>
>>>>  java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:592)
>>>> >>>
>>>> >>>
>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
>>>> >>>
>>>> >>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1516)
>>>> >>>
>>>> >>>
>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
>>>> >>>
>>>> >>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1516)
>>>> >>>
>>>> >>>
>>>> >
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
>>>> >>>
>>>>  java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
>>>> >>>     java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61)
>>>> >>>
>>>> >>>
>>>> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141)
>>>> >>>
>>>> >>>
>>>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1836)
>>>> >>>
>>>> >>>
>>>> >
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1795)
>>>> >>>
>>>>  java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
>>>> >>>     java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
>>>> >>>
>>>> >>>
>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> >>>     java.lang.Thread.run(Thread.java:701)
>>>> >>> Driver stacktrace:
>>>> >>>     at org.apache.spark.scheduler.DAGScheduler.org
>>>> >>>
>>>> >>
>>>> >
>>>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>> >>>     at
>>>> >>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>> >>>     at scala.Option.foreach(Option.scala:236)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>>> >>>     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>> >>>     at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>> >>>     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>> >>>     at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>>> >>>     at
>>>> >>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>> >>>     at
>>>> >>>
>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>> >>>     at
>>>> >>>
>>>> >>
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>> >>>
>>>> >>> Best,
>>>> >>> Mahesh Balija.
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Tue, Oct 21, 2014 at 2:38 AM, Dmitriy Lyubimov <
>>>> dlie...@gmail.com>
>>>> >>> wrote:
>>>> >>>
>>>> >>>> On Mon, Oct 20, 2014 at 1:51 PM, Pat Ferrel <p...@occamsmachete.com
>>>> >
>>>> >>> wrote:
>>>> >>>>
>>>> >>>>> Is anyone else nervous about ignoring this issue or relying on
>>>> >>> non-build
>>>> >>>>> (hand run) test driven transitive dependency checking. I hope
>>>> someone
>>>> >>>> else
>>>> >>>>> will chime in.
>>>> >>>>>
>>>> >>>>> As to running unit tests on a TEST_MASTER I’ll look into it. Can
>>>> we
>>>> > set
>>>> >>>> up
>>>> >>>>> the build machine to do this? I’d feel better about eyeballing
>>>> deps if
>>>> >>> we
>>>> >>>>> could have a TEST_MASTER automatically run during builds at
>>>> Apache.
>>>> >>> Maybe
>>>> >>>>> the regular unit tests are OK for building locally ourselves.
>>>> >>>>>
>>>> >>>>>>
>>>> >>>>>> On Oct 20, 2014, at 12:23 PM, Dmitriy Lyubimov <
>>>> dlie...@gmail.com>
>>>> >>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> On Mon, Oct 20, 2014 at 11:44 AM, Pat Ferrel <
>>>> p...@occamsmachete.com>
>>>> >>>>> wrote:
>>>> >>>>>>
>>>> >>>>>>> Maybe a more fundamental issue is that we don’t know for sure
>>>> >>> whether
>>>> >>>> we
>>>> >>>>>>> have missing classes or not. The job.jar at least used the pom
>>>> >>>>> dependencies
>>>> >>>>>>> to guarantee every needed class was present. So the job.jar
>>>> seems to
>>>> >>>>> solve
>>>> >>>>>>> the problem but may ship some unnecessary duplicate code, right?
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>> No, as i wrote spark doesn't  work with job jar format. Neither
>>>> as it
>>>> >>>>> turns
>>>> >>>>>> out more recent hadoop MR btw.
>>>> >>>>>
>>>> >>>>> Not speaking literally of the format. Spark understands jars and
>>>> maven
>>>> >>>> can
>>>> >>>>> build one from transitive dependencies.
>>>> >>>>>
>>>> >>>>>>
>>>> >>>>>> Yes, this is A LOT of duplicate code (will take normally MINUTES
>>>> to
>>>> >>>>> startup
>>>> >>>>>> tasks with all of it just on copy time). This is absolutely not
>>>> the
>>>> >>> way
>>>> >>>>> to
>>>> >>>>>> go with this.
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>> Lack of guarantee to load seems like a bigger problem than startup
>>>> >>> time.
>>>> >>>>> Clearly we can’t just ignore this.
>>>> >>>>>
>>>> >>>>
>>>> >>>> Nope. given highly iterative nature and dynamic task allocation in
>>>> this
>>>> >>>> environment, one is looking to effects similar to Map Reduce. This
>>>> is
>>>> >> not
>>>> >>>> the only reason why I never go to MR anymore, but that's one of
>>>> main
>>>> >>> ones.
>>>> >>>>
>>>> >>>> How about experiment: why don't you create assembly that copies ALL
>>>> >>>> transitive dependencies in one folder, and then try to broadcast it
>>>> > from
>>>> >>>> single point (front end) to well... let's start with 20 machines.
>>>> (of
>>>> >>>> course we ideally want to into 10^3 ..10^4 range -- but why bother
>>>> if
>>>> > we
>>>> >>>> can't do it for 20).
>>>> >>>>
>>>> >>>> Or, heck, let's try to simply parallel-copy it between too
>>>> machines 20
>>>> >>>> times that are not collocated on the same subnet.
>>>> >>>>
>>>> >>>>
>>>> >>>>>>
>>>> >>>>>>> There may be any number of bugs waiting for the time we try
>>>> running
>>>> >>>> on a
>>>> >>>>>>> node machine that doesn’t have some class in it’s classpath.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> No. Assuming any given method is tested on all its execution
>>>> paths,
>>>> >>>> there
>>>> >>>>>> will be no bugs. The bugs of that sort will only appear if the
>>>> user
>>>> >>> is
>>>> >>>>>> using algebra directly and calls something that is not on the
>>>> path,
>>>> >>>> from
>>>> >>>>>> the closure. In which case our answer to this is the same as for
>>>> the
>>>> >>>>> solver
>>>> >>>>>> methodology developers -- use customized SparkConf while creating
>>>> >>>> context
>>>> >>>>>> to include stuff you really want.
>>>> >>>>>>
>>>> >>>>>> Also another right answer to this is that we probably should
>>>> >>> reasonably
>>>> >>>>>> provide the toolset here. For example, all the stats stuff found
>>>> in R
>>>> >>>>> base
>>>> >>>>>> and R stat packages so the user is not compelled to go
>>>> non-native.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>> Huh? this is not true. The one I ran into was found by calling
>>>> >>> something
>>>> >>>>> in math from something in math-scala. It led outside and you can
>>>> >>>> encounter
>>>> >>>>> such things even in algebra.  In fact you have no idea if these
>>>> >>> problems
>>>> >>>>> exists except for the fact you have used it a lot personally.
>>>> >>>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> You ran it with your own code that never existed before.
>>>> >>>>
>>>> >>>> But there's difference between released Mahout code (which is what
>>>> you
>>>> >>> are
>>>> >>>> working on) and the user code. Released code must run thru remote
>>>> tests
>>>> >>> as
>>>> >>>> you suggested and thus guarantee there are no such problems with
>>>> post
>>>> >>>> release code.
>>>> >>>>
>>>> >>>> For users, we only can provide a way for them to load stuff that
>>>> they
>>>> >>>> decide to use. We don't have apriori knowledge what they will use.
>>>> It
>>>> > is
>>>> >>>> the same thing that spark does, and the same thing that MR does,
>>>> > doesn't
>>>> >>>> it?
>>>> >>>>
>>>> >>>> Of course mahout should drop rigorously the stuff it doesn't load,
>>>> from
>>>> >>> the
>>>> >>>> scala scope. No argue about that. In fact that's what i suggested
>>>> as #1
>>>> >>>> solution. But there's nothing much to do here but to go dependency
>>>> >>>> cleansing for math and spark code. Part of the reason there's so
>>>> much
>>>> > is
>>>> >>>> because newer modules still bring in everything from mrLegacy.
>>>> >>>>
>>>> >>>> You are right in saying it is hard to guess what else dependencies
>>>> are
>>>> >> in
>>>> >>>> the util/legacy code that are actually used. but that's not a
>>>> >>> justification
>>>> >>>> for brute force "copy them all" approach that virtually guarantees
>>>> >>> ruining
>>>> >>>> one of the foremost legacy issues this work intended to address.
>>>> >>>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>

Re: Upgrade to Spark 1.1.0?

Reply via email to