Mahout context does not include _all_ possible transitive dependencies.
Would not be lighting fast to take all legacy etc. dependencies.

There's an "ignored" unit test that asserts context path correctness. you
can "uningnore" it and run to verify it still works as ex[ected.The reason
it is set to "ingored" is because it requires mahout environment + already
built mahout in order to run successfully. i can probably look it up if you
don't find it immediately.


Now.
mahout context only includes what's really used in the drm algebra. Which
is just a handful of jars. Apache commons math is not one of them.

But, your driver can add it when creating mahout context, by tinkering
additionally with the method parameters there (such as spark config).
However, you may incounter a problem which may be that mahout assembly
currently may not build -- and copy -- commons math jar into any of mahout
tree.

Finally, i am against adding commons-math by default, as general algebra
does not depend on it. I'd suggest, in order of preference, (1) get rid of
relying on commons math random generator (surely, by now we should be ok
with scala.Random or even standard random?), or (2) add dependency in a
custom way per above.

If there's an extremely compelling reason why commons-math random gen
dependency cannot be eliminated, then a better way is to include commons
math into assembly (i think right now the only assembly that really copies
in dependencies is the examples; which is probably wrong as examples are
not the core product here), and add it explicitly to createMahoutContext
(or whatever that method's name was) code.

My understanding is the random from utils was mainly encouraged because it
is automatically made deterministic in tests. I am unaware any fundamental
deficiencies of scala random w.r.t its uses in existing methods. So perhaps
scala side needs its own "RandomUtils" for testing that do not rely on
commons math.


On Sun, Oct 19, 2014 at 4:36 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> Trying to upgrade from Spark 1.0.1 to 1.1.0. Can’t imagine the upgrade is
> the problem but anyway...
>
> I get a NoClassDefFoundError for RandomGenerator when running a driver
> from the CLI. But only when using a named master, even a standalone master.
> If I run using master = local[4] the job executes correctly but if I set
> the master to spark://Maclaurin.local:7077 though they are the same machine
> I get the NoClassDefFoundError. The classpath seems correct on the CLI and
> the jars do indeed contain the offending class (see below). There must be
> some difference in how classes are loaded between local[4] and
> spark://Maclaurin.local:7077?
>
> Any ideas?
>
> ===============
>
> The driver is in mahout-spark_2.10-1.0-SNAPSHOT-job.jar so it’s execution
> means it must be in the classpath. When I look at what’s in the jar I see
> RandomGenerator.
>
> Maclaurin:target pat$ jar tf mahout-spark_2.10-1.0-SNAPSHOT-job.jar | grep
> RandomGenerator
> cern/jet/random/engine/RandomGenerator.class
> org/apache/commons/math3/random/GaussianRandomGenerator.class
> org/apache/commons/math3/random/JDKRandomGenerator.class
> org/apache/commons/math3/random/UniformRandomGenerator.class
> org/apache/commons/math3/random/RandomGenerator.class  <==========!
> org/apache/commons/math3/random/NormalizedRandomGenerator.class
> org/apache/commons/math3/random/AbstractRandomGenerator.class
> org/apache/commons/math3/random/StableRandomGenerator.class
>
> But get the following error executing the job:
>
> 14/10/19 15:39:00 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
> 6.9 (TID 84, 192.168.0.2): java.lang.NoClassDefFoundError:
> org/apache/commons/math3/random/RandomGenerator
>         org.apache.mahout.common.RandomUtils.getRandom(RandomUtils.java:65)
>
> org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$5.apply(SimilarityAnalysis.scala:272)
>
> org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$5.apply(SimilarityAnalysis.scala:267)
>
> org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$1.apply(MapBlock.scala:33)
>
> org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$1.apply(MapBlock.scala:32)
>         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:235)
>
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
>         org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
>         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>         org.apache.spark.scheduler.Task.run(Task.scala:54)
>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>         java.lang.Thread.run(Thread.java:695)
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to