Mahout context does not include _all_ possible transitive dependencies. Would not be lighting fast to take all legacy etc. dependencies.
There's an "ignored" unit test that asserts context path correctness. you can "uningnore" it and run to verify it still works as ex[ected.The reason it is set to "ingored" is because it requires mahout environment + already built mahout in order to run successfully. i can probably look it up if you don't find it immediately. Now. mahout context only includes what's really used in the drm algebra. Which is just a handful of jars. Apache commons math is not one of them. But, your driver can add it when creating mahout context, by tinkering additionally with the method parameters there (such as spark config). However, you may incounter a problem which may be that mahout assembly currently may not build -- and copy -- commons math jar into any of mahout tree. Finally, i am against adding commons-math by default, as general algebra does not depend on it. I'd suggest, in order of preference, (1) get rid of relying on commons math random generator (surely, by now we should be ok with scala.Random or even standard random?), or (2) add dependency in a custom way per above. If there's an extremely compelling reason why commons-math random gen dependency cannot be eliminated, then a better way is to include commons math into assembly (i think right now the only assembly that really copies in dependencies is the examples; which is probably wrong as examples are not the core product here), and add it explicitly to createMahoutContext (or whatever that method's name was) code. My understanding is the random from utils was mainly encouraged because it is automatically made deterministic in tests. I am unaware any fundamental deficiencies of scala random w.r.t its uses in existing methods. So perhaps scala side needs its own "RandomUtils" for testing that do not rely on commons math. On Sun, Oct 19, 2014 at 4:36 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > Trying to upgrade from Spark 1.0.1 to 1.1.0. Can’t imagine the upgrade is > the problem but anyway... > > I get a NoClassDefFoundError for RandomGenerator when running a driver > from the CLI. But only when using a named master, even a standalone master. > If I run using master = local[4] the job executes correctly but if I set > the master to spark://Maclaurin.local:7077 though they are the same machine > I get the NoClassDefFoundError. The classpath seems correct on the CLI and > the jars do indeed contain the offending class (see below). There must be > some difference in how classes are loaded between local[4] and > spark://Maclaurin.local:7077? > > Any ideas? > > =============== > > The driver is in mahout-spark_2.10-1.0-SNAPSHOT-job.jar so it’s execution > means it must be in the classpath. When I look at what’s in the jar I see > RandomGenerator. > > Maclaurin:target pat$ jar tf mahout-spark_2.10-1.0-SNAPSHOT-job.jar | grep > RandomGenerator > cern/jet/random/engine/RandomGenerator.class > org/apache/commons/math3/random/GaussianRandomGenerator.class > org/apache/commons/math3/random/JDKRandomGenerator.class > org/apache/commons/math3/random/UniformRandomGenerator.class > org/apache/commons/math3/random/RandomGenerator.class <==========! > org/apache/commons/math3/random/NormalizedRandomGenerator.class > org/apache/commons/math3/random/AbstractRandomGenerator.class > org/apache/commons/math3/random/StableRandomGenerator.class > > But get the following error executing the job: > > 14/10/19 15:39:00 WARN scheduler.TaskSetManager: Lost task 0.0 in stage > 6.9 (TID 84, 192.168.0.2): java.lang.NoClassDefFoundError: > org/apache/commons/math3/random/RandomGenerator > org.apache.mahout.common.RandomUtils.getRandom(RandomUtils.java:65) > > org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$5.apply(SimilarityAnalysis.scala:272) > > org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$5.apply(SimilarityAnalysis.scala:267) > > org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$1.apply(MapBlock.scala:33) > > org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$1.apply(MapBlock.scala:32) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > > org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:235) > > org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163) > org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70) > org.apache.spark.rdd.RDD.iterator(RDD.scala:227) > org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > org.apache.spark.scheduler.Task.run(Task.scala:54) > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > java.lang.Thread.run(Thread.java:695) > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >