PS

all jar-finding routines rely on MAHOUT_HOME variable to find jars. so if
you add some logic to add custom mahout jar to context, it should rely on
it too.

Perhaps the solution could be along the following lines.

findMahoutJars() finds minimally required set of jars to run. Perhaps we
can add all Mahout transitive dependencies (bar stuff like hadoop and hbase
which already present in Spark) to some folder in mahout tree, say
$MAHOUT_HOME/libManaged (similar to SBT).

Knowing that, we perhaps can add a helper, findMahoutDependencyJars(),
which will accept one or more artifact name for finding jars from
$MAHOUT_HOME/libManged, similarly to how findMahoutJars() do it.

findMahoutDependencyJars() should assert that it found all jars requested.

Then driver code could use that helper to create additoinal jars in
SparkConf before requesting Spark context.

So for example, in your case driver should say

findMahoutDependencyJars( "commons-math" :: Nil )


and then add the result to SparkConf.



On Mon, Oct 20, 2014 at 11:05 AM, Dmitriy Lyubimov <dlie...@gmail.com>
wrote:

>
>
> On Mon, Oct 20, 2014 at 10:49 AM, Pat Ferrel <p...@occamsmachete.com>
> wrote:
>
>> I agree it’s just that different classes, required by mahout are missing
>> from the environment depending on what happens to be in Spark. These deps
>> should be supplied in the job.jar assemblies, right?
>>
>
> No. They should be physically available as jars, somewhere. E.g. in
> compiled mahout tree.
>
> the "job.xml" assembly in the "spark" module is but a left over from an
> experiment i ran on job jars with Spark long ago. It's just hanging around
> there but not actually being built. Sorry for confusion. DRM doesn't use
> job jars. As far as I have established, Spark does not understand job jars
> (it's purely a Hadoop notion -- but even there it has been unsupported or
> depricated for a long time now).
>
> So. we can e.g. create a new assembly for spark, such as "optional
> dependencies" jars, and put it somewhere into the compiled tree. (I guess
> similar to "managed libraries" notion in SBT.).
>
> Then, if you need any of those, your driver code needs to do the
> following. The mahoutSparkContext() method accepts optional SparkConf
> parameter. Additional jars could be added to SparkConf before passing on to
> mahoutSparkContext. If you don't supply SparkConf, the method will create
> default one. If you do, it will merge all mahout specific settings and
> standard jars to the context information you supply.
>
> As far as i see, by default context includes only math, math-scala, spark
> and mrlegacy jars. No third party jars. (line 212 in sparkbindings
> package). The test that checks that is in SparkBindingsSuite.scala. (yes
> you are correct, the one you mentioned.)
>
>
>
>
>
>
>>
>> Trying out the
>>   test("context jars") {
>>   }
>>
>> findMahoutContextJars(closeables) gets the .jars, and seems to explicitly
>> filter out the job.jars. The job.jars include needed dependencies so for a
>> clustered environment shouldn’t these be the only ones used?
>>
>>
>> On Oct 20, 2014, at 10:39 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>>
>> either way i don't believe there's something specific to 1.0.1, 1.0.2 or
>> 1.1.0 that is causing/not causing classpath errors. it's just jars are
>> picked by explicitly hardcoded artifact "opt-in" policy, not the other way
>> around.
>>
>> It is not enough just to modify pom in order for something to appear in
>> task classpath.
>>
>> On Mon, Oct 20, 2014 at 9:35 AM, Dmitriy Lyubimov <dlie...@gmail.com>
>> wrote:
>>
>> > Note that classpaths for "cluster" environment is tested trivially by
>> > starting 1-2 workers and standalone spark manager processes locally. No
>> > need to build anything "real". Workers would not know anything about
>> mahout
>> > so unless proper jars are exposed in context, they would have no way of
>> > "faking" the access to classes.
>> >
>> > On Mon, Oct 20, 2014 at 9:28 AM, Pat Ferrel <p...@occamsmachete.com>
>> wrote:
>> >
>> >> Yes, asap.
>> >>
>> >> To test this right it has to run on a cluster so I’m upgrading. When
>> >> ready it will just be a “mvn clean install" if you already have Spark
>> 1.1.0
>> >> running.
>> >>
>> >> I would have only expected errors on the CLI drivers so if anyone else
>> >> sees runtime errors please let us know. Some errors are very hard to
>> unit
>> >> test since the environment is different for local(unit tests) and
>> cluster
>> >> execution.
>> >>
>> >>
>> >> On Oct 20, 2014, at 9:14 AM, Mahesh Balija <balijamahesh....@gmail.com
>> >
>> >> wrote:
>> >>
>> >> Hi Pat,
>> >>
>> >> Can you please give detailed steps to build Mahout against Spark 1.1.0.
>> >> I build against 1.1.0 but still had class not found errors, thats why I
>> >> reverted back to Spark 1.0.2 even though first few steps are successful
>> >> but still facing some issues in running Mahout spark-shell sample
>> commands
>> >> (drmData) throws some errors even on 1.0.2.
>> >>
>> >> Best,
>> >> Mahesh.B.
>> >>
>> >> On Mon, Oct 20, 2014 at 1:46 AM, peng <pc...@uowmail.edu.au> wrote:
>> >>
>> >>> From my experience 1.1.0 is quite stable, plus some performance
>> >>> improvements that totally worth the effort.
>> >>>
>> >>>
>> >>> On 10/19/2014 06:30 PM, Ted Dunning wrote:
>> >>>
>> >>>> On Sun, Oct 19, 2014 at 1:49 PM, Pat Ferrel <p...@occamsmachete.com>
>> >>>> wrote:
>> >>>>
>> >>>> Getting off the dubious Spark 1.0.1 version is turning out to be a
>> bit
>> >> of
>> >>>>> work. Does anyone object to upgrading our Spark dependency? I’m not
>> >> sure
>> >>>>> if
>> >>>>> Mahout built for Spark 1.1.0 will run on 1.0.1 so it may mean
>> >> upgrading
>> >>>>> your Spark cluster.
>> >>>>>
>> >>>>
>> >>>> It is going to have to happen sooner or later.
>> >>>>
>> >>>> Sooner may actually be less total pain.
>> >>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >
>>
>>
>

Reply via email to