Pe 17.02.2012 17:22, Ioan Eugen Stan a scris:
Hello,

For the company I work, I have developed a custom vectorization job that
runs on hadoop/hbase and creates vector files. The job is designed to be
run by MahoutDriver but I'm having difficulties adding it to the
classpath. What is the best way

I use a custom script to create all the path variables, but the output I
get is all the jobs that mahout finds in mahout-distribution :

The frustrating thing is that I managed to do it right once but now I
can't seem to find the right way to do it.

I believe the reason why things fails is that MAHOUT-JOB get's linked to
MAHOUT-JOB:
/usr/local/mahout-distribution-0.6/mahout-examples-0.6-job.jar and it
loads driver.classes.props from there first.

Wouldn't be more useful for people to be able to specify their own job?
This should be an easy fix and I can provide a patch.

This is the script I use:

#!/bin/bash
HADOOP_HOME=/usr/lib/hadoop
MAHOUT_HOME=/usr/local/mahout-distribution-0.6
MAHOUT_JAVA_HOME=/usr/lib/jvm/java-6-sun
MAHOUT_CONF_DIR=${debian.install.base}/${debian.install.dir}

# add the custom job jar to the classpath
# the job jar contains a custom driver.classes.props
CLASSPATH=$CLASSPATH:$MAHOUT_CONF_DIR/${project.build.finalName}.jar


PATH=$PATH:$MAHOUT_HOME/bin
export PATH

# add project dependencies to CLASSPATH
# WARNING: the same or different versions of the same libs will be added
by mahout
# this may cause classpath hell problems
for f in $MAHOUT_CONF_DIR/lib/*.jar; do
CLASSPATH=${CLASSPATH}:$f;
done

export HADOOP_HOME MAHOUT_HOME MAHOUT_JAVA_HOME MAHOUT_CONF_DIR CLASSPATH
mahout


Cheers,

The solution I came up to is to run the job directly using hadoop script and provide all the jar dependencies necessary inside the job jar, under lib directory, like this:

HADOOP_HOME=/usr/lib/hadoop
MAHOUT_HOME=/usr/local/mahout-distribution-0.6
export PATH=$PATH:$MAHOUT_HOME/bin

# use java 6 style of adding jars to classpath
export HADOOP_CLASSPATH=${MAHOUT_CONF_DIR}/lib/*
export MAHOUT_JOB=custom-mahout-job.jar
export HADOOP_HOME

# do what mahout script does
exec "$HADOOP_HOME/bin/hadoop" jar $MAHOUT_JOB "$@"

It's best if you make the application that submits the job extend AbstractJob or Tool interface. This will start *hadoop.util.RunJar class which will run your $MAHOUT_JOB jar.

Troubles that you may encounter:

Hadoop adds it's libs to the classpath. Make sure you use the same/compatible versions or you will get exceptions. In my case I was using slf4j-1.6 and hadoop was built using version 1.4. The jvm will complain that it can't find a method that's missing. Make sure you build your job with the same version of libs used by hadoop or they will fail when you run them on nodes.

Hope this helps,

--
Ioan Eugen Stan
http://ieugen.blogspot.com

Reply via email to