Hello,
For the company I work, I have developed a custom vectorization job that
runs on hadoop/hbase and creates vector files. The job is designed to be
run by MahoutDriver but I'm having difficulties adding it to the
classpath. What is the best way
I use a custom script to create all the path variables, but the output I
get is all the jobs that mahout finds in mahout-distribution :
The frustrating thing is that I managed to do it right once but now I
can't seem to find the right way to do it.
I believe the reason why things fails is that MAHOUT-JOB get's linked to
MAHOUT-JOB:
/usr/local/mahout-distribution-0.6/mahout-examples-0.6-job.jar and it
loads driver.classes.props from there first.
Wouldn't be more useful for people to be able to specify their own job?
This should be an easy fix and I can provide a patch.
This is the script I use:
#!/bin/bash
HADOOP_HOME=/usr/lib/hadoop
MAHOUT_HOME=/usr/local/mahout-distribution-0.6
MAHOUT_JAVA_HOME=/usr/lib/jvm/java-6-sun
MAHOUT_CONF_DIR=${debian.install.base}/${debian.install.dir}
# add the custom job jar to the classpath
# the job jar contains a custom driver.classes.props
CLASSPATH=$CLASSPATH:$MAHOUT_CONF_DIR/${project.build.finalName}.jar
PATH=$PATH:$MAHOUT_HOME/bin
export PATH
# add project dependencies to CLASSPATH
# WARNING: the same or different versions of the same libs will be added
by mahout
# this may cause classpath hell problems
for f in $MAHOUT_CONF_DIR/lib/*.jar; do
CLASSPATH=${CLASSPATH}:$f;
done
export HADOOP_HOME MAHOUT_HOME MAHOUT_JAVA_HOME MAHOUT_CONF_DIR CLASSPATH
mahout
Cheers,
--
Ioan Eugen Stan
http://ieugen.blogspot.com