Hello,

For the company I work, I have developed a custom vectorization job that runs on hadoop/hbase and creates vector files. The job is designed to be run by MahoutDriver but I'm having difficulties adding it to the classpath. What is the best way

I use a custom script to create all the path variables, but the output I get is all the jobs that mahout finds in mahout-distribution :

The frustrating thing is that I managed to do it right once but now I can't seem to find the right way to do it.

I believe the reason why things fails is that MAHOUT-JOB get's linked to MAHOUT-JOB: /usr/local/mahout-distribution-0.6/mahout-examples-0.6-job.jar and it loads driver.classes.props from there first.

Wouldn't be more useful for people to be able to specify their own job? This should be an easy fix and I can provide a patch.

This is the script I use:

#!/bin/bash
HADOOP_HOME=/usr/lib/hadoop
MAHOUT_HOME=/usr/local/mahout-distribution-0.6
MAHOUT_JAVA_HOME=/usr/lib/jvm/java-6-sun
MAHOUT_CONF_DIR=${debian.install.base}/${debian.install.dir}

# add the custom job jar to the classpath
# the job jar contains a custom driver.classes.props
CLASSPATH=$CLASSPATH:$MAHOUT_CONF_DIR/${project.build.finalName}.jar


PATH=$PATH:$MAHOUT_HOME/bin
export PATH

# add project dependencies to CLASSPATH
# WARNING: the same or different versions of the same libs will be added by mahout
# this may cause classpath hell problems
for f in $MAHOUT_CONF_DIR/lib/*.jar; do
  CLASSPATH=${CLASSPATH}:$f;
done

export HADOOP_HOME MAHOUT_HOME MAHOUT_JAVA_HOME MAHOUT_CONF_DIR CLASSPATH
mahout


Cheers,
--
Ioan Eugen Stan
http://ieugen.blogspot.com

Reply via email to