It seems like seqdirectory expects the input to be on HDFS and not local? Running the below command will write an empty output directory on HDFS

MAHOUT_LOCAL=true $MAHOUT seqdirectory \
        -i mahout-work/reuters-out \
        -o mahout-work/reuters-out-seqdir \
        -c UTF-8 -chunk 5

If I put the input directory into HDFS then all will work as expected. Does seqdirectory expect its input to be on HDFS.. ie is this the expected behavior? If so, should the example be updated?

On 6/5/11 11:07 AM, Mark wrote:
Hi all. I'm trying to run the examples/bin/build-reuters.sh but I continue to run into the following exception.

INFO: Deleting mahout-work/reuters-kmeans-clusters
Jun 5, 2011 10:29:37 AM org.apache.hadoop.util.NativeCodeLoader <clinit>
WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Jun 5, 2011 10:29:37 AM org.apache.hadoop.io.compress.CodecPool getCompressor
INFO: Got brand-new compressor
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:108) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:101)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:58)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)

I am also confused reading the build-reuters.sh code itself. There seems to be some disjunction between what is expected to be local and what should be on HDFS. For example on the comments on 77-79 are:

# we know reuters-out-seqdir exists on a local disk at
# this point, if we're running in clustered mode,
# copy it up to hdfs

However upon inspection you'll notice that the reueters-out-seqdir is actually on HDFS. It seems like the seqdirectory will never write to local disk... even with the MAHOUT_LOCAL=true flag set.

Any ideas?

Thanks

Reply via email to