I've made a little bit of progress here, but not much. Here's what I ran:
elastic-mapreduce -j <JOB> --jar s3n://news-vecs/mahout-core-0.4-SNAPSHOT.job
--main-class org.apache.mahout.clustering.kmeans.KMeansDriver --arg --input
--arg s3n://news-vecs/part-out.vec --arg --clusters --arg
s3n://news-vecs/kmeans/clusters/ --arg --k --arg 10 --arg --output --arg
s3n://news-vecs/out/ --arg --distanceMeasure --arg
org.apache.mahout.common.distance.CosineDistanceMeasure --arg
--convergenceDelta --arg 0.001 --arg --overwrite --arg --maxIter --arg 50 --arg
--clustering -v --debug
In the controller log, I see:
2010-09-11T23:49:16.958Z INFO Fetching jar file.
2010-09-11T23:49:20.723Z INFO Working dir /mnt/var/lib/hadoop/steps/1
2010-09-11T23:49:20.723Z INFO Executing /usr/lib/jvm/java-6-sun/bin/java -cp
/home/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-0.20-core.jar:/home/hadoop/hadoop-0.20-tools.jar:/home/hadoop/lib/*:/home/hadoop/lib/jetty-ext/*
-Xmx1000m -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/1
-Dhadoop.log.file=syslog -Dhadoop.home.dir=/home/hadoop -Dhadoop.id.str=hadoop
-Dhadoop.root.logger=INFO,DRFA -Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/1/tmp
-Djava.library.path=/home/hadoop/lib/native/Linux-i386-32
org.apache.hadoop.util.RunJar
/mnt/var/lib/hadoop/steps/1/mahout-core-0.4-SNAPSHOT.job
org.apache.mahout.clustering.kmeans.KMeansDriver --input
s3n://news-vecs/part-out.vec --clusters s3n://news-vecs/kmeans/clusters/ --k 10
--output s3n://news-vecs/out/ --distanceMeasure
org.apache.mahout.common.distance.CosineDistanceMeasure --convergenceDelta
0.001 --overwrite --maxIter 50 --clustering
2010-09-11T23:49:23.302Z INFO Execution ended with ret val 0
2010-09-11T23:49:25.415Z INFO Step created jobs:
2010-09-11T23:49:25.416Z INFO Step succeeded
But, then in stdout log I see:
<snip>
usage: <command> [Generic Options] [Job-Specific Options]
Generic Options:
-archives <paths> comma separated archives to be unarchived
on the compute machines.
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-files <paths> comma separated files to be copied to the
map reduce cluster
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-libjars <paths> comma separated jar files to include in the
classpath.
Job-Specific Options:
--input (-i) input Path to job input directory.
--output (-o) output The directory pathname for
output.
--distanceMeasure (-dm) distanceMeasure The classname of the
DistanceMeasure. Default is
SquaredEuclidean
--clusters (-c) clusters The input centroids, as Vectors.
Must be a SequenceFile of
Writable, Cluster/Canopy. If k
is also specified, then a random
set of vectors will be selected
and written out to this path
first
--numClusters (-k) k The k in k-Means. If specified,
then a random selection of k
Vectors will be chosen as the
Centroid and written to the
clusters input path.
--convergenceDelta (-cd) convergenceDelta The convergence delta value.
Default is 0.5
--maxIter (-x) maxIter The maximum number of
iterations.
--overwrite (-ow) If present, overwrite the output
directory before running job
--maxRed (-r) maxRed The number of reduce tasks.
Defaults to 2
--clustering (-cl) If present, run clustering after
the iterations have taken place
--method (-xm) method The execution method to use:
sequential or mapreduce. Default
is mapreduce
--help (-h) Print out help
--tempDir tempDir Intermediate output directory
--startPhase startPhase First phase to run
--endPhase endPhase Last phase to run
</snip>
Which, of course, shows that it isn't getting the arguments. Perhaps it's the
s3n:// paths? I'm going to try running from ssh.
-Grant
On Sep 2, 2010, at 1:04 PM, Drew Farris wrote:
> Were there specific issues you ran into? I suspect the documentation
> on the wiki is out of date.
>
> Drew
>
> On Sun, Aug 29, 2010 at 10:58 AM, Grant Ingersoll <[email protected]> wrote:
>> Has anyone successfully run any of the clustering algorithms on Amazon's
>> Elastic Map Reduce? If so, please share steps please.
>>
>> Thanks,
>> Grant
--------------------------
Grant Ingersoll
http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8