Re: Error Running Collaborative Filtering Job

Gayatri Rao Mon, 20 Dec 2010 19:13:20 -0800

Hi,

Thanks, I checked and you may be right. It looks like the RowSimilarityJob
did not run because of which there was no similarityMatrix directory.


I tried running only the RowSimilarityJob using the command below

hadoop jar core/target/mahout-core-0.4-job.jar
org.apache.mahout.math.hadoop.similarity.RowSimilarityJob
-Dmapred.input.dir=/data/temp/itemUserMatrix
-Dmapred.output.dir=/data/temp/similarityMatrix --numberOfColumns 6040
--similarityClassname SIMILARITY_COOCCURRENCE --maxSimilaritiesPerRow 100
--tempDir "/data/temp"

Then I am getting the error, It looks like the usage is not correct but I am
wondering what might be wrong.

10/12/21 08:36:23 ERROR common.AbstractJob: Unexpected /data/temp while
processing Job-Specific Options:
usage: <command> [Generic Options] [Job-Specific Options]
 Generic Options:
 -archives <paths>             comma separated archives to be unarchived
                               on the compute machines.
 -conf <configuration file>    specify an application configuration file
 -D <property=value>           use value for given property
 -files <paths>                comma separated files to be copied to the
                               map reduce cluster
 -fs <local|namenode:port>     specify a namenode
 -jt <local|jobtracker:port>   specify a job tracker
 -libjars <paths>              comma separated jar files to include in the
                               classpath.
Job-Specific Options:
  --input (-i) input                                    Path to job input
                                                        directory.
  --output (-o) output                                  The directory
pathname
                                                        for output.
  --numberOfColumns (-r) numberOfColumns                Number of columns in
                                                        the input matrix
  --similarityClassname (-s) similarityClassname        Name of distributed
                                                        similarity class to
                                                        instantiate,
                                                        alternatively use
one
                                                        of the predefined
                                                        similarities

([SIMILARITY_COOCCURRENC
                                                        E,

SIMILARITY_EUCLIDEAN_DIS
                                                        TANCE,

SIMILARITY_LOGLIKELIHOOD
                                                        ,

SIMILARITY_PEARSON_CORRE
                                                        LATION,

SIMILARITY_TANIMOTO_COEF
                                                        FICIENT,

SIMILARITY_UNCENTERED_CO
                                                        SINE,

SIMILARITY_UNCENTERED_ZE
                                                        RO_ASSUMING_COSINE])
  --maxSimilaritiesPerRow (-m) maxSimilaritiesPerRow    Number of maximum
                                                        similarities per row
                                                        (default: 100)
  --help (-h)                                           Print out help
  --tempDir tempDir                                     Intermediate output
                                                        directory
  --startPhase startPhase                               First phase to run
  --endPhase endPhase                                   Last phase to run

 Thanks,
Gayatri
On Mon, Dec 20, 2010 at 1:31 PM, Sebastian Schelter <[email protected]> wrote:

> Hi,
>
> can you post the exact parameters you used to call the job? And please have
> a look at your error logs again, I have the suspicion that something else
> already went wrong before the exception that you posted occured, could you
> check that too?
>
> --sebastian
>
>
> On 20.12.2010 08:50, Gayatri Rao wrote:
>
>> Hi,
>>
>> I have been trying to run the Hadoop Item Based Collaborative Filtering
>> Job
>> as described in
>> https://cwiki.apache.org/confluence/display/MAHOUT/TasteCommandLine
>> Few MR jobs run sucessfully
>>
>> ((RecommenderJob-ItemIDIndexMapper-ItemIDIndexReduce,RecommenderJob-ToItemPrefsMapper-ToUserVectorReduc,RecommenderJob-CountUsersMapper-CountUsersReducer,RecommenderJob-MaybePruneRowsMapper-ToItemVectorsR)
>>
>> After which the job dies with an exception
>>
>>
>> Exception in thread "main"
>> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
>> does
>> not exist: /data/temp/similarityMatrix
>>     at
>>
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
>>     at
>>
>> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
>>     at
>>
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
>>     at
>> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
>>     at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
>>     at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>>     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
>>     at
>>
>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:234)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>     at
>>
>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:328)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>     at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>
>> I find the following files in /data/temp
>>
>> [gaya...@e1aeu046110d mahout-distribution-0.4]$ hadoop dfs -ls /data/temp
>> Found 4 items
>> drwxr-xr-x   - gayatri supergroup          0 2010-12-17 16:53
>> /data/temp/countUsers
>> drwxr-xr-x   - gayatri supergroup          0 2010-12-17 16:51
>> /data/temp/itemIDIndex
>> drwxr-xr-x   - gayatri supergroup          0 2010-12-17 16:54
>> /data/temp/itemUserMatrix
>> drwxr-xr-x   - gayatri supergroup          0 2010-12-17 16:52
>> /data/temp/userVectors
>>
>> Is this a configuration issue? I am not able to understand what might be
>> the
>> error.
>>
>> Thanks
>> Gayatri
>>
>>
>

Re: Error Running Collaborative Filtering Job

Reply via email to