Re: Error Running Collaborative Filtering Job

Sebastian Schelter Mon, 20 Dec 2010 23:52:04 -0800

Hmm, your parameters look correct. You could try to remove the quotationmarks around /data/temp and check for whitespace characters.


--sebastian


On 21.12.2010 04:12, Gayatri Rao wrote:

Hi,

Thanks, I checked and you may be right. It looks like the RowSimilarityJob
did not run because of which there was no similarityMatrix directory.

I tried running only the RowSimilarityJob using the command below

hadoop jar core/target/mahout-core-0.4-job.jar
org.apache.mahout.math.hadoop.similarity.RowSimilarityJob
-Dmapred.input.dir=/data/temp/itemUserMatrix
-Dmapred.output.dir=/data/temp/similarityMatrix --numberOfColumns 6040
--similarityClassname SIMILARITY_COOCCURRENCE --maxSimilaritiesPerRow 100
--tempDir "/data/temp"

Then I am getting the error, It looks like the usage is not correct but I am
wondering what might be wrong.

10/12/21 08:36:23 ERROR common.AbstractJob: Unexpected /data/temp while
processing Job-Specific Options:
usage:<command>  [Generic Options] [Job-Specific Options]
  Generic Options:
  -archives<paths>              comma separated archives to be unarchived
                                on the compute machines.
  -conf<configuration file>     specify an application configuration file
  -D<property=value>            use value for given property
  -files<paths>                 comma separated files to be copied to the
                                map reduce cluster
  -fs<local|namenode:port>      specify a namenode
  -jt<local|jobtracker:port>    specify a job tracker
  -libjars<paths>               comma separated jar files to include in the
                                classpath.
Job-Specific Options:
   --input (-i) input                                    Path to job input
                                                         directory.
   --output (-o) output                                  The directory
pathname
                                                         for output.
   --numberOfColumns (-r) numberOfColumns                Number of columns in
                                                         the input matrix
   --similarityClassname (-s) similarityClassname        Name of distributed
                                                         similarity class to
                                                         instantiate,
                                                         alternatively use
one
                                                         of the predefined
                                                         similarities

([SIMILARITY_COOCCURRENC
                                                         E,

SIMILARITY_EUCLIDEAN_DIS
                                                         TANCE,

SIMILARITY_LOGLIKELIHOOD
                                                         ,

SIMILARITY_PEARSON_CORRE
                                                         LATION,

SIMILARITY_TANIMOTO_COEF
                                                         FICIENT,

SIMILARITY_UNCENTERED_CO
                                                         SINE,

SIMILARITY_UNCENTERED_ZE
                                                         RO_ASSUMING_COSINE])
   --maxSimilaritiesPerRow (-m) maxSimilaritiesPerRow    Number of maximum
                                                         similarities per row
                                                         (default: 100)
   --help (-h)                                           Print out help
   --tempDir tempDir                                     Intermediate output
                                                         directory
   --startPhase startPhase                               First phase to run
   --endPhase endPhase                                   Last phase to run

  Thanks,
Gayatri
On Mon, Dec 20, 2010 at 1:31 PM, Sebastian Schelter<[email protected]>  wrote:

Hi,

can you post the exact parameters you used to call the job? And please have
a look at your error logs again, I have the suspicion that something else
already went wrong before the exception that you posted occured, could you
check that too?

--sebastian


On 20.12.2010 08:50, Gayatri Rao wrote:

Hi,

I have been trying to run the Hadoop Item Based Collaborative Filtering
Job
as described in
https://cwiki.apache.org/confluence/display/MAHOUT/TasteCommandLine
Few MR jobs run sucessfully

((RecommenderJob-ItemIDIndexMapper-ItemIDIndexReduce,RecommenderJob-ToItemPrefsMapper-ToUserVectorReduc,RecommenderJob-CountUsersMapper-CountUsersReducer,RecommenderJob-MaybePruneRowsMapper-ToItemVectorsR)

After which the job dies with an exception


Exception in thread "main"
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does
not exist: /data/temp/similarityMatrix
     at

org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
     at

org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
     at

org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
     at
org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
     at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
     at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
     at

org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:234)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
     at

org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:328)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
     at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

I find the following files in /data/temp

[gaya...@e1aeu046110d mahout-distribution-0.4]$ hadoop dfs -ls /data/temp
Found 4 items
drwxr-xr-x   - gayatri supergroup          0 2010-12-17 16:53
/data/temp/countUsers
drwxr-xr-x   - gayatri supergroup          0 2010-12-17 16:51
/data/temp/itemIDIndex
drwxr-xr-x   - gayatri supergroup          0 2010-12-17 16:54
/data/temp/itemUserMatrix
drwxr-xr-x   - gayatri supergroup          0 2010-12-17 16:52
/data/temp/userVectors

Is this a configuration issue? I am not able to understand what might be
the
error.

Thanks
Gayatri

Re: Error Running Collaborative Filtering Job

Reply via email to