Re: Run ItemSimilarityJob Problem

Pat Ferrel Thu, 16 Apr 2015 06:32:58 -0700

As I said below “mahout itemsimilarity …”

“mahout” will show a list of commands
“mahout itemsimilarity” will show the command help


You are using HDFS and I suspect /home/hadoop/itembased/user_item is not a 
valid HDFS path? If so put the data in HDFS and use that path. Usually no need 
to specify the tmp dir.

On Apr 14, 2015, at 9:05 PM, lastarsenal <lastarse...@163.com> wrote:

Hi, Pat,


    I have tried to give a minimum arguments form ItemSimilarityJob as below:


hadoop jar mahout-core-0.9-job.jar 
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i 
/home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s 
SIMILARITY_EUCLIDEAN_DISTANCE


the argument parser error dismissed but another eorror came out:
Exception in thread "main" java.io.IOException: resolve path must start with /, 
temp/prepareRatingMatrix/numUsers.bin
       at org.apache.hadoop.fs.viewfs.MountTree.resolve(MountTree.java:272)
       at org.apache.hadoop.fs.viewfs.ViewFs.open(ViewFs.java:139)
       at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:394)
       at org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:339)
       at 
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.run(ItemSimilarityJob.java:147)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
       at 
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.main(ItemSimilarityJob.java:93)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
       at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:601)
       at org.apache.hadoop.util.RunJar.main(RunJar.java:166)


Then I tried to add --tempDir args:
hadoop jar mahout-core-0.9-job.jar 
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i 
/home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s 
SIMILARITY_EUCLIDEAN_DISTANCE --tempDir=/tmp
The argument parser error was back:
ERROR common.AbstractJob: Unexpected --tempDir=/tmp while processing 
Job-Specific Options:
Unexpected --tempDir=/tmp while processing Job-Specific Options:                
Usage:                                                                          
[--input <input> --output <output> --similarityClassname <similarityClassname> 
--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>         
--minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold     
<threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> --startPhase   
<startPhase> --endPhase <endPhase>]                                      


So...   Oh, you give advice to use command line: mahout xxx， however, there is 
no mahout command, how can I solve it? 


Thanks a lot! 

在 2015-04-15 03:13:23，"Pat Ferrel" <p...@occamsmachete.com> 写道：

> Also you don’t need to specify -mp 0 that is always allowed, you are 
> specifying minimum if there are any and so -mp 0 is not valid, omit it.
> 
> On Apr 14, 2015, at 11:59 AM, Pat Ferrel <p...@occamsmachete.com> wrote:
> 
> use 
> 
> “mahout itemsimilarity …”
> 
> But be aware that you have to convert all your user and item ids into 
> non-negative ints. Basically inside Mahout-MapReduce they are assumed to be 
> row and column numbers in a big matrix of all input. 
> 
> BTW no need to move data, Mahout-Spark reads anything Mahout-MapReduce can 
> read without the ID restrictions.
> 
> On Apr 12, 2015, at 8:04 PM, lastarsenal <lastarse...@163.com> wrote:
> 
> Hi, Pat,
> I think it would better to follow the existing system instead of making a 
> large scale data transfer. 
> 
> 
> So, I will be very appreciated if somebody can give the advice based on 
> hadoop, Thank you.
> 
> 
> 
> 
> 
> 在 2015-04-13 00:33:48，"Pat Ferrel" <p...@occamsmachete.com> 写道：
>> You are invoking it incorrectly but I’d suggest using the newer Spark 
>> version. It’s easier to use and about 10x faster.
>> 
>> You’ll need to install Spark alongside Mahout then invoke with:
>> 
>> mahout spark-itemsimilarity -i input -o output ….
>> 
>> The driver is documented here: 
>> http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
>> 
>> 
>> On Apr 11, 2015, at 12:34 AM, lastarsenal <lastarse...@163.com> wrote:
>> 
>> Hi,
>> 
>> I'm a rookie for mahout. Recently when I tried to run ItemSimilarityJob with 
>> my own hadoop, I met a problem. The command is:
>> 
>> hadoop jar mahout-core-0.9-job.jar 
>> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i 
>> /home/hadoop/itembased/user_item -o /home/hadoop/itembased/output -s 
>> SIMILARITY_EUCLIDEAN_DISTANCE -mp 0 -b true --startPhase 0 --endPhase 0
>> 
>> 
>> There are 1 errors:
>> 15/04/10 15:06:02 ERROR common.AbstractJob: Unexpected 0 while processing 
>> Job-Specific Options:
>> Unexpected 0 while processing Job-Specific Options:                          
>>    
>> Usage:                                                                       
>>    
>> [--input <input> --output <output> --similarityClassname 
>> <similarityClassname> 
>> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefs <maxPrefs>      
>>    
>> --minPrefsPerUser <minPrefsPerUser> --booleanData <booleanData> --threshold  
>>    
>> <threshold> --randomSeed <randomSeed> --help --tempDir <tempDir> 
>> --startPhase   
>> <startPhase> --endPhase <endPhase>]    
>> 
>> 
>> What's the resaon for this situation? Thank you!
>> 
>> 
>> Best Regards,
>> lastarsenal
>> 
> 
>

Re: Run ItemSimilarityJob Problem

Reply via email to