Which version of Mahout?

On Mon, Jul 21, 2014 at 11:05 AM, Serega Sheypak <serega.shey...@gmail.com>
wrote:

> Hi, I've tried: Unexpected --outputPathForSimilarityMatrix while processing
> Job-Specific
>
> sudo -u hdfs hadoop fs -rm -r hdfs://nameservice1/recommenditembased/output
> sudo -u hdfs hadoop fs -rm -r hdfs://nameservice1/recommenditembased/temp
> sudo -u oozie mahout recommenditembased \
>                     --input \
>
>
> hdfs://nameservice1/user/hive/warehouse/staging_weighted_visits_and_rec_clicks
> \
>                     --output \
>                     hdfs://nameservice1/recommenditembased/output \
>                     --similarityClassname \
>                     SIMILARITY_LOGLIKELIHOOD \
>                    --numRecommendations \
>                     500 \
>                     --booleanData \
>                     false \
>                     --maxPrefsPerUser \
>                     1000 \
>                     --maxSimilaritiesPerItem \
>                     1000 \
>                     --minPrefsPerUser \
>                     5 \
>                     --maxPrefsPerUserInItemSimilarity \
>                     30 \
>                     --threshold \
>                    1.1 \
>                     --tempDir \
>                     hdfs://nameservice1/recommenditembased/temp \
>                     --outputPathForSimilarityMatrix \
>                     hdfs://nameservice1/recommenditembased/sim_matrix
>
>
> I'm on Cloudera cdh 4.7, looks like this feature is not supported.
>
>
> 2014-07-21 11:18 GMT+04:00 Peng Zhang <pzhang.x...@gmail.com>:
>
> > Serega,
> >
> > See the last line on how to pass outputPathForSimilarityMatrix options to
> > the recommenditembased command:
> >
> > sudo -u oozie mahout recommenditembased \
> >                    --input visited_items_with_inverted_items \
> >
> >                    --output result \
> >                    --similarityClassname SIMILARITY_LOGLIKELIHOOD \
> >                    --usersFile inverted_items \
> >                    --numRecommendations 500 \
> >                    --booleanData false \
> >                    --maxPrefsPerUser 100 \
> >                    --maxSimilaritiesPerItem 500 \
> >                    --minPrefsPerUser 0\
> >                    --maxPrefsPerUserInItemSimilarity 30 \
> >                    --threshold 0.91 \
> >                    --tempDir  temp \
> >                    --outputPathForSimilarityMatrix similarityMatri \
> >
> >
> > Peng Zhang
> > pzhang.x...@gmail.com
> >
> >
> >
> >
> >
> > On Jul 21, 2014, at 3:09 PM, Serega Sheypak <serega.shey...@gmail.com>
> > wrote:
> >
> > > I've inspected the code, our approach wouldn't work with
> > booleanData=false.
> > > We do calcualte imte similarity in the wrong way...(((
> > > Thank you
> > > 1. We provide "fake" user_id and provide --usersFile in order to get
> > > recommendations for "fake user_id, where user_id is a negative item_id.
> > It
> > > worked when we did provide user_id->item_id pairs without preference.
> > > 2. Our target is to get item similarities. We tried
> > > org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob but
> > it
> > > returns bad result comparing to RecommenderJob with our "fake" user_id
> > > (inverted item_id)
> > >
> > > 1. I'll try the option you provided.
> > > 2. I will remove input with fake user_id and usersFile with these fake
> > ids
> > >
> > > 3.
> > >
> >
> https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java
> > > I don't understand how to pass ---outputPathForSimilarityMatrix option
> to
> > > RecommenderJob
> > >
> > >
> > > 2014-07-21 4:58 GMT+04:00 Peng Zhang <pzhang.x...@gmail.com>:
> > >
> > >> Seraga,
> > >>
> > >> I have two comments:
> > >> 1. Don’t use negative user ids. Since Mahout uses user id as well as
> > item
> > >> id as the row/column index, you’d better use 0, 1, 2, etc as ids
> > >> 2. If you want to get the item similarity information, you can use
> > >> --outputPathForSimilarityMatrix in the command
> > >>
> > >> Regards,
> > >> Peng Zhang
> > >> M: +86 186-1658-7856
> > >> pzhang.x...@gmail.com
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Jul 21, 2014, at 4:00 AM, Serega Sheypak <serega.shey...@gmail.com
> >
> > >> wrote:
> > >>
> > >>> All bad things happen here:
> > >>>
> > >>>
> > >>>
> > >>> Name
> > >>>
> > >>> RecommenderJob-PartialMultiplyMapper-Reducer
> > >>>
> > >>> User
> > >>>
> > >>> oozie
> > >>>
> > >>> Process User
> > >>>
> > >>> oozie
> > >>>
> > >>> Group
> > >>>
> > >>> oozie
> > >>>
> > >>> Mapper Class
> > >>>
> > >>> PartialMultiplyMapper
> > >>>
> > >>> Reducer Class
> > >>>
> > >>> AggregateAndRecommendReducer
> > >>>
> > >>>
> > >>> Job Input Directory
> > >>>
> > >>> hdfs://nameservice1/itemrec/temp/partialMultiply
> > >>>
> > >>> Job Output Directory
> > >>>
> > >>> hdfs://nameservice1/itemrec/output/
> > >>>
> > >>> 14/07/20 23:57:47 INFO mapred.JobClient:     Map input
> records=3312879
> > >>>
> > >>> 14/07/20 23:57:47 INFO mapred.JobClient:     Map output
> records=3313251
> > >>>
> > >>>
> > >>> 14/07/20 23:57:47 INFO mapred.JobClient:     Reduce input
> > records=3313251
> > >>>
> > >>> 14/07/20 23:57:47 INFO mapred.JobClient:     Reduce output records=0
> > >>>
> > >>> Why does mahout returns 0 rows? it works when booleanData=true
> > >> (preferences
> > >>> are ignored...?)
> > >>>
> > >>>
> > >>>
> > >>> 2014-07-20 23:19 GMT+04:00 Serega Sheypak <serega.shey...@gmail.com
> >:
> > >>>
> > >>>> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40
> > >>>> users_file:
> > >>>> --inverted_item_id
> > >>>> -1
> > >>>> -2
> > >>>> -3
> > >>>> -4
> > >>>>
> > >>>> users_items_prefs
> > >>>> --inverted item_id
> > >>>> -1 1 1.0
> > >>>> -2 2 1.0
> > >>>> -3 3 1.0
> > >>>> -4 4 1.0
> > >>>> --user_id item_id pref_value
> > >>>> 11   1 1.6
> > >>>> 11   2 1.6
> > >>>> 123 3 2.0
> > >>>> 123 4 2.0
> > >>>> 333 1 2.0
> > >>>> 333 2 1.6
> > >>>> --e.t.c.
> > >>>>
> > >>>> if I set --booleanData true
> > >>>> then mahout returns the result.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> 2014-07-20 23:12 GMT+04:00 Andrew Musselman <
> > andrew.mussel...@gmail.com
> > >>> :
> > >>>>
> > >>>> I'm confused about how you're constructing the user file, and why
> > there
> > >>>>> are negated item ids here.
> > >>>>>
> > >>>>> Can you post some more details please, including Mahout version and
> > >> some
> > >>>>> sample data sets?
> > >>>>>
> > >>>>>> On Jul 20, 2014, at 11:57 AM, Serega Sheypak <
> > >> serega.shey...@gmail.com>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>> Hi, I'm trying to create item similarity.
> > >>>>>> I gather items which users visit during shopping and then create a
> > >> file:
> > >>>>>> user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9],
> > >> depends
> > >>>>> on
> > >>>>>> user action type and data source)
> > >>>>>> UNION
> > >>>>>> -item_id, item_id, 1.0 (from items dictionary)
> > >>>>>>
> > >>>>>> and I do provide a userFile, where user_id = -item_id
> > >>>>>>
> > >>>>>> The idea is to get item similary. If any user visits item named
> > "A", i
> > >>>>> want
> > >>>>>> to show him items "B", "c", "xxx" using preferences of other
> users.
> > >>>>>>
> > >>>>>> The problem is that the last (???) mapreduce job returns 0 rows:
> > >>>>>>
> > >>>>>> Here are my settings:
> > >>>>>>
> > >>>>>>
> > >>>>>> sudo -u oozie mahout recommenditembased \
> > >>>>>>                  --input visited_items_with_inverted_items \
> > >>>>>>
> > >>>>>>                  --output result \
> > >>>>>>                  --similarityClassname SIMILARITY_LOGLIKELIHOOD \
> > >>>>>>                  --usersFile inverted_items \
> > >>>>>>                  --numRecommendations 500 \
> > >>>>>>                  --booleanData false \
> > >>>>>>                  --maxPrefsPerUser 100 \
> > >>>>>>                  --maxSimilaritiesPerItem 500 \
> > >>>>>>                  --minPrefsPerUser 0\
> > >>>>>>                  --maxPrefsPerUserInItemSimilarity 30 \
> > >>>>>>                  --threshold 0.91 \
> > >>>>>>                  --tempDir  temp \
> > >>>>>>
> > >>>>>> Some counters... I don't get what do they mean....
> > >>>>>>
> > >>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:
> > >>>>>>
> org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
> > >>>>>>
> > >>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:     USERS=7528530
> > >>>>>>
> > >>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> > >>>>>>
> > >>>>>
> > >>
> >
> org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
> > >>>>>>
> > >>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> > >>>>>>  USER_RATINGS_NEGLECTED=1,798,738
> > >>>>>>
> > >>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> > >>>>> USER_RATINGS_USED=12,429,693
> > >>>>>>
> > >>>>>>
> > >>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:
> > >>>>>>
> > >>>>>
> > >>
> >
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> > >>>>>>
> > >>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:     ROWS=3312879
> > >>>>>>
> > >>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
> > >>>>>>
> > >>>>>
> > >>
> >
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> > >>>>>>
> > >>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
> COOCCURRENCES=35882374
> > >>>>>>
> > >>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
> PRUNED_COOCCURRENCES=0
> > >>>>>>
> > >>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Map input
> > records=3312879
> > >>>>>>
> > >>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Map output
> > >> records=17570268
> > >>>>>>
> > >>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce input
> > >>>>> records=5221907
> > >>>>>>
> > >>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce output
> > >>>>> records=3312879
> > >>>>>>
> > >>>>>>
> > >>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
> > >>>>> records=3312879
> > >>>>>>
> > >>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
> > >>>>> records=3312879
> > >>>>>>
> > >>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
> > >>>>> records=3312879
> > >>>>>>
> > >>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
> > >>>>> records=3312879
> > >>>>>>
> > >>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Map input
> > records=7528530
> > >>>>>>
> > >>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Map output
> > >> records=3313251
> > >>>>>>
> > >>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce input
> > >>>>> records=3313251
> > >>>>>>
> > >>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce output
> > >>>>> records=3313251
> > >>>>>>
> > >>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Map input
> > records=6626130
> > >>>>>>
> > >>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Map output
> > >> records=6626130
> > >>>>>>
> > >>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce input
> > >>>>> records=6626130
> > >>>>>>
> > >>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce output
> > >>>>> records=3312879
> > >>>>>>
> > >>>>>>
> > >>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Map input
> > records=3312879
> > >>>>>>
> > >>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Map output
> > >> records=3313251
> > >>>>>>
> > >>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce input
> > >>>>> records=3313251
> > >>>>>>
> > >>>>>> --------
> > >>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce output
> records=0
> > >>>>>> --------
> > >>>>>>
> > >>>>>> why 0???
> > >>>>>
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Reply via email to