Both those jobs require you create Mahout IDs for users and items. For most 
Hadoop based Mahout jobs, taking either text input or sequence files, the IDs 
must follow the rules mentioned below. There are a few exceptions but none you 
are using. The Wiki was rewritten for 0.9 and so the ID requirements may not be 
documented well. You can file a Jira so someone documents this.

BTW spark-itemsimilarity will take any IDs and can read any text-delimited file 
format, unfortunately it’s not quite ready yet.
 
On Jul 26, 2014, at 3:14 AM, Serega Sheypak <serega.shey...@gmail.com> wrote:

Hm... rather confusing... You are talking about input for:
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
or
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

My target is to get item-item similarity. ItemSimilarityJob right now
returns few similarities.

I'm readin this:
https://mahout.apache.org/users/recommender/intro-itembased-hadoop.html
and that:
https://mahout.apache.org/users/recommender/userbased-5-minutes.html

I don't see there something about " Your IDs must be in the range from 0 to
the number of rows" for both items and users. Where does this requirement
come from?


2014-07-25 23:57 GMT+04:00 Pat Ferrel <pat.fer...@gmail.com>:

> I think I did explain below. Your IDs must be in the range from 0 to the
> number of rows - 1 and the same for item IDs. This is done by taking your
> application specific IDs and mapping them to sequential non-negative
> Integers. You need to maintain a mapping to/from Mahout IDs somewhere in
> your own code.
> 
> For example imagine input of the form
> -92, abc, 1.0
> 75000x, jkl, 2.0
> 
> Your first user ID is -92, give it Mahout ID = 0. For your next user ID
> 75000x give it Mahout ID = 1
> Your first item ID is abc, give it Mahout ID = 0. For your next item ID
> jkl give it Mahout ID = 1
> keep doing this the first time you see a unique id from your input. A Map
> will do this for you.
> 
> And so on. Then the input to Mahout would be:
> 0,0,1.0
> 1,1,2.0
> 
> The output will have Mahout IDs too so you need to map recommendations for
> Mahout User ID 0 back to your User ID of -92, and the same for all item IDs.
> 
> 
> On Jul 25, 2014, at 11:55 AM, Serega Sheypak <serega.shey...@gmail.com>
> wrote:
> 
> I'm preparing data using apache hive: user_id:long, item_it:long,
> preference[1.0, 2.0]
> I don't understand "For most Mahout jobs you have to prepare you data to
> have Mahout IDs". What is "Mahout IDs"? I try to follow mahout site docs, I
> didn't find there something related to mahout ids.
> Please explain.
> 
> 
> 2014-07-25 22:39 GMT+04:00 Pat Ferrel <pat.fer...@gmail.com>:
> 
>> Sorry I haven’t read this thread carefully but it looks like you may be
>> using the wrong IDs.
>> 
>> For most Mahout jobs you have to prepare you data to have Mahout IDs. You
>> do this by looking at each datum and as you see a new unique application
>> specific user or item ID you give it a Mahout ID starting from 0. So
> Mahout
>> ID can be thought of as row and column numbers in a matrix. The Mahout
> IDs
>> for rows will be 0 thru # of rows-1 same for columns.
>> 
>> This always requires that you translate into Mahout IDs then after the
> job
>> is run translate back into your application IDs. You need a
> bi-directional
>> dictionary of some type. I use a HashBiMap from Guava.
>> 
>> Also I’d avoid the threshold for now. If you get that wrong it will mess
>> things up badly and is very hard to tune. It’s there for completeness
> but I
>> never use it.
>> 
>> 
>> On Jul 25, 2014, at 12:55 AM, Serega Sheypak <serega.shey...@gmail.com>
>> wrote:
>> 
>> Hi, nothing helps...
>> I do use mahout 0.9 compiled for CDH 4.7
>> I do provide only positive values
>> I do use itemsimilarityJob and do get 2000 similarities for 1400 unique
>> items
>> Input data is:
>> 16*10^6 preferences
>> 4*10^6 users
>> 0.6*10^ items
>> I do use perason correlation and preferece vlaues are: 1.0 and 2.0
>> 
>> 
>> 2014-07-22 9:32 GMT+04:00 Serega Sheypak <serega.shey...@gmail.com>:
>> 
>>> Ok, I have recompiled mahout 0.9 for CDH 4.7. I'll try this evening.
>>> Right now I don't see how can it help me. As far as I know the stuff I
>> try
>>> to use is pretty old and stable.
>>> looks like I do apply it in a wrong way.
>>> 
>>> There is an option for recommenditembased named "--threshold". I do
>>> provide data for recommenditembased with preference values in range
>>> [1.1..2.0].
>>> I set --threshold to 1.2
>>> --threshold is absolute and can be from [1.1 . .2+] or it's relative and
>>> can be [0.0 .. 0.99999]?
>>> 
>>> 
>>> 2014-07-22 3:54 GMT+04:00 Ted Dunning <ted.dunn...@gmail.com>:
>>> 
>>> That version is no longer supported.  You should upgrade to 0.9
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Mon, Jul 21, 2014 at 11:41 AM, Serega Sheypak <
>>>> serega.shey...@gmail.com>
>>>> wrote:
>>>> 
>>>>> 0.7-cdh4.7.0
>>>>> Anyway, recommenditembased does produce these catalogs:
>>>>> 
>>>>> /recommenditembased/temp/maxValues.bin
>>>>> /recommenditembased/temp/norms.bin
>>>>> /recommenditembased/temp/numNonZeroEntries.bin
>>>>> /recommenditembased/temp/pairwiseSimilarity
>>>>> /recommenditembased/temp/partialMultiply
>>>>> /recommenditembased/temp/prePartialMultiply1
>>>>> /recommenditembased/temp/prePartialMultiply2
>>>>> /recommenditembased/temp/preparePreferenceMatrix
>>>>> /recommenditembased/temp/similarityMatrix
>>>>> /recommenditembased/temp/weights
>>>>> 
>>>>> I suppose that "/recommenditembased/temp/similarityMatrix" is the
> thing
>>>> In
>>>>> eed. Right now I try to read it using
>>>>> 
>>>>> matrix = LOAD '/recommenditembased/temp/similarityMatrix' USING
>>>>> com.twitter.elephantbird.pig.load.SequenceFileLoader(
>>>>>  '-c com.twitter.elephantbird.pig.util.IntWritableConverter',
>>>>>  '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter'
>>>>> )  as (intId: int, vector:tuple(cardinality:int,
>>>>> entries:bag{t:tuple(some_id:long, some_value:double)}));
>>>>> 
>>>>> 
>>>>> Looks like the vector is empty... Or i do something wrong.
>>>>> 
>>>>> 
>>>>> 
>>>>> 2014-07-21 22:09 GMT+04:00 Ted Dunning <ted.dunn...@gmail.com>:
>>>>> 
>>>>>> Which version of Mahout?
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jul 21, 2014 at 11:05 AM, Serega Sheypak <
>>>>> serega.shey...@gmail.com
>>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi, I've tried: Unexpected --outputPathForSimilarityMatrix while
>>>>>> processing
>>>>>>> Job-Specific
>>>>>>> 
>>>>>>> sudo -u hdfs hadoop fs -rm -r
>>>>>> hdfs://nameservice1/recommenditembased/output
>>>>>>> sudo -u hdfs hadoop fs -rm -r
>>>>> hdfs://nameservice1/recommenditembased/temp
>>>>>>> sudo -u oozie mahout recommenditembased \
>>>>>>>                  --input \
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
> hdfs://nameservice1/user/hive/warehouse/staging_weighted_visits_and_rec_clicks
>>>>>>> \
>>>>>>>                  --output \
>>>>>>>                  hdfs://nameservice1/recommenditembased/output \
>>>>>>>                  --similarityClassname \
>>>>>>>                  SIMILARITY_LOGLIKELIHOOD \
>>>>>>>                 --numRecommendations \
>>>>>>>                  500 \
>>>>>>>                  --booleanData \
>>>>>>>                  false \
>>>>>>>                  --maxPrefsPerUser \
>>>>>>>                  1000 \
>>>>>>>                  --maxSimilaritiesPerItem \
>>>>>>>                  1000 \
>>>>>>>                  --minPrefsPerUser \
>>>>>>>                  5 \
>>>>>>>                  --maxPrefsPerUserInItemSimilarity \
>>>>>>>                  30 \
>>>>>>>                  --threshold \
>>>>>>>                 1.1 \
>>>>>>>                  --tempDir \
>>>>>>>                  hdfs://nameservice1/recommenditembased/temp \
>>>>>>>                  --outputPathForSimilarityMatrix \
>>>>>>> 
>>>> hdfs://nameservice1/recommenditembased/sim_matrix
>>>>>>> 
>>>>>>> 
>>>>>>> I'm on Cloudera cdh 4.7, looks like this feature is not supported.
>>>>>>> 
>>>>>>> 
>>>>>>> 2014-07-21 11:18 GMT+04:00 Peng Zhang <pzhang.x...@gmail.com>:
>>>>>>> 
>>>>>>>> Serega,
>>>>>>>> 
>>>>>>>> See the last line on how to pass outputPathForSimilarityMatrix
>>>>> options
>>>>>> to
>>>>>>>> the recommenditembased command:
>>>>>>>> 
>>>>>>>> sudo -u oozie mahout recommenditembased \
>>>>>>>>                 --input visited_items_with_inverted_items \
>>>>>>>> 
>>>>>>>>                 --output result \
>>>>>>>>                 --similarityClassname SIMILARITY_LOGLIKELIHOOD
>>>> \
>>>>>>>>                 --usersFile inverted_items \
>>>>>>>>                 --numRecommendations 500 \
>>>>>>>>                 --booleanData false \
>>>>>>>>                 --maxPrefsPerUser 100 \
>>>>>>>>                 --maxSimilaritiesPerItem 500 \
>>>>>>>>                 --minPrefsPerUser 0\
>>>>>>>>                 --maxPrefsPerUserInItemSimilarity 30 \
>>>>>>>>                 --threshold 0.91 \
>>>>>>>>                 --tempDir  temp \
>>>>>>>>                 --outputPathForSimilarityMatrix
>>>> similarityMatri \
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Peng Zhang
>>>>>>>> pzhang.x...@gmail.com
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Jul 21, 2014, at 3:09 PM, Serega Sheypak <
>>>>> serega.shey...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I've inspected the code, our approach wouldn't work with
>>>>>>>> booleanData=false.
>>>>>>>>> We do calcualte imte similarity in the wrong way...(((
>>>>>>>>> Thank you
>>>>>>>>> 1. We provide "fake" user_id and provide --usersFile in order to
>>>>> get
>>>>>>>>> recommendations for "fake user_id, where user_id is a negative
>>>>>> item_id.
>>>>>>>> It
>>>>>>>>> worked when we did provide user_id->item_id pairs without
>>>>> preference.
>>>>>>>>> 2. Our target is to get item similarities. We tried
>>>>>>>>> 
>>>> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
>>>>>> but
>>>>>>>> it
>>>>>>>>> returns bad result comparing to RecommenderJob with our "fake"
>>>>>> user_id
>>>>>>>>> (inverted item_id)
>>>>>>>>> 
>>>>>>>>> 1. I'll try the option you provided.
>>>>>>>>> 2. I will remove input with fake user_id and usersFile with
>>>> these
>>>>>> fake
>>>>>>>> ids
>>>>>>>>> 
>>>>>>>>> 3.
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
> https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java
>>>>>>>>> I don't understand how to pass ---outputPathForSimilarityMatrix
>>>>>> option
>>>>>>> to
>>>>>>>>> RecommenderJob
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2014-07-21 4:58 GMT+04:00 Peng Zhang <pzhang.x...@gmail.com>:
>>>>>>>>> 
>>>>>>>>>> Seraga,
>>>>>>>>>> 
>>>>>>>>>> I have two comments:
>>>>>>>>>> 1. Don’t use negative user ids. Since Mahout uses user id as
>>>> well
>>>>> as
>>>>>>>> item
>>>>>>>>>> id as the row/column index, you’d better use 0, 1, 2, etc as
>>>> ids
>>>>>>>>>> 2. If you want to get the item similarity information, you can
>>>> use
>>>>>>>>>> --outputPathForSimilarityMatrix in the command
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Peng Zhang
>>>>>>>>>> M: +86 186-1658-7856
>>>>>>>>>> pzhang.x...@gmail.com
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Jul 21, 2014, at 4:00 AM, Serega Sheypak <
>>>>>> serega.shey...@gmail.com
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> All bad things happen here:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Name
>>>>>>>>>>> 
>>>>>>>>>>> RecommenderJob-PartialMultiplyMapper-Reducer
>>>>>>>>>>> 
>>>>>>>>>>> User
>>>>>>>>>>> 
>>>>>>>>>>> oozie
>>>>>>>>>>> 
>>>>>>>>>>> Process User
>>>>>>>>>>> 
>>>>>>>>>>> oozie
>>>>>>>>>>> 
>>>>>>>>>>> Group
>>>>>>>>>>> 
>>>>>>>>>>> oozie
>>>>>>>>>>> 
>>>>>>>>>>> Mapper Class
>>>>>>>>>>> 
>>>>>>>>>>> PartialMultiplyMapper
>>>>>>>>>>> 
>>>>>>>>>>> Reducer Class
>>>>>>>>>>> 
>>>>>>>>>>> AggregateAndRecommendReducer
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Job Input Directory
>>>>>>>>>>> 
>>>>>>>>>>> hdfs://nameservice1/itemrec/temp/partialMultiply
>>>>>>>>>>> 
>>>>>>>>>>> Job Output Directory
>>>>>>>>>>> 
>>>>>>>>>>> hdfs://nameservice1/itemrec/output/
>>>>>>>>>>> 
>>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:     Map input
>>>>>>> records=3312879
>>>>>>>>>>> 
>>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:     Map output
>>>>>>> records=3313251
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:     Reduce input
>>>>>>>> records=3313251
>>>>>>>>>>> 
>>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:     Reduce output
>>>>>> records=0
>>>>>>>>>>> 
>>>>>>>>>>> Why does mahout returns 0 rows? it works when booleanData=true
>>>>>>>>>> (preferences
>>>>>>>>>>> are ignored...?)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2014-07-20 23:19 GMT+04:00 Serega Sheypak <
>>>>>> serega.shey...@gmail.com
>>>>>>>> :
>>>>>>>>>>> 
>>>>>>>>>>>> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40
>>>>>>>>>>>> users_file:
>>>>>>>>>>>> --inverted_item_id
>>>>>>>>>>>> -1
>>>>>>>>>>>> -2
>>>>>>>>>>>> -3
>>>>>>>>>>>> -4
>>>>>>>>>>>> 
>>>>>>>>>>>> users_items_prefs
>>>>>>>>>>>> --inverted item_id
>>>>>>>>>>>> -1 1 1.0
>>>>>>>>>>>> -2 2 1.0
>>>>>>>>>>>> -3 3 1.0
>>>>>>>>>>>> -4 4 1.0
>>>>>>>>>>>> --user_id item_id pref_value
>>>>>>>>>>>> 11   1 1.6
>>>>>>>>>>>> 11   2 1.6
>>>>>>>>>>>> 123 3 2.0
>>>>>>>>>>>> 123 4 2.0
>>>>>>>>>>>> 333 1 2.0
>>>>>>>>>>>> 333 2 1.6
>>>>>>>>>>>> --e.t.c.
>>>>>>>>>>>> 
>>>>>>>>>>>> if I set --booleanData true
>>>>>>>>>>>> then mahout returns the result.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 2014-07-20 23:12 GMT+04:00 Andrew Musselman <
>>>>>>>> andrew.mussel...@gmail.com
>>>>>>>>>>> :
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm confused about how you're constructing the user file, and
>>>>> why
>>>>>>>> there
>>>>>>>>>>>>> are negated item ids here.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Can you post some more details please, including Mahout
>>>> version
>>>>>> and
>>>>>>>>>> some
>>>>>>>>>>>>> sample data sets?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jul 20, 2014, at 11:57 AM, Serega Sheypak <
>>>>>>>>>> serega.shey...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi, I'm trying to create item similarity.
>>>>>>>>>>>>>> I gather items which users visit during shopping and then
>>>>>> create a
>>>>>>>>>> file:
>>>>>>>>>>>>>> user_id, item_id, weight (where weight can be: [1.0, 1.6,
>>>>> 1.9],
>>>>>>>>>> depends
>>>>>>>>>>>>> on
>>>>>>>>>>>>>> user action type and data source)
>>>>>>>>>>>>>> UNION
>>>>>>>>>>>>>> -item_id, item_id, 1.0 (from items dictionary)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> and I do provide a userFile, where user_id = -item_id
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The idea is to get item similary. If any user visits item
>>>>> named
>>>>>>>> "A", i
>>>>>>>>>>>>> want
>>>>>>>>>>>>>> to show him items "B", "c", "xxx" using preferences of
>>>> other
>>>>>>> users.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The problem is that the last (???) mapreduce job returns 0
>>>>> rows:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Here are my settings:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> sudo -u oozie mahout recommenditembased \
>>>>>>>>>>>>>>               --input visited_items_with_inverted_items
>>>> \
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>               --output result \
>>>>>>>>>>>>>>               --similarityClassname
>>>>> SIMILARITY_LOGLIKELIHOOD
>>>>>> \
>>>>>>>>>>>>>>               --usersFile inverted_items \
>>>>>>>>>>>>>>               --numRecommendations 500 \
>>>>>>>>>>>>>>               --booleanData false \
>>>>>>>>>>>>>>               --maxPrefsPerUser 100 \
>>>>>>>>>>>>>>               --maxSimilaritiesPerItem 500 \
>>>>>>>>>>>>>>               --minPrefsPerUser 0\
>>>>>>>>>>>>>>               --maxPrefsPerUserInItemSimilarity 30 \
>>>>>>>>>>>>>>               --threshold 0.91 \
>>>>>>>>>>>>>>               --tempDir  temp \
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Some counters... I don't get what do they mean....
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:
>>>>>>>>>>>>>> 
>>>>>>> org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:     USERS=7528530
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
> org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>>>>>>>>>>>>>> USER_RATINGS_NEGLECTED=1,798,738
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>>>>>>>>>>>>> USER_RATINGS_USED=12,429,693
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:     ROWS=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
>>>>>>> COOCCURRENCES=35882374
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
>>>>>>> PRUNED_COOCCURRENCES=0
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Map input
>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Map output
>>>>>>>>>> records=17570268
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>> records=5221907
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce output
>>>>>>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
>>>>>>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
>>>>>>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Map input
>>>>>>>> records=7528530
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Map output
>>>>>>>>>> records=3313251
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>> records=3313251
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce output
>>>>>>>>>>>>> records=3313251
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Map input
>>>>>>>> records=6626130
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Map output
>>>>>>>>>> records=6626130
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>> records=6626130
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce output
>>>>>>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Map input
>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Map output
>>>>>>>>>> records=3313251
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>> records=3313251
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --------
>>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce output
>>>>>>> records=0
>>>>>>>>>>>>>> --------
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> why 0???
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 

Reply via email to