Hi Great . Thanks alot. How do I sort the result by score and select top 20 (say)?
On Monday, October 22, 2012, Gunther Hagleitner <ghagleit...@hortonworks.com> wrote: > This should work: > > matrix = load 'data1.txt' using PigStorage(',') as (row:chararray, > column:chararray, value:float); > vectors = load 'data2.txt' using PigStorage(',') as (user:chararray, > column:chararray); > > joined = join vectors by column, matrix by column; > groups = group joined by (user, row); > result = foreach groups generate group.user, group.row, (float) > SUM(joined.value); > > store result into 'result'; > > Thanks, > Gunther. > > On Sun, Oct 21, 2012 at 7:40 PM, jamal sasha <jamalsha...@gmail.com> wrote: > >> Hi, >> I am trying to do matrix multiplication using pig. >> >> Basically I have data in the form: >> data1.txt >> item1,item2,0.3 >> item1, item3, 0.4 >> item1, item5, 0.6 >> >> And then I another data in the form >> data2.txt >> user1,item1 >> user1,item2 >> user1,item5 >> ... >> user2,item2 >> etc >> >> Just to give some context.. I am trying to build a top n recommendation >> system.. which is as follows. >> Matrix formed by data2.txt >> item1 item2 item3 item4 item5 >> user1 1 1 0 0 1 >> >> >> Matrix formed by data1.txt >> >> item1 item2 item 3 item4 item5 >> item1 1 0.3 0.4 0 0.6 >> item2 1 >> item3 1 >> item4 1 >> item5 1 >> >> >> So recommendations for user1 would be whether user1 is the score >> computation as followed >> Score for user 1 for item 1 = (ignore item1, item1 score) u12* item_12 + >> u13*item_13 + u14*item14 + u15*item15 >> >> = >> 1 *0.3 + 0*0.4 + 0*0 + 1 * 0.6 = 0.9 >> >> And then i find this score for user1 and item2 >> >> And then for user 2 .. item 1 and so on. >> >> I understand this is more of an implementation challenge.. and not sure >> whether this is the right place to ask this.. But any suggestions will be >> greatly appreciated. >> Thanks >> Jamal >> >