Re: JobConf and ClassPath

2013-04-11 Thread Cyril Bogus
Hi am trying to use mahout jar instead of compiling it with my code. On Tue, Apr 9, 2013 at 6:01 PM, Dominik Hübner wrote: > Try adding this to your pom file > > > > > org.apache.maven.plugins > maven-assembly-plugin > >

Re: cross recommender

2013-04-11 Thread Pat Ferrel
Getting this running with co-occurrence rather than using a similarity calc on user rows finally forced me to understand what is going on in the base recommender. And the answer implies further work. [B'B] is usually not calculated in the usual item based recommender. The matrix that comes out

Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Billy
I am very new to Mahout and currently just ready up to chapter 5 of 'MIA' but after reading about the various User centric and Item centric recommenders they all seem to still need a userId so still unsure if Mahout can help with a fairly common recommendation. My requirement is to produce 'n' ite

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Sean Owen
This sounds like just a most-similar-items problem. That's good news because that's simpler. The only question is how you want to compute item-item similarities. That could be based on user-item interactions. If you're on Hadoop, try the RowSimilarityJob (where you will need rows to be items, colum

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Pat Ferrel
Or you may want to look at recording purchases by user ID. Then use the standard recommender to train on (userID, itemsID, boolean). Then query the trained recommender thus: recommender.mostSimilarItems(long itemID, int howMany) This does what you want but uses more data than just what items wer

Re: cross recommender

2013-04-11 Thread Sebastian Schelter
> Do I have to create a SimilarityJob( matrixB, matrixA, similarityType ) to get this or have I missed something already in Mahout? It could be worth to investigate whether MatrixMultiplicationJob could be extended to compute similarities instead of dot products. Best, Sebastian

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Sebastian Schelter
Use ItemSimilarityJob instead of RowSimilarityJob, its the easy-to-use wrapper around that :) On 11.04.2013 19:28, Sean Owen wrote: > This sounds like just a most-similar-items problem. That's good news > because that's simpler. The only question is how you want to compute > item-item similarities

KMeans

2013-04-11 Thread Cyril Bogus
Hi everyone, Running Hadoop 1.0.4 with Mahout 0.7 I am currently trying to run a kmeans job on some data that I stored in hdfs. I already ran a canopy clustering to get initial clusters and it runs fine. Now I am trying to do the kmeans and get the errors bellow. My vectors are NamedVector(Dens

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Sean Owen
You can try treating your orders as the 'users'. Then just compute item-item similarities per usual. On Thu, Apr 11, 2013 at 7:59 PM, Billy wrote: > Thanks for replying, > > > I don't have users, well I do :-) but in this case it should not influence > the recommendations > > , > these need to be

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Ted Dunning
Actually, making this user based is a really good thing because you get recommendations from one session to the next. These may be much more valuable for cross-sell than things in the same order. On Thu, Apr 11, 2013 at 12:50 PM, Sean Owen wrote: > You can try treating your orders as the 'user

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Billy
As in the example data 'intro.csv' in the MIA it has users 1-5 so if I ask for recommendations for user 1 then this works but if I ask for recommendations for user 6 (a new user yet to be added to the data model) then I get no recommendations ... so if I substitute users for orders then again I wil

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Sean Owen
You can actually create a "user" #6 for your new order. Or you can use the "anonymous user" function of the library, although it's hacky. We may be mixing up terms here. "DataModel" is a class that has nothing to do with Hadoop. Hadoop in turn has no part in real-time anything, like recommending t

Re: log-likelihood ratio value in item similarity calculation

2013-04-11 Thread Ted Dunning
These numbers don't match what I get. I get LLR = 117. This is wildly anomalous so this pair should definitely be connected. Both items are quite rare (15/300,000 or 20/300,000 rates) but they occur together most of the time that they appear. On Wed, Apr 10, 2013 at 2:15 AM, Phoenix Bai wrot

Re: log-likelihood ratio value in item similarity calculation

2013-04-11 Thread Ted Dunning
Counts are critical here. Suppose that two rare events occur together the first time you ever see them. How exciting is this? Not very in my mind, but not necessarily trivial. Now suppose that they occur together 20 times and never occur alone after you have collected 20 times more data. This i

Re: log-likelihood ratio value in item similarity calculation

2013-04-11 Thread Sean Owen
Yes I also get (er, Mahout gets) 117 (116.69), FWIW. I think the second question concerned counts vs relative frequencies -- normalized, or not. Like whether you divide all the counts by their sum or not. For a fixed set of observations that does change the LLR because it is unnormalized, not beca

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Pat Ferrel
Do you not have a user ID? No matter (though if you do I'd use it) you can use the item ID as a surrogate for a user ID in the recommender. And there will be no filtering if you ask for recommender.mostSimilarItems(long itemID, int howMany), which has no user ID in the call and so will not filte

trainclassifier -type cbayes dumps text

2013-04-11 Thread Ryan Compton
I'm trying to train a simple text classifier using cbayes. I've got formatted sequence files created with com.twitter.elephantbird.pig.store.SequenceFileStorage(), eg: JOY actually turning decent new year ☺ JOY best New Years tonight! ready 2013. JOY playing Dream League Soccer i

Re: trainclassifier -type cbayes dumps text

2013-04-11 Thread Ryan Compton
Also, right before the screen dump I see: 13/04/11 15:46:40 INFO mapred.JobClient: Combine output records=462236 13/04/11 15:46:40 INFO mapred.JobClient: Physical memory (bytes) snapshot=1618497536 13/04/11 15:46:40 INFO mapred.JobClient: Reduce output records=419058 13/04/11 15:46:40

Re: trainclassifier -type cbayes dumps text

2013-04-11 Thread Ryan Compton
Ok I think I got it. The problem was that I wasn't naming the files properly. If I'm not mistaken I'll need to organize my training data like: -bash-3.2$ hadoop dfs -lsr /user/rfcompton/emotion-training-labeled/ -rw-r--r-- 3 rfcompton hadoop2896850 2013-04-11 16:23 /user/rfcompton/emotion-tr

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Sebastian Schelter
You can also use the new MultithreadedBatchItemSimilarities class to efficiently precompute item similarities on a single machine without having to go to MapReduce. On 12.04.2013 00:54, Pat Ferrel wrote: > Do you not have a user ID? No matter (though if you do I'd use it) you can > use the item I