Re: log-likelihood ratio value in item similarity calculation

2013-04-12 Thread Phoenix Bai
don't match what I get. > > > > I get LLR = 117. > > > > This is wildly anomalous so this pair should definitely be connected. > Both > > items are quite rare (15/300,000 or 20/300,000 rates) but they occur > > together most of the time that they appear. >

Re: log-likelihood ratio value in item similarity calculation

2013-04-10 Thread Phoenix Bai
= row entropy + col entropy and > LLR = 0. > > > On Wed, Apr 10, 2013 at 10:15 AM, Phoenix Bai wrote: > > Hi, > > > > the counts for two events are: > > * **Event A**Everything but A**Event B**k11=7**k12=8**Everything but B** > > k21=13**k22=300,000*

log-likelihood ratio value in item similarity calculation

2013-04-10 Thread Phoenix Bai
Hi, the counts for two events are: * **Event A**Everything but A**Event B**k11=7**k12=8**Everything but B** k21=13**k22=300,000* according to the code, I will get: rowEntropy = entropy(7,8) + entropy(13, 300,000) = 222 colEntropy = entropy(7,13) + entropy(8, 300,000) = 152 matrixEntropy(entropy(7

Re: How to map UUID to userId in Preference class to use mahout recommender?

2013-04-07 Thread Phoenix Bai
. Instead, you can use a mapping to/from 64-bit > values. See IDMigrator for instance. > > On Mon, Apr 8, 2013 at 3:51 AM, Phoenix Bai wrote: > > Hi All, > > > > the input format required for mahout recommender is : > > > > *userId (long), itemId (long),

How to map UUID to userId in Preference class to use mahout recommender?

2013-04-07 Thread Phoenix Bai
Hi All, the input format required for mahout recommender is : *userId (long), itemId (long), rating (optional)* while, currently, my input format is: *userId (UUID, which is 128bit long), itemId (long), boolean* so, my question is, how could I convert userId in UUID format to long datatype? e.

How to map UUID to userId in Preference class to use mahout recommender?

2013-04-07 Thread Phoenix Bai
Hi All, the input format required for mahout recommender is : *userId (long), itemId (long), rating (optional)* while, currently, my input format is: *userId (UUID, which is 128bit long), itemId (long), boolean* so, my question is, how could I convert userId in UUID format to long datatype? e.

Re: Regarding ItemBased Recommendation Results

2013-04-01 Thread Phoenix Bai
Raju, like Sebastian said, it probably due to the default sampling restriction of hadoop-based implementation. maxPrefsPerUserInItemSimilarity", "max number of preferences to consider per user in the " + "item similarity computation phase, users with more preferences will be sampled d

Re: seq2sparse -a analyzerClass is throwing: ClassNotFoundException

2012-11-23 Thread Phoenix Bai
ChineseAnalyzer you'll have to add it as a > dependency either by modifying maven dependencies and rebuiling, or just by > injecting the ChineseAnalyzer class into the jar (using jar xf, jar cf, > etc.). > > > Jeremie > > 2012/11/21 Phoenix Bai > > > HI All, >

Re: Issue: Canopy is processing extremly slow, what goes wrong?

2012-11-14 Thread Phoenix Bai
a single canopy and you can go smaller until you get a reasonable > number. There are also T3 and T4 arguments that allow you to specify the T1 > and T2 values used by the reducer. > > > On 11/13/12 7:01 AM, Phoenix Bai wrote: > >> Hi All, >> >> 1) data size: >

Re: hadoop-0.19 and mahout 0.7: throwing incompatible errors, how can I fix it?

2012-09-21 Thread Phoenix Bai
> > I imagine the best use of your time and effort is to convince your admins > > that running a 3 year old version of hadoop is a bad idea. Things are > only > > going to get worse... > > Mat > > On Sep 13, 2012 7:15 PM, "Phoenix Bai" wrote: > > &g

hadoop-0.19 and mahout 0.7: throwing incompatible errors, how can I fix it?

2012-09-13 Thread Phoenix Bai
Hi guys, I am trying to compile my application code using mahout 0.7 and hadoop 0.19. during the compile process, it is throwing errors as below: $ hadoop jar cluster-0.0.1-SNAPSHOT-jar-with-dependencies.jar mahout.sample.ClusterVideos 12/09/13 20:36:18 INFO vectorizer.SparseVectorsFromSequenceFi

Re: Does clusterdump still support option "--seqFileDir"?

2012-09-12 Thread Phoenix Bai
in your current mahout version (0.7?) , you should use --input (-i) input instead of --seqDir. for the detailed usage, you should check out: $mahout clusterdump -h On Wed, Sep 5, 2012 at 3:26 PM, javaboom wrote: > I've tried to use "clusterdump". I followed this manual > https://cwiki.apache.o

Re: does seq2sparse or kmeans filter data ? I am losing data!

2012-08-29 Thread Phoenix Bai
a breakpoint in > ClusterClassificationDriver.**shouldClassify() > (you'd need to edit it a bit first) you could determine if this was > removing any of your input points. > > > > On 8/27/12 10:26 PM, Phoenix Bai wrote: > >> Hi Jeff, >> >> first of all, thank

Re: does seq2sparse or kmeans filter data ? I am losing data!

2012-08-27 Thread Phoenix Bai
so, then > using the directory instead might help: > > --pointsDir > /group/tbdev/zhimo.bmz/mahout/**output/videotags-kmeans-**clusters/clusteredPoints > \ > > > > > On 8/27/12 2:49 AM, Phoenix Bai wrote: > >> --pointsDir >> /group/tbdev/zhi

does seq2sparse or kmeans filter data ? I am losing data!

2012-08-26 Thread Phoenix Bai
Hi All, Good afternoon. I run the following three steps and got the clustered data I expected. My input data is 1124 object (it is in key:value format), However, from the output, I only received 491 objects. What happened to the 1124-491=633 objects? I checked out the options of seq2sparse, kmea

Re: java.lang.NoClassDefFoundError: org/apache/commons/cli2/Option

2012-08-26 Thread Phoenix Bai
Or instead of invoking mahout in format "$ hadoop jar mahout-core-0.5.jar ", you should try "$mahout ..". in $MAHOUT_HOME/bin, there lies the mahout script which will load all necessary jar files before run any classes. the jars that required by mahout are normally put in $MAHOUT_HOME/lib e.g.