Re: cvb/lda run time

2013-01-31 Thread Jack Pay
On a related I note I believe I have found a bug in the cvb implementation and wish to know how to go about getting it fixed. How do I go about doing this? Sent from my iPad On 31 Jan 2013, at 02:50, Andy Schlaikjer andrew.schlaik...@gmail.com wrote: I assume you mean input *matrix* with

Re: MiA NewsKMeansClustering Example Help

2013-01-31 Thread Chris Harrington
Yeah I just did, seems there was something very wrong with my setup or I did something foolish during setup, anyway I removed it all (mahout and hadoop) and started from scratch and it's working now. Sorry for the trouble. On 30 Jan 2013, at 19:51, Robin Anil wrote: Could you try it with 0.7

Re: What will be the LDAPrintTopics compatible/equivalent feature in Mahout-0.7?

2013-01-31 Thread Jake Mannix
Hi Yutaka, On Thu, Jan 31, 2013 at 3:03 AM, 万代豊 20525entrad...@gmail.com wrote: Hi Here is a question around how to evaluate the result of Mahout 0.7 CVB (Collapsed Variational Bayes), which used to be LDA (Latent Dirichlet Allocation) in Mahout version under 0.5. I believe I have no

Re: Interpreting the results of LDA CVB

2013-01-31 Thread Jake Mannix
Hi Thilina, The flag you missed on your vectordump commandline is the --sort option, which sorts the results before taking the top k. Try that and send us what that looks like? It should be much easier to interpret. On Mon, Jan 7, 2013 at 7:19 AM, Thilina Gunarathne cset...@gmail.comwrote:

Re: Interpreting the results of LDA CVB

2013-01-31 Thread Jack Pay
So the bug I found results in the document topic model being trained on a random matrix as opposed to the final (term|topic probability) distributions. Unless a bug fix has been released this happens in all cases. At least for me. The result of which is a random (document|topic) model, with more

Re: Logistic Regression in Mahout

2013-01-31 Thread Ted Dunning
Here are few notes: - TrainLogistic uses OnlineLogisticRegression which uses L1 regularization. You don't say what you are using in R, but I would assume glm(family=binomial) or equivalent. Is this correct? - I don't think that there is a log issue here. - can you share the data off-list so

(near) real time recommender/predictor

2013-01-31 Thread Frederik Kraus
Hi Guys, I'm rather new to the whole Mahout ecosystem, so please excuse if the questions I have are rather dumb ;) Our problem basically boils down to this: we want to match users with either the content they interested in and/or the content they could contribute to. To do this matching we

Re: (near) real time recommender/predictor

2013-01-31 Thread Sean Owen
It's a good question. I think you can achieve a partial solution in Mahout. Real-time suggests that you won't be able to make use of Hadoop-based implementations, since they are by nature big batch processes. All of the implementations accept the same input -- user,item,value. That's OK; you can

Question regarding Canopy Clustering (sequential)

2013-01-31 Thread Stefan Kreuzer
If I understand correctly, CosineDistanceMeasure has a range of [0,1]. So shouldn't Canopy Clustering return only one single cluster if 1 is used for T1 and T2 as in the example below? All points are within range 1 from the random starting point and should therefore be removed from the list