On a related I note I believe I have found a bug in the cvb implementation and
wish to know how to go about getting it fixed. How do I go about doing this?
Sent from my iPad
On 31 Jan 2013, at 02:50, Andy Schlaikjer andrew.schlaik...@gmail.com wrote:
I assume you mean input *matrix* with
Yeah I just did, seems there was something very wrong with my setup or I did
something foolish during setup, anyway I removed it all (mahout and hadoop) and
started from scratch and it's working now. Sorry for the trouble.
On 30 Jan 2013, at 19:51, Robin Anil wrote:
Could you try it with 0.7
Hi Yutaka,
On Thu, Jan 31, 2013 at 3:03 AM, 万代豊 20525entrad...@gmail.com wrote:
Hi
Here is a question around how to evaluate the result of Mahout 0.7 CVB
(Collapsed Variational Bayes), which used to be LDA
(Latent Dirichlet Allocation) in Mahout version under 0.5.
I believe I have no
Hi Thilina,
The flag you missed on your vectordump commandline is the --sort
option, which sorts the results before taking the top k. Try that and send
us what that looks like? It should be much easier to interpret.
On Mon, Jan 7, 2013 at 7:19 AM, Thilina Gunarathne cset...@gmail.comwrote:
So the bug I found results in the document topic model being trained on a
random matrix as opposed to the final (term|topic probability) distributions.
Unless a bug fix has been released this happens in all cases. At least for me.
The result of which is a random (document|topic) model, with more
Here are few notes:
- TrainLogistic uses OnlineLogisticRegression which uses L1 regularization.
You don't say what you are using in R, but I would assume
glm(family=binomial) or equivalent.
Is this correct?
- I don't think that there is a log issue here.
- can you share the data off-list so
Hi Guys,
I'm rather new to the whole Mahout ecosystem, so please excuse if the questions
I have are rather dumb ;)
Our problem basically boils down to this: we want to match users with either
the content they interested in and/or the content they could contribute to. To
do this matching we
It's a good question. I think you can achieve a partial solution in Mahout.
Real-time suggests that you won't be able to make use of
Hadoop-based implementations, since they are by nature big batch
processes.
All of the implementations accept the same input -- user,item,value.
That's OK; you can
If I understand correctly, CosineDistanceMeasure has a range of [0,1].
So shouldn't Canopy Clustering return only one single cluster if 1 is
used for T1 and T2 as in the example below? All points are within range
1 from the random starting point and should therefore be removed from
the list