Hi,
I am new to mahout.I have to do any example using mahout to start.Please
give me detailed steps to do an application using mahout.Pls help me.
Hi Ray,
welcome to the list. On twitter we talked about your evaluation of
Mahout's recommender code. I'd like to go into detail on this to either
clear up your doubts or learn from your input. Your evaluation listed a
bunch of pros and cons regarding Mahout, can you share them here to
start
Hi All,
I tried classifying my dataset using the BuildForest() and TestForest()
functions and it worked perfectly fine. But the final output is displayed in
terms of standard accuracy. Is there an easy way to also compute the AUC for
the Forest built?
Thank you,
--
Praneet Mhatre
What is a good name for what LDA and SVD do? "Basis concentration"?
"Basis isolation"?
On 4/26/11, Ted Dunning wrote:
> I think you are right.
>
> On Tue, Apr 26, 2011 at 2:32 PM, Jake Mannix wrote:
>
>>
>> Ted, I think what they are asking is for the output of the gamma matrix
>> (i.e.
>> the L
Hi Ted,
For the data, currently, we digg the logs for a specific cookie. For
example, we will check how many times has he seen the banner from the
advertiser in last 7 days. We didn't has 1000 non-zero value now, I thought
we will only have 100-200 now, but we expect to have 1000 at most I thought
I think you are right.
On Tue, Apr 26, 2011 at 2:32 PM, Jake Mannix wrote:
>
> Ted, I think what they are asking is for the output of the gamma matrix
> (i.e.
> the LDA version of the *left* singular vectors, living in
> document-by-topic-space,
> not topic-by-word space), which is currently not
On Tue, Apr 26, 2011 at 2:08 PM, Ted Dunning wrote:
>
> - LDA isn't really clustering. It is more along the lines of SVD as a
> dimensionality reduction. It should
> be possible to display the internals to find which terms or documents have
> the highest components on
> a single topic, but combi
Two things,
- use trunk. We are about to release 0.5 and there has been a ton of
progress since 0.4 including
several important bug fixes.
- LDA isn't really clustering. It is more along the lines of SVD as a
dimensionality reduction. It should
be possible to display the internals to find whic
I'm looking at using LDA to cluster documents based on topics. I've
gotten LDA to work in Mahout 0.4 and I am able to get keywords and
topics using the built-in mahout utilities.
Is there any simple way to view which documents are assigned to which
clusters after performing LDA? This could easily
I've done a new, clean, implementation of this (just the knn piece) at my
current company which has agreed to allow an open source contribution.
Thanks,
Randy
On Mon, Apr 25, 2011 at 11:09 PM, Ted Dunning wrote:
> Available cheaper at my old company.
>
>
> http://www.deepdyve.com/lp/association
On Tue, Apr 26, 2011 at 9:12 AM, Sean Owen wrote:
> That reduces to something like the Jaccard / Tanimoto coefficient -- not
> precisely since you're dividing by the length of those vectors rather than
> the size of their "union", but practically similar. And that's implemented
> as TanimotoCoeff
That reduces to something like the Jaccard / Tanimoto coefficient -- not
precisely since you're dividing by the length of those vectors rather than
the size of their "union", but practically similar. And that's implemented
as TanimotoCoefficientSimilarity.
Perhaps my point is that in Mahout (well
Setting didn't-buy to 0 and getting a valid cosine distance is pretty common
in these scenarios.
I still prefer what Sean is recommending in terms of LLR for item to item
links, but the cosine version does make sense to support, especially for
purchase histories.
Even better would be to remember
On Mon, Apr 25, 2011 at 11:46 PM, Stanley Xu wrote:
> 1 hour is acceptable, but I guess you misunderstand the data scale I mean
> here. The 900M records didn't mean 900M Bytes, but 900M lines of training
> set(900M training example.). If every training data has 1000 dimension, it
> means 900 mill
I am reading the book now, and will refer to you if I have any questions then.
Thanks.
On Fri, Apr 22, 2011 at 6:16 AM, Ted Dunning wrote:
> The trainlogistic command is (as Stanley says) only a simple example.
>
> You will need to write a program something like TrainNewsGroups for your
> modele
Maybe he used to be a window?
On Tue, Apr 26, 2011 at 1:57 AM, Ted Dunning wrote:
> Welcome!
>
> (like the email name ... as long as you don't toss too much out the window)
>
> On Mon, Apr 25, 2011 at 8:00 PM, Raymond Richardson
> wrote:
>
>> I represent Simularity.com, an organization which is p
What exactly does 'didnt buy' mean here ? Was the user shown the item or its
just an item they never considered?
To find the 'best' metric here you could simply run an offline evaluation
across your dataset. But what appears to be the most important thing is what
does each representation actually
Thanks, I'm going to look at that
2011/4/26 Sean Owen
> There are Mapper / Reducer pairs in org.apache.mahout.cf.taste.hadoop.item
> that would do the conversion on Hadoop. If you want something that's not on
> Hadoop, you would have to write your own code, but it's pretty easy.
>
> On Tue, Apr
There are Mapper / Reducer pairs in org.apache.mahout.cf.taste.hadoop.item
that would do the conversion on Hadoop. If you want something that's not on
Hadoop, you would have to write your own code, but it's pretty easy.
On Tue, Apr 26, 2011 at 8:25 AM, Mathieu sgard wrote:
> Hello,
>
> I'm playin
Hello,
I'm playing with mahout to discover it and I would like to cluster a sample
of customers. I have a preference matrix file (userID, ItemID, score) and I
would like to use clustering functions. How could I change this file into
VectorWritable/SequenceFile ?
Thanks,
Best Regards,
I think my comment mostly addressed his comments. Yes, this is the
definition of cosine distance, and is implemented. No it doesn't work over
true binary data. There is no "0", only "1" or non-existent.
What is the remaining question?
On Tue, Apr 26, 2011 at 3:21 AM, Chris Waggoner wrote:
>
>
> I
Peter (/Ted),
Yes this is all answered in the framework already. You would never directly
use the recommenders intended for data sets with ratings, as most don't make
sense when all ratings are 1.0. You would use, for example,
GenericBooleanPrefItemBasedRecommender, a variant on
GenericItemBasedRe
22 matches
Mail list logo