Great to hear that you use Mahout in production! If you want to start
working on it, you can either browse our jira issues or propose some
issue to work on yourself.
If you need some input, it would be awesome to enhance our ALS
recommenders with cross-validation and tooling for finding a good
reg
We would love to have you!
I will let others answer about things to do since I have to fly.
On Fri, Apr 5, 2013 at 1:56 AM, Andrew Musselman wrote:
> In case this thread is still a good place to reply with an offer to help,
> I'd love to pitch in. I have built a few production recommenders, m
[
https://issues.apache.org/jira/browse/MAHOUT-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623338#comment-13623338
]
Suneel Marthi commented on MAHOUT-998:
--
Grant, would you like me to take a stab on th
In case this thread is still a good place to reply with an offer to help,
I'd love to pitch in. I have built a few production recommenders, most
recently using Mahout at a large retailer along with my partner where we
used ALS, with a pipeline of transforming transactions in XML into vectors
using
All of this doesn't normally matter when cosine distance is used since
usually it is used with normalized vectors. For that set of vectors it is
a measure.
On Thu, Apr 4, 2013 at 11:25 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:
> I agree 1 is wrong :)
>
>
> On Thu, Apr 4, 2013 at
I agree 1 is wrong :)
On Thu, Apr 4, 2013 at 2:22 PM, Dan Filimon wrote:
> Ah, okay then. :)
> I thought that you depend on the current convention that it returns 1. So,
> disclaimers aside, you're fine with the change?
>
>
> On Fri, Apr 5, 2013 at 12:20 AM, Sebastian Schelter <
> ssc.o...@googl
On 04.04.2013 23:22, Dan Filimon wrote:
> Ah, okay then. :)
> I thought that you depend on the current convention that it returns 1. So,
> disclaimers aside, you're fine with the change?
Yes, I concur that the distance between two identical vectors should be
zero.
>
>
> On Fri, Apr 5, 2013 at 1
Ah, okay then. :)
I thought that you depend on the current convention that it returns 1. So,
disclaimers aside, you're fine with the change?
On Fri, Apr 5, 2013 at 12:20 AM, Sebastian Schelter wrote:
> You can ignore the recommender stuff for the DistanceMeasure classes, as
> the recommenders u
You can ignore the recommender stuff for the DistanceMeasure classes, as
the recommenders use their own distance/similarity implementations.
I justed wanted to comment on the example that Andrew gave, to mention
that there are some common pitfalls with modeling ratings/interactions.
On 04.04.2013
Right, that's fair. So, you're saying there needs to be a special value
when both vectors are 0 for the recommender system to work?
And that 0 means dislike, which isn't in fact accurate. You want to convey
lack of information.
But now, the code returns 1. Is that a special value? I'd guess it mea
In recommender systems, it's dangerous to interpret "no interaction" as
dislike. Think of all movies you never watched, do you really dislike
them all? :)
On 04.04.2013 23:03, Andrew Musselman wrote:
> I agree; I mis-spoke before if I said "dislike". Zero to me means
> literally nothing. No int
I'm not familiar with the recommender code at all. I was only thinking of
the clustering.
How is dislike related to the cosine distance?
Also, CosineDistanceMeasure isn't really behaving like a measure in this
case (the whole d(x, x) = 0 thing). Maybe it makes sense to have a specific
subclass spe
I agree; I mis-spoke before if I said "dislike". Zero to me means
literally nothing. No interaction. Which could be either "don't like",
"don't like today", "dislike", etc. Which adds to the meaninglessness of
it.
On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter
wrote:
> I think that in ou
I think that in our recommender code, 0 should mean no rating or no
interaction observed. I think modeling dislike with 0 creates lot of
unnecessary problems.
On 04.04.2013 22:56, Andrew Musselman wrote:
> I see the arguments for having it defined, just raising the point that it's
> a very strange
I see the arguments for having it defined, just raising the point that it's
a very strange spot to be in.
If all users are zero except for one person who likes the lentil soup, then
the other users are equally different from that person.
The problem for me is the discontinuity Sean mentions, wher
Dislike should not be modeled by a zero rating IMHO. This might also
create problems with the iterateNonZero() method in our vectors.
On 04.04.2013 22:40, Andrew Musselman wrote:
> I think it should return an "undefined" symbol. There is no angle between
> two zero vectors.
>
> In a practical
While I agree that it's fairly meaningless mathematically, this ensures
that the distance between two vectors that are the same is 0 always holds.
Think of yourself using this class through the DistanceMeasure interface.
The implicit expectation [1] here is that d(x, y) = 0 iff x = y.
[1] http://e
It is a good argument, and cosine distance is discontinuous at 0. In
the context here they're trying to define a distance metric rather
than actually care about the angle in question, and 0 is probably a
better way to define it than anything else. I think it's OK to say
that two users for whom you
I think it should return an "undefined" symbol. There is no angle between
two zero vectors.
In a practical sense, taking two zero vectors to be equivalent in the
context of user-item vectors, say, is dodgy in my opinion. That is akin to
saying "If we both hate everything on this restaurant's men
Suneel is right. :)
Let me explain how this came up:
- When clustering, and assigning a point to a cluster, the centroid needs
to be updated.
- To update the centroid in the nearest neighbor searcher classes, the
centroid must first be removed.
- To remove the centroid, we get the closest vector (
Code from CosineDistanceMeasure
// correct for zero-vector corner case
if (denominator == 0 && dotProduct == 0) {
return 1;
}
Seems like a bug to me, agree with Dan it should be 0 (and not 1).
From: Dan Filimon
To: dev@mahout.apache.org
According to the GSoC calendar, accepted organizations aren't posted
until April 8 (Monday), at which point (assuming Apache is accepted...I
can't imagine it wouldn't be) slots will be doled out internally. This
will probably take at least a day or two, so probably by middle of next
week we'll
It sounds pretty undefined, but I would tend to define the distance as
0 in this case of course. And that means defining the cosine as 1.
Which class in particular? There are a few implementations of this
distance measure.
On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon wrote:
> In the case where bot
Any news on this front? Did we get approved/assigned a slot/anything?
On Fri, Mar 29, 2013 at 7:44 PM, Dan Filimon wrote:
> Ok, updated!
>
>
> On Fri, Mar 29, 2013 at 7:36 PM, Andy Twigg wrote:
>
>> Dan,
>>
>> I think what you've written is fine (I wanted to edit to remove the
>> '?' around ran
In the case where both vectors are all zeros, the angle between them is 0,
so the cosine is therefore 1 and the so the distance returned should be 0
(unless I misunderstood what the distance does).
In Mahout, when calling distance() however, if both the denominator and
dotProduct are 0 (which is t
[
https://issues.apache.org/jira/browse/MAHOUT-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622351#comment-13622351
]
Sebastian Schelter commented on MAHOUT-1161:
@rohit did you apply the patch f
[
https://issues.apache.org/jira/browse/MAHOUT-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622190#comment-13622190
]
Grant Ingersoll commented on MAHOUT-1161:
-
Sebastian, that's a reasonable approac
[
https://issues.apache.org/jira/browse/MAHOUT-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622171#comment-13622171
]
Rohit Haritash commented on MAHOUT-1161:
Hi Sebastian ,
Tested with the jars i
[
https://issues.apache.org/jira/browse/MAHOUT-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622088#comment-13622088
]
Sebastian Schelter commented on MAHOUT-1161:
Hi Rohit,
can you test wether t
[
https://issues.apache.org/jira/browse/MAHOUT-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622086#comment-13622086
]
Rohit Haritash commented on MAHOUT-1161:
Hi when envoking the CJK Analyser gettin
Congratulations! :)
On 4/4/13 6:30 AM, Grant Ingersoll wrote:
In recognition of the contributions of Suneel Marthi and Dan Filimon to the
Mahout project, the PMC is pleased to announce both have accepted our
invitations to join the Mahout project as committers.
As is customary, I will leave i
In recognition of the contributions of Suneel Marthi and Dan Filimon to the
Mahout project, the PMC is pleased to announce both have accepted our
invitations to join the Mahout project as committers.
As is customary, I will leave it to Suneel and Dan to provide a little bit of
background on who
Has anyone looked at seq2sparse performance in recent memory? I'm wondering if
anyone has any ideas for improving it. Based on my reading of the code, it
likely is slowly due to the sheer number of steps it has to do, but I'm hoping
there are some other cheaper wins hiding in there. (I am awa
33 matches
Mail list logo