User 3 gave a recommendation to item 107.
User 5 did not rate 107.
On Thu, Feb 13, 2014 at 1:57 AM, Suresh M suresh4mas...@gmail.com wrote:
user 5 has given rating for all 5 books,
So there will be no recommendations for him.
On 12 February 2014 08:55, jiangwen jiang jiangwen...@gmail.com
I guess you would get a 107 as a recommendation for 5
if you switched to user-based?
On Thu, Feb 13, 2014 at 8:21 AM, Koobas koo...@gmail.com wrote:
User 3 gave a recommendation to item 107.
User 5 did not rate 107.
On Thu, Feb 13, 2014 at 1:57 AM, Suresh M suresh4mas...@gmail.com wrote
5 should get 107 as a recommendation, whether user-based or item-based.
No clue why you're not getting it.
On Wed, Feb 12, 2014 at 11:50 PM, jiangwen jiang jiangwen...@gmail.comwrote:
Hi, all:
I try to user mahout api to make recommendations, but I find some userId
has no recommendations,
to approximate the rating values.
That's exactly what I was thinking.
Thanks for your reply.
On Sat, Jan 25, 2014 at 5:08 AM, Koobas koo...@gmail.com wrote:
A generic latent variable recommender question.
I passed the user-item matrix through a low rank approximation,
with either something like
A generic latent variable recommender question.
I passed the user-item matrix through a low rank approximation,
with either something like ALS or SVD, and now I have the feature
vectors for all users and all items.
Case 1:
I want to recommend items to a user.
I compute a dot product of the user’s
In ALS the coincidence matrix is approximated by XY',
where X is user-feature, Y is item-feature.
Now, here is the question:
are/should the feature vectors be normalized before computing
recommendations?
Now, what happens in the case of SVD?
The vectors are normal by definition.
Are singular
dlie...@gmail.com wrote:
On Wed, Sep 4, 2013 at 10:07 AM, Koobas koo...@gmail.com wrote:
In ALS the coincidence matrix is approximated by XY',
where X is user-feature, Y is item-feature.
Now, here is the question:
are/should the feature vectors be normalized before computing
!
Straight to the point.
That's the answer I was looking for.
Also, thanks to Ted. He pretty much said the same thing.
On Wed, Sep 4, 2013 at 6:07 PM, Koobas koo...@gmail.com wrote:
In ALS the coincidence matrix is approximated by XY',
where X is user-feature, Y is item-feature.
Now, here
Same request here.
Can you share the paper?
On Tue, Jul 23, 2013 at 6:47 AM, 刘鎏 liuliu@gmail.com wrote:
Congratulations~
By the way, could the paper be shared? THX~
Best,
LiuLiu
On Mon, Jul 22, 2013 at 2:22 AM, Sebastian Schelter s...@apache.org
wrote:
I'm happy to anounce
Is a factorizing recommender a better idea for low volume data in general?
On Mon, Jul 15, 2013 at 11:35 AM, Ted Dunning ted.dunn...@gmail.com wrote:
With such small data, this sounds (without thinking too much) like you are
doing reasonably well with LLR similarity.
Have you tried a
I am guessing (comments welcome) that it is going to be difficult
to guarantee reproducibility under parallel execution conditions.
MapReduce has reduction in its name.
Reduction operations are the main cause of irreproducibility in parallel
codes,
because changing the order of summations changes
On Mon, Jun 24, 2013 at 5:07 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
On Mon, Jun 24, 2013 at 1:35 PM, Michael Kazekin kazm...@hotmail.com
wrote:
I agree with you, I should have mentioned earlier that it would be good
to
separate noise from data and deal with only what is separable.
this will change as soon as CaaS machine learning goes mainstream.
On Mon, Jun 24, 2013 at 2:29 PM, Koobas koo...@gmail.com wrote:
On Mon, Jun 24, 2013 at 5:07 PM, Dmitriy Lyubimov dlie...@gmail.com
wrote:
On Mon, Jun 24, 2013 at 1:35 PM, Michael Kazekin kazm...@hotmail.com
wrote
Well, you know, the issue is there, whether we like it or not.
Maybe replication is enough, maybe not.
If there is a workshop on that issue, it's on the radar.
http://beamtenherrschaft.blogspot.com/2013/06/acm-recsys-2013-workshop-on.html
On Mon, Jun 24, 2013 at 6:36 PM, Sean Owen
Since I am primarily an HPC person, probably a naive question from the ML
perspective.
What if, when computing recommendations, we don't exclude what the user
already has,
and then see if the items he has end up being recommended to him (compute
some appropriate metric / ratio)?
Wouldn't that be
of explains why precision/recall can be really low in these
tests. I would not be surprised if you get 0 in some cases, on maybe
small input. Is it a bad predictor? maybe, but it's not clear.
On Fri, Jun 7, 2013 at 8:06 PM, Koobas koo...@gmail.com wrote:
Since I am primarily an HPC person
I am also very interested in the answer to this question.
Just to reiterate, if you use different recommenders, e.g.,
kNN user-based, kNN item-based, ALS, each one produces
recommendations on a different scale. So how do you combine them?
On Fri, May 31, 2013 at 3:07 PM, Dominik Hübner
as a weak, though still useful,
inspirational guide.
On Fri, May 31, 2013 at 3:18 PM, Koobas koo...@gmail.com wrote:
I am also very interested in the answer to this question.
Just to reiterate, if you use different recommenders, e.g.,
kNN user-based, kNN item-based, ALS, each one produces
Since Dominik mentioned item-based and ALS, let me throw in a question here.
I believe that one of the Netflix price solutions combined KNN and ALS.
1) What is the best way to combine the results of both?
2) Is there really merit to this approach?
3) Are there other combinations that make sense?
I think I see the picture now.
Thanks!
On Mon, May 6, 2013 at 5:25 PM, Ted Dunning ted.dunn...@gmail.com wrote:
On Mon, May 6, 2013 at 12:50 PM, Koobas koo...@gmail.com wrote:
Since Dominik mentioned item-based and ALS, let me throw in a question
here.
I believe that one of the Netflix
to purchases. All
these are implicit preferences but that's not the important part for this
technique.
On Apr 10, 2013, at 4:15 PM, Koobas koo...@gmail.com wrote:
Retail data may be hard to impossible, but one can improvise.
It seems to be fairly common to use Wikipedia articles (Myrrix, GraphLab
Retail data may be hard to impossible, but one can improvise.
It seems to be fairly common to use Wikipedia articles (Myrrix, GraphLab).
Another idea is to use StackOverflow tags (Myrrix examples).
Although they are only good for emulating implicit feedback.
On Wed, Apr 10, 2013 at 6:48 PM, Ted
Okay, it sheds some light on the problem.
Thanks for sharing.
On Mon, Apr 8, 2013 at 4:33 AM, Sean Owen sro...@gmail.com wrote:
PS I think the issue is really more like this, after some more testing.
When lambda (overfitting parameter) is high, the X and Y in the
factorization A = X*Y' are
On Fri, Apr 5, 2013 at 8:07 AM, Sean Owen sro...@gmail.com wrote:
OK yes you're on to something here. I should clarify. Koobas you are
right that the ALS algorithm itself is fine here as far as my
knowledge takes me. The thing it inverts to solve for a row of X is
something like (Y' * Cu * Y
, but it's not really a matter of
condition number or machine precision. Condition numbers are 1 in
these cases but not that large.
On Sun, Apr 7, 2013 at 12:19 AM, Koobas koo...@gmail.com wrote:
I don't see why the inverse of Y'*Y does not exist.
What Y do you end up with?
Let me try to wrap my head around it
On Fri, Apr 5, 2013 at 8:07 AM, Sean Owen sro...@gmail.com wrote:
OK yes you're on to something here. I should clarify. Koobas you are
right that the ALS algorithm itself is fine here as far as my
knowledge takes me. The thing it inverts to solve
On Thu, Apr 4, 2013 at 9:13 AM, Ted Dunning ted.dunn...@gmail.com wrote:
Typically, to deal with this kind of problem, you need to follow one of two
courses.
First, you can use a so-called rank-revealing QR which uses a pivoting
strategy to push all of the small elements of R as far down the
On Thu, Apr 4, 2013 at 9:36 AM, Sean Owen sro...@gmail.com wrote:
Yeah I've got the pivoting part down -- I think. The problem is that I
can't seem to identify the problem by simple thresholding. For
example, a diagonal like 10 9 8 0.0001 0.001 obviously has a
problem. But so might 100 90
. Is there established procedure for evaluating the
ill-conditioned-ness of matrices -- like a principled choice of
threshold above which you say it's ill-conditioned, based on k, etc.?
On Thu, Apr 4, 2013 at 3:19 PM, Koobas koo...@gmail.com wrote:
So, the problem is that the kxk matrix is ill
of matrices -- like a principled choice of
threshold above which you say it's ill-conditioned, based on k, etc.?
On Thu, Apr 4, 2013 at 3:19 PM, Koobas koo...@gmail.com wrote:
So, the problem is that the kxk matrix is ill-conditioned, or is there
more
to it?
I took Movie Lens 100K data without ratings and ran non-weighted ALS in
Matlab.
I set number of features k=2000, which is larger than the input matrix
(1000 x 1700).
I used QR to do the inversion.
It runs without problems.
Can you share your data?
On Thu, Apr 4, 2013 at 1:10 PM, Koobas koo
dummy data like below, without maybe k=10. If it
completes with error that's a problem!
Okay, let me try it
0,0,1
0,1,4
0,2,3
1,2,3
2,1,4
2,3,3
2,4,2
3,0,5
3,2,2
3,4,3
4,3,5
5,0,2
5,1,4
On Thu, Apr 4, 2013 at 7:05 PM, Koobas koo...@gmail.com wrote:
I took Movie Lens 100K
:04 PM, Koobas koo...@gmail.com wrote:
On Thu, Apr 4, 2013 at 2:23 PM, Sean Owen sro...@gmail.com wrote:
Does it complete without problems? It may complete without error but
the result may be garbage. The matrix that's inverted is not going to
be singular due to round-off. Even if it's
k was 10
On Thu, Apr 4, 2013 at 3:37 PM, Koobas koo...@gmail.com wrote:
No major problems:
A =
1 4 3 0 0
0 0 3 0 0
0 4 0 3 2
5 0 2 0 3
0 0 0 5 0
2 4 0 0 0
Sorry, the image was off.
This is more like it:
[image: Inline image 1]
On Thu, Apr 4, 2013 at 3:38 PM, Koobas koo...@gmail.com wrote:
k was 10
On Thu, Apr 4, 2013 at 3:37 PM, Koobas koo...@gmail.com wrote:
No major problems:
A =
1 4 3 0 0
0 0 3
0 0
0 01.43070.1803
0 0 0 0 0 0
0 0 01.1404
On Thu, Apr 4, 2013 at 3:46 PM, Koobas koo...@gmail.com wrote:
Sorry, the image was off.
This is more like it:
[image: Inline image 1]
On Thu, Apr 4
BTW, my initialization of X and Y is simply random:
X = rand(m,k);
Y = rand(k,n);
On Thu, Apr 4, 2013 at 3:51 PM, Koobas koo...@gmail.com wrote:
It's done in one iteration.
This is the R from QR factorization:
5.06635.81224.97044.39876.34004.59705.0334
4.2581
Makes perfect sense.
Thanks for the explanation.
On Thu, Apr 4, 2013 at 6:11 PM, Ted Dunning ted.dunn...@gmail.com wrote:
On Thu, Apr 4, 2013 at 4:16 PM, Koobas koo...@gmail.com wrote:
The Mahout QR that I whipped up a couple of months ago is not rank
revealing, but it is pretty easy
at 8:54 PM, Koobas koo...@gmail.com wrote:
BTW, my initialization of X and Y is simply random:
X = rand(m,k);
Y = rand(k,n);
On Thu, Apr 4, 2013 at 3:51 PM, Koobas koo...@gmail.com wrote:
It's done in one iteration.
This is the R from QR factorization:
5.06635.8122
Are the suggestions completely different, or somewhat different?
What about the neighborhoods?
On Thu, Mar 28, 2013 at 10:09 AM, ch raju ch.raju...@gmail.com wrote:
Hi all,
I am working on mahout-0.7 recommendations, ran following command from
the command line
./bin/mahout
matrix?
Are there any indicators that it results in better recommendations?
Koobas
On Mon, Mar 25, 2013 at 9:52 AM, Sean Owen sro...@gmail.com wrote:
On Mon, Mar 25, 2013 at 1:41 PM, Koobas koo...@gmail.com wrote:
But the assumption works nicely for click-like data. Better still when
you can weakly prefer to reconstruct the 0 for missing observations
and much more
regularization entirely.
I misspoke.
I meant lambda=1.
On Mon, Mar 25, 2013 at 2:14 PM, Koobas koo...@gmail.com wrote:
On Mon, Mar 25, 2013 at 9:52 AM, Sean Owen sro...@gmail.com wrote:
On Mon, Mar 25, 2013 at 1:41 PM, Koobas koo...@gmail.com wrote:
But the assumption works nicely for click-like
will chip in.
Koobas
On Sun, Mar 24, 2013 at 10:19 PM, Dominik Huebner cont...@dhuebner.comwrote:
It's quite hard for me to get the mathematical concepts of the ALS
recommenders. It would be great if someone could help me to figure out
the details. This is my current status:
1. The item-feature (M
. Not sure about KNN though.
On Sun, Mar 17, 2013 at 3:03 AM, Koobas koo...@gmail.com wrote:
Can anybody shed any light on the issue of reproducibility in Mahout,
with and without Hadoop, specifically in the context of kNN and ALS
recommenders?
.
On Sun, Mar 17, 2013 at 1:43 PM, Koobas koo...@gmail.com wrote:
I am asking the basic reproducibility question.
If I run twice on the same dataset, with the same hardware setup, will I
always get the same resuts?
Or is there any chance that on two different runs, the same user will get
On Wed, Mar 13, 2013 at 5:01 AM, Manuel Blechschmidt
manuel.blechschm...@gmx.de wrote:
Hi Reinhard,
here you go:
https://github.com/ManuelB/facebook-recommender-demo
The example above provides a SOAP interface and a REST interface using
Java EE 6. It is not scalable for a lot of reasons
In the GenericUserBasedRecommender the concept of a neighborhood seems to
be fundamental.
I.e., it is a classic implementation of the kNN algorithm.
But it is not the case with the GenericItemBasedRecommender.
I understand that the two approaches are not meant to be completely
symmetric,
but
do I find more information about this?
And thanks for the instantaneous reply :)
On Thu, Feb 21, 2013 at 2:37 PM, Koobas koo...@gmail.com wrote:
In the GenericUserBasedRecommender the concept of a neighborhood seems to
be fundamental.
I.e., it is a classic implementation of the kNN
to N users but the quality of recommendations overall.
In this particular data set, which is rich and un-noisy, the ratings
are probably valuable information and I imagine you will do better
with any approach that doesn't drop them.
On Fri, Jan 25, 2013 at 2:19 AM, Koobas koo...@gmail.com
A naive question:
Boolean recommender means that we are ignoring ratings,
but aren't recommendations still weighted by user-user similarities or
item-item similarities?
Which would also mean that increasing the neighborhood will not deteriorate
the results,
because bad contributions from farther
On Thu, Jan 24, 2013 at 7:41 PM, Ted Dunning ted.dunn...@gmail.com wrote:
That doesn't mean that is a bad recommendation.
People don't rate things for simple reasons. Generally, they rate things
that are close to what they like and they rate things negatively that are
very close to what
, Jan 21, 2013 at 1:12 AM, Colin Wang colin.bin.wang.mah...@gmail.com
wrote:
Hi Koobas,
I am trying on dense matrix in Hadoop, thousand times thousand square size.
How do HPC guys to solve this problem? Any references?
Thank you,
Colin
On Mon, Jan 21, 2013 at 11:49 AM, Koobas koo
Colin,
I am more of an HPC guys.
I am a Mahout noob myself.
Are we talking about a dense matrix?
What size?
On Sun, Jan 20, 2013 at 9:34 PM, Colin Wang colin.bin.wang.mah...@gmail.com
wrote:
Hi Koobas,
I want the first one. Do you have any suggestions?
Thank you,
Colin
On Fri, Jan 18
Martix inversion, as in explicitly computing the inverse,
e.g. computing variance / covariance,
or matrix inversion, as in solving a linear system of equations?
On Thu, Jan 17, 2013 at 7:49 PM, Colin Wang colin.bin.wang.mah...@gmail.com
wrote:
Hi All,
I want to solve the matrix inversion,
, Koobas koo...@gmail.com wrote:
Okay, I got a little bit further in my understanding.
The matrix of ratings R is replaced with the binary matrix P.
Then R is used again in regularization.
I get it.
This takes care of the situations when you have user-item interactions,
but you don't have
notation?
Because it seems to me that I have to go one row of X, (one column of Y) at
a time.
Is that really so, or am I missing something?
On Wed, Jan 9, 2013 at 10:13 AM, Koobas koo...@gmail.com wrote:
On Wed, Jan 9, 2013 at 12:40 AM, Sean Owen sro...@gmail.com wrote:
I think the model you're
storage and hit
with Householder?
The underlying question being the computational complexity, i.e. number of
floating point operations involved.
On Tue, Jan 8, 2013 at 4:03 PM, Sebastian Schelter s...@apache.org wrote:
Hi Koobas,
We have two classes that implement the solutions described
On Tue, Jan 8, 2013 at 6:41 PM, Sean Owen sro...@gmail.com wrote:
There's definitely a QR decomposition in there for me since solving A
= X Y' for X is X = A Y (Y' * Y)^-1 and you need some means to
compute the inverse of that (small) matrix.
Sean,
I think I got it.
1) A Y is a handful of
On Tue, Jan 8, 2013 at 7:18 PM, Ted Dunning ted.dunn...@gmail.com wrote:
But is it actually QR of Y?
Ted,
This is my understanding:
In the process of solving the least squares problem,
you end up inverting a small square matrix (Y' * Y)-1.
How it is done is irrelevant.
Since the matrix is
On Tue, Jan 8, 2013 at 7:17 PM, Koobas koo...@gmail.com wrote:
On Tue, Jan 8, 2013 at 6:41 PM, Sean Owen sro...@gmail.com wrote:
There's definitely a QR decomposition in there for me since solving A
= X Y' for X is X = A Y (Y' * Y)^-1 and you need some means to
compute the inverse
As a n00b, I am still revolving in the kNN space.
Could you please point me to some details on ALS.
Thanks!
On Thu, Dec 6, 2012 at 10:14 AM, Sean Owen sro...@gmail.com wrote:
The tree-based ones are very old and not fast, and were more of an
experiment. I recall a few questions about them but
it is not going away.
I need a starting point to get up to speed.
Thanks for clarifying.
On Thu, Dec 6, 2012 at 3:18 PM, Koobas koo...@gmail.com wrote:
As a n00b, I am still revolving in the kNN space.
Could you please point me to some details on ALS.
Thanks!
On Thu, Dec 6, 2012 at 10
I am very happy to see that I started a lively thread.
I am a newcomer to the field, so this is all very useful.
Now yet another naive question.
Ted is probably going to go ballistic ;)
Assuming that simple overlap methods suck,
is there still a metric that works better than others
(i.e. Tanimoto
On Wed, Dec 5, 2012 at 7:03 PM, Ted Dunning ted.dunn...@gmail.com wrote:
On Wed, Dec 5, 2012 at 5:29 PM, Koobas koo...@gmail.com wrote:
...
Now yet another naive question.
Ted is probably going to go ballistic ;)
I hope not.
Assuming that simple overlap methods suck
65 matches
Mail list logo