: Thursday, March 22, 2012 13:51
To: user@mahout.apache.org
Subject: Re: Mahout beginner questions...
1. These are the JDBC-related classes. For example see
MySQLJDBCDiffStorage or MySQLJDBCDataModel in integration/
2. The distributed and non-distributed code are quite separate
?
-Original Message-
From: Sean Owen [mailto:sro...@gmail.com]
Sent: Thursday, March 22, 2012 17:57
To: user@mahout.apache.org
Subject: Re: Mahout beginner questions...
A distributed and non-distributed recommender are really quite
separate. They perform the same task in quite different
To: user@mahout.apache.org
Subject: Re: Mahout beginner questions...
Hi Oren,
If you use an item-based approach, its sufficient to use the top-k
similar items per item (with k somewhere between 25 and 100). That means
the data to hold in memory is num_items * k data points.
While
...@apache.org]
Sent: Thursday, April 05, 2012 10:34
To: user@mahout.apache.org
Subject: Re: Mahout beginner questions...
Hi Oren,
If you use an item-based approach, its sufficient to use the top-k
similar items per item (with k somewhere between 25 and 100). That means
the data to hold in memory
It might or might not be interesting to comment on this discussion in
light of the new product/project I mentioned last night, Myrrix.
It's definitely an example of precisely this two-layered architecture
we've been discussing on this thread. http://myrrix.com/design/
The nice thing about a
Subject: Re: Mahout beginner questions...
On Sun, Mar 25, 2012 at 3:36 PM, Razon, Oren oren.ra...@intel.com wrote:
...
The system I need should of course give the recommendation itself in no
time.
...
But because I'm talking about very large scales, I guess that I want to
push much of my
I'm sure he's referring to the off-line model-building bit, not an online
component.
On Mon, Mar 26, 2012 at 9:27 AM, Razon, Oren oren.ra...@intel.com wrote:
By saying: At Veoh, we built our models from several billion interactions
on a tiny cluster you meant that you used the distributed
Message-
From: Sean Owen [mailto:sro...@gmail.com]
Sent: Monday, March 26, 2012 11:48
To: user@mahout.apache.org
Subject: Re: Mahout beginner questions...
I'm sure he's referring to the off-line model-building bit, not an online
component.
On Mon, Mar 26, 2012 at 9:27 AM, Razon, Oren oren.ra
necessarily need
to load the entire intermediate file (similarity results) into the memory?!
-Original Message-
From: Sean Owen [mailto:sro...@gmail.com]
Sent: Monday, March 26, 2012 11:48
To: user@mahout.apache.org
Subject: Re: Mahout beginner questions...
I'm sure he's referring
: Mahout beginner questions...
On Sun, Mar 25, 2012 at 3:36 PM, Razon, Oren oren.ra...@intel.com wrote:
...
The system I need should of course give the recommendation itself in no
time.
...
But because I'm talking about very large scales, I guess that I want to
push much of my model
from the DB into
your memory
So what is the pros in doing so? When should I consider it?
-Original Message-
From: Ted Dunning [mailto:ted.dunn...@gmail.com]
Sent: Monday, March 26, 2012 15:52
To: user@mahout.apache.org
Subject: Re: Mahout beginner questions...
No. I meant that I used
Message-
From: Ted Dunning [mailto:ted.dunn...@gmail.com]
Sent: Monday, March 26, 2012 15:52
To: user@mahout.apache.org
Subject: Re: Mahout beginner questions...
No. I meant that I used the same sort of combined offline and online
processes that I have recommended to you. The cluster did
, meaning I could scale up), or is it because of the recommendation time
it takes?
Thanks,
Oren
-Original Message-
From: Sean Owen [mailto:sro...@gmail.com]
Sent: Thursday, March 22, 2012 17:57
To: user@mahout.apache.org
Subject: Re: Mahout beginner questions...
A distributed and non
[mailto:sro...@gmail.com]
Sent: Sunday, March 25, 2012 21:25
To: user@mahout.apache.org
Subject: Re: Mahout beginner questions...
It is memory. You will need a pretty large heap to put 100M data in memory
-- probably 4GB, if not a little more (so the machine would need 8GB+ RAM).
You can go bigger if you
It rounds like the original poster isn't clear about the division between
off-line and on-line work.
Almost all production recommendation systems have a large off-line
component which analyzes logs of behavior and produces a recommendation
model. This model typically consists of item-item
the recommendations in advanced
(refresh it every X min\hours) and always recommend using the most updated
recommendations, right?!
-Original Message-
From: Sean Owen [mailto:sro...@gmail.com]
Sent: Sunday, March 25, 2012 21:25
To: user@mahout.apache.org
Subject: Re: Mahout beginner
, March 25, 2012 21:25
To: user@mahout.apache.org
Subject: Re: Mahout beginner questions...
It is memory. You will need a pretty large heap to put 100M data in memory
-- probably 4GB, if not a little more (so the machine would need 8GB+ RAM).
You can go bigger if you have more memory
the reading from the
DB offline so I'm not too afraid from losing some of my speed...
-Original Message-
From: Ted Dunning [mailto:ted.dunn...@gmail.com]
Sent: Sunday, March 25, 2012 21:35
To: user@mahout.apache.org
Subject: Re: Mahout beginner questions...
Not really. See my previous posting
On Sun, Mar 25, 2012 at 3:36 PM, Razon, Oren oren.ra...@intel.com wrote:
...
The system I need should of course give the recommendation itself in no
time.
...
But because I'm talking about very large scales, I guess that I want to
push much of my model computation to offline mode (which
@mahout.apache.org
Subject: Re: Mahout beginner questions...
On Sun, Mar 25, 2012 at 3:36 PM, Razon, Oren oren.ra...@intel.com wrote:
...
The system I need should of course give the recommendation itself in no
time.
...
But because I'm talking about very large scales, I guess that I want to
push much of my
On Sun, Mar 25, 2012 at 4:02 PM, Razon, Oren oren.ra...@intel.com wrote:
So let's continue with your example... I will do I 2 I similarity matrix
on Hadoop and then will do online recommendation based on it and the user
ranked items.
Yes.
So where does the online part will sit at? Is it
, March 25, 2012 21:35
To: user@mahout.apache.org
Subject: Re: Mahout beginner questions...
Not really. See my previous posting.
The best way to get fast recommendations is to use an item-based
recommender. Pre-computing recommendations for all users is not usually a
win because you wind up
1. These are the JDBC-related classes. For example see
MySQLJDBCDiffStorage or MySQLJDBCDataModel in integration/
2. The distributed and non-distributed code are quite separate. At
this scale I don't think you can use the non-distributed code to a
meaningful degree. For example you could
@mahout.apache.org
Subject: Re: Mahout beginner questions...
1. These are the JDBC-related classes. For example see
MySQLJDBCDiffStorage or MySQLJDBCDataModel in integration/
2. The distributed and non-distributed code are quite separate. At
this scale I don't think you can use the non-distributed code
, March 22, 2012 13:51
To: user@mahout.apache.org
Subject: Re: Mahout beginner questions...
1. These are the JDBC-related classes. For example see
MySQLJDBCDiffStorage or MySQLJDBCDataModel in integration/
2. The distributed and non-distributed code are quite separate. At
this scale I don't
25 matches
Mail list logo