Hi all,
I'm curious what approaches are recommended for generating user-user
similarity, when I've got two (or more) distinct types of item data, both of
which are fairly large.
E.g. let's say I had a set of users where I knew both (a) what books they had
bought on Amazon, and (b) what YouTube
Item-item similarity is a property of the information you have on two
items and just those items. Whether there are just those 2 items over
500K users, or 2M items over 500K users, makes no difference. So no I
don't think that this skew implies you should use any particular
algorithm, by itself.
I
Hello Everyone,I was reading through the documentation on the different
itemsimilarity algorithms in mahout and had a question, if one has a scenario
where the number of items are significantly less than the number of users (say
500,000 users to 1000 items) are there particular item similarity
Because the order of the examples is randomized.
On Tue, Jul 3, 2012 at 8:13 AM, Caspar Hsieh wrote:
> I use Mahout classification example "TrainNewsGroup" to train the model
> with leak type 3, then use "TestNewsGroups" to test the model,
> then I re-trained the model and test again, the test r
I use Mahout classification example "TrainNewsGroup" to train the model
with leak type 3, then use "TestNewsGroups" to test the model,
then I re-trained the model and test again, the test result are not the
same before.
why each time the model trained are not the same?
Thanks.
(Please don't "ping" your questions on the list -- bad form and makes
people less likely to answer.)
You do not have to have equal numbers of positive/negative examples. I
think you need to go back and read up on the basics of how Bayesian
classification works before you dig in to Mahout. This is
I'm not sure if Mridul's suggestion does what you want. Do you want to
recommend items to users? then no, you do not start with item IDs and
recommend to them.
It sounds like your question is how to compute similarity data. The
first answer is that you do not use Hadoop unless you must use Hadoop.
Can someone help me with this?
Regards,
Damodar
On Tue, Jul 3, 2012 at 4:27 PM, damodar shetyo wrote:
> Hi,
> I plan to use mahout classification feature.I have a lot of data on which
> i am planning to train my model.Now i have few queries as follows:
> 1)Suppose i have 2 types of data: Spam
Thanks Mridul, I'll try this out. Does getItemIDs return every item id
from the file in your example?
This kind of leads me to another, related question... I want to have
my recommender engine recommend items to a user, but the items should
be from a known set of item ids. For example, if a user i
> I'm thinking the session ID (in the cookie) would be used as the user ID.
> The events
> are tied to product IDs, so these would be used in generating the
> preferences.
I guess if you consider product-preference on a per session-basis (i.e.
only items for which a user expresses preference for,
Hi,
I'm just beginning to play with the Mahout recommendation framework.
I'm wondering if I could get some advice for implementing this thing.
My data comes from a web app's, event logs, where the users accounts
are only persisted for 30 days -- cookie data. I'm thinking the
session ID (in the co
Hi,
I plan to use mahout classification feature.I have a lot of data on which i
am planning to train my model.Now i have few queries as follows:
1)Suppose i have 2 types of data: Spam and not spam (this is just for
example and not real use case , but similar to my real use case).The
amount of sp
Hi Lance
I understand that pages are pages but nutch stores pages in its own format
while mahout operates with other data formats.
I would like to merge nutch and mahout with minimun efforts that's why I
question what is easier. Alter mahout and implement logic to read/write
nutch data or impleme
13 matches
Mail list logo