User-item similarity and time-based recommendations

2012-03-10 Thread Alex Geller
Hi, I want to write a recommendation system which recommends items to customers based on the following parameters (and some others): - User-item similarity (for example recommend items which target certain gender,age etc. to users which meet these criteria) - Time of year (recommend

Not all Mapper/Reducer slots are taken when running K-Means cluster

2012-03-10 Thread WangRamon
Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have 42 map and 42 reduce slots configured, I set the default reduce task per job as 73 (42 * 1.75), I find there are always about 12 of the reduce tasks are running at any time although there are 73 reduce tasks created for

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Sean Owen
If by #3 you mean you have preferences for many users, this is of course the standard input for a recommender, yes. If you also have some user-user similarity info beyond that, you can implement UserSimliarity and use GenericUserBasedRecommender to incorporate that. If you want to boost items

Vector based queries

2012-03-10 Thread Pat Ferrel
I have a case where I'd like to get documents which most closely match a particular vector. The RowSimilarityJob of Mahout is ideal for precalculating similarity between existing documents but in my case the query is constructed at run time. So the UI constructs a vector to be used as a query.

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Alex Geller
By time-based I meant something that supports recommendation by time of year (#2 on my list). IDRescorer looks interesting, but (correct me if I'm wrong, I'm a complete newbie with Mahout and generally in this field) it seems more like a tool to refine the order of recommended items after the

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Sean Owen
It really depends on what you mean by based on time, as it could mean many things. I'm assuming you mean that an item's seasonality should somehow boost its importance, and boost its perceived value, by some multiplier. The useful application of that idea is in fact what you get in IDRescorer. I

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Alex Geller
Yep, filtering is really what I need in this case, I'll give IDRescorer a look. Regarding the perfect item (simplified for the sake of example) - let's assume I have the info that the user is a 20 y.o. woman who likes the color red, and it's going to be christmas in a week's time. So if she's

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Sean Owen
It sounds like you have substantially a search problem. You know the user's attributes, you know the items' attributes, and are just finding the closest match. That by itself doesn't need a recommender at all; it would just be extra complexity and fiddling to make it look like a recommender

Re: Clustering techniques, tips and tricks

2012-03-10 Thread Pat Ferrel
Deploying a jar with a single class extending Analyzer results in an error for a missing org.apache.lucene.analysis.Analyzer mahout seq2sparse -i wp-seqfiles/part-r-0 -o wp-vectors -ow *-a com.custom.analyzers.LuceneStemmingAnalyzer* -chunk 100 -wt tfidf -s 2 -md 3 -x 95 -ng 2 -ml

site plugin versus modules at nonstandard paths

2012-03-10 Thread Benson Margulies
With site plugin 3.0, is there a way to make the deployed site work with modules like: modulesrc/somemodule/module ? I'm getting links without the src/ but web pages delivered to the src/ directory.

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Alex Geller
Hmm haven't about it this way, the required functionality is recommending items to users so I always looked at the problem from that angle. Still, the attributes of users and items won't always be that easy to match, so as I said earlier, I'll need to integrate user-similarity-based suggestions as

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Sean Owen
Yes... returning best items to users is a slightly more statement than that what a recommender does. That's the output, but a recommender input is usually user-item associations rather than attributes. I split hairs only to make the point that you really have two problems here, and the data to

Re: Not all Mapper/Reducer slots are taken when running K-Means cluster

2012-03-10 Thread Paritosh Ranjan
Can you run K-means jobs again ( all with the same block size ) and give same statistics for : a) only 1 job running b) 2 jobs running simultaneously c) 5 jobs running simultaneously On 10-03-2012 21:08, WangRamon wrote: Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have

Re: Not all Mapper/Reducer slots are taken when running K-Means cluster

2012-03-10 Thread Paritosh Ranjan
And to answer the question about KMeans configuration : Kmeans has two jobs : 1) builClusters : has a reducer and has no limitation on the number of reducer tasks 2) clusterData : executes if runClustering = true, has no reducer tasks On 11-03-2012 09:10, Paritosh Ranjan wrote: Can you run

RE: Not all Mapper/Reducer slots are taken when running K-Means cluster

2012-03-10 Thread WangRamon
Hi ParitoshI did the tests with 1 job and 5 jobs, they all have the same problem, the job i'm running is the buildClusters one, I can see there are 73 reduce tasks created from the monitor GUI, but only 12 of them are running at any time (the rest are in pending state), the task finished

Re: Not all Mapper/Reducer slots are taken when running K-Means cluster

2012-03-10 Thread Jeff Eastman
What's your Hadoop config in terms of the maximum number of reducers? It's a function of your available RAM on each node and numbers of nodes. On 3/10/12 8:55 PM, WangRamon wrote: Hi ParitoshI did the tests with 1 job and 5 jobs, they all have the same problem, the job i'm running is the

RE: Not all Mapper/Reducer slots are taken when running K-Means cluster

2012-03-10 Thread WangRamon
Here is the configuration: property namemapred.tasktracker.map.tasks.maximum/name value14/value /property property namemapred.tasktracker.reduce.tasks.maximum/name value14/value /property property namemapred.reduce.tasks/name

Re: Not all Mapper/Reducer slots are taken when running K-Means cluster

2012-03-10 Thread Paritosh Ranjan
Can you try reducing/increasing you block and see the impact? I am suspecting block size to be the problem. I have faced the same problem once ( for a different hadoop job, and it was very hard to debug it ). In that case, CompositeInputFormat was being used as input, which used to fix the block