Hi,
I want to write a recommendation system which recommends items to customers
based on the following parameters (and some others):
- User-item similarity (for example recommend items which target certain
gender,age etc. to users which meet these criteria)
- Time of year (recommend
Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have 42 map
and 42 reduce slots configured, I set the default reduce task per job as 73 (42
* 1.75), I find there are always about 12 of the reduce tasks are running at
any time although there are 73 reduce tasks created for
If by #3 you mean you have preferences for many users, this is of
course the standard input for a recommender, yes. If you also have
some user-user similarity info beyond that, you can implement
UserSimliarity and use GenericUserBasedRecommender to incorporate
that.
If you want to boost items
I have a case where I'd like to get documents which most closely match a
particular vector. The RowSimilarityJob of Mahout is ideal for
precalculating similarity between existing documents but in my case the
query is constructed at run time. So the UI constructs a vector to be
used as a query.
By time-based I meant something that supports recommendation by time of
year (#2 on my list).
IDRescorer looks interesting, but (correct me if I'm wrong, I'm a complete
newbie with Mahout and generally in this field) it seems more like a tool
to refine the order of recommended items after the
It really depends on what you mean by based on time, as it could
mean many things. I'm assuming you mean that an item's seasonality
should somehow boost its importance, and boost its perceived value, by
some multiplier.
The useful application of that idea is in fact what you get in
IDRescorer. I
Yep, filtering is really what I need in this case, I'll give IDRescorer a
look.
Regarding the perfect item (simplified for the sake of example) - let's
assume I have the info that the user is a 20 y.o. woman who likes the color
red, and it's going to be christmas in a week's time. So if she's
It sounds like you have substantially a search problem. You know the
user's attributes, you know the items' attributes, and are just
finding the closest match. That by itself doesn't need a recommender
at all; it would just be extra complexity and fiddling to make it look
like a recommender
Deploying a jar with a single class extending Analyzer results in an
error for a missing org.apache.lucene.analysis.Analyzer
mahout seq2sparse -i wp-seqfiles/part-r-0 -o wp-vectors -ow *-a
com.custom.analyzers.LuceneStemmingAnalyzer* -chunk 100 -wt tfidf -s
2 -md 3 -x 95 -ng 2 -ml
With site plugin 3.0, is there a way to make the deployed site work
with modules like:
modulesrc/somemodule/module
?
I'm getting links without the src/ but web pages delivered to the src/
directory.
Hmm haven't about it this way, the required functionality is recommending
items to users so I always looked at the problem from that angle. Still,
the attributes of users and items won't always be that easy to match, so as
I said earlier, I'll need to integrate user-similarity-based suggestions as
Yes... returning best items to users is a slightly more statement
than that what a recommender does. That's the output, but a
recommender input is usually user-item associations rather than
attributes.
I split hairs only to make the point that you really have two problems
here, and the data to
Can you run K-means jobs again ( all with the same block size ) and give
same statistics for :
a) only 1 job running
b) 2 jobs running simultaneously
c) 5 jobs running simultaneously
On 10-03-2012 21:08, WangRamon wrote:
Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have
And to answer the question about KMeans configuration :
Kmeans has two jobs :
1) builClusters : has a reducer and has no limitation on the number of
reducer tasks
2) clusterData : executes if runClustering = true, has no reducer tasks
On 11-03-2012 09:10, Paritosh Ranjan wrote:
Can you run
Hi ParitoshI did the tests with 1 job and 5 jobs, they all have the same
problem, the job i'm running is the buildClusters one, I can see there are 73
reduce tasks created from the monitor GUI, but only 12 of them are running at
any time (the rest are in pending state), the task finished
What's your Hadoop config in terms of the maximum number of reducers?
It's a function of your available RAM on each node and numbers of nodes.
On 3/10/12 8:55 PM, WangRamon wrote:
Hi ParitoshI did the tests with 1 job and 5 jobs, they all have the same
problem, the job i'm running is the
Here is the configuration: property
namemapred.tasktracker.map.tasks.maximum/name
value14/value
/property
property
namemapred.tasktracker.reduce.tasks.maximum/name
value14/value
/property
property
namemapred.reduce.tasks/name
Can you try reducing/increasing you block and see the impact?
I am suspecting block size to be the problem.
I have faced the same problem once ( for a different hadoop job, and it
was very hard to debug it ). In that case, CompositeInputFormat was
being used as input, which used to fix the block
18 matches
Mail list logo