I see the architecture similar to  the following:

Asynchronously:Given a set of feature vectors    run clustering/classification 
algorithms for each of our feature vectors to create the appropriate buckets 
for the set of users, feed the result of these computations into the 
synchronous database.    
Synchronously:For each bucket run item similarity recommendation algorithms to 
display a real time set of recommendations for each user

For the asynchronous computations we need the ability to tweak the weights 
associated with each feature of the feature vectors (typical features might 
include income/age/dining preferences etc) and we need the business folks to 
adjust the weights associated with each of these to regenerate the async buckets

So given the above architecture we need the ability for the async computations 
to judge which algorithm to use based on a set of performance measuring 
criteria, that was the heart of my initial question, whether folks have built 
this sort of framework and what are some things to think about when building 
this.
Thanks for your feedback



> Date: Tue, 10 Apr 2012 14:33:56 -0500
> Subject: Re: Evalutation of recommenders
> From: sro...@gmail.com
> To: user@mahout.apache.org
> 
> You're talking about recommendations now... are we talking about a
> clustering, classification or recommender system?
> 
> In general I don't know if it makes sense for business users to be
> deciding aspects of the internal model. At most someone should input
> the tradeoffs -- how important is accuracy vs speed? those kinds of
> things. Then it's an optimization problem. But, understood, maybe you
> need to let people explore these things manually at first.
> 
> On Tue, Apr 10, 2012 at 2:21 PM, Saikat Kanjilal <sxk1...@hotmail.com> wrote:
> >
> > The question really is what are some tried approaches to figure out how to 
> > measure the quality of a set of algorithms currently being used for 
> > clustering/classification?
> >
> > And in thinking about this some more we also need to be able to regenerate 
> > models as soon as the business users tweak the weights associated with 
> > features inside a feature vector, we need to figure out a way to 
> > efficiently tie this into our online workflow which could show updated 
> > recommendations every few hours?
> >
> > When I say picking an algorithm on the fly what I mean is that we need to 
> > continuously test our basket of algorithms based on a new set of training 
> > data and make the determination offline as to which of the algorithms to 
> > use at that moment to regenerate our recommendations.
> >> Date: Tue, 10 Apr 2012 14:08:17 -0500
> >> Subject: Re: Evalutation of recommenders
> >> From: sro...@gmail.com
> >> To: user@mahout.apache.org
> >>
> >> Picking an algorithm 'on the fly' is almost surely not realistic --
> >> well, I am not sure what eval process you would run in milliseconds.
> >> But it's also unnecessary; you usually run evaluations offline on
> >> training/test data that reflects real input, and then, the resulting
> >> tuning should be fine for that real input that comes the next day.
> >>
> >> Is that really the question, or are you just asking about how you
> >> measure the quality of clustering or a classifier?
> >>
> >> On Tue, Apr 10, 2012 at 10:41 AM, Saikat Kanjilal <sxk1...@hotmail.com> 
> >> wrote:
> >> >
> >> > Hi everyone,We're looking at building out some clustering and 
> >> > classification algorithms using mahout and one of the things we're also 
> >> > looking at doing is to build performance metrics around each of these 
> >> > algorithms, as we go down the path of choosing the best model in an 
> >> > iterative closed feedback loop (i.e. our business users manipulate 
> >> > weights for each attribute for our feature vectors->we use these changes 
> >> > to regenerate an asynchronous model using the appropriate 
> >> > clustering/classification algorithms and then replenish our online 
> >> > component with this newly recalculated data for fresh recommendations).  
> >> >  So our end goal is to have a basket of algorithms and use a set of 
> >> > performance metrics to pick and choose the right algorithm on the fly.  
> >> > I was wondering if anyone has done this type of analysis before and if 
> >> > so are there approaches that have worked well and approaches that 
> >> > haven't when it comes to measuring the "quality" of each of the 
> >> > recommendation algorithms.
> >> > Regards
> >
                                          

Reply via email to