Note sure if this is what you are looking for. I assume you are talking about 
Ted’s paper describing a Solr based recommender pipeline?

Much of the paper was implemented in the solr-recommender referenced below, 
which has a fairly flexible parallel version of a logfile reader that uses 
Cascading for mapreduce. It picks out columns in delimited text files. You can 
choose a constant string for your action id, like “purchase” or “thumbs-up”. 
Then specify the field index for user, item, and action. It assumes strings for 
all these inputs and creates string-id->Mahout-Integer-id->string-id 
bidriectional hashmaps as dictionary and reverse dictionary. Everything is 
scalable except the BiHashmaps, which are in-memory. They aren’t usually too 
big for that. There is also a pattern for the input log file names and they are 
searched for recursively from some root directory. 

Caveat emptor: not all the options are implemented or tested. One person has 
already implemented a scaffolded option and their pull request was merged so 
feel free to contribute.

It is an example of how to digest logfiles, build Mahout data, and run the 
recommender. It creates Solr indexing data too but the output of the 
recommender is up to you to implement. It is a Solr query or a lookup in the 
Mahout recommender DRM output.

https://github.com/pferrel/solr-recommender


On Feb 14, 2014, at 12:39 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

Yes!

But it is very hard to find the time.



On Fri, Feb 14, 2014 at 11:51 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> I'd like to see cross-recommendations added too.
> 
> But I also want some automation of the steps required to build a simple
> recommender like the solr/mahout example Ted and Ellen have in their
> pamphlet.
> 
> Lowering the barrier to entry by providing a sample pipeline would help a
> lot of folks get started and hopefully would keep them interested.  Perhaps
> in examples/bin?
> 
> 
> On Fri, Feb 14, 2014 at 10:56 AM, Pat Ferrel <p...@occamsmachete.com>
> wrote:
> 
>> There's been work done on the cross-recommender. There is a Mahout-style
>> XRecommenderJob that has two preference models for two actions or
>> preference types. It uses matrix multiply to get a cooccurrence type
>> similarity matrix. If we had a cross-row-similarity-job, it could pretty
>> easily be integrated and I'd volunteer to integrate it. The XRSJ is
>> probably beyond me right now so if we can scare up someone to do that
> we'd
>> be a long way down the road.
>> 
>> I'll put a feature request into Jira and take this to the dev list
>> 
>> BTW this is already integrated with the solr-recommender.
>> 
>> On Feb 8, 2014, at 7:19 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>> 
>> I have different opinions about each piece.
>> 
>> I think that cross recommendation is as core as RowSimilarityJob and
> should
>> be a parallel implementation or integrated.  Parallel is probably easier.
>> It is even plausible to have a version of RowSimilarityJob that doesn't
>> support all the different distance measures but does support multiple
> cross
>> and direct processing using LLR or related cooccurrence based measures.
> It
>> would be very cool if a single pass over the data could do many kinds of
> co
>> or cross occurrence operations.
>> 
>> For dithering, it really is post processing.  That said, it is also the
>> single largest improvement that anybody typically gets when testing
>> different options so it is a bit goofy to not have good support for some
>> kinds of dithering.
>> 
>> For Thompson sampled recommenders, I am not sure where to start hacking
> on
>> our current code.
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sat, Feb 8, 2014 at 4:53 PM, Pat Ferrel <p...@occamsmachete.com>
> wrote:
>> 
>>> That was by no means to criticize effort level, which has been
> impressive
>>> especially during the release.
>>> 
>>> It was more a question about the best place to add these things and
>>> whether they are important. Whether people see these things as custom
>> post
>>> processing or core.
>>> 
>>> On Feb 8, 2014, at 12:13 PM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
>>> 
>>> ...
>>> 
>>> The reason that we aren't adding this like cross-rec and other things
> is
>>> that "we" have full-time jobs, mostly.  Suneel is full-time on Mahout,
>> but
>>> the rest are not.  You seem more active than most.
>>> 
>>> 
>>> 
>> 
>> 
> 

Reply via email to