Done,

BTW I have the thing running on a demo site but am getting very poor results 
that I think are related to the Solr setup. I’d appreciate any ideas.

The sample data has 27,000 items and something like 4000 users. The preference 
data is fairly dense since the users are professional reviewers and the items 
videos.

1) The number of item-item similarities that are kept is 100. Is this a good 
starting point? Ted, do you recall how many you used before?
2) The query is a simple text query made of space delimited video id strings. 
These are the same ids as are stored in the item-item similarity docs that Solr 
indexes.

Hit thumbs up on one video you you get several recommendations. Hit thumbs up 
on several videos you get no recs. I’m either using the wrong query type or 
have it set up to be too restrictive. As I read through the docs if someone has 
a suggestion or pointer I’d appreciate it. 

BTW the same sort of thing happens with Title search. Search for “vampire 
werewolf zombie” you get no results, search for “zombie” you get several.

On Nov 6, 2013, at 2:18 PM, Sebastian Schelter <s...@apache.org> wrote:

Hi Pat,

can you create issues for 1) and 2) ? Then I will try to get this into
trunk asap.

Best,
Sebastian

On 06.11.2013 19:13, Pat Ferrel wrote:
> Trying to integrate the Solr-recoemmender with the latest Mahout snapshot. 
> The project uses a modified RecommenderJob because it needs SequenceFile 
> output and to get the location of the preparePreferenceMatrix directory. If 
> #1 and #2 are addressed I can remove the modified Mahout code from the 
> project and rely on the default implementations in Mahout 0.9. #3 is a longer 
> term issue related to the creation of a CrossRowSimilarityJob. 
> 
> I have dropped the modified code from the Solr-recommender project and have a 
> modified build of the current Mahout 0.9 snapshot. If the following changes 
> are made to Mahout I can test and release a Mahout 0.9 version of the 
> Solr-recommender.
> 
> 1. Option to change RecommenderJob output format
> 
> Can someone add an option to output a SequenceFile. I modified the code to do 
> the following, note the SequenceFileOutputFormat.class as the last parameter 
> but this should really be determined with an option I think.
> 
>      Job aggregateAndRecommend = prepareJob(
>              new Path(aggregateAndRecommendInput), outputPath, 
> SequenceFileInputFormat.class,
>              PartialMultiplyMapper.class, VarLongWritable.class, 
> PrefAndSimilarityColumnWritable.class,
>              AggregateAndRecommendReducer.class, VarLongWritable.class, 
> RecommendedItemsWritable.class,
>              SequenceFileOutputFormat.class);
> 
> 2. Visibility of preparePreferenceMatrix directory location
> 
> The Solr-recommender needs to find where the RecommenderJob is putting it’s 
> output. 
> 
> Mahout 0.8 RecommenderJob code was:
>    public static final String DEFAULT_PREPARE_DIR = "preparePreferenceMatrix”;
> 
> Mahout 0.9 RecommenderJob code just puts “preparePreferenceMatrix” inline in 
> the code:
>    Path prepPath = getTempPath("preparePreferenceMatrix");
> 
> This change to Mahout 0.9 works:
>    public static final String DEFAULT_PREPARE_DIR = "preparePreferenceMatrix”;
> and
>    Path prepPath = getTempPath(DEFAULT_PREPARE_DIR);
> 
> You could also make this a getter method on the RecommenderJob Class instead 
> of using a public constant.
> 
> 3. Downsampling
> 
> The downsampling for maximum prefs per user has been moved from 
> PreparePreferenceMatrixJob to RowSimilarityJob. The XRecommenderJob uses 
> matrix math instead of RSJ so it will no longer support downsampling until 
> there is a hypothetical CrossRowSimilairtyJob with downsampling in it.
> 
> 


Reply via email to