The multivalued field will obey the "positionIncrementGap" value you specify 
(default=100).  So for querying purposes, those id's will be 100 (or whatever 
you specified) positions apart.  So a phrase search for adjacent ids would not 
match, unless you set the slop for >= positionIncrementGap.  Other than this, 
both scenarios index the same.

For stored fields, solr returns an array of values for multivalued fields, 
which is convienent when writing a UI.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Dominik Hübner [mailto:cont...@dhuebner.com] 
Sent: Thursday, November 07, 2013 11:23 AM
To: user@mahout.apache.org
Subject: Re: Solr-recommender for Mahout 0.9

Does anyone know what the difference is between keeping the ids in a space 
delimited string and indexing a multivalued field of ids? I recently tried the 
latter since ... it felt right, however I am not sure which of both has which 
advantages.

On 07 Nov 2013, at 18:18, Pat Ferrel <pat.fer...@gmail.com> wrote:

> I have dismax (no edismax) but am not using it yet, using the default query, 
> which does use 'AND'. I had much the same though as I slept on it. Changing 
> to OR is now working much much better. So obvious it almost bit me, not good 
> in this case...
> 
> With only a trivially small amount of testing I'd say we have a new 
> recommender on the block.
> 
> If anyone would like to help eyeball test the thing let me know off-list. 
> There are a few instructions I'll need to give. And it can't handle much load 
> right now due to intentional design limits.
> 
> 
> On Nov 7, 2013, at 6:11 AM, Dyer, James <james.d...@ingramcontent.com> wrote:
> 
> Pat,
> 
> Can you give us the query it generates when you enter "vampire werewolf 
> zombie", q/qt/defType ?
> 
> My guess is you're using the default query parser with "q.op=AND" , or, 
> you're using dismax/edismax with a high "mm" (min-must-match) value.
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: Pat Ferrel [mailto:pat.fer...@gmail.com] 
> Sent: Wednesday, November 06, 2013 5:53 PM
> To: s...@apache.org Schelter; user@mahout.apache.org
> Subject: Re: Solr-recommender for Mahout 0.9
> 
> Done,
> 
> BTW I have the thing running on a demo site but am getting very poor results 
> that I think are related to the Solr setup. I'd appreciate any ideas.
> 
> The sample data has 27,000 items and something like 4000 users. The 
> preference data is fairly dense since the users are professional reviewers 
> and the items videos.
> 
> 1) The number of item-item similarities that are kept is 100. Is this a good 
> starting point? Ted, do you recall how many you used before?
> 2) The query is a simple text query made of space delimited video id strings. 
> These are the same ids as are stored in the item-item similarity docs that 
> Solr indexes.
> 
> Hit thumbs up on one video you you get several recommendations. Hit thumbs up 
> on several videos you get no recs. I'm either using the wrong query type or 
> have it set up to be too restrictive. As I read through the docs if someone 
> has a suggestion or pointer I'd appreciate it. 
> 
> BTW the same sort of thing happens with Title search. Search for "vampire 
> werewolf zombie" you get no results, search for "zombie" you get several.
> 
> On Nov 6, 2013, at 2:18 PM, Sebastian Schelter <s...@apache.org> wrote:
> 
> Hi Pat,
> 
> can you create issues for 1) and 2) ? Then I will try to get this into
> trunk asap.
> 
> Best,
> Sebastian
> 
> On 06.11.2013 19:13, Pat Ferrel wrote:
>> Trying to integrate the Solr-recoemmender with the latest Mahout snapshot. 
>> The project uses a modified RecommenderJob because it needs SequenceFile 
>> output and to get the location of the preparePreferenceMatrix directory. If 
>> #1 and #2 are addressed I can remove the modified Mahout code from the 
>> project and rely on the default implementations in Mahout 0.9. #3 is a 
>> longer term issue related to the creation of a CrossRowSimilarityJob. 
>> 
>> I have dropped the modified code from the Solr-recommender project and have 
>> a modified build of the current Mahout 0.9 snapshot. If the following 
>> changes are made to Mahout I can test and release a Mahout 0.9 version of 
>> the Solr-recommender.
>> 
>> 1. Option to change RecommenderJob output format
>> 
>> Can someone add an option to output a SequenceFile. I modified the code to 
>> do the following, note the SequenceFileOutputFormat.class as the last 
>> parameter but this should really be determined with an option I think.
>> 
>>    Job aggregateAndRecommend = prepareJob(
>>            new Path(aggregateAndRecommendInput), outputPath, 
>> SequenceFileInputFormat.class,
>>            PartialMultiplyMapper.class, VarLongWritable.class, 
>> PrefAndSimilarityColumnWritable.class,
>>            AggregateAndRecommendReducer.class, VarLongWritable.class, 
>> RecommendedItemsWritable.class,
>>            SequenceFileOutputFormat.class);
>> 
>> 2. Visibility of preparePreferenceMatrix directory location
>> 
>> The Solr-recommender needs to find where the RecommenderJob is putting it's 
>> output. 
>> 
>> Mahout 0.8 RecommenderJob code was:
>>  public static final String DEFAULT_PREPARE_DIR = "preparePreferenceMatrix";
>> 
>> Mahout 0.9 RecommenderJob code just puts "preparePreferenceMatrix" inline in 
>> the code:
>>  Path prepPath = getTempPath("preparePreferenceMatrix");
>> 
>> This change to Mahout 0.9 works:
>>  public static final String DEFAULT_PREPARE_DIR = "preparePreferenceMatrix";
>> and
>>  Path prepPath = getTempPath(DEFAULT_PREPARE_DIR);
>> 
>> You could also make this a getter method on the RecommenderJob Class instead 
>> of using a public constant.
>> 
>> 3. Downsampling
>> 
>> The downsampling for maximum prefs per user has been moved from 
>> PreparePreferenceMatrixJob to RowSimilarityJob. The XRecommenderJob uses 
>> matrix math instead of RSJ so it will no longer support downsampling until 
>> there is a hypothetical CrossRowSimilairtyJob with downsampling in it.
>> 
>> 
> 
> 
> 
> 
> 



Reply via email to