The multivalued field will obey the "positionIncrementGap" value you specify (default=100). So for querying purposes, those id's will be 100 (or whatever you specified) positions apart. So a phrase search for adjacent ids would not match, unless you set the slop for >= positionIncrementGap. Other than this, both scenarios index the same.
For stored fields, solr returns an array of values for multivalued fields, which is convienent when writing a UI. James Dyer Ingram Content Group (615) 213-4311 -----Original Message----- From: Dominik Hübner [mailto:cont...@dhuebner.com] Sent: Thursday, November 07, 2013 11:23 AM To: user@mahout.apache.org Subject: Re: Solr-recommender for Mahout 0.9 Does anyone know what the difference is between keeping the ids in a space delimited string and indexing a multivalued field of ids? I recently tried the latter since ... it felt right, however I am not sure which of both has which advantages. On 07 Nov 2013, at 18:18, Pat Ferrel <pat.fer...@gmail.com> wrote: > I have dismax (no edismax) but am not using it yet, using the default query, > which does use 'AND'. I had much the same though as I slept on it. Changing > to OR is now working much much better. So obvious it almost bit me, not good > in this case... > > With only a trivially small amount of testing I'd say we have a new > recommender on the block. > > If anyone would like to help eyeball test the thing let me know off-list. > There are a few instructions I'll need to give. And it can't handle much load > right now due to intentional design limits. > > > On Nov 7, 2013, at 6:11 AM, Dyer, James <james.d...@ingramcontent.com> wrote: > > Pat, > > Can you give us the query it generates when you enter "vampire werewolf > zombie", q/qt/defType ? > > My guess is you're using the default query parser with "q.op=AND" , or, > you're using dismax/edismax with a high "mm" (min-must-match) value. > > James Dyer > Ingram Content Group > (615) 213-4311 > > > -----Original Message----- > From: Pat Ferrel [mailto:pat.fer...@gmail.com] > Sent: Wednesday, November 06, 2013 5:53 PM > To: s...@apache.org Schelter; user@mahout.apache.org > Subject: Re: Solr-recommender for Mahout 0.9 > > Done, > > BTW I have the thing running on a demo site but am getting very poor results > that I think are related to the Solr setup. I'd appreciate any ideas. > > The sample data has 27,000 items and something like 4000 users. The > preference data is fairly dense since the users are professional reviewers > and the items videos. > > 1) The number of item-item similarities that are kept is 100. Is this a good > starting point? Ted, do you recall how many you used before? > 2) The query is a simple text query made of space delimited video id strings. > These are the same ids as are stored in the item-item similarity docs that > Solr indexes. > > Hit thumbs up on one video you you get several recommendations. Hit thumbs up > on several videos you get no recs. I'm either using the wrong query type or > have it set up to be too restrictive. As I read through the docs if someone > has a suggestion or pointer I'd appreciate it. > > BTW the same sort of thing happens with Title search. Search for "vampire > werewolf zombie" you get no results, search for "zombie" you get several. > > On Nov 6, 2013, at 2:18 PM, Sebastian Schelter <s...@apache.org> wrote: > > Hi Pat, > > can you create issues for 1) and 2) ? Then I will try to get this into > trunk asap. > > Best, > Sebastian > > On 06.11.2013 19:13, Pat Ferrel wrote: >> Trying to integrate the Solr-recoemmender with the latest Mahout snapshot. >> The project uses a modified RecommenderJob because it needs SequenceFile >> output and to get the location of the preparePreferenceMatrix directory. If >> #1 and #2 are addressed I can remove the modified Mahout code from the >> project and rely on the default implementations in Mahout 0.9. #3 is a >> longer term issue related to the creation of a CrossRowSimilarityJob. >> >> I have dropped the modified code from the Solr-recommender project and have >> a modified build of the current Mahout 0.9 snapshot. If the following >> changes are made to Mahout I can test and release a Mahout 0.9 version of >> the Solr-recommender. >> >> 1. Option to change RecommenderJob output format >> >> Can someone add an option to output a SequenceFile. I modified the code to >> do the following, note the SequenceFileOutputFormat.class as the last >> parameter but this should really be determined with an option I think. >> >> Job aggregateAndRecommend = prepareJob( >> new Path(aggregateAndRecommendInput), outputPath, >> SequenceFileInputFormat.class, >> PartialMultiplyMapper.class, VarLongWritable.class, >> PrefAndSimilarityColumnWritable.class, >> AggregateAndRecommendReducer.class, VarLongWritable.class, >> RecommendedItemsWritable.class, >> SequenceFileOutputFormat.class); >> >> 2. Visibility of preparePreferenceMatrix directory location >> >> The Solr-recommender needs to find where the RecommenderJob is putting it's >> output. >> >> Mahout 0.8 RecommenderJob code was: >> public static final String DEFAULT_PREPARE_DIR = "preparePreferenceMatrix"; >> >> Mahout 0.9 RecommenderJob code just puts "preparePreferenceMatrix" inline in >> the code: >> Path prepPath = getTempPath("preparePreferenceMatrix"); >> >> This change to Mahout 0.9 works: >> public static final String DEFAULT_PREPARE_DIR = "preparePreferenceMatrix"; >> and >> Path prepPath = getTempPath(DEFAULT_PREPARE_DIR); >> >> You could also make this a getter method on the RecommenderJob Class instead >> of using a public constant. >> >> 3. Downsampling >> >> The downsampling for maximum prefs per user has been moved from >> PreparePreferenceMatrixJob to RowSimilarityJob. The XRecommenderJob uses >> matrix math instead of RSJ so it will no longer support downsampling until >> there is a hypothetical CrossRowSimilairtyJob with downsampling in it. >> >> > > > > >