Pat, Perhaps I am missing something here, but why not use a String field if you do not need any of the analysis? Seems like from your previous email "The query is a simple text query made of space delimited video id strings" - - that you basically have a keyword style query which would seem to fit better with a String field and not a neutered Text field.
Thanks, Andrew On 11/7/13 10:44 AM, "Pat Ferrel" <pat.fer...@gmail.com> wrote: >One difference is that a ³text² field has analyzers like Porter stemming >applied. I had to take these out of the schema.xml. I think TFIDF is also >applied to the tems in ³text² but may not be to MV fields. I think TFIDF >is good in the application. The idea is that if everyone likes a movie, >it isn¹t much of a differentiator. Also changing to MV fields is simply >applying a different type to the field in the schema I think, so trivial >to try out. > >At this point the only test is an eyeball test so measuring differences >is problematic. If anyone has intuition fire away. > >On Nov 7, 2013, at 9:23 AM, Dominik Hübner <cont...@dhuebner.com> wrote: > >Does anyone know what the difference is between keeping the ids in a >space delimited string and indexing a multivalued field of ids? I >recently tried the latter since ... it felt right, however I am not sure >which of both has which advantages. > >On 07 Nov 2013, at 18:18, Pat Ferrel <pat.fer...@gmail.com> wrote: > >> I have dismax (no edismax) but am not using it yet, using the default >>query, which does use ŒAND¹. I had much the same though as I slept on >>it. Changing to OR is now working much much better. So obvious it almost >>bit me, not good in this case... >> >> With only a trivially small amount of testing I¹d say we have a new >>recommender on the block. >> >> If anyone would like to help eyeball test the thing let me know >>off-list. There are a few instructions I¹ll need to give. And it can¹t >>handle much load right now due to intentional design limits. >> >> >> On Nov 7, 2013, at 6:11 AM, Dyer, James <james.d...@ingramcontent.com> >>wrote: >> >> Pat, >> >> Can you give us the query it generates when you enter "vampire werewolf >>zombie", q/qt/defType ? >> >> My guess is you're using the default query parser with "q.op=AND" , or, >>you're using dismax/edismax with a high "mm" (min-must-match) value. >> >> James Dyer >> Ingram Content Group >> (615) 213-4311 >> >> >> -----Original Message----- >> From: Pat Ferrel [mailto:pat.fer...@gmail.com] >> Sent: Wednesday, November 06, 2013 5:53 PM >> To: s...@apache.org Schelter; user@mahout.apache.org >> Subject: Re: Solr-recommender for Mahout 0.9 >> >> Done, >> >> BTW I have the thing running on a demo site but am getting very poor >>results that I think are related to the Solr setup. I'd appreciate any >>ideas. >> >> The sample data has 27,000 items and something like 4000 users. The >>preference data is fairly dense since the users are professional >>reviewers and the items videos. >> >> 1) The number of item-item similarities that are kept is 100. Is this a >>good starting point? Ted, do you recall how many you used before? >> 2) The query is a simple text query made of space delimited video id >>strings. These are the same ids as are stored in the item-item >>similarity docs that Solr indexes. >> >> Hit thumbs up on one video you you get several recommendations. Hit >>thumbs up on several videos you get no recs. I'm either using the wrong >>query type or have it set up to be too restrictive. As I read through >>the docs if someone has a suggestion or pointer I'd appreciate it. >> >> BTW the same sort of thing happens with Title search. Search for >>"vampire werewolf zombie" you get no results, search for "zombie" you >>get several. >> >> On Nov 6, 2013, at 2:18 PM, Sebastian Schelter <s...@apache.org> wrote: >> >> Hi Pat, >> >> can you create issues for 1) and 2) ? Then I will try to get this into >> trunk asap. >> >> Best, >> Sebastian >> >> On 06.11.2013 19:13, Pat Ferrel wrote: >>> Trying to integrate the Solr-recoemmender with the latest Mahout >>>snapshot. The project uses a modified RecommenderJob because it needs >>>SequenceFile output and to get the location of the >>>preparePreferenceMatrix directory. If #1 and #2 are addressed I can >>>remove the modified Mahout code from the project and rely on the >>>default implementations in Mahout 0.9. #3 is a longer term issue >>>related to the creation of a CrossRowSimilarityJob. >>> >>> I have dropped the modified code from the Solr-recommender project and >>>have a modified build of the current Mahout 0.9 snapshot. If the >>>following changes are made to Mahout I can test and release a Mahout >>>0.9 version of the Solr-recommender. >>> >>> 1. Option to change RecommenderJob output format >>> >>> Can someone add an option to output a SequenceFile. I modified the >>>code to do the following, note the SequenceFileOutputFormat.class as >>>the last parameter but this should really be determined with an option >>>I think. >>> >>> Job aggregateAndRecommend = prepareJob( >>> new Path(aggregateAndRecommendInput), outputPath, >>>SequenceFileInputFormat.class, >>> PartialMultiplyMapper.class, VarLongWritable.class, >>>PrefAndSimilarityColumnWritable.class, >>> AggregateAndRecommendReducer.class, VarLongWritable.class, >>>RecommendedItemsWritable.class, >>> SequenceFileOutputFormat.class); >>> >>> 2. Visibility of preparePreferenceMatrix directory location >>> >>> The Solr-recommender needs to find where the RecommenderJob is putting >>>it's output. >>> >>> Mahout 0.8 RecommenderJob code was: >>> public static final String DEFAULT_PREPARE_DIR = >>>"preparePreferenceMatrix"; >>> >>> Mahout 0.9 RecommenderJob code just puts "preparePreferenceMatrix" >>>inline in the code: >>> Path prepPath = getTempPath("preparePreferenceMatrix"); >>> >>> This change to Mahout 0.9 works: >>> public static final String DEFAULT_PREPARE_DIR = >>>"preparePreferenceMatrix"; >>> and >>> Path prepPath = getTempPath(DEFAULT_PREPARE_DIR); >>> >>> You could also make this a getter method on the RecommenderJob Class >>>instead of using a public constant. >>> >>> 3. Downsampling >>> >>> The downsampling for maximum prefs per user has been moved from >>>PreparePreferenceMatrixJob to RowSimilarityJob. The XRecommenderJob >>>uses matrix math instead of RSJ so it will no longer support >>>downsampling until there is a hypothetical CrossRowSimilairtyJob with >>>downsampling in it. >>> >>> >> >> >> >> >> > >