say &debugQuery=true&mlt=true and see the scores for the MLT query, not a sample query. You can use Amazon ec2 to bring up your solr, you should be able to get a micro instance for free trial.
On Mon, Apr 1, 2013 at 5:10 AM, dc tech <dctech1...@gmail.com> wrote: > I did try the raw query against the *simi* field and those seem to return > results in the order expected. > For instance, Acura MDX has ( large, SUV, 4WD Luxury) in the simi field. > Running a query with those words against the simi field returns the > expected models (X5, Audi Q5, etc) and then the subsequent documents have > decreasing relevance. So the basic query mechanism seems to be fine. > > The issue just seems to be with MoreLikeThis component and handler. > I can post the index on a public SOLR instance - any suggestions? (or for > hosting) > > > On Sun, Mar 31, 2013 at 1:54 PM, Gagandeep singh <gagan.g...@gmail.com > >wrote: > > > If you can bring up your solr setup on a public machine then im sure a > lot > > of debugging can be done. Without that, i think what you should look at > is > > the tf-idf scores of the terms like "camry" etc. Usually idf is the > > deciding factor into which results show at the top (tf should be 1 for > your > > data). > > Enable &debugQuery=true and look at explain section to see show score is > > getting calculated. > > > > You should try giving different boosts to class, type, drive, size to > > control the results. > > > > > > On Sun, Mar 31, 2013 at 8:52 PM, dc tech <dctech1...@gmail.com> wrote: > > > >> I am running some experiments on more like this and the results seem > >> rather odd - I am doing something wrong but just cannot figure out what. > >> Basically, the similarity results are decent - but not great. > >> > >> *Issue 1 = Quality* > >> Toyota Camry : finds Altima (good) but then next one is Camry Hybrid > >> whereas it should have found Accord. > >> I have normalized the data into a simi field which has only the > >> attributes that I care about. > >> Without the simi field, I could not get mlt.qf boosts to work well > enough > >> to return results > >> > >> *Issue 2* > >> Some fields do not work at all. For instance, text+simi (in mlt.fl) > works > >> whereas just simi does not. > >> So some weirdness that am just not understanding. > >> > >> Would be grateful for your guidance ! > >> > >> > >> Here is the setup: > >> *1. SOLR Version* > >> solr-spec 4.2.0.2013.03.06.22.32.13 > >> solr-impl 4.2.0 1453694 rmuir - 2013-03-06 22:32:13 > >> lucene-spec 4.2.0 > >> lucene-impl 4.2.0 1453694 - rmuir - 2013-03-06 22:25:29 > >> > >> *2. Machine Information* > >> Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23 > >> 19.0-b09) > >> Windows 7 Home 64 Bit with 4 GB RAM > >> > >> *3. Sample Data * > >> I created this 'dummy' data of cars - the idea being that these would > be > >> sufficient and simple to generate similarity and understand how it would > >> work. > >> There are 181 rows in the data set (I have attached it for reference in > >> CSV format) > >> > >> [image: Inline image 1] > >> > >> *4. SCHEMA* > >> *Field Definitions* > >> <field name="id" type="string" indexed="true" stored="true" > >> termVectors="true" multiValued="false"/> > >> <field name="make" type="string" indexed="true" stored="true" > >> termVectors="true" multiValued="false"/> > >> <field name="model" type="string" indexed="true" stored="true" > >> termVectors="true" multiValued="false"/> > >> <field name="class" type="string" indexed="true" stored="true" > >> termVectors="true" multiValued="false"/> > >> <field name="type" type="string" indexed="true" stored="true" > >> termVectors="true" multiValued="false"/> > >> <field name="drive" type="string" indexed="true" stored="true" > >> termVectors="true" multiValued="false"/> > >> <field name="comment" type="text_general" indexed="true" > stored="true" > >> termVectors="true" multiValued="true"/> > >> <field name="size" type="string" indexed="true" stored="true" > >> termVectors="true" multiValued="false"/> > >> * > >> * > >> *Copy Fields* > >> <copyField source="make" dest="make_en" /> <!-- Search --> > >> <copyField source="model" dest="model_en" /> <!-- Search --> > >> <copyField source="class" dest="class_en" /> <!-- Search --> > >> <copyField source="type" dest="type_en" /> <!-- Search --> > >> <copyField source="drive" dest="drive_en" /> <!-- Search --> > >> <copyField source="comment" dest="comment_en" /> <!-- Search > --> > >> <copyField source="size" dest="size_en" /> <!-- Search --> > >> <copyField source="id" dest="text" /> <!-- Glob --> > >> <copyField source="make" dest="text" /> <!-- Glob --> > >> <copyField source="model" dest="text" /> <!-- Glob --> > >> <copyField source="class" dest="text" /> <!-- Glob --> > >> <copyField source="type" dest="text" /> <!-- Glob --> > >> <copyField source="drive" dest="text" /> <!-- Glob --> > >> <copyField source="comment" dest="text" /> <!-- Glob --> > >> <copyField source="size" dest="text" /> <!-- Glob --> > >> <copyField source="size" dest="text" /> <!-- Glob --> > >> *<copyField source="class" dest="simi_en" /> <!-- similarity > >> -->* > >> *<copyField source="type" dest="simi_en" /> <!-- similarity > --> > >> * > >> *<copyField source="drive" dest="simi_en" /> <!-- similarity > >> -->* > >> *<copyField source="size" dest="simi_en" /> <!-- similarity > --> > >> * > >> > >> Note that the "simi" field ends up with values like make, class, size > >> and drive: > >> - Luxury SUV 4WD Large > >> - Standard Sedan Front Familt > >> > >> > >> *5. MLT Setup* > >> a. mlt.FL = *text* QF=*text* Works but results are obviously not good > >> (make is not a good similarity indicator) > >> > >> > http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=text&mlt.qf=text > >> > >> b. mlt.FL = *simi* QF=*simi* Does not work at all (0 results) > >> > >> > http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=simi&mlt.qf=simi > >> > >> c. mlt.FL = *simi,text * QF=*simi^10 text^.1* Works with decent > >> results in most cases > >> > >> > http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=simi,text&mlt.qf=simi > >> ^10%20text^.01 > >> Works for getting similarity for Acura MDX (Luxury SUV 4WD Large) > >> But for Toyota Camry - it finds hybrid family cars (Prius) ahead of > Honda. > >> > >> > >> * > >> * > >> > >> > >> > >> > >> > >> > >> > >> > > >