OK - so I have my SOLR instance running on AWS. Any suggestions on how to safely share the link? Right now, the whole SOLR instance is totally open.
Gagandeep singh <gagan.g...@gmail.com> wrote: >say &debugQuery=true&mlt=true and see the scores for the MLT query, not a >sample query. You can use Amazon ec2 to bring up your solr, you should be >able to get a micro instance for free trial. > > >On Mon, Apr 1, 2013 at 5:10 AM, dc tech <dctech1...@gmail.com> wrote: > >> I did try the raw query against the *simi* field and those seem to return >> results in the order expected. >> For instance, Acura MDX has ( large, SUV, 4WD Luxury) in the simi field. >> Running a query with those words against the simi field returns the >> expected models (X5, Audi Q5, etc) and then the subsequent documents have >> decreasing relevance. So the basic query mechanism seems to be fine. >> >> The issue just seems to be with MoreLikeThis component and handler. >> I can post the index on a public SOLR instance - any suggestions? (or for >> hosting) >> >> >> On Sun, Mar 31, 2013 at 1:54 PM, Gagandeep singh <gagan.g...@gmail.com >> >wrote: >> >> > If you can bring up your solr setup on a public machine then im sure a >> lot >> > of debugging can be done. Without that, i think what you should look at >> is >> > the tf-idf scores of the terms like "camry" etc. Usually idf is the >> > deciding factor into which results show at the top (tf should be 1 for >> your >> > data). >> > Enable &debugQuery=true and look at explain section to see show score is >> > getting calculated. >> > >> > You should try giving different boosts to class, type, drive, size to >> > control the results. >> > >> > >> > On Sun, Mar 31, 2013 at 8:52 PM, dc tech <dctech1...@gmail.com> wrote: >> > >> >> I am running some experiments on more like this and the results seem >> >> rather odd - I am doing something wrong but just cannot figure out what. >> >> Basically, the similarity results are decent - but not great. >> >> >> >> *Issue 1 = Quality* >> >> Toyota Camry : finds Altima (good) but then next one is Camry Hybrid >> >> whereas it should have found Accord. >> >> I have normalized the data into a simi field which has only the >> >> attributes that I care about. >> >> Without the simi field, I could not get mlt.qf boosts to work well >> enough >> >> to return results >> >> >> >> *Issue 2* >> >> Some fields do not work at all. For instance, text+simi (in mlt.fl) >> works >> >> whereas just simi does not. >> >> So some weirdness that am just not understanding. >> >> >> >> Would be grateful for your guidance ! >> >> >> >> >> >> Here is the setup: >> >> *1. SOLR Version* >> >> solr-spec 4.2.0.2013.03.06.22.32.13 >> >> solr-impl 4.2.0 1453694 rmuir - 2013-03-06 22:32:13 >> >> lucene-spec 4.2.0 >> >> lucene-impl 4.2.0 1453694 - rmuir - 2013-03-06 22:25:29 >> >> >> >> *2. Machine Information* >> >> Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23 >> >> 19.0-b09) >> >> Windows 7 Home 64 Bit with 4 GB RAM >> >> >> >> *3. Sample Data * >> >> I created this 'dummy' data of cars - the idea being that these would >> be >> >> sufficient and simple to generate similarity and understand how it would >> >> work. >> >> There are 181 rows in the data set (I have attached it for reference in >> >> CSV format) >> >> >> >> [image: Inline image 1] >> >> >> >> *4. SCHEMA* >> >> *Field Definitions* >> >> <field name="id" type="string" indexed="true" stored="true" >> >> termVectors="true" multiValued="false"/> >> >> <field name="make" type="string" indexed="true" stored="true" >> >> termVectors="true" multiValued="false"/> >> >> <field name="model" type="string" indexed="true" stored="true" >> >> termVectors="true" multiValued="false"/> >> >> <field name="class" type="string" indexed="true" stored="true" >> >> termVectors="true" multiValued="false"/> >> >> <field name="type" type="string" indexed="true" stored="true" >> >> termVectors="true" multiValued="false"/> >> >> <field name="drive" type="string" indexed="true" stored="true" >> >> termVectors="true" multiValued="false"/> >> >> <field name="comment" type="text_general" indexed="true" >> stored="true" >> >> termVectors="true" multiValued="true"/> >> >> <field name="size" type="string" indexed="true" stored="true" >> >> termVectors="true" multiValued="false"/> >> >> * >> >> * >> >> *Copy Fields* >> >> <copyField source="make" dest="make_en" /> <!-- Search --> >> >> <copyField source="model" dest="model_en" /> <!-- Search --> >> >> <copyField source="class" dest="class_en" /> <!-- Search --> >> >> <copyField source="type" dest="type_en" /> <!-- Search --> >> >> <copyField source="drive" dest="drive_en" /> <!-- Search --> >> >> <copyField source="comment" dest="comment_en" /> <!-- Search >> --> >> >> <copyField source="size" dest="size_en" /> <!-- Search --> >> >> <copyField source="id" dest="text" /> <!-- Glob --> >> >> <copyField source="make" dest="text" /> <!-- Glob --> >> >> <copyField source="model" dest="text" /> <!-- Glob --> >> >> <copyField source="class" dest="text" /> <!-- Glob --> >> >> <copyField source="type" dest="text" /> <!-- Glob --> >> >> <copyField source="drive" dest="text" /> <!-- Glob --> >> >> <copyField source="comment" dest="text" /> <!-- Glob --> >> >> <copyField source="size" dest="text" /> <!-- Glob --> >> >> <copyField source="size" dest="text" /> <!-- Glob --> >> >> *<copyField source="class" dest="simi_en" /> <!-- similarity >> >> -->* >> >> *<copyField source="type" dest="simi_en" /> <!-- similarity >> --> >> >> * >> >> *<copyField source="drive" dest="simi_en" /> <!-- similarity >> >> -->* >> >> *<copyField source="size" dest="simi_en" /> <!-- similarity >> --> >> >> * >> >> >> >> Note that the "simi" field ends up with values like make, class, size >> >> and drive: >> >> - Luxury SUV 4WD Large >> >> - Standard Sedan Front Familt >> >> >> >> >> >> *5. MLT Setup* >> >> a. mlt.FL = *text* QF=*text* Works but results are obviously not good >> >> (make is not a good similarity indicator) >> >> >> >> >> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=text&mlt.qf=text >> >> >> >> b. mlt.FL = *simi* QF=*simi* Does not work at all (0 results) >> >> >> >> >> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=simi&mlt.qf=simi >> >> >> >> c. mlt.FL = *simi,text * QF=*simi^10 text^.1* Works with decent >> >> results in most cases >> >> >> >> >> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=simi,text&mlt.qf=simi >> >> ^10%20text^.01 >> >> Works for getting similarity for Acura MDX (Luxury SUV 4WD Large) >> >> But for Toyota Camry - it finds hybrid family cars (Prius) ahead of >> Honda. >> >> >> >> >> >> * >> >> * >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >>