Isn't this an AWS security groups question? You should probably post this question on the AWS forums, but for the moment, here's the basic reading material - go set up your EC2 security groups and lock down your systems.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html If you just want to password protect Solr here are the instructions: http://wiki.apache.org/solr/SolrSecurity But I most certainly would not leave it open to the world even with a password (note that the basic password authentication sends passwords in clear text if you're not using HTTPS, best lock the thing down behind a firewall). Dave -----Original Message----- From: DC tech [mailto:dctech1...@gmail.com] Sent: Tuesday, April 02, 2013 1:02 PM To: solr-user@lucene.apache.org Subject: Re: MoreLikeThis - Odd results - what am I doing wrong? OK - so I have my SOLR instance running on AWS. Any suggestions on how to safely share the link? Right now, the whole SOLR instance is totally open. Gagandeep singh <gagan.g...@gmail.com> wrote: >say &debugQuery=true&mlt=true and see the scores for the MLT query, not >a sample query. You can use Amazon ec2 to bring up your solr, you >should be able to get a micro instance for free trial. > > >On Mon, Apr 1, 2013 at 5:10 AM, dc tech <dctech1...@gmail.com> wrote: > >> I did try the raw query against the *simi* field and those seem to >> return results in the order expected. >> For instance, Acura MDX has ( large, SUV, 4WD Luxury) in the simi field. >> Running a query with those words against the simi field returns the >> expected models (X5, Audi Q5, etc) and then the subsequent documents >> have decreasing relevance. So the basic query mechanism seems to be fine. >> >> The issue just seems to be with MoreLikeThis component and handler. >> I can post the index on a public SOLR instance - any suggestions? (or >> for >> hosting) >> >> >> On Sun, Mar 31, 2013 at 1:54 PM, Gagandeep singh >> <gagan.g...@gmail.com >> >wrote: >> >> > If you can bring up your solr setup on a public machine then im >> > sure a >> lot >> > of debugging can be done. Without that, i think what you should >> > look at >> is >> > the tf-idf scores of the terms like "camry" etc. Usually idf is the >> > deciding factor into which results show at the top (tf should be 1 >> > for >> your >> > data). >> > Enable &debugQuery=true and look at explain section to see show >> > score is getting calculated. >> > >> > You should try giving different boosts to class, type, drive, size >> > to control the results. >> > >> > >> > On Sun, Mar 31, 2013 at 8:52 PM, dc tech <dctech1...@gmail.com> wrote: >> > >> >> I am running some experiments on more like this and the results >> >> seem rather odd - I am doing something wrong but just cannot figure out >> >> what. >> >> Basically, the similarity results are decent - but not great. >> >> >> >> *Issue 1 = Quality* >> >> Toyota Camry : finds Altima (good) but then next one is Camry >> >> Hybrid whereas it should have found Accord. >> >> I have normalized the data into a simi field which has only the >> >> attributes that I care about. >> >> Without the simi field, I could not get mlt.qf boosts to work well >> enough >> >> to return results >> >> >> >> *Issue 2* >> >> Some fields do not work at all. For instance, text+simi (in >> >> mlt.fl) >> works >> >> whereas just simi does not. >> >> So some weirdness that am just not understanding. >> >> >> >> Would be grateful for your guidance ! >> >> >> >> >> >> Here is the setup: >> >> *1. SOLR Version* >> >> solr-spec 4.2.0.2013.03.06.22.32.13 >> >> solr-impl 4.2.0 1453694 rmuir - 2013-03-06 22:32:13 >> >> lucene-spec 4.2.0 >> >> lucene-impl 4.2.0 1453694 - rmuir - 2013-03-06 22:25:29 >> >> >> >> *2. Machine Information* >> >> Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23 >> >> 19.0-b09) >> >> Windows 7 Home 64 Bit with 4 GB RAM >> >> >> >> *3. Sample Data * >> >> I created this 'dummy' data of cars - the idea being that these >> >> would >> be >> >> sufficient and simple to generate similarity and understand how it >> >> would work. >> >> There are 181 rows in the data set (I have attached it for >> >> reference in CSV format) >> >> >> >> [image: Inline image 1] >> >> >> >> *4. SCHEMA* >> >> *Field Definitions* >> >> <field name="id" type="string" indexed="true" stored="true" >> >> termVectors="true" multiValued="false"/> >> >> <field name="make" type="string" indexed="true" stored="true" >> >> termVectors="true" multiValued="false"/> >> >> <field name="model" type="string" indexed="true" stored="true" >> >> termVectors="true" multiValued="false"/> >> >> <field name="class" type="string" indexed="true" stored="true" >> >> termVectors="true" multiValued="false"/> >> >> <field name="type" type="string" indexed="true" stored="true" >> >> termVectors="true" multiValued="false"/> >> >> <field name="drive" type="string" indexed="true" stored="true" >> >> termVectors="true" multiValued="false"/> >> >> <field name="comment" type="text_general" indexed="true" >> stored="true" >> >> termVectors="true" multiValued="true"/> >> >> <field name="size" type="string" indexed="true" stored="true" >> >> termVectors="true" multiValued="false"/> >> >> * >> >> * >> >> *Copy Fields* >> >> <copyField source="make" dest="make_en" /> <!-- Search --> >> >> <copyField source="model" dest="model_en" /> <!-- Search --> >> >> <copyField source="class" dest="class_en" /> <!-- Search --> >> >> <copyField source="type" dest="type_en" /> <!-- Search --> >> >> <copyField source="drive" dest="drive_en" /> <!-- Search --> >> >> <copyField source="comment" dest="comment_en" /> <!-- Search >> --> >> >> <copyField source="size" dest="size_en" /> <!-- Search --> >> >> <copyField source="id" dest="text" /> <!-- Glob --> >> >> <copyField source="make" dest="text" /> <!-- Glob --> >> >> <copyField source="model" dest="text" /> <!-- Glob --> >> >> <copyField source="class" dest="text" /> <!-- Glob --> >> >> <copyField source="type" dest="text" /> <!-- Glob --> >> >> <copyField source="drive" dest="text" /> <!-- Glob --> >> >> <copyField source="comment" dest="text" /> <!-- Glob --> >> >> <copyField source="size" dest="text" /> <!-- Glob --> >> >> <copyField source="size" dest="text" /> <!-- Glob --> >> >> *<copyField source="class" dest="simi_en" /> <!-- similarity >> >> -->* >> >> *<copyField source="type" dest="simi_en" /> <!-- similarity >> --> >> >> * >> >> *<copyField source="drive" dest="simi_en" /> <!-- similarity >> >> -->* >> >> *<copyField source="size" dest="simi_en" /> <!-- similarity >> --> >> >> * >> >> >> >> Note that the "simi" field ends up with values like make, class, >> >> size and drive: >> >> - Luxury SUV 4WD Large >> >> - Standard Sedan Front Familt >> >> >> >> >> >> *5. MLT Setup* >> >> a. mlt.FL = *text* QF=*text* Works but results are obviously not >> >> good (make is not a good similarity indicator) >> >> >> >> >> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.f >> l=text&mlt.qf=text >> >> >> >> b. mlt.FL = *simi* QF=*simi* Does not work at all (0 results) >> >> >> >> >> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.f >> l=simi&mlt.qf=simi >> >> >> >> c. mlt.FL = *simi,text * QF=*simi^10 text^.1* Works with decent >> >> results in most cases >> >> >> >> >> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.f >> l=simi,text&mlt.qf=simi >> >> ^10%20text^.01 >> >> Works for getting similarity for Acura MDX (Luxury SUV 4WD Large) >> >> But for Toyota Camry - it finds hybrid family cars (Prius) ahead >> >> of >> Honda. >> >> >> >> >> >> * >> >> * >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >>