Yes, I understand that reindexing is neccesary , however for some reason I was not able to invoke the js script from the updateprocessor, so I ended up using Java only solution at index time.
Thanks. On Thu, Dec 11, 2014 at 7:18 AM, Ahmet Arslan <iori...@yahoo.com.invalid> wrote: > > Hi, > > No special steps to be taken for cloud setup. Please note that for both > solutions, re-index is mandatory. > > Ahmet > > > > On Thursday, December 11, 2014 12:15 PM, S.L <simpleliving...@gmail.com> > wrote: > Ahmet, > > Thank you , as the configurations in SolrCloud are uploaded to zookeeper , > are there any special steps that need to be taken to make this work in > SolrCloud ? > > > On Wed, Dec 10, 2014 at 4:32 AM, Ahmet Arslan <iori...@yahoo.com.invalid> > wrote: > > > > Hi, > > > > Or even better, you can use your new field for tie break purposes. Where > > scores are identical. > > e.g. sort=score desc, wordCount asc > > > > Ahmet > > > > > > On Wednesday, December 10, 2014 11:29 AM, Ahmet Arslan < > iori...@yahoo.com> > > wrote: > > Hi, > > > > You mean update processor factory? > > > > Here is augmented (wordCount field added) version of your example : > > > > doc1: > > > > phoneName:"Details about Apple iPhone 4s - 16GB - White (Verizon) > > Smartphone Factory Unlocked" > > wordCount: 11 > > > > doc2: > > > > phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White" > > wordCount: 9 > > > > > > First task is simply calculate wordCount values. You can do it in your > > indexing code, or other places. > > I quickly skimmed existing update processors but I couldn't find stock > > implementation. > > CountFieldValuesUpdateProcessorFactory fooled me, but it looks like it is > > all about multivalued fields. > > > > I guess, A simple javascript that splits on whitespace and returns the > > produced array size would do the trick : > > StatelessScriptUpdateProcessorFactory > > > > > > > > At this point you have a int field named word count. > > boost=div(1,wordCount) should work. Or you can came up with more > > sophisticated math formula. > > > > Ahmet > > > > > > On Wednesday, December 10, 2014 11:12 AM, S.L <simpleliving...@gmail.com > > > > wrote: > > Hi Ahmet, > > > > Is there already an implementation of the suggested work around ? Thanks. > > > > > > On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan <iori...@yahoo.com.invalid> > > wrote: > > > > > Hi, > > > > > > Default length norm is not best option for differentiating very short > > > documents, like product names. > > > Please see : > > > http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec > > > > > > I suggest you to create an additional integer field, that holds number > of > > > tokens. You can populate it via update processor. And then penalise > > (using > > > fuction queries) according to that field. This way you have more fine > > > grained and flexible control over it. > > > > > > Ahmet > > > > > > > > > > > > On Tuesday, December 9, 2014 12:22 PM, S.L <simpleliving...@gmail.com> > > > wrote: > > > Hi , > > > > > > Mikhail Thanks , I looked at the explain and this is what I see for the > > two > > > different documents in questions, they have identical scores even > > though > > > the document 2 has a shorter productName field, I do not see any > > lenghtNorm > > > related information in the explain. > > > > > > Also I am not exactly clear on what needs to be looked in the API ? > > > > > > *Search Query* : q=iphone+4s+16gb&qf= productName&mm=1&pf= > > > productName&ps=1&pf2= productName&pf3= > > > productName&stopwords=true&lowercaseOperators=true > > > > > > *productName Details about Apple iPhone 4s 16GB Smartphone AT&T Factory > > > Unlocked * > > > > > > > > > - *100%* 10.649221 sum of the following: > > > - *10.58%* 1.1270299 sum of the following: > > > - *2.1%* 0.22383358 productName:iphon > > > - *3.47%* 0.36922288 productName:"4 s" > > > - *5.01%* 0.53397346 productName:"16 gb" > > > - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1 > > > - *27.79%* 2.959255 sum of the following: > > > - *10.97%* 1.1680154 productName:"iphon 4 s"~1 > > > - *16.82%* 1.7912396 productName:"4 s 16 gb"~1 > > > - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1 > > > > > > > > > *productName Apple iPhone 4S 16GB for Net10, No Contract, White* > > > > > > > > > - *100%* 10.649221 sum of the following: > > > - *10.58%* 1.1270299 sum of the following: > > > - *2.1%* 0.22383358 productName:iphon > > > - *3.47%* 0.36922288 productName:"4 s" > > > - *5.01%* 0.53397346 productName:"16 gb" > > > - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1 > > > - *27.79%* 2.959255 sum of the following: > > > - *10.97%* 1.1680154 productName:"iphon 4 s"~1 > > > - *16.82%* 1.7912396 productName:"4 s 16 gb"~1 > > > - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1 > > > > > > > > > > > > > > > > > > On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev < > > > mkhlud...@griddynamics.com> wrote: > > > > > > > It's worth to look into <explain> to check particular scoring values. > > But > > > > for most suspect is the reducing precision when float norms are > stored > > in > > > > byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float) > > > > > > > > > > > > On Mon, Dec 8, 2014 at 5:49 PM, S.L <simpleliving...@gmail.com> > wrote: > > > > > > > > > I have two documents doc1 and doc2 and each one of those has a > field > > > > called > > > > > phoneName. > > > > > > > > > > doc1:phoneName:"Details about Apple iPhone 4s - 16GB - White > > (Verizon) > > > > > Smartphone Factory Unlocked" > > > > > > > > > > doc2:phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White" > > > > > > > > > > Here if I search for > > > > > > > > > > > > > > > > > > > > q=iphone+4s+16gb&qf=phoneName&mm=1&pf=phoneName&ps=1&pf2=phoneName&pf3=phoneName&stopwords=true&lowercaseOperators=true > > > > > > > > > > Doc1 and Doc2 both have the same identical score , but since the > > field > > > > > phoneName in the doc2 has shorter length I would expect it to have > a > > > > higher > > > > > score , but both have an identical score of 9.961212. > > > > > > > > > > The phoneName filed is defined as follows.As we can see no where > am I > > > > > specifying omitNorms=True, still the behavior seems to be that the > > > length > > > > > norm is not functioning at all. Can some one let me know whats the > > > issue > > > > > here ? > > > > > > > > > > <field name="phoneName" type="text_en_splitting" > > indexed="true" > > > > > stored="true" required="true" /> > > > > > <fieldType name="text_en_splitting" class="solr.TextField" > > > > > positionIncrementGap="100" > > > autoGeneratePhraseQueries="true"> > > > > > <analyzer type="index"> > > > > > <tokenizer class="solr.WhitespaceTokenizerFactory" > /> > > > > > <!-- in this example, we will only use synonyms at > > > query > > > > > time <filter > > > > > class="solr.SynonymFilterFactory" > > > > > synonyms="index_synonyms.txt" ignoreCase="true" > > > > > expand="false"/> --> > > > > > <!-- Case insensitive stop word removal. add > > > > > enablePositionIncrements=true > > > > > in both the index and query analyzers to leave > a > > > > 'gap' > > > > > for more accurate > > > > > phrase queries. --> > > > > > <filter class="solr.StopFilterFactory" > > > ignoreCase="true" > > > > > words="lang/stopwords_en.txt" > > > > > enablePositionIncrements="true" /> > > > > > <filter class="solr.WordDelimiterFilterFactory" > > > > > generateWordParts="1" generateNumberParts="1" > > > > > catenateWords="1" > > > > > catenateNumbers="1" catenateAll="0" > > > > > splitOnCaseChange="1" /> > > > > > <filter class="solr.LowerCaseFilterFactory" /> > > > > > <filter class="solr.KeywordMarkerFilterFactory" > > > > > protected="protwords.txt" /> > > > > > <filter class="solr.PorterStemFilterFactory" /> > > > > > </analyzer> > > > > > <analyzer type="query"> > > > > > <tokenizer class="solr.WhitespaceTokenizerFactory" > /> > > > > > <filter class="solr.SynonymFilterFactory" > > > > > synonyms="synonyms.txt" > > > > > ignoreCase="true" expand="true" /> > > > > > <filter class="solr.StopFilterFactory" > > > ignoreCase="true" > > > > > words="lang/stopwords_en.txt" > > > > > enablePositionIncrements="true" /> > > > > > <filter class="solr.WordDelimiterFilterFactory" > > > > > generateWordParts="1" generateNumberParts="1" > > > > > catenateWords="0" > > > > > catenateNumbers="0" catenateAll="0" > > > > > splitOnCaseChange="1" /> > > > > > <filter class="solr.LowerCaseFilterFactory" /> > > > > > <filter class="solr.KeywordMarkerFilterFactory" > > > > > protected="protwords.txt" /> > > > > > <filter class="solr.PorterStemFilterFactory" /> > > > > > </analyzer> > > > > > </fieldType> > > > > > > > > > > > > > > > > > > > > > -- > > > > Sincerely yours > > > > Mikhail Khludnev > > > > Principal Engineer, > > > > Grid Dynamics > > > > > > > > <http://www.griddynamics.com> > > > > <mkhlud...@griddynamics.com> > > > > > > > > > >