Re: Length norm not functioning in solr queries.

S.L Thu, 11 Dec 2014 06:34:44 -0800

Yes, I understand that reindexing is neccesary , however for some reason I
was not able to invoke the js script from the updateprocessor, so I ended
up using Java only solution at index time.


Thanks.

On Thu, Dec 11, 2014 at 7:18 AM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:
>
> Hi,
>
> No special steps to be taken for cloud setup. Please note that for both
> solutions, re-index is mandatory.
>
> Ahmet
>
>
>
> On Thursday, December 11, 2014 12:15 PM, S.L <simpleliving...@gmail.com>
> wrote:
> Ahmet,
>
> Thank you , as the configurations in SolrCloud are uploaded to zookeeper ,
> are there any special steps that need to be taken to make this work in
> SolrCloud ?
>
>
> On Wed, Dec 10, 2014 at 4:32 AM, Ahmet Arslan <iori...@yahoo.com.invalid>
> wrote:
> >
> > Hi,
> >
> > Or even better, you can use your new field for tie break purposes. Where
> > scores are identical.
> > e.g. sort=score desc, wordCount asc
> >
> > Ahmet
> >
> >
> > On Wednesday, December 10, 2014 11:29 AM, Ahmet Arslan <
> iori...@yahoo.com>
> > wrote:
> > Hi,
> >
> > You mean update processor factory?
> >
> > Here is augmented (wordCount field added) version of your example :
> >
> > doc1:
> >
> > phoneName:"Details about  Apple iPhone 4s - 16GB - White (Verizon)
> > Smartphone Factory Unlocked"
> > wordCount: 11
> >
> > doc2:
> >
> > phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"
> > wordCount: 9
> >
> >
> > First task is simply calculate wordCount values. You can do it in your
> > indexing code, or other places.
> > I quickly skimmed existing update processors but I couldn't find stock
> > implementation.
> > CountFieldValuesUpdateProcessorFactory fooled me, but it looks like it is
> > all about multivalued fields.
> >
> > I guess, A simple javascript that splits on whitespace and returns the
> > produced array size would do the trick :
> > StatelessScriptUpdateProcessorFactory
> >
> >
> >
> > At this point you have a int field named word count.
> > boost=div(1,wordCount) should work. Or you can came up with more
> > sophisticated math formula.
> >
> > Ahmet
> >
> >
> > On Wednesday, December 10, 2014 11:12 AM, S.L <simpleliving...@gmail.com
> >
> > wrote:
> > Hi Ahmet,
> >
> > Is there already an implementation of the suggested work around ? Thanks.
> >
> >
> > On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan <iori...@yahoo.com.invalid>
> > wrote:
> >
> > > Hi,
> > >
> > > Default length norm is not best option for differentiating very short
> > > documents, like product names.
> > > Please see :
> > > http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec
> > >
> > > I suggest you to create an additional integer field, that holds number
> of
> > > tokens. You can populate it via update processor. And then penalise
> > (using
> > > fuction queries) according to that field. This way you have more fine
> > > grained and flexible control over it.
> > >
> > > Ahmet
> > >
> > >
> > >
> > > On Tuesday, December 9, 2014 12:22 PM, S.L <simpleliving...@gmail.com>
> > > wrote:
> > > Hi ,
> > >
> > > Mikhail Thanks , I looked at the explain and this is what I see for the
> > two
> > > different documents in questions, they have identical scores   even
> > though
> > > the document 2 has a shorter productName field, I do not see any
> > lenghtNorm
> > > related information in the explain.
> > >
> > > Also I am not exactly clear on what needs to be looked in the API ?
> > >
> > > *Search Query* : q=iphone+4s+16gb&qf= productName&mm=1&pf=
> > > productName&ps=1&pf2= productName&pf3=
> > > productName&stopwords=true&lowercaseOperators=true
> > >
> > > *productName Details about Apple iPhone 4s 16GB Smartphone AT&T Factory
> > > Unlocked *
> > >
> > >
> > >    - *100%* 10.649221 sum of the following:
> > >       - *10.58%* 1.1270299 sum of the following:
> > >          - *2.1%* 0.22383358 productName:iphon
> > >          - *3.47%* 0.36922288 productName:"4 s"
> > >          - *5.01%* 0.53397346 productName:"16 gb"
> > >       - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >       - *27.79%* 2.959255 sum of the following:
> > >          - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> > >          - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> > >       - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >
> > >
> > > *productName Apple iPhone 4S 16GB for Net10, No Contract, White*
> > >
> > >
> > >    - *100%* 10.649221 sum of the following:
> > >       - *10.58%* 1.1270299 sum of the following:
> > >          - *2.1%* 0.22383358 productName:iphon
> > >          - *3.47%* 0.36922288 productName:"4 s"
> > >          - *5.01%* 0.53397346 productName:"16 gb"
> > >       - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >       - *27.79%* 2.959255 sum of the following:
> > >          - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> > >          - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> > >       - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev <
> > > mkhlud...@griddynamics.com> wrote:
> > >
> > > > It's worth to look into <explain> to check particular scoring values.
> > But
> > > > for most suspect is the reducing precision when float norms are
> stored
> > in
> > > > byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float)
> > > >
> > > >
> > > > On Mon, Dec 8, 2014 at 5:49 PM, S.L <simpleliving...@gmail.com>
> wrote:
> > > >
> > > > > I have two documents doc1 and doc2 and each one of those has a
> field
> > > > called
> > > > > phoneName.
> > > > >
> > > > > doc1:phoneName:"Details about  Apple iPhone 4s - 16GB - White
> > (Verizon)
> > > > > Smartphone Factory Unlocked"
> > > > >
> > > > > doc2:phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"
> > > > >
> > > > > Here if I search for
> > > > >
> > > > >
> > > >
> > >
> >
> q=iphone+4s+16gb&qf=phoneName&mm=1&pf=phoneName&ps=1&pf2=phoneName&pf3=phoneName&stopwords=true&lowercaseOperators=true
> > > > >
> > > > > Doc1 and Doc2 both have the same identical score , but since the
> > field
> > > > > phoneName in the doc2 has shorter length I would expect it to have
> a
> > > > higher
> > > > > score , but both have an identical score of 9.961212.
> > > > >
> > > > > The phoneName filed is defined as follows.As we can see no where
> am I
> > > > > specifying omitNorms=True, still the behavior seems to be that the
> > > length
> > > > > norm is not functioning at all. Can some one let me know whats the
> > > issue
> > > > > here ?
> > > > >
> > > > >         <field name="phoneName" type="text_en_splitting"
> > indexed="true"
> > > > >             stored="true" required="true" />
> > > > >         <fieldType name="text_en_splitting" class="solr.TextField"
> > > > >             positionIncrementGap="100"
> > > autoGeneratePhraseQueries="true">
> > > > >             <analyzer type="index">
> > > > >                 <tokenizer class="solr.WhitespaceTokenizerFactory"
> />
> > > > >                 <!-- in this example, we will only use synonyms at
> > > query
> > > > > time <filter
> > > > >                     class="solr.SynonymFilterFactory"
> > > > > synonyms="index_synonyms.txt" ignoreCase="true"
> > > > >                     expand="false"/> -->
> > > > >                 <!-- Case insensitive stop word removal. add
> > > > > enablePositionIncrements=true
> > > > >                     in both the index and query analyzers to leave
> a
> > > > 'gap'
> > > > > for more accurate
> > > > >                     phrase queries. -->
> > > > >                 <filter class="solr.StopFilterFactory"
> > > ignoreCase="true"
> > > > >                     words="lang/stopwords_en.txt"
> > > > > enablePositionIncrements="true" />
> > > > >                 <filter class="solr.WordDelimiterFilterFactory"
> > > > >                     generateWordParts="1" generateNumberParts="1"
> > > > > catenateWords="1"
> > > > >                     catenateNumbers="1" catenateAll="0"
> > > > > splitOnCaseChange="1" />
> > > > >                 <filter class="solr.LowerCaseFilterFactory" />
> > > > >                 <filter class="solr.KeywordMarkerFilterFactory"
> > > > > protected="protwords.txt" />
> > > > >                 <filter class="solr.PorterStemFilterFactory" />
> > > > >             </analyzer>
> > > > >             <analyzer type="query">
> > > > >                 <tokenizer class="solr.WhitespaceTokenizerFactory"
> />
> > > > >                 <filter class="solr.SynonymFilterFactory"
> > > > > synonyms="synonyms.txt"
> > > > >                     ignoreCase="true" expand="true" />
> > > > >                 <filter class="solr.StopFilterFactory"
> > > ignoreCase="true"
> > > > >                     words="lang/stopwords_en.txt"
> > > > > enablePositionIncrements="true" />
> > > > >                 <filter class="solr.WordDelimiterFilterFactory"
> > > > >                     generateWordParts="1" generateNumberParts="1"
> > > > > catenateWords="0"
> > > > >                     catenateNumbers="0" catenateAll="0"
> > > > > splitOnCaseChange="1" />
> > > > >                 <filter class="solr.LowerCaseFilterFactory" />
> > > > >                 <filter class="solr.KeywordMarkerFilterFactory"
> > > > > protected="protwords.txt" />
> > > > >                 <filter class="solr.PorterStemFilterFactory" />
> > > > >             </analyzer>
> > > > >         </fieldType>
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sincerely yours
> > > > Mikhail Khludnev
> > > > Principal Engineer,
> > > > Grid Dynamics
> > > >
> > > > <http://www.griddynamics.com>
> > > > <mkhlud...@griddynamics.com>
> > > >
> > >
> >
>

Re: Length norm not functioning in solr queries.

Reply via email to