wrong fieldNorm when title is empty

2011-01-07 Thread Andre Wallat
I recognized when the title of a document inside an index is empty, then the fieldNorm value is set to 7.5161928E9. This would lead to a big unwanted boost of documents with an empty title I imagine. Is this a bug? -- View this message in context: http://lucene.472066.n3.nabble.com/wrong

Custom similarity calculation ignoring fieldnorm

2010-11-18 Thread Philippe
Dear Lucene group, I wrote my own Scorer by extending Similarity. The scorer works quite well, but I would like to ignore the fieldnorm value. Is this somehow possible during search time? Or do I have to add a field indexed with no_norm? Best, Philippe

How to calculate the fieldNorm

2010-09-21 Thread Qi Li
Hi, guys: I read this http://lucene.apache.org/java/3_0_2/api/core/index.html . But I am confused about how the fieldNorm is calculated after seeing the explanation. ( I am using StandardAnalyzer for both index and search) *1.. Index Part* document 0: doc.add(new Field("

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
; how many characters was it in the url in before and after update? > > > > >>> > > > > >>> > > > > >>> karl > > > > >>> > > > > >>> 5 okt 2009 kl. 10.21 skrev Ole-Martin Mørk: > > &g

Re: Help understanding fieldNorm

2009-10-05 Thread Simon Willnauer
n the url in before and after update? > > > >>> > > > >>> > > > >>> karl > > > >>> > > > >>> 5 okt 2009 kl. 10.21 skrev Ole-Martin Mørk: > > > >>> > > > >>> > > > >

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
> >>> 5 okt 2009 kl. 10.21 skrev Ole-Martin Mørk: > > >>> > > >>> > > >>> Hi. I am trying to understand Lucene's scoring algorithm. We're > > >>> > > >>>> getting some strange results. First we search

Re: Help understanding fieldNorm

2009-10-05 Thread Karl Wettin
Could it be that the tokenization schema for URL have changed between the times you added documents? I.e. yielding more tokens when you got the low fieldNorm value. Number of documents should not impact the fieldnorm, the value is based on number of tokens in the field, field and document

Re: Help understanding fieldNorm

2009-10-05 Thread Simon Willnauer
ring algorithm. We're > >>> > >>>> getting some strange results. First we search for a given page by it's > >>>> url. We get this result: > >>>> > >>>> 0.0014793393 = fieldWeight(url:"our super secret url&qu

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
given page by it's >>>> url. We get this result: >>>> >>>> 0.0014793393 = fieldWeight(url:"our super secret url" in 22), product >>>> of: >>>> 1.0 = tf(phraseFreq=1.0) >>>> 32.31666 = idf(url: www=7327 host=321

Re: Help understanding fieldNorm

2009-10-05 Thread Karl Wettin
uot; in 22), product of: 1.0 = tf(phraseFreq=1.0) 32.31666 = idf(url: www=7327 host=321 com=7327 article=2456 something=2 something=44 704290075=1) 4.5776367E-5 = fieldNorm(field=url, doc=22) When this is done, we use solrJ to read and write the document. The only change is the title of the docum

Re: Help understanding fieldNorm

2009-10-05 Thread Simon Willnauer
url" in 22), product of: >> 1.0 = tf(phraseFreq=1.0) >> 32.31666 = idf(url: www=7327 host=321 com=7327 article=2456 >> something=2 something=44 704290075=1) >> 4.5776367E-5 = fieldNorm(field=url, doc=22) >> >> When this is done, we use solrJ to read and wr

Re: Help understanding fieldNorm

2009-10-05 Thread Karl Wettin
rl. We get this result: 0.0014793393 = fieldWeight(url:"our super secret url" in 22), product of: 1.0 = tf(phraseFreq=1.0) 32.31666 = idf(url: www=7327 host=321 com=7327 article=2456 something=2 something=44 704290075=1) 4.5776367E-5 = fieldNorm(field=url, doc=22) When this is done,

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
Did another update: 9.707364 = fieldWeight(url:"our super secret url" in 0), product of: 1.0 = tf(phraseFreq=1.0) 31.063566 = idf(url: www=7329 host=323 com=7329 article=2458 something=4 something=46 704290075=3) 0.3125 = fieldNorm(field=url, doc=0) FieldNorm value is not changed

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
; in 22), product of: > > 1.0 = tf(phraseFreq=1.0) > > 32.31666 = idf(url: www=7327 host=321 com=7327 article=2456 > > something=2 something=44 704290075=1) > > 4.5776367E-5 = fieldNorm(field=url, doc=22) > > > > When this is done, we use solrJ to read and write

Re: Help understanding fieldNorm

2009-10-05 Thread Simon Willnauer
22), product of: > 1.0 = tf(phraseFreq=1.0) > 32.31666 = idf(url: www=7327 host=321 com=7327 article=2456 > something=2 something=44 704290075=1) > 4.5776367E-5 = fieldNorm(field=url, doc=22) > > When this is done, we use solrJ to read and write the document. The > only chan

Re: Calculation of fieldNorm causes irritating effect of sort order

2008-10-02 Thread Chris Hostetter
If i'm reading your message correctly, you (and everyone who has replied so far) have gotten caught in a red herring. While an "explain" on the results from your queryB will most likely show you that the fieldNorm is the main differantiator in score between document-153 an

RE: Calculation of fieldNorm causes irritating effect of sort order

2008-10-02 Thread Jimi Hullegård
Karl Wrote: > > 2 okt 2008 kl. 14.47 skrev Jimi Hullegård: > > > But apparently this setOmitNorms(true) also disables boosting > > aswell. That is ok for now, but what if we want to use boosting in > > the future? Is there no way to disable the length normalization > > while still keeping the boost

Re: Calculation of fieldNorm causes irritating effect of sort order

2008-10-02 Thread Karl Wettin
2 okt 2008 kl. 14.47 skrev Jimi Hullegård: But apparently this setOmitNorms(true) also disables boosting aswell. That is ok for now, but what if we want to use boosting in the future? Is there no way to disable the length normalization while still keeping the boost calculation? You can m

RE: Calculation of fieldNorm causes irritating effect of sort order

2008-10-02 Thread Jimi Hullegård
Erick wrote: > > Another possibility (and I'm not sure it'll work, but what > the heck) would > be > to create a Filter for active ideas. So rather than add a > "category:14" > clause, > you create a Category14Filter that you send to the query > along with your > +type:idea +alltext:betyg clauses.

Re: Calculation of fieldNorm causes irritating effect of sort order

2008-10-02 Thread Erick Erickson
Another possibility (and I'm not sure it'll work, but what the heck) would be to create a Filter for active ideas. So rather than add a "category:14" clause, you create a Category14Filter that you send to the query along with your +type:idea +alltext:betyg clauses. Now, category won't be considered

RE: Calculation of fieldNorm causes irritating effect of sort order

2008-10-02 Thread Jimi Hullegård
Erik wrote: > > On Oct 2, 2008, at 7:39 AM, Jimi Hullegård wrote: > > Is it possible to disable the lengthNorm calculation for particular > > fields? > > Yes, use Field#setOmitNorms(true) when indexing. Ok, thanks. I will just have to look on how to do this the best way (since the CMS is handling

Re: Calculation of fieldNorm causes irritating effect of sort order

2008-10-02 Thread Erik Hatcher
On Oct 2, 2008, at 7:39 AM, Jimi Hullegård wrote: Is it possible to disable the lengthNorm calculation for particular fields? Yes, use Field#setOmitNorms(true) when indexing. Erik - To unsubscribe, e-mail: [EMAIL P

Calculation of fieldNorm causes irritating effect of sort order

2008-10-02 Thread Jimi Hullegård
Hi, Maybe I have missunderstood the general concept of how search results should be scored in regards to the fieldNorm, but the way i see it it causes an irritating effect of the sort order for me. Here's the deal: I'm building a simple site with documents that represents ideas. Eac

Re: fieldNorm and fieldValueUniqueness

2008-06-11 Thread Cam Bazz
yes, figured it out. thanks. how about checking for uniqueness? Best. On Wed, Jun 11, 2008 at 5:39 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > > 11 jun 2008 kl. 16.04 skrev Cam Bazz: > >> >> When you look at the fields of a document with Luke, there is a norm >> column. >> I have not been able

Re: fieldNorm and fieldValueUniqueness

2008-06-11 Thread Karl Wettin
11 jun 2008 kl. 16.04 skrev Cam Bazz: When you look at the fields of a document with Luke, there is a norm column. I have not been able to figure out what that is. Norms is the 8 bit discretization of length normalization and field boost combined. See IndexReader#norms, Similarity#leng

fieldNorm and fieldValueUniqueness

2008-06-11 Thread Cam Bazz
Hello, When you look at the fields of a document with Luke, there is a norm column. I have not been able to figure out what that is. The reason I am asking is that I am trying to build a uniqueness model. My Index is structured as follows: classID, textID, K, V classID is a given class. textID

Re: explain() - fieldnorm

2008-03-25 Thread Grant Ingersoll
books got different publication years - but explain() tells me that my fieldNorm value is 1.5. Document boosts do not have much granularity due to the limited number of bits in the norm. I seem to recall Yonik publishing a list of values at one time on the mailing list, but I can't fo

Re: explain() - fieldnorm

2008-03-25 Thread JensBurkhardt
: 10.577795 = idf(docFreq=270) 0.0014624415 = queryNorm 15.866693 = (MATCH) fieldWeight(ti:genetik in 1849319), product of: 1.0 = tf(termFreq(ti:genetik)=1) 10.577795 = idf(docFreq=270) 1.5 = fieldNorm(field=ti, doc=1849319) 0.58184767

Re: explain() - fieldnorm

2008-03-03 Thread JensBurkhardt
Okay, thanks a lot. Maybe I should change my indexing behavior ;-) . Greetings Jens hossman wrote: > > : As my subject is telling, i have a little problem with analyzing the > : explain() output. > : I know, that the fieldnorm value consists out of "documentboost, >

Re: explain() - fieldnorm

2008-03-02 Thread Chris Hostetter
: As my subject is telling, i have a little problem with analyzing the : explain() output. : I know, that the fieldnorm value consists out of "documentboost, fieldboost : and lengthNorm". : Is is possible to recieve the single values? I know that they are multiplied : while indexing

explain() - fieldnorm

2008-02-27 Thread JensBurkhardt
Hey everybody, As my subject is telling, i have a little problem with analyzing the explain() output. I know, that the fieldnorm value consists out of "documentboost, fieldboost and lengthNorm". Is is possible to recieve the single values? I know that they are multiplied while indexi

Re: Very high fieldNorm for a field resulting in bad results

2006-10-02 Thread Chris Hostetter
: This should solve most of my heartache. : Whats the suggested way to use this ? Copy a solr jar ? Or just copy : the code for this 1 query ? that's entirely up to you, it depends on what kind of source management you want to have -- the suggested way to use it is to run Solr and use it via the

Re: Very high fieldNorm for a field resulting in bad results

2006-09-29 Thread Mek
You might want to look into the DisjunctionMaxQuery class ... in particular building a BooleanQuery containing a DisjunctionMaxQuery for each 'word' of your input in the various fields ... i've found it to be very effective. when it was first proposed it was called "MaxDisjunctionQuery" and you c

Re: Very high fieldNorm for a field resulting in bad results

2006-09-29 Thread Chris Hostetter
: Assuming I want to boost the fields with the same value for all documents, : can this be replaced by query-time boosting. if i'm understanding what you mena, then yes. : I, though, am storing the norms & yet do not get exact matches ranking : higher than others. the notion that norms help "ex

Re: Very high fieldNorm for a field resulting in bad results

2006-09-28 Thread Mek
it depends on your goal. index time field boosts are a way to express things like "this documents title is worth twice as much as the title of most documents" query time boosts are a way to express "i care about matches on this clause of my query twice as much as i do about matches to other claus

Re: Very high fieldNorm for a field resulting in bad results

2006-09-27 Thread Chris Hostetter
: 1. Can I do away with index-time boosting for fields & tweak : query-time boosting for them ? I understand that doc level boosting is : very useful while indexing. : But for fields, both index-boost & query-boost are mutiples which lead : to the score, so would it be safe to say that I can repla

Re: Very high fieldNorm for a field resulting in bad results

2006-09-26 Thread Mek
lengthNorm is by default 1/swrt(num of terms)) That explains the very high value for the fieldNorm. The boost value became boost_vale^#of values in the field. A couple of more questions: 1. Can I do away with index-time boosting for fields & tweak query-time boosting for them ? I understan

Re: Very high fieldNorm for a field resulting in bad results

2006-09-26 Thread Chris Hostetter
: The symptom: : Very high fieldNorm for field A.(explain output pasted below) The boost i am : applying to the troublesome field is 3.5 & the max boost applied per doc is : 1.8 : Given that information, the very high fieldNorm is very surprising to me. : Based on what I read, FieldNorm

Very high fieldNorm for a field resulting in bad results

2006-09-25 Thread Mek
on exact matches in 1 other field. (I change th userQuery => "userQuery"^4 userQuery .. any other ideas for ranking exact matches higher ?) The symptom: Very high fieldNorm for field A.(explain output pasted below) The boost i am applying to the troublesome field is 3.5 & the max

Re: fieldNorm

2006-06-08 Thread Otis Gospodnetic
In English, fieldNorm essentially means: "give term hits in shorter fields more weight/importance than those in longer fields". I believe the implementation is 1/sqrt(number of terms in field). Keep in mind that index-time boost is calculated into the field norm. Otis - Origin

fieldNorm

2006-06-08 Thread Seeta Somagani
ch document, all the values made sense except the fieldNorm which is comparatively very high for the topmost hit. Can someone please explain how the fieldNorm factor is calculated? Thanks so much. Seeta - To unsubscribe, e-mail: [

Re: understand the queryNorm and the fieldNorm.

2006-02-06 Thread jason
/6/06, jason <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I have a problem of understanding the queryNorm and fieldNorm. > > > > The following is an example. I try to follow what said in the Javadoc > > "Computes the normalization value for a query gi

Re: understand the queryNorm and the fieldNorm.

2006-02-06 Thread Yonik Seeley
Hi Jason, I get the same thing for the queryNorm when I calculate it by hand: 1/((1.7613963**2 + 1.326625**2)**.5) = 0.45349488111693986 -Yonik On 2/6/06, jason <[EMAIL PROTECTED]> wrote: > Hi, > > I have a problem of understanding the queryNorm and fieldNorm. > > The follo

To understand the queryNorm and fieldNorm

2006-02-06 Thread jason
Hi, I have a problem of understanding the queryNorm and fieldNorm. The following is an example. I try to follow what said in the Javadoc "Computes the normalization value for a query given the sum of the squared weights of each of the query terms". But the result is different. ID:0 C

understand the queryNorm and the fieldNorm.

2006-02-06 Thread jason
Hi, I have a problem of understanding the queryNorm and fieldNorm. The following is an example. I try to follow what said in the Javadoc "Computes the normalization value for a query given the sum of the squared weights of each of the query terms". But the result is different. ID:0 C