I recognized when the title of a document inside an index is empty, then the
fieldNorm value is set to 7.5161928E9. This would lead to a big unwanted
boost of documents with an empty title I imagine. Is this a bug?
--
View this message in context:
http://lucene.472066.n3.nabble.com/wrong
Dear Lucene group,
I wrote my own Scorer by extending Similarity. The scorer works quite
well, but I would like to ignore the fieldnorm value. Is this somehow
possible during search time? Or do I have to add a field indexed with
no_norm?
Best,
Philippe
Hi, guys:
I read this http://lucene.apache.org/java/3_0_2/api/core/index.html . But I
am confused about how the fieldNorm is calculated after seeing the
explanation.
( I am using StandardAnalyzer for both index and search)
*1.. Index Part*
document 0:
doc.add(new Field("
; how many characters was it in the url in before and after update?
> > > > >>>
> > > > >>>
> > > > >>> karl
> > > > >>>
> > > > >>> 5 okt 2009 kl. 10.21 skrev Ole-Martin Mørk:
> > &g
n the url in before and after update?
> > > >>>
> > > >>>
> > > >>> karl
> > > >>>
> > > >>> 5 okt 2009 kl. 10.21 skrev Ole-Martin Mørk:
> > > >>>
> > > >>>
> > > >
> >>> 5 okt 2009 kl. 10.21 skrev Ole-Martin Mørk:
> > >>>
> > >>>
> > >>> Hi. I am trying to understand Lucene's scoring algorithm. We're
> > >>>
> > >>>> getting some strange results. First we search
Could it be that the tokenization schema for URL have changed between
the times you added documents? I.e. yielding more tokens when you got
the low fieldNorm value. Number of documents should not impact the
fieldnorm, the value is based on number of tokens in the field, field
and document
ring algorithm. We're
> >>>
> >>>> getting some strange results. First we search for a given page by it's
> >>>> url. We get this result:
> >>>>
> >>>> 0.0014793393 = fieldWeight(url:"our super secret url&qu
given page by it's
>>>> url. We get this result:
>>>>
>>>> 0.0014793393 = fieldWeight(url:"our super secret url" in 22), product
>>>> of:
>>>> 1.0 = tf(phraseFreq=1.0)
>>>> 32.31666 = idf(url: www=7327 host=321
uot; in 22),
product of:
1.0 = tf(phraseFreq=1.0)
32.31666 = idf(url: www=7327 host=321 com=7327 article=2456
something=2 something=44 704290075=1)
4.5776367E-5 = fieldNorm(field=url, doc=22)
When this is done, we use solrJ to read and write the document. The
only change is the title of the docum
url" in 22), product of:
>> 1.0 = tf(phraseFreq=1.0)
>> 32.31666 = idf(url: www=7327 host=321 com=7327 article=2456
>> something=2 something=44 704290075=1)
>> 4.5776367E-5 = fieldNorm(field=url, doc=22)
>>
>> When this is done, we use solrJ to read and wr
rl. We get this result:
0.0014793393 = fieldWeight(url:"our super secret url" in 22),
product of:
1.0 = tf(phraseFreq=1.0)
32.31666 = idf(url: www=7327 host=321 com=7327 article=2456
something=2 something=44 704290075=1)
4.5776367E-5 = fieldNorm(field=url, doc=22)
When this is done,
Did another update:
9.707364 = fieldWeight(url:"our super secret url" in 0), product of:
1.0 = tf(phraseFreq=1.0)
31.063566 = idf(url: www=7329 host=323 com=7329
article=2458 something=4 something=46 704290075=3)
0.3125 = fieldNorm(field=url, doc=0)
FieldNorm value is not changed
; in 22), product of:
> > 1.0 = tf(phraseFreq=1.0)
> > 32.31666 = idf(url: www=7327 host=321 com=7327 article=2456
> > something=2 something=44 704290075=1)
> > 4.5776367E-5 = fieldNorm(field=url, doc=22)
> >
> > When this is done, we use solrJ to read and write
22), product of:
> 1.0 = tf(phraseFreq=1.0)
> 32.31666 = idf(url: www=7327 host=321 com=7327 article=2456
> something=2 something=44 704290075=1)
> 4.5776367E-5 = fieldNorm(field=url, doc=22)
>
> When this is done, we use solrJ to read and write the document. The
> only chan
If i'm reading your message correctly, you (and everyone who has replied
so far) have gotten caught in a red herring. While an "explain" on the
results from your queryB will most likely show you that the
fieldNorm is the main differantiator in score between document-153 an
Karl Wrote:
>
> 2 okt 2008 kl. 14.47 skrev Jimi Hullegård:
>
> > But apparently this setOmitNorms(true) also disables boosting
> > aswell. That is ok for now, but what if we want to use boosting in
> > the future? Is there no way to disable the length normalization
> > while still keeping the boost
2 okt 2008 kl. 14.47 skrev Jimi Hullegård:
But apparently this setOmitNorms(true) also disables boosting
aswell. That is ok for now, but what if we want to use boosting in
the future? Is there no way to disable the length normalization
while still keeping the boost calculation?
You can m
Erick wrote:
>
> Another possibility (and I'm not sure it'll work, but what
> the heck) would
> be
> to create a Filter for active ideas. So rather than add a
> "category:14"
> clause,
> you create a Category14Filter that you send to the query
> along with your
> +type:idea +alltext:betyg clauses.
Another possibility (and I'm not sure it'll work, but what the heck) would
be
to create a Filter for active ideas. So rather than add a "category:14"
clause,
you create a Category14Filter that you send to the query along with your
+type:idea +alltext:betyg clauses. Now, category won't be considered
Erik wrote:
>
> On Oct 2, 2008, at 7:39 AM, Jimi Hullegård wrote:
> > Is it possible to disable the lengthNorm calculation for particular
> > fields?
>
> Yes, use Field#setOmitNorms(true) when indexing.
Ok, thanks. I will just have to look on how to do this the best way (since the
CMS is handling
On Oct 2, 2008, at 7:39 AM, Jimi Hullegård wrote:
Is it possible to disable the lengthNorm calculation for particular
fields?
Yes, use Field#setOmitNorms(true) when indexing.
Erik
-
To unsubscribe, e-mail: [EMAIL P
Hi,
Maybe I have missunderstood the general concept of how search results should be
scored in regards to the fieldNorm, but the way i see it it causes an
irritating effect of the sort order for me.
Here's the deal:
I'm building a simple site with documents that represents ideas. Eac
yes, figured it out. thanks.
how about checking for uniqueness?
Best.
On Wed, Jun 11, 2008 at 5:39 PM, Karl Wettin <[EMAIL PROTECTED]> wrote:
>
> 11 jun 2008 kl. 16.04 skrev Cam Bazz:
>
>>
>> When you look at the fields of a document with Luke, there is a norm
>> column.
>> I have not been able
11 jun 2008 kl. 16.04 skrev Cam Bazz:
When you look at the fields of a document with Luke, there is a norm
column.
I have not been able to figure out what that is.
Norms is the 8 bit discretization of length normalization and field
boost combined.
See IndexReader#norms, Similarity#leng
Hello,
When you look at the fields of a document with Luke, there is a norm column.
I have not been able to figure out what that is.
The reason I am asking is that I am trying to build a uniqueness model. My
Index is structured as follows:
classID, textID, K, V
classID is a given class. textID
books got different publication years - but explain() tells me
that my
fieldNorm value is 1.5.
Document boosts do not have much granularity due to the limited number
of bits in the norm. I seem to recall Yonik publishing a list of
values at one time on the mailing list, but I can't fo
:
10.577795 = idf(docFreq=270)
0.0014624415 = queryNorm
15.866693 = (MATCH) fieldWeight(ti:genetik in 1849319), product of:
1.0 = tf(termFreq(ti:genetik)=1)
10.577795 = idf(docFreq=270)
1.5 = fieldNorm(field=ti, doc=1849319)
0.58184767
Okay, thanks a lot. Maybe I should change my indexing behavior ;-) .
Greetings
Jens
hossman wrote:
>
> : As my subject is telling, i have a little problem with analyzing the
> : explain() output.
> : I know, that the fieldnorm value consists out of "documentboost,
>
: As my subject is telling, i have a little problem with analyzing the
: explain() output.
: I know, that the fieldnorm value consists out of "documentboost, fieldboost
: and lengthNorm".
: Is is possible to recieve the single values? I know that they are multiplied
: while indexing
Hey everybody,
As my subject is telling, i have a little problem with analyzing the
explain() output.
I know, that the fieldnorm value consists out of "documentboost, fieldboost
and lengthNorm".
Is is possible to recieve the single values? I know that they are multiplied
while indexi
: This should solve most of my heartache.
: Whats the suggested way to use this ? Copy a solr jar ? Or just copy
: the code for this 1 query ?
that's entirely up to you, it depends on what kind of source management
you want to have -- the suggested way to use it is to run Solr and use it
via the
You might want to look into the DisjunctionMaxQuery class ... in
particular building a BooleanQuery containing a DisjunctionMaxQuery for
each 'word' of your input in the various fields ... i've found it to be
very effective. when it was first proposed it was called
"MaxDisjunctionQuery" and you c
: Assuming I want to boost the fields with the same value for all documents,
: can this be replaced by query-time boosting.
if i'm understanding what you mena, then yes.
: I, though, am storing the norms & yet do not get exact matches ranking
: higher than others.
the notion that norms help "ex
it depends on your goal. index time field boosts are a way to express
things like "this documents title is worth twice as much as the title of
most documents" query time boosts are a way to express "i care about
matches on this clause of my query twice as much as i do about matches to
other claus
: 1. Can I do away with index-time boosting for fields & tweak
: query-time boosting for them ? I understand that doc level boosting is
: very useful while indexing.
: But for fields, both index-boost & query-boost are mutiples which lead
: to the score, so would it be safe to say that I can repla
lengthNorm is by default 1/swrt(num of terms))
That explains the very high value for the fieldNorm. The boost value
became boost_vale^#of values in the field.
A couple of more questions:
1. Can I do away with index-time boosting for fields & tweak
query-time boosting for them ? I understan
: The symptom:
: Very high fieldNorm for field A.(explain output pasted below) The boost i am
: applying to the troublesome field is 3.5 & the max boost applied per doc is
: 1.8
: Given that information, the very high fieldNorm is very surprising to me.
: Based on what I read, FieldNorm
on exact matches in 1 other field.
(I change th userQuery => "userQuery"^4 userQuery .. any other ideas for
ranking exact matches higher ?)
The symptom:
Very high fieldNorm for field A.(explain output pasted below) The boost i am
applying to the troublesome field is 3.5 & the max
In English, fieldNorm essentially means:
"give term hits in shorter fields more weight/importance than those in longer
fields".
I believe the implementation is 1/sqrt(number of terms in field).
Keep in mind that index-time boost is calculated into the field norm.
Otis
- Origin
ch document, all
the values made sense except the fieldNorm which is comparatively very
high for the topmost hit.
Can someone please explain how the fieldNorm factor is calculated?
Thanks so much.
Seeta
-
To unsubscribe, e-mail: [
/6/06, jason <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I have a problem of understanding the queryNorm and fieldNorm.
> >
> > The following is an example. I try to follow what said in the Javadoc
> > "Computes the normalization value for a query gi
Hi Jason,
I get the same thing for the queryNorm when I calculate it by hand:
1/((1.7613963**2 + 1.326625**2)**.5) = 0.45349488111693986
-Yonik
On 2/6/06, jason <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have a problem of understanding the queryNorm and fieldNorm.
>
> The follo
Hi,
I have a problem of understanding the queryNorm and fieldNorm.
The following is an example. I try to follow what said in the Javadoc
"Computes the normalization value for a query given the sum of the squared
weights of each of the query terms". But the result is different.
ID:0 C
Hi,
I have a problem of understanding the queryNorm and fieldNorm.
The following is an example. I try to follow what said in the Javadoc
"Computes the normalization value for a query given the sum of the squared
weights of each of the query terms". But the result is different.
ID:0 C
45 matches
Mail list logo