It takes the highest scoring document, if greater than 1.0, and
divides every hit's score by this number, leaving them all = 1.0.
Actually, I just looked at the code, and it actually does this by
taking 1/maxScore and then multiplying this by each score (equivalent
results in the end, maybe more
Well,
We are sporting Solaris 10 on a Sun Fire-machine with four cores and
12GB of RAM and mirrored Ultra 320-disks. I guess I could try switching
to FSDirectory and hope for the best.
-Original Message-
From: Chris Lamprecht [mailto:[EMAIL PROTECTED]
Sent: den 27 januari 2006 08:50
To:
On Friday 27 January 2006 02:36, Chun Wei Ho wrote:
Thanks for the info :) One last related question.
If I delete documents using a IndexReader(), can I assume that the
internal document numbers of other undeleted documents (obtained using
the same IndexReader instance) will not change until
hi,
thank you for your help.
On 1/27/06, Chris Lamprecht [EMAIL PROTECTED] wrote:
It takes the highest scoring document, if greater than 1.0, and
divides every hit's score by this number, leaving them all = 1.0.
Actually, I just looked at the code, and it actually does this by
taking
Daniel Pfeifer wrote:
We are sporting Solaris 10 on a Sun Fire-machine with four cores and
12GB of RAM and mirrored Ultra 320-disks. I guess I could try switching
to FSDirectory and hope for the best.
Or, since you're on a 64-bit platform, try MMapDirectory, which supports
greater parallelism
petite_abeille wrote:
I would love to see this. I presently have a somewhat unwieldy
conversion table [1] that I would love to get ride of :))
[snip]
[1] http://dev.alt.textdrive.com/browser/lu/LUStringBasicLatin.txt
I've attached the perl script -- feed
Are we both talking about Lucene? I am using Lucene 1.4.3 and can't find
a class called MapDirectory or MMapDirectory.
/Daniel
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: den 27 januari 2006 11:43
To: java-user@lucene.apache.org
Subject: [SPAM] - Re:
..but this means, that the scores are not comparable across queries,
because a hit with the score '0.7' from one query mustn't be as 'good' as
a '0.7' from another query...and this is only the case, whether the original,
unnormalized top score value was less than 1.0.
Looks this really like a
On 1/27/06, Chris Lamprecht [EMAIL PROTECTED] wrote:
Actually, I just looked at the code, and it actually does this by
taking 1/maxScore and then multiplying this by each score (equivalent
results in the end, maybe more efficient(?)).
Very much so... fdiv commonly takes 20 to 40 clock cycles,
The lucene info is:
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.6.1
Created-By: Apache Jakarta
Name: org/apache/lucene
Specification-Title: Lucene Search Engine
Specification-Version: 1.4.3
Specification-Vendor: Lucene
Implementation-Title: org.apache.lucene
Implementation-Version: build
I'm having some trouble coming up with a good search strategy for geographical
data. e.g., given:
[1] city: London, United Kingdom
[2] city: London, Ontario, Canada
[3] city: Ontario, California, United States
[4] state: Ontario, Canada
[5] city: Vancouver, Washington, United States
[6] city:
Daniel Pfeifer wrote:
Are we both talking about Lucene? I am using Lucene 1.4.3 and can't find
a class called MapDirectory or MMapDirectory.
It is post-1.4.
You can download a nightly build of the current trunk at:
http://cvs.apache.org/dist/lucene/java/nightly/
Doug
Hi Colin,
Even assuming you came up with a good way of indexing, the example
query Ontario, CA should yield 3 hits. All 2, 3 and 4 are valid
retrievals. Could you please justify which 2 hits you want and why?
Thanks,
Rajesh Munavalli
On 1/27/06, Colin Young [EMAIL PROTECTED] wrote:
Hi Colin,
Even assuming you came up with a good way of indexing, the
example query Ontario, CA should yield 3 hits. All 2, 3 and 4 are
valid retrievals. Could you please justify which 2 hits you want and why?
Thanks,
Rajesh Munavalli
Colin Young wrote:
I'm having some trouble
The reason I only want 2 hits is because [2] is more specific in my
domain -- I could also have Toronto, Ontario; Kingston, Ontario etc.
which would take the hits up to 5 now.
What I'm really after is finding a way to index and search that would
make [2] an invalid retrieval.
My latest attempt
Few questions.
(1) Does each document contain only one geographical location?
(2) Given a document, how are you tokenizing it into city, state and
country? I am assuming , as the delimiter here. Otherwise determining the
boundary for names like St. Louis du Ha Ha would be difficult.
(3) Are
Hi,
I'm trying to figure out a way to locate tokens which include special
characters. The actual text in the file being indexed is something like
function() { statement1; statement2; }
The query I'm using is function\() since I want to locate precisely
function() - the query succeeds but
1) Yes. One location per document.
2) Using the SimpleAnalyzer (for now). I have city, state and country as
separate fields, so I could tokenize each as a single token if that
would work better. I think that avoids the need for a delimiter at index
time.
3) I am not making any assumptions now at
: ..but this means, that the scores are not comparable across queries,
: because a hit with the score '0.7' from one query mustn't be as 'good' as
: a '0.7' from another query...and this is only the case, whether the original,
: unnormalized top score value was less than 1.0.
Scores are not
hi list,
i'm trying to use Lucene (1.4.3) to replace an existing MySQL search system.
so far, this is working great, but i have a couple of questions.
firstly, when my index updater is (re)indexing a lot of documents at once, i
often get errors like
FileNotFoundException:
20 matches
Mail list logo