Re: How does the lucene normalize the score?

2006-01-27 Thread Chris Lamprecht
It takes the highest scoring document, if greater than 1.0, and divides every hit's score by this number, leaving them all = 1.0. Actually, I just looked at the code, and it actually does this by taking 1/maxScore and then multiplying this by each score (equivalent results in the end, maybe more

RE: Performance tips?

2006-01-27 Thread Daniel Pfeifer
Well, We are sporting Solaris 10 on a Sun Fire-machine with four cores and 12GB of RAM and mirrored Ultra 320-disks. I guess I could try switching to FSDirectory and hope for the best. -Original Message- From: Chris Lamprecht [mailto:[EMAIL PROTECTED] Sent: den 27 januari 2006 08:50 To:

Re: Getting the document number (with IndexReader)

2006-01-27 Thread Paul Elschot
On Friday 27 January 2006 02:36, Chun Wei Ho wrote: Thanks for the info :) One last related question. If I delete documents using a IndexReader(), can I assume that the internal document numbers of other undeleted documents (obtained using the same IndexReader instance) will not change until

Re: How does the lucene normalize the score?

2006-01-27 Thread xing jiang
hi, thank you for your help. On 1/27/06, Chris Lamprecht [EMAIL PROTECTED] wrote: It takes the highest scoring document, if greater than 1.0, and divides every hit's score by this number, leaving them all = 1.0. Actually, I just looked at the code, and it actually does this by taking

Re: Performance tips?

2006-01-27 Thread Doug Cutting
Daniel Pfeifer wrote: We are sporting Solaris 10 on a Sun Fire-machine with four cores and 12GB of RAM and mirrored Ultra 320-disks. I guess I could try switching to FSDirectory and hope for the best. Or, since you're on a 64-bit platform, try MMapDirectory, which supports greater parallelism

Re: encoding

2006-01-27 Thread John Haxby
petite_abeille wrote: I would love to see this. I presently have a somewhat unwieldy conversion table [1] that I would love to get ride of :)) [snip] [1] http://dev.alt.textdrive.com/browser/lu/LUStringBasicLatin.txt I've attached the perl script -- feed

RE: [SPAM] - Re: Performance tips? - Sending mail server found on bl.spamcop.net

2006-01-27 Thread Daniel Pfeifer
Are we both talking about Lucene? I am using Lucene 1.4.3 and can't find a class called MapDirectory or MMapDirectory. /Daniel -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: den 27 januari 2006 11:43 To: java-user@lucene.apache.org Subject: [SPAM] - Re:

Re: How does the lucene normalize the score?

2006-01-27 Thread duiduder
..but this means, that the scores are not comparable across queries, because a hit with the score '0.7' from one query mustn't be as 'good' as a '0.7' from another query...and this is only the case, whether the original, unnormalized top score value was less than 1.0. Looks this really like a

Re: How does the lucene normalize the score?

2006-01-27 Thread Yonik Seeley
On 1/27/06, Chris Lamprecht [EMAIL PROTECTED] wrote: Actually, I just looked at the code, and it actually does this by taking 1/maxScore and then multiplying this by each score (equivalent results in the end, maybe more efficient(?)). Very much so... fdiv commonly takes 20 to 40 clock cycles,

RE: problem updating a document: no segments file?

2006-01-27 Thread John Powers
The lucene info is: Manifest-Version: 1.0 Ant-Version: Apache Ant 1.6.1 Created-By: Apache Jakarta Name: org/apache/lucene Specification-Title: Lucene Search Engine Specification-Version: 1.4.3 Specification-Vendor: Lucene Implementation-Title: org.apache.lucene Implementation-Version: build

Help with indexing and query strategy

2006-01-27 Thread Colin Young
I'm having some trouble coming up with a good search strategy for geographical data. e.g., given: [1] city: London, United Kingdom [2] city: London, Ontario, Canada [3] city: Ontario, California, United States [4] state: Ontario, Canada [5] city: Vancouver, Washington, United States [6] city:

Re: [SPAM] - Re: Performance tips? - Sending mail server found on bl.spamcop.net

2006-01-27 Thread Doug Cutting
Daniel Pfeifer wrote: Are we both talking about Lucene? I am using Lucene 1.4.3 and can't find a class called MapDirectory or MMapDirectory. It is post-1.4. You can download a nightly build of the current trunk at: http://cvs.apache.org/dist/lucene/java/nightly/ Doug

Re: Help with indexing and query strategy

2006-01-27 Thread Rajesh Munavalli
Hi Colin, Even assuming you came up with a good way of indexing, the example query Ontario, CA should yield 3 hits. All 2, 3 and 4 are valid retrievals. Could you please justify which 2 hits you want and why? Thanks, Rajesh Munavalli On 1/27/06, Colin Young [EMAIL PROTECTED] wrote:

Re: Help with indexing and query strategy

2006-01-27 Thread Rajesh Munavalli
Hi Colin, Even assuming you came up with a good way of indexing, the example query Ontario, CA should yield 3 hits. All 2, 3 and 4 are valid retrievals. Could you please justify which 2 hits you want and why? Thanks, Rajesh Munavalli Colin Young wrote: I'm having some trouble

RE: Help with indexing and query strategy

2006-01-27 Thread Colin Young
The reason I only want 2 hits is because [2] is more specific in my domain -- I could also have Toronto, Ontario; Kingston, Ontario etc. which would take the hits up to 5 now. What I'm really after is finding a way to index and search that would make [2] an invalid retrieval. My latest attempt

Re: Help with indexing and query strategy

2006-01-27 Thread Rajesh Munavalli
Few questions. (1) Does each document contain only one geographical location? (2) Given a document, how are you tokenizing it into city, state and country? I am assuming , as the delimiter here. Otherwise determining the boundary for names like St. Louis du Ha Ha would be difficult. (3) Are

How to find function() - ?

2006-01-27 Thread Dmitry Goldenberg
Hi, I'm trying to figure out a way to locate tokens which include special characters. The actual text in the file being indexed is something like function() { statement1; statement2; } The query I'm using is function\() since I want to locate precisely function() - the query succeeds but

RE: Help with indexing and query strategy

2006-01-27 Thread Colin Young
1) Yes. One location per document. 2) Using the SimpleAnalyzer (for now). I have city, state and country as separate fields, so I could tokenize each as a single token if that would work better. I think that avoids the need for a delimiter at index time. 3) I am not making any assumptions now at

Re: How does the lucene normalize the score?

2006-01-27 Thread Chris Hostetter
: ..but this means, that the scores are not comparable across queries, : because a hit with the score '0.7' from one query mustn't be as 'good' as : a '0.7' from another query...and this is only the case, whether the original, : unnormalized top score value was less than 1.0. Scores are not

index concurrency result order

2006-01-27 Thread kate
hi list, i'm trying to use Lucene (1.4.3) to replace an existing MySQL search system. so far, this is working great, but i have a couple of questions. firstly, when my index updater is (re)indexing a lot of documents at once, i often get errors like FileNotFoundException: