[jira] Created: (LUCENE-760) Spellchecker could/should use n-gram tokenizers instead of rolling its own n-gramming

2006-12-22 Thread Otis Gospodnetic (JIRA)
Spellchecker could/should use n-gram tokenizers instead of rolling its own n-gramming - Key: LUCENE-760 URL: http://issues.apache.org/jira/browse/LUCENE-760 Project:

[jira] Resolved: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

2006-12-22 Thread Otis Gospodnetic (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-759?page=all ] Otis Gospodnetic resolved LUCENE-759. - Resolution: Fixed Unit tests pass, committed. > Add n-gram tokenizers to contrib/analyzers > -- > >

[jira] Updated: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

2006-12-22 Thread Otis Gospodnetic (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-759?page=all ] Otis Gospodnetic updated LUCENE-759: Attachment: LUCENE-759.patch Included: NGramTokenizer NGramTokenizerTest EdgeNGramTokenizer EdgeNGramTokenizerTest > Add n-gram tokenizers to

Re: Payloads

2006-12-22 Thread Michael Busch
Nicolas Lalevée wrote: I have just looked at it. It looks great :) Thanks! :-) But I still doesn't understand why a new entry in the fieldinfo is needed. The entry is not really *needed*, but I use it for backwards-compatibility and as an optimization for fields that don't have any

[jira] Created: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

2006-12-22 Thread Otis Gospodnetic (JIRA)
Add n-gram tokenizers to contrib/analyzers -- Key: LUCENE-759 URL: http://issues.apache.org/jira/browse/LUCENE-759 Project: Lucene - Java Issue Type: Improvement Components: Analysis

[jira] Commented: (LUCENE-708) Setup nightly build website links and docs

2006-12-22 Thread Grant Ingersoll (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-708?page=comments#action_12460595 ] Grant Ingersoll commented on LUCENE-708: Nightly build distribution of binary jars should not contain instrumented classes from Clover. > Setup nightly bu

Re: Payloads

2006-12-22 Thread Marvin Humphrey
On Dec 22, 2006, at 10:36 AM, Doug Cutting wrote: The easiest way to do this would be to have separate files in each segment for each PostingFormat. It would be better if different posting formats could share files, but that's harder to coordinate. The approach I'm taking in KinoSearch 0.

Re: Payloads

2006-12-22 Thread Ning Li
On 12/22/06, Doug Cutting <[EMAIL PROTECTED]> wrote: Ning Li wrote: > The draft proposal seems to suggest the following (roughly): > A dictionary entry is . Perhaps this ought to be , where TermInfo contains a FilePointer and perhaps other information (e.g., frequency data). Yes. Another exam

Re: Payloads

2006-12-22 Thread Doug Cutting
Ning Li wrote: I'm aware of this design. Boolean and phrase queries are an example. The point is, there are different queries whose processing will (continue to) require different information of terms, especially when flexible posting is allowed. The question is, should the number of files used t

Re: Payloads

2006-12-22 Thread Doug Cutting
Ning Li wrote: The draft proposal seems to suggest the following (roughly): A dictionary entry is . Perhaps this ought to be , where TermInfo contains a FilePointer and perhaps other information (e.g., frequency data). A posting entry for a term in a document is . Classes which implement

Re: Payloads

2006-12-22 Thread Marvin Humphrey
On Dec 22, 2006, at 9:17 AM, Ning Li wrote: The question is, should the number of files used to store postings be customizable? I think it ought to remain an implementation detail for now. Using multiple files is an optimization of unknown advantage. Optimizations have to work very hard

[jira] Commented: (LUCENE-758) IndexReader.isCurrent fails when using two IndexReaders

2006-12-22 Thread Michael McCandless (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-758?page=comments#action_12460555 ] Michael McCandless commented on LUCENE-758: --- Thank you for the full test case showing the issue! However, I believe this is by design. When you init a R

[jira] Updated: (LUCENE-662) Extendable writer and reader of field data

2006-12-22 Thread JIRA
[ http://issues.apache.org/jira/browse/LUCENE-662?page=all ] Nicolas Lalevée updated LUCENE-662: --- Attachment: generic-fieldIO-4.patch Patch synchronized with the trunk. I also tried to minimize the diff. And in fact I just realized that there are two

Re: Payloads

2006-12-22 Thread Ning Li
On 12/22/06, Marvin Humphrey <[EMAIL PROTECTED]> wrote: Precision would be enhanced if boolean scoring took position into account, and could be further enhanced if each position were assigned a boost. For that purpose, having everything in one file is an advantage, as it cuts down disk seeks. T

Re: Payloads

2006-12-22 Thread Marvin Humphrey
On Dec 21, 2006, at 1:58 PM, Ning Li wrote: Storing all the posting content, e.g. frequencies and positions, in a single file greatly simplifies things. However, this could cause some performance penalty. For example, boolean query 'Apache AND Lucene' would have to paw through positions. But po

Re: Payloads

2006-12-22 Thread Nicolas Lalevée
Le Mercredi 20 Décembre 2006 20:42, Michael Busch a écrit : > Doug Cutting wrote: > > Michael, > > > > This sounds like very good work. The back-compatibility of this > > approach is great. But we should also consider this in the broader > > context of index-format flexibility. > > > > Three gene

[jira] Commented: (LUCENE-755) Payloads

2006-12-22 Thread Grant Ingersoll (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-755?page=comments#action_12460496 ] Grant Ingersoll commented on LUCENE-755: Great patch, Michael, and something that will come in handy for a lot of people. I can vouch it applies cleanly a

[jira] Created: (LUCENE-758) IndexReader.isCurrent fails when using two IndexReaders

2006-12-22 Thread Bernhard Messer (JIRA)
IndexReader.isCurrent fails when using two IndexReaders --- Key: LUCENE-758 URL: http://issues.apache.org/jira/browse/LUCENE-758 Project: Lucene - Java Issue Type: Bug Affects Versions:

[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

2006-12-22 Thread Doron Cohen (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-756?page=all ] Doron Cohen updated LUCENE-756: --- Attachment: nrm.patch.2.txt nrm.patch.2.txt: Updated as Doug suggested: - ".nrm" extension now maintained in a constant . - .nrm file now has a 4 bytes header.