Re: Lucene/Netbean Newbie looking for help

2006-07-10 Thread Chuck Williams
Hi Peter, I'm also a Netbeans user, ableit a very happy one who would never consider eclipse! The following sequence of steps has worked for me in netbeans 4.0 and 5.0 (haven't upgraded to 5.5 quite yet). The reason for the unusual directory structure is that Lucene's interleaving of the core an

Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-10 Thread Daniel John Debrunner
Vic Bancroft wrote: >> On Jul 10, 2006, at 11:17 PM, Daniel John Debrunner wrote: >> >>> Doug Cutting wrote: >>> Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem

Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-10 Thread Vic Bancroft
robert engels wrote: Seems silly to support 1.5 and not do it this way. Sometimes a little silliness is some serious fun! Just give me a rubber nose, since I am just clowning around trying to build Andi's kewly contrib/db using gcj on the slightly stylish db-4.4.20 and je-3.0.12 . . . O

Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-10 Thread robert engels
Agreed. I think those that are reliant on GCJ should plan on expending the effort to do whatever backporting is needed to make Lucene work on it. It should also be a GCJ branch or version. Seems silly to support 1.5 and not do it this way. On Jul 10, 2006, at 11:17 PM, Daniel John Debrunne

Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-10 Thread Daniel John Debrunner
Doug Cutting wrote: > Since GCJ is effectively available on all platforms, we could say that > we will start accepting 1.5 features when a GCJ release supports those > features. Does that seem reasonable? Seems potentially a little strange to me. Does this mean Lucene would be limited to the set

Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-10 Thread Vic Bancroft
Andi Vajda wrote: On Mon, 10 Jul 2006, Doug Cutting wrote: Andi Vajda wrote: On Sat, 8 Jul 2006, Doug Cutting wrote: Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem reaso

Re: MultiSegmentQueryFilter enhancement for interactive indexes?

2006-07-10 Thread robert engels
Creation of the filters is very expense - usually involves a large range query. We also convert all range and prefix queries to filters since scoring these does not make sense to us... For example, show sales where the sales price was > 0 and less than 500k. Frequently the user will get too

RE: MultiSegmentQueryFilter enhancement for interactive indexes?

2006-07-10 Thread Bruce Ritchie
Robert, Can you quantify 'through the roof' a bit? Are the filters that you are creating that expensive to create or is it the usage of BitSets that are the real cause of the performance improvement you've seen? Regards, Bruce Ritchie -Original Message- From: robert engels [mailto:[EM

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-07-10 Thread Ning Li
Random comment... ... An alternate implementation could use a HashMap to associate term with maxSegment. ... Very well taken. :-) I won't submit a new version of the patch at this point to avoid too many versions of the patch. Thanks, Ning ---

Lucene/Netbean Newbie looking for help

2006-07-10 Thread peter decrem
I am trying to contribute to the dot lucene port, but I am having no luck in getting the tests to compile and debug for the java version. I tried eclipse and failed and now I am stuck in Netbean. More specifically I am using Netbean 5.5 (same problems with 5.0). My understanding is that it comes

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-07-10 Thread Yonik Seeley
Random comment... when applying deletes you can break out of the loop early. + while (docs.next()) { + int doc = docs.doc(); + if (doc <= (((DeleteTerm)deleteTerms.elementAt(i)).maxSegment)) { + reader.deleteDocument(doc); +

Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-10 Thread Andi Vajda
On Mon, 10 Jul 2006, Doug Cutting wrote: Andi Vajda wrote: On Sat, 8 Jul 2006, Doug Cutting wrote: Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem reasonable? +1 If we u

Re: Global field semantics

2006-07-10 Thread Chuck Williams
Chris Hostetter wrote on 07/10/2006 12:31 PM: > So i guess we are on the same page that this kind of thing can be done at > the App level -- what benefits do you see moving them into the Lucene > index level? > Other than performance per David's and Marvin's ideas, the functionality benefits

Re: Global field semantics

2006-07-10 Thread Chris Hostetter
: previously mentioned a very simple one: validating fields in the query : parser. More interesting examples are: This strikes me as something that can be done with an abstraction layer above and seperate from the physical index (this is in fact what Solr does) without needing to add any hard c

Re: Global field semantics

2006-07-10 Thread David Balmain
On 7/11/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 7/10/06, David Balmain <[EMAIL PROTECTED]> wrote: > I don't think declaring all fields up front is necessary for > substantial optimizations. I've found that the key to some really good > optimizations is having constant field numbers. That i

Re: Global field semantics

2006-07-10 Thread Yonik Seeley
On 7/10/06, David Balmain <[EMAIL PROTECTED]> wrote: I don't think declaring all fields up front is necessary for substantial optimizations. I've found that the key to some really good optimizations is having constant field numbers. That is, once a field is added to the index it is assigned a fie

Re: Global field semantics

2006-07-10 Thread David Balmain
On 7/11/06, Chuck Williams <[EMAIL PROTECTED]> wrote: David Balmain wrote on 07/10/2006 01:04 AM: > The only problem I could find with this solution is that > fields are no longer in alphabetical order in the term dictionary but > I couldn't think of a use-case where this is necessary although I'

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-07-10 Thread Chuck Williams
Yonik Seeley wrote on 07/10/2006 09:27 AM: > I'll rephrase my original question: > When implementing NewIndexModifier, what type of efficiencies do we > get by using the new protected methods of IndexWriter vs using the > public APIs of IndexReader and IndexWriter? I won't comment on Ning's imp

Re: [jira] Commented: (LUCENE-623) RAMDirectory.close() should have a comment about not releasing any resources

2006-07-10 Thread Yonik Seeley
On 7/10/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: [performance stuff] Is this serious enough to revert? Definitely not serious at all. It won't even be measurable. Uncovering bugs that may only have manifested in FSDirectory is more than enough reason to not revert. -Yonik http://incubat

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-07-10 Thread Yonik Seeley
On 7/10/06, Ning Li <[EMAIL PROTECTED]> wrote: Almost all the Lucene newbies that I know went through this learning curve of realizing you have to batch inserts and deletes to achieve good performance. I agree that having the ability of interleave inserts and deletes to users of Lucene is a goo

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-07-10 Thread robert engels
Then I submit hat my proposed "BufferedWriter" is far simpler and probably performs equally as well, if not better, especially for the case where a document can be uniquely identified. On Jul 10, 2006, at 10:47 AM, Ning Li wrote: You keep stating that you never need to close the IndexWrite

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-07-10 Thread Ning Li
You keep stating that you never need to close the IndexWriter. I don't believe this is the case, and you are possibly misleading people as to the extent of your patch. Don't you need to close (or flush) to get the documents on disk, so a new IndexReader can find them? If not any documents added

Re: Global field semantics

2006-07-10 Thread Chuck Williams
Chris Hostetter wrote on 07/10/2006 02:06 AM: > As near as i can tell, the large issue can be sumarized with the following > sentiment: > > Performance gains could be realized if Field > properties were made fixed and homogeneous for > all Documents in an index. > This is cert

Re: Global field semantics

2006-07-10 Thread Chuck Williams
David Balmain wrote on 07/10/2006 01:04 AM: > The only problem I could find with this solution is that > fields are no longer in alphabetical order in the term dictionary but > I couldn't think of a use-case where this is necessary although I'm > sure there probably is one. So presumably fields ar

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-07-10 Thread robert engels
You keep stating that you never need to close the IndexWriter. I don't believe this is the case, and you are possibly misleading people as to the extent of your patch. Don't you need to close (or flush) to get the documents on disk, so a new IndexReader can find them? If not any documents

Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-10 Thread Doug Cutting
Andi Vajda wrote: On Sat, 8 Jul 2006, Doug Cutting wrote: Since GCJ is effectively available on all platforms, we could say that we will start accepting 1.5 features when a GCJ release supports those features. Does that seem reasonable? +1 If we use this criteria, then we should probably of

Re: Global field semantics

2006-07-10 Thread Chris Hostetter
: > Are there good reasons this path has not been followed? : : Hoss, that's your cue. I must admit, I haven't been able to fully follow this thread, perhaps it's just because it's late (no, that can't be it ... i started reading it at 3:30 this afternoon and then stoped because it was making my

Re: Global field semantics

2006-07-10 Thread David Balmain
On 7/10/06, Doug Cutting <[EMAIL PROTECTED]> wrote: Chuck Williams wrote: > Lucene today allows many field properties to vary at the Field level. > E.g., the same field name might be tokenized in one Field on a Document > while it is untokenized in another Field on the same or different > Documen

Re: [jira] Commented: (LUCENE-623) RAMDirectory.close() should have a comment about not releasing any resources

2006-07-10 Thread Chris Hostetter
: I should metion that there is an upside to the patch it can : uncover bugs by detecting access after a close(). Before, this would : have worked with a RAMDirectory, but failed with a FSDirectory. yeah, seeing that test failure when i made the change locally is what sold me on commiting it

Re: Global field semantics

2006-07-10 Thread Doug Cutting
Chuck Williams wrote: Lucene today allows many field properties to vary at the Field level. E.g., the same field name might be tokenized in one Field on a Document while it is untokenized in another Field on the same or different Document. The rationale for this design was to keep the API simp