Bug in FieldInfo (omitTF)?

2008-11-18 Thread Shai Erera
Hi I looked at FieldInfo and found this line (95): if (this.omitTf != omitTf) { this.omitTf = true;// if one require omitTf at least once, it remains off for life } Shouldn't it be: if (this.omitTf != other.omitTf) { this.omitTf = true;//

Re: Bug in FieldInfo (omitTF)?

2008-11-18 Thread Adriano Crestani
I'm almost sure this was not the expected logic. Otherwise the this.omitTf = true statement will never be executed. Based on code logic, it should probably be what you are saying: this.omitTf != other.omitTf instead of this.omitTf omitTf : ) Regards, Adriano Crestani Campos On Tue, Nov 18,

Re: Bug in FieldInfo (omitTF)?

2008-11-18 Thread Michael Busch
see http://issues.apache.org/jira/browse/LUCENE-1456 Shai Erera wrote: Hi I looked at FieldInfo and found this line (95): if (this.omitTf != omitTf) { this.omitTf = true;// if one require omitTf at least once, it remains off for life } Shouldn't it be: if

[jira] Assigned: (LUCENE-1456) FieldInfo omitTerms bug

2008-11-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1456: -- Assignee: Michael McCandless FieldInfo omitTerms bug

Re: Bug in FieldInfo (omitTF)?

2008-11-18 Thread Michael McCandless
It is a bug, but it's in dead code that's never called. I'll remove the code. Mike Michael Busch wrote: see http://issues.apache.org/jira/browse/LUCENE-1456 Shai Erera wrote: Hi I looked at FieldInfo and found this line (95): if (this.omitTf != omitTf) { this.omitTf = true;

[jira] Resolved: (LUCENE-1456) FieldInfo omitTerms bug

2008-11-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1456. Resolution: Fixed Fix Version/s: 2.9 Committed revision 718537. Thanks

[jira] Commented: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2008-11-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12648552#action_12648552 ] Michael McCandless commented on LUCENE-1453: OK thanks for the review guys!

[jira] Updated: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1458: --- Attachment: LUCENE-1458.patch Further steps towards flexible indexing

[jira] Created: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Michael McCandless (JIRA)
Further steps towards flexible indexing --- Key: LUCENE-1458 URL: https://issues.apache.org/jira/browse/LUCENE-1458 Project: Lucene - Java Issue Type: New Feature Components: Index Affects

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12648613#action_12648613 ] Mark Miller commented on LUCENE-1458: - Hmmm...I think something is missing -

[jira] Updated: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1458: --- Attachment: LUCENE-1458.patch Woops, sorry... I was missing a bunch of files. Try

Re: Build failed in Hudson: Lucene-trunk #644

2008-11-18 Thread Chris Hostetter
: Sorry that the build was broken for three days. I don't have a hudson account. absolutely, 100%, not your fault ... i wasn't giving you a hard time at all, i was just trying to goad the other PMC members to take a more active role in maintaining our hudson setup. : So I can't get one unless

Re: [jira] Updated: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Jason Rutherglen
Michael, Can you describe a bit more about why the term dictionary index is no longer required? Jason On Tue, Nov 18, 2008 at 7:41 AM, Michael McCandless (JIRA) [EMAIL PROTECTED]wrote: [

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12648727#action_12648727 ] Marvin Humphrey commented on LUCENE-1458: - The work on streamlining the term

Re: [jira] Updated: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Michael McCandless
It's not that it isn't required -- it's just that it stores less info than before. I changed the _X.tis format such that at each seekable point (every 128 terms by default), everything is written as absolutes (term text, freq prox offset). This means the _X.tii file only has to store the

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12648739#action_12648739 ] Michael McCandless commented on LUCENE-1458: bq. Can we design a format that

Re: Luke bugs (Re: [jira] Commented: (LUCENE-1454) Corrupted index produced by lucene 2.4)

2008-11-18 Thread Michael McCandless
OK will do :) Nice that you're paying attention! Mike Andrzej Bialecki wrote: Michael McCandless (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12647960 #action_12647960 ] Michael

Re: Proposal for introducing CharFilter

2008-11-18 Thread Chris Hostetter
: If a given Tokenizer does not need to do any character normalization (I : would think most wouldn't) is there any added cost during tokenization with : this change? : : Thank you for your reply, Mike! : There is no added cost if Tokenizer doesn't need to call correctOffset(). But every

Re: Allow IndexReader to take ownership of Directory

2008-11-18 Thread Michael McCandless
I think this makes sense. But: I think we'd need to add incRef/decRef to Directory? And fix the newly added logic in DirectoryIndexReader that now clones the dir during reopen (because it's hardwired to only work with FSDir). Mike Mark Miller wrote: Does anyone object to making

Re: Allow IndexReader to take ownership of Directory

2008-11-18 Thread robert engels
Why not create new lightweight references to the the directory, and using WeakReferences and ReferenceQueues and avoid the need to manually use incRef and decRef ? Tracking state like this almost always leads to problems - this is why Java has GC in the first place - because it is very

[jira] Commented: (LUCENE-1342) 64bit JVM crashes on Linux

2008-11-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12648748#action_12648748 ] Michael McCandless commented on LUCENE-1342: Just to confirm, it was at least

[jira] Created: (LUCENE-1459) CachingWrapperFilter crashes if you call both bits() and getDocIdSet()

2008-11-18 Thread Matt Jones (JIRA)
CachingWrapperFilter crashes if you call both bits() and getDocIdSet() -- Key: LUCENE-1459 URL: https://issues.apache.org/jira/browse/LUCENE-1459 Project: Lucene - Java

[jira] Updated: (LUCENE-1459) CachingWrapperFilter crashes if you call both bits() and getDocIdSet()

2008-11-18 Thread Matt Jones (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Jones updated LUCENE-1459: --- Attachment: caching-wrapper-filter.diff Patch against 2.4.0 to be more careful about returning from

[jira] Commented: (LUCENE-1342) 64bit JVM crashes on Linux

2008-11-18 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12648768#action_12648768 ] Paul Smith commented on LUCENE-1342: java version 1.6.0_10 Java(TM) SE Runtime

[jira] Updated: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1458: --- Attachment: LUCENE-1458.patch [Attached patch] To test whether the new pluggable

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12648781#action_12648781 ] Michael Busch commented on LUCENE-1458: --- I'll look into this patch soon. Just

Re: [jira] Updated: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Jason Rutherglen
Nice! I'm looking at using PForDelta in creating the tag index type of system. Do you think there is an elegant way to add realtime updates to individual fields using the current (or future) flexible indexing API? On Tue, Nov 18, 2008 at 2:11 PM, Michael McCandless (JIRA) [EMAIL

Re: [jira] Updated: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Jason Rutherglen
On a side note, and I have not looked at the flexible indexing API enough to know if there is some equivalent but are we moving to something like MG4J's MutableString http://mg4j.dsi.unimi.it/docs/it/unimi/dsi/mg4j/util/MutableString.htmlinstead of java.lang.String objects? On Tue, Nov 18, 2008

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Jason Rutherglen
it'd be nice to genericize MultiLevelSkipListWriter so that it could index arbitrary files +1 on this idea. Using skip lists for the term index would be an improvement. On Tue, Nov 18, 2008 at 12:27 PM, Michael McCandless (JIRA) [EMAIL PROTECTED] wrote: [

[jira] Resolved: (LUCENE-1422) New TokenStream API

2008-11-18 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch resolved LUCENE-1422. --- Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch Available,

[jira] Created: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API

2008-11-18 Thread Michael Busch (JIRA)
Change all contrib TokenStreams/Filters to use the new TokenStream API -- Key: LUCENE-1460 URL: https://issues.apache.org/jira/browse/LUCENE-1460 Project: Lucene - Java

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12648835#action_12648835 ] Marvin Humphrey commented on LUCENE-1458: - I'm not sure I'd trust the OS's IO

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2008-11-18 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12648839#action_12648839 ] Michael Busch commented on LUCENE-1458: --- {quote} We could also explore something

[jira] Created: (LUCENE-1461) Cached filter for a single term field

2008-11-18 Thread Tim Sturge (JIRA)
Cached filter for a single term field - Key: LUCENE-1461 URL: https://issues.apache.org/jira/browse/LUCENE-1461 Project: Lucene - Java Issue Type: New Feature Reporter: Tim Sturge These

[jira] Updated: (LUCENE-1461) Cached filter for a single term field

2008-11-18 Thread Tim Sturge (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Sturge updated LUCENE-1461: --- Attachment: DisjointMultiFilter.java Base code which builds the integer array. Cached filter for

[jira] Updated: (LUCENE-1461) Cached filter for a single term field

2008-11-18 Thread Tim Sturge (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Sturge updated LUCENE-1461: --- Attachment: RangeMultiFilter.java Constructs a virtual RangeFilter on top of an already existing

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

2008-11-18 Thread Tim Sturge (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12648869#action_12648869 ] Tim Sturge commented on LUCENE-1461: Here's some benchmark data to demonstrate the

Re: Proposal for introducing CharFilter

2008-11-18 Thread Koji Sekiguchi
Chris Hostetter wrote: : If a given Tokenizer does not need to do any character normalization (I : would think most wouldn't) is there any added cost during tokenization with : this change? : : Thank you for your reply, Mike! : There is no added cost if Tokenizer doesn't need to call