[jira] Updated: (LUCENE-1679) Make WildcardTermEnum#difference() non-final

2009-06-10 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1679: Attachment: WildcardTermEnum_cleanup.patch WildcardTermEnum.patch > Make W

[jira] Created: (LUCENE-1679) Make WildcardTermEnum#difference() non-final

2009-06-10 Thread Simon Willnauer (JIRA)
Make WildcardTermEnum#difference() non-final Key: LUCENE-1679 URL: https://issues.apache.org/jira/browse/LUCENE-1679 Project: Lucene - Java Issue Type: Improvement Components: Search

[jira] Created: (LUCENE-1680) Make prefixLength accessible to PrefixTermEnum subclasses

2009-06-10 Thread Simon Willnauer (JIRA)
Make prefixLength accessible to PrefixTermEnum subclasses - Key: LUCENE-1680 URL: https://issues.apache.org/jira/browse/LUCENE-1680 Project: Lucene - Java Issue Type: Improvement Af

[jira] Updated: (LUCENE-1680) Make prefixLength accessible to PrefixTermEnum subclasses

2009-06-10 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1680: Attachment: PrefixTermEnum.patch > Make prefixLength accessible to PrefixTermEnum subclass

[jira] Updated: (LUCENE-1680) Make prefixLength accessible to PrefixTermEnum subclasses

2009-06-10 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1680: Priority: Minor (was: Major) > Make prefixLength accessible to PrefixTermEnum subclasses

[jira] Updated: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1453: -- Attachment: LUCENE-1453.patch Hi Earwin, attached is a patch, that simply reuses SegmentReader

[jira] Issue Comment Edited: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718004#action_12718004 ] Uwe Schindler edited comment on LUCENE-1453 at 6/10/09 2:40 AM:

[jira] Commented: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718008#action_12718008 ] Michael McCandless commented on LUCENE-1453: bq. Mike will you do this, or sho

[jira] Commented: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-10 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718009#action_12718009 ] Earwin Burrfoot commented on LUCENE-1453: - bq. As the Filter is just a deprecated

[jira] Assigned: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-1453: - Assignee: Uwe Schindler (was: Michael McCandless) > When reopen returns a new IndexRead

[jira] Commented: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718013#action_12718013 ] Uwe Schindler commented on LUCENE-1453: --- Mike: OK, I commit the latest patch soon!

[jira] Closed: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler closed LUCENE-1453. - Resolution: Fixed Committed revision 783280. 2.4 branch is untouched, if backporting is needed

[jira] Assigned: (LUCENE-1680) Make prefixLength accessible to PrefixTermEnum subclasses

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1680: -- Assignee: Michael McCandless > Make prefixLength accessible to PrefixTermEnum

[jira] Commented: (LUCENE-1680) Make prefixLength accessible to PrefixTermEnum subclasses

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718024#action_12718024 ] Michael McCandless commented on LUCENE-1680: Should we just add a getter for t

[jira] Reopened: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-1453: I'm seeing a failure in back-compat tests ("and test-tag -Dtestcase=TestIndexReader")

[jira] Assigned: (LUCENE-1679) Make WildcardTermEnum#difference() non-final

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1679: -- Assignee: Michael McCandless > Make WildcardTermEnum#difference() non-final >

[jira] Updated: (LUCENE-1679) Make WildcardTermEnum#difference() non-final

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1679: --- Fix Version/s: 2.9 > Make WildcardTermEnum#difference() non-final >

[jira] Commented: (LUCENE-1679) Make WildcardTermEnum#difference() non-final

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718029#action_12718029 ] Michael McCandless commented on LUCENE-1679: I like the cleanup patch, but, I

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Michael McCandless
Use them how? (Sounds interesting...). Mike On Tue, Jun 9, 2009 at 10:32 PM, Jason Rutherglen wrote: > At the SF Lucene User's group, Michael Busch mentioned using > payloads with TrieRangeQueries. Is this something that's being > worked on? I'm interested in what sort performance benefits > the

[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718031#action_12718031 ] Michael McCandless commented on LUCENE-1678: bq. So, given it is already broke

[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718032#action_12718032 ] Michael McCandless commented on LUCENE-1678: bq. The sane/smart way is to do i

[jira] Updated: (LUCENE-1680) Make prefixLength accessible to PrefixTermEnum subclasses

2009-06-10 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1680: Attachment: PrefixTermEnum_2nd.patch > Make prefixLength accessible to PrefixTermEnum subc

[jira] Updated: (LUCENE-1679) Make WildcardTermEnum#difference() non-final

2009-06-10 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1679: Attachment: WildcardTermEnum_cleanup_2nd.patch > Make WildcardTermEnum#difference() non-fi

[jira] Commented: (LUCENE-1680) Make prefixLength accessible to PrefixTermEnum subclasses

2009-06-10 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718038#action_12718038 ] Simon Willnauer commented on LUCENE-1680: - You are right, adding a getter for pref

[jira] Commented: (LUCENE-1679) Make WildcardTermEnum#difference() non-final

2009-06-10 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718039#action_12718039 ] Simon Willnauer commented on LUCENE-1679: - I created a new patch containing the #c

[jira] Commented: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718041#action_12718041 ] Uwe Schindler commented on LUCENE-1453: --- Thanks Mike, it is from this fix. The test

[jira] Updated: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1453: -- Attachment: LUCENE-1453-fix-TestIndexReader.patch This fixes this special case and the test on

[jira] Issue Comment Edited: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718041#action_12718041 ] Uwe Schindler edited comment on LUCENE-1453 at 6/10/09 5:07 AM:

[jira] Closed: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler closed LUCENE-1453. - Resolution: Fixed Committed revision 783314. Thanks Mike! Next time I will also test-tag, sorry.

[jira] Commented: (LUCENE-1679) Make WildcardTermEnum#difference() non-final

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718051#action_12718051 ] Michael McCandless commented on LUCENE-1679: bq. I created a new patch contain

[jira] Commented: (LUCENE-1680) Make prefixLength accessible to PrefixTermEnum subclasses

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718054#action_12718054 ] Michael McCandless commented on LUCENE-1680: bq. Also, I think we can't sudden

[jira] Commented: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718055#action_12718055 ] Michael McCandless commented on LUCENE-1453: OK thanks Uwe! > When reopen ret

[jira] Updated: (LUCENE-1595) Split DocMaker into ContentSource and DocMaker

2009-06-10 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1595: --- Attachment: LUCENE-1595.patch Some updates: # Added to PerfTask a log.step config parameter, and imp

[jira] Commented: (LUCENE-1679) Make WildcardTermEnum#difference() non-final

2009-06-10 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718069#action_12718069 ] Simon Willnauer commented on LUCENE-1679: - I see. I could not thing of anything wh

[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718081#action_12718081 ] Michael McCandless commented on LUCENE-1678: bq. Adopting a fixed release cycl

[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718079#action_12718079 ] Michael McCandless commented on LUCENE-1678: bq. Mike was gung ho for it for

[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718080#action_12718080 ] Michael McCandless commented on LUCENE-1678: bq. The way Lucene stuff general

[jira] Created: (LUCENE-1681) DocValues infinite loop caused by - a call to getMinValue | getMaxValue | getAverageValue

2009-06-10 Thread Simon Willnauer (JIRA)
DocValues infinite loop caused by - a call to getMinValue | getMaxValue | getAverageValue - Key: LUCENE-1681 URL: https://issues.apache.org/jira/browse/LUCENE-1681

[jira] Updated: (LUCENE-1681) DocValues infinite loop caused by - a call to getMinValue | getMaxValue | getAverageValue

2009-06-10 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1681: Attachment: DocValues.patch > DocValues infinite loop caused by - a call to getMinValue |

Re: [jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-06-10 Thread Mark Miller
Michael McCandless (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718080#action_12718080 ] Michael McCandless commented on LUCENE-1678:

[jira] Commented: (LUCENE-1595) Split DocMaker into ContentSource and DocMaker

2009-06-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718090#action_12718090 ] Mark Miller commented on LUCENE-1595: - Someone else can nab this from me if they want

[jira] Resolved: (LUCENE-1680) Make prefixLength accessible to PrefixTermEnum subclasses

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1680. Resolution: Fixed Thanks Simon! > Make prefixLength accessible to PrefixTermEnum

[jira] Resolved: (LUCENE-1679) Make WildcardTermEnum#difference() non-final

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1679. Resolution: Fixed Thanks Simon! > Make WildcardTermEnum#difference() non-final >

Re: Some thoughts around the use of reader.isDeleted and hasDeletions

2009-06-10 Thread Shai Erera
> > it makes sense because isDeleted() is essentially the *only* thing > being done in the loop, and hence we can eliminate the loop entirely > You mean that in case there is a matching segment, we can call matchingVectorsReader.rawDocs(rawDocLengths, rawDocLengths2, 0, maxDoc)? But in case it doe

[jira] Updated: (LUCENE-1682) unit tests should use private directories

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1682: --- Attachment: LUCENE-1682.patch I plan to commit later today... > unit tests should u

[jira] Created: (LUCENE-1682) unit tests should use private directories

2009-06-10 Thread Michael McCandless (JIRA)
unit tests should use private directories - Key: LUCENE-1682 URL: https://issues.apache.org/jira/browse/LUCENE-1682 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandle

Re: Some thoughts around the use of reader.isDeleted and hasDeletions

2009-06-10 Thread Michael McCandless
On Wed, Jun 10, 2009 at 11:16 AM, Shai Erera wrote: >> it makes sense because isDeleted() is essentially the *only* thing >> being done in the loop, and hence we can eliminate the loop entirely > > You mean that in case there is a matching segment, we can call > matchingVectorsReader.rawDocs(rawDoc

Re: Some thoughts around the use of reader.isDeleted and hasDeletions

2009-06-10 Thread Yonik Seeley
On Wed, Jun 10, 2009 at 11:16 AM, Shai Erera wrote: >> it makes sense because isDeleted() is essentially the *only* thing >> being done in the loop, and hence we can eliminate the loop entirely > > You mean that in case there is a matching segment, we can call > matchingVectorsReader.rawDocs(rawDo

[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-06-10 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718122#action_12718122 ] Shai Erera commented on LUCENE-1678: We've had this thread http://www.nabble.com/Luce

Re: Lucene's default settings & back compatibility

2009-06-10 Thread Mark Miller
No one really responded to this Shai? And I take it that the user list never saw it? Perhaps we should just ask for opinion from the user list based on what you already have - just to gauge the reaction on different points. Unless someone responds shortly, we could take a year waiting to shake

Re: [jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-06-10 Thread Mark Miller
Michael McCandless (JIRA) wrote: bq. Adopting a fixed release cycle with small intervals between releases (compared to what we have now). I think this is almost a good solution, though instead of "fixed" it could be that we try [harder] to do major

back compat is good

2009-06-10 Thread Yonik Seeley
I'm starting to feel like the lone holdout that thinks back compat for commonly used interfaces and index formats is important. So I'll sum up some of my thoughts and leave it at that: - I doubt that the number of new users for each release of Lucene exceeds the sum total of all existing users of

Re: [jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-06-10 Thread Michael McCandless
On Wed, Jun 10, 2009 at 12:45 PM, Mark Miller wrote: > I've heard that one before ;) In fact, we pretty much committed to releasing > more often. Now if 2.9 would just fall into line with our darn commitments > :) I hear you! So... how about we try to wrap up 2.9/3.0 and ship with what we have,

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Jason Rutherglen
I think instead of ORing postings (trie range, rangequery, etc), have a custom Query + Scorer that examines the payload (somehow)? It could encode the multiple levels of trie bits in it? (I'm just guessing here). On Wed, Jun 10, 2009 at 4:04 AM, Michael McCandless < [email protected]> wr

Re: back compat is good

2009-06-10 Thread Michael McCandless
Well... Lucene still seems to be experiencing strong adoption/growth, eg combined user+dev email traffic: http://lucene.markmail.org/ Net/net, I also think that back-compat is important and we shouldn't up and abandon it or relax our policy too much. However, I wish we had better tools for *im

Re: back compat is good

2009-06-10 Thread Mark Miller
Yonik Seeley wrote: I'm starting to feel like the lone holdout that thinks back compat for commonly used interfaces and index formats is important. I think the fact that your not the only one is why things got stymied. I wouldnt personally support anything that didnt try and maintain stabili

Re: back compat is good

2009-06-10 Thread Yonik Seeley
On Wed, Jun 10, 2009 at 2:01 PM, Michael McCandless wrote: > Well... Lucene still seems to be experiencing strong adoption/growth, > eg combined user+dev email traffic: > http://lucene.markmail.org/ I think that includes all Lucene sub-projects (Solr, Tika, Mahout, Nutch, Droids, etc). http://lu

RE: Payloads and TrieRangeQuery

2009-06-10 Thread Uwe Schindler
Hi, sorry I missed the first mail. The idea we discussed in Amsterdam during ApacheCon was: Instead of indexing all trie precisions from e.g. the leftmost 8 bits downto all 64 bits, the TrieTokenStream only creates terms from e.g. precisions 8 to 56. The last precision is left out. Instead

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Michael McCandless
Ooh that sounds compelling! So you would not need to use payloads for the "inside" brackets, right? Only for the edges? I wonder how performance would compare. Without payloads, there are many more terms (for the tiny ranges) in the index, and your OR query will have lots of these tiny terms.

Re: back compat is good

2009-06-10 Thread Simon Willnauer
On Wed, Jun 10, 2009 at 7:00 PM, Yonik Seeley wrote: > I'm starting to feel like the lone holdout that thinks back compat for > commonly used interfaces and index formats is important.  So I'll sum > up some of my thoughts and leave it at that: > > - I doubt that the number of new users for each re

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Yonik Seeley
Yep, makes sense. It could be a little slower, but it would decrease the number of terms indexed by a factor of 256 (for 8 bits). But the payload part... seems like another case of using that because CSF isn't there yet, right? (well, perhaps except if you didn't want to store the field...) -Yon

Lucene / Solr Function API

2009-06-10 Thread Simon Willnauer
Hey there, I'm curious if anybody is working on the issue https://issues.apache.org/jira/browse/LUCENE-1085 and the blocker https://issues.apache.org/jira/browse/LUCENE-1085 ? I would love to see both solr and lucene using the same api for search functions. The issues have been idle for a while so

RE: Payloads and TrieRangeQuery

2009-06-10 Thread Uwe Schindler
> Ooh that sounds compelling! > > So you would not need to use payloads for the "inside" brackets, > right? Only for the edges? Exactly. > I wonder how performance would compare. Without payloads, there are > many more terms (for the tiny ranges) in the index, and your OR query > will have lot

Re: back compat is good

2009-06-10 Thread Michael McCandless
On Wed, Jun 10, 2009 at 2:23 PM, Yonik Seeley wrote: >> Well... Lucene still seems to be experiencing strong adoption/growth, >> eg combined user+dev email traffic: >> http://lucene.markmail.org/ > > I think that includes all Lucene sub-projects (Solr, Tika, Mahout, > Nutch, Droids, etc). > > http

Re: Lucene / Solr Function API

2009-06-10 Thread Michael McCandless
Well, it's unassigned and has no comments so my guess is: it's all yours! This would be a great step forward. The line between Solr & Lucene ought to be more "crisp" and this issue is a step towards that... Mike On Wed, Jun 10, 2009 at 2:59 PM, Simon Willnauer wrote: > Hey there, > > I'm curiou

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Yonik Seeley
On Wed, Jun 10, 2009 at 3:07 PM, Uwe Schindler wrote: > My problem with all this is how to optimize after which shift value to > switch between terms and payloads. Just make it a configurable number of bits at the end that are "stored" instead of indexed. People will want to select different tra

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Michael McCandless
On Wed, Jun 10, 2009 at 3:07 PM, Uwe Schindler wrote: >> I wonder how performance would compare.  Without payloads, there are >> many more terms (for the tiny ranges) in the index, and your OR query >> will have lots of these tiny terms.  But then these tiny terms don't >> hit many docs, and with

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Michael McCandless
On Wed, Jun 10, 2009 at 3:19 PM, Yonik Seeley wrote: >> And this information about the trie >> structure and where payloads are should be stored in FieldInfos. > > As is the case today, the info is encoded in the class you use (and > it's settings)... no need to add it to the index structure.  In

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Earwin Burrfoot
>>> And this information about the trie >>> structure and where payloads are should be stored in FieldInfos. >> >> As is the case today, the info is encoded in the class you use (and >> it's settings)... no need to add it to the index structure.  In any >> case, it's a completely different issue an

[jira] Commented: (LUCENE-1609) Eliminate synchronization contention on initial index reading in TermInfosReader ensureIndexIsRead

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718175#action_12718175 ] Michael McCandless commented on LUCENE-1609: Alas... the big problem with doin

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Yonik Seeley
On Wed, Jun 10, 2009 at 3:43 PM, Michael McCandless wrote: > On Wed, Jun 10, 2009 at 3:19 PM, Yonik Seeley > wrote: > >>> And this information about the trie >>> structure and where payloads are should be stored in FieldInfos. >> >> As is the case today, the info is encoded in the class you use (

[jira] Commented: (LUCENE-1448) add getFinalOffset() to TokenStream

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718176#action_12718176 ] Michael McCandless commented on LUCENE-1448: Michael are you going to get to t

[jira] Updated: (LUCENE-1584) Callback for intercepting merging segments in IndexWriter

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1584: --- Fix Version/s: (was: 2.9) Moving out. > Callback for intercepting merging segme

Re: back compat is good

2009-06-10 Thread Mark Miller
As far as default settings, it seems like it can be mostly fixed with documentation (i.e. recommended settings for maximum performance). That seems like a very small burden for people writing new applications with Lucene anyway (compare to the cost of writing the whole application). On the othe

[jira] Updated: (LUCENE-1577) Benchmark of different in RAM realtime techniques

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1577: --- Fix Version/s: (was: 2.9) Moving out. > Benchmark of different in RAM realtime

Re: Lucene's default settings & back compatibility

2009-06-10 Thread Shai Erera
Well .. to be honest I haven't monitored java-user for quite some time, so I don't know if it hasn't been raised there. But now there's the other thread that Yonik started, so I'm not really sure where to answer. I think that if we look back at 2.0 and compare to 2.9, anyone upgrading from that v

Re: Lucene memory usage

2009-06-10 Thread Jason Rutherglen
Great! If I understand correctly it looks like RAM savings? Will there be an improvement in lookup speed? (We're using binary search here?). Is there a precedence in database systems for what was mentioned about placing the term dict, delDocs, and filters onto disk and reading them from there (wit

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718181#action_12718181 ] Michael McCandless commented on LUCENE-1607: Yonik is this ready to go in...?

[jira] Resolved: (LUCENE-1682) unit tests should use private directories

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1682. Resolution: Fixed > unit tests should use private directories > --

[jira] Updated: (LUCENE-1671) FSDirectory internally caches and clones FSIndexInput

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1671: --- Fix Version/s: (was: 2.9) Moving out. > FSDirectory internally caches and clone

[jira] Commented: (LUCENE-1584) Callback for intercepting merging segments in IndexWriter

2009-06-10 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718185#action_12718185 ] Jason Rutherglen commented on LUCENE-1584: -- Can we put this one in 2.9? It seems

Re: Lucene's default settings & back compatibility

2009-06-10 Thread Mark Miller
Right - I'd actually hold off now. I figured the threat of sending might prompt some action ;) It still wouldn't hurt to know what the users think, perhaps at more digestible, overview level though. I do think Yonik torpedoed something this liberal :) Thats not a bad thing though. We will fi

Re: back compat is good

2009-06-10 Thread Yonik Seeley
On Wed, Jun 10, 2009 at 4:11 PM, Mark Miller wrote: > The computer should handle that > for me. It really should be as easy > as saying, look I want the best new defaults, or I want the back compat > defaults. The computer should figure > out the rest for me. actsAsVersion ;-) nice and back compa

Re: Lucene memory usage

2009-06-10 Thread Michael McCandless
On Wed, Jun 10, 2009 at 4:13 PM, Jason Rutherglen wrote: > Great! If I understand correctly it looks like RAM savings? Will > there be an improvement in lookup speed? (We're using binary > search here?). Yes, sizable RAM reduction for apps that have many unique terms. And, init'ing (warming) the

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-06-10 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718188#action_12718188 ] Yonik Seeley commented on LUCENE-1607: -- I think so... but I was waiting for some kind

Re: back compat is good

2009-06-10 Thread Grant Ingersoll
I'm not against back compatibility. In fact, I agree with your points, especially the use of the phrase "commonly used interfaces". My main problem is our approach seems to be very dogmatic and detrimental for _less_ commonly used interfaces (more importantly less commonly _implemented_ In

[jira] Updated: (LUCENE-1607) String.intern() faster alternative

2009-06-10 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-1607: - Attachment: LUCENE-1607.patch latest patch - could use a multi-threaded testcase to ensure no ex

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-06-10 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718198#action_12718198 ] Earwin Burrfoot commented on LUCENE-1607: - bq. but I was waiting for some kind of

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Michael McCandless
On Wed, Jun 10, 2009 at 4:04 PM, Earwin Burrfoot wrote: > And then, when you merge segments indexed with different Trie* > settings, you need to convert them to some common form. > Sounds like something too complex and with minimum returns. Oh yeah... tricky. So... there are various situations t

RE: Payloads and TrieRangeQuery

2009-06-10 Thread Uwe Schindler
> On Wed, Jun 10, 2009 at 3:43 PM, Michael McCandless > wrote: > > On Wed, Jun 10, 2009 at 3:19 PM, Yonik > Seeley wrote: > > > >>> And this information about the trie > >>> structure and where payloads are should be stored in FieldInfos. > >> > >> As is the case today, the info is encoded in the

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Michael McCandless
On Wed, Jun 10, 2009 at 5:07 PM, Uwe Schindler wrote: > I would really like to leave this optimization out for 2.9. We can still add > this after 2.9 as an optimization. The number of bits encoded into the > TermPosition (this is really a cool idea, thanks Yonik, I was missing > exactly that, becau

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Yonik Seeley
On Wed, Jun 10, 2009 at 5:03 PM, Michael McCandless wrote: > On Wed, Jun 10, 2009 at 4:04 PM, Earwin Burrfoot wrote: >  * Was the field even indexed w/ Trie, or indexed as "simple text"? Why the special treatment for Trie? >    It's useful to know this "automatically" at search time, so eg a >  

RE: Payloads and TrieRangeQuery

2009-06-10 Thread Uwe Schindler
> I would like to go forward with moving the classes into the right packages > and optimize the way, how queries and analyzers are created (only one > class > for each). The idea from LUCENE-1673 to use static factories to create > these > classes for the different data types seems to be more elega

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Earwin Burrfoot
>  * Was the field even indexed w/ Trie, or indexed as "simple text"? >    It's useful to know this "automatically" at search time, so eg a >    RangeQuery can do the right thing by default.  FieldInfos seems >    like the natural place to store this.  It's basically Lucene's >    per-segment write

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Michael McCandless
I think we'd need richer communication between MTQ and its subclasses, so that eg your enum would return a Query instead of a Term? Then you'd either return a TermQuery, or, a BooleanQuery that's filtering the TermQuery? But yes, doing after 3.0 seems good! Mike On Wed, Jun 10, 2009 at 5:26 PM,

[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718209#action_12718209 ] Michael McCandless commented on LUCENE-1673: bq. NumericRangeQuery.newFloatRan

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Michael McCandless
On Wed, Jun 10, 2009 at 5:24 PM, Yonik Seeley wrote: > On Wed, Jun 10, 2009 at 5:03 PM, Michael McCandless > wrote: >> On Wed, Jun 10, 2009 at 4:04 PM, Earwin Burrfoot wrote: >> * Was the field even indexed w/ Trie, or indexed as "simple text"? > > Why the special treatment for Trie? So that at

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Yonik Seeley
> Another question not so simple to answer: When embedding these TermPositions > into the whole process, how would this work with MultiTermQuery? There's no reason why Trie has to use MultiTermQuery, right? -Yonik http://www.lucidimagination.com --

RE: Payloads and TrieRangeQuery

2009-06-10 Thread Uwe Schindler
> I think we'd need richer communication between MTQ and its subclasses, > so that eg your enum would return a Query instead of a Term? > > Then you'd either return a TermQuery, or, a BooleanQuery that's > filtering the TermQuery? > > But yes, doing after 3.0 seems good! There is one other thing

RE: Payloads and TrieRangeQuery

2009-06-10 Thread Uwe Schindler
> > Another question not so simple to answer: When embedding these > TermPositions > > into the whole process, how would this work with MultiTermQuery? > > There's no reason why Trie has to use MultiTermQuery, right? No but is elegant and simplifies much (see current code in trunk). Uwe --

  1   2   >