[jira] Commented: (LUCENE-1672) Deprecate all String/File ctors/opens in IndexReader/IndexWriter/IndexSearcher

2009-06-01 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715093#action_12715093 ] Earwin Burrfoot commented on LUCENE-1672: - Yahoo! I was going to create the same

[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-01 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715144#action_12715144 ] Earwin Burrfoot commented on LUCENE-1673: - Sudden thought. Leave it in contribs

Re: Lucene's default settings back compatibility

2009-05-30 Thread Earwin Burrfoot
As far as I understand the policy-making process, someone from PMC has to start the vote, and then PMC members should, well, vote. Without them taking action we can beep to our hearts' content without any consequences. On Sat, May 30, 2009 at 07:22, Shai Erera ser...@gmail.com wrote: So ... I've

[jira] Created: (LUCENE-1668) Trunk fails tests, FSD.open() - related

2009-05-30 Thread Earwin Burrfoot (JIRA)
Reporter: Earwin Burrfoot [junit] Testcase: testReadAfterClose(org.apache.lucene.index.TestCompoundFile): FAILED [junit] expected readByte() to throw exception [junit] junit.framework.AssertionFailedError: expected readByte() to throw exception [junit

[jira] Commented: (LUCENE-1656) When sorting by field, IndexSearcher should not compute scores by default

2009-05-27 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12713613#action_12713613 ] Earwin Burrfoot commented on LUCENE-1656: - bq. I'm actually wondering if we should

[jira] Commented: (LUCENE-1658) Absorb NIOFSDirectory into FSDirectory

2009-05-26 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712930#action_12712930 ] Earwin Burrfoot commented on LUCENE-1658: - Yay for base class + three concrete

[jira] Commented: (LUCENE-1658) Absorb NIOFSDirectory into FSDirectory

2009-05-26 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712997#action_12712997 ] Earwin Burrfoot commented on LUCENE-1658: - bq. Wait, are you saying Win 64 bit has

[jira] Commented: (LUCENE-1654) Include diagnostics per-segment when writing a new segment

2009-05-26 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12713086#action_12713086 ] Earwin Burrfoot commented on LUCENE-1654: - bq. It's the reading side of that API

[jira] Commented: (LUCENE-1658) Absorb NIOFSDirectory into FSDirectory

2009-05-25 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712696#action_12712696 ] Earwin Burrfoot commented on LUCENE-1658: - bq. Excellent point - I think

[jira] Commented: (LUCENE-1658) Absorb NIOFSDirectory into FSDirectory

2009-05-24 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712615#action_12712615 ] Earwin Burrfoot commented on LUCENE-1658: - bq. enum java 1.5, unless you're going

[jira] Commented: (LUCENE-1654) Include diagnostics per-segment when writing a new segment

2009-05-24 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712616#action_12712616 ] Earwin Burrfoot commented on LUCENE-1654: - Let's use Collections.EMPTY_MAP instead

[jira] Commented: (LUCENE-1654) Include diagnostics per-segment when writing a new segment

2009-05-23 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712413#action_12712413 ] Earwin Burrfoot commented on LUCENE-1654: - We can start with string key-value

Re: Lucene's default settings back compatibility

2009-05-22 Thread Earwin Burrfoot
A funny thought: we can give those methods/classes really stupid/nasty names, to emphasize the beauty of the existing API, to encourage people to stick with the better API :) I believe I've seen google using internally names like thisisbadbadbadInstanceMap. :) One thing we didn't address

Re: Lucene's default settings back compatibility

2009-05-22 Thread Earwin Burrfoot
 1. If we deprecate an API in the 2.1 release, we can remove it in     the next minor release (2.2). Agree. Maybe also this? 1a. If deprecated functionality is trivially implemented with new one, we reserve the right to delete deprecated things right away with appropriate CHANGES note. Sample I:

Re: Lucene's default settings back compatibility

2009-05-22 Thread Earwin Burrfoot
In KinoSearch SVN trunk, satellite classes like QueryParser and Highlighter have to be passed a Schema, which contains all the Analyzers.  Analyzers aren't satellite classes under this model -- they are a fixed property of a FullTextType field spec.  Think of them as baked into an SQL field

Re: Lucene's default settings back compatibility

2009-05-22 Thread Earwin Burrfoot
Custom analyzers. No problem. How are they recorded in the index? Several indexes using the same analyzer. No problem.  Only necessary if the analyzer is costly or has some esoteric need for shared state.  And possible via subclassing Schema or Analyzer. It is. Intentionally different

[jira] Commented: (LUCENE-1654) Include diagnostics per-segment when writing a new segment

2009-05-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712269#action_12712269 ] Earwin Burrfoot commented on LUCENE-1654: - Let's have string key-value pairs per

[jira] Issue Comment Edited: (LUCENE-1654) Include diagnostics per-segment when writing a new segment

2009-05-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712269#action_12712269 ] Earwin Burrfoot edited comment on LUCENE-1654 at 5/22/09 2:26 PM

[jira] Commented: (LUCENE-1648) when you clone or reopen an IndexReader with pending changes, the new reader doesn't commit the changes

2009-05-21 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711571#action_12711571 ] Earwin Burrfoot commented on LUCENE-1648: - bq. Try the patch? Yup, it fixed

[jira] Commented: (LUCENE-1648) when you clone or reopen an IndexReader with pending changes, the new reader doesn't commit the changes

2009-05-21 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711575#action_12711575 ] Earwin Burrfoot commented on LUCENE-1648: - Or to be more exact, it fixed the tests

[jira] Updated: (LUCENE-1648) when you clone or reopen an IndexReader with pending changes, the new reader doesn't commit the changes

2009-05-21 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-1648: Attachment: LUCENE-1648-followup.patch bq. Bad news is something is wrong w/ your patch

[jira] Updated: (LUCENE-1648) when you clone or reopen an IndexReader with pending changes, the new reader doesn't commit the changes

2009-05-21 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-1648: Attachment: LUCENE-1648-followup.patch And here's the fix. The problem - it's not elegant

SegmentReader instantiation

2009-05-21 Thread Earwin Burrfoot
Right now a set of system properties and Class.newInstance() is used to create SegmentReader. I've tracked down this code's origins to: r150531 | cutting | 2004-09-22 22:32:27 +0400 (ср, 22 сен 2004) | 2 lines Add GCJ native code for SegmentTermDocs.read(int[],int[]) to accellerate TermScorer.

Re: SegmentReader instantiation

2009-05-21 Thread Earwin Burrfoot
2009/5/21 Michael McCandless luc...@mikemccandless.com: It looks like this was done in order to implement SegmentTermDocs.read(int[], int[]) natively, when using a gcj environment, since that gave performance improvements? Yup, you're right. But something tells me, since Lucene 1.9 many things

Re: Lucene's default settings back compatibility

2009-05-21 Thread Earwin Burrfoot
That bug has led to 'base' having a compromised reputation among elite users because of intermittent, inexplicable flakiness.  Is that what you want for Lucene? While I agree with that point, Lucene already has lots and lots of static configuration. Having actsAsVersion won't add any new woes.

[jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean

2009-05-21 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711773#action_12711773 ] Earwin Burrfoot commented on LUCENE-1614: - bq. Oh, it turns out OBSI.nextDoc

Re: Lucene's default settings back compatibility

2009-05-21 Thread Earwin Burrfoot
Sounds like a good proposition. There's one problem I'd like to address. Good names for classes/members matter, and matter much. They directly affect how fast a newcomer is able to understand that particular API, it also affects how comfortable you work with it once you did understand. When we're

Re: Lucene's default settings back compatibility

2009-05-21 Thread Earwin Burrfoot
Why not store an actsAs in the index, just for the changes that affect what's in the index?  Ie the index records the version that created it, and by default TokenStreams emulate their behavior as of that version? Because you don't always have access to index at the time you create your

[jira] Commented: (LUCENE-1645) Deleted documents are visible across reopened MSRs

2009-05-20 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711139#action_12711139 ] Earwin Burrfoot commented on LUCENE-1645: - bq. Right; I think we should simply

Re: Lucene's default settings back compatibility

2009-05-20 Thread Earwin Burrfoot
Exactly what happens when you call BooleanQuery.setMaxClauseCount(n) from two libraries. Last one wins. On Wed, May 20, 2009 at 17:50, Marvin Humphrey mar...@rectangular.com wrote: But since 3.0 is a major release anyway, we could change the default of actsAsVersion with each 3.x release (or

[jira] Updated: (LUCENE-1645) Deleted documents are visible across reopened MSRs

2009-05-20 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-1645: Attachment: LUCENE-1645.patch Here's the fix. Plus slightly modified test that fails

Re: Lucene's default settings back compatibility

2009-05-20 Thread Earwin Burrfoot
In fact, there's no reason to upgrade Lucene (save for bigfixes), if you absolutely require a drop-in jar, and don't want to touch any of your code. See, you upgrade either for new features, or for performance improvements. You have to write code for former, and you have to write code for the

Re: Lucene's default settings back compatibility

2009-05-20 Thread Earwin Burrfoot
Mark Miller: If you have upgraded Lucene over the years and you never touched code to tweak performance, you still got fantastic performance improvements. You just didn't get them all. If you never touched the code over the years, your project is probably already dead. Shai Erera: Exactly !

Re: Lucene's default settings back compatibility

2009-05-20 Thread Earwin Burrfoot
That said, I see the points and value of relaxing the back compat policy as well. Its been discussed a lot in the past, and it has been eased in the past. Afraid to ask which additional shackles Lucene bore in the past. I mean, 'what' has to be eased to produce policies we have right now?

[jira] Commented: (LUCENE-1648) when you clone or reopen an IndexReader with pending changes, the new reader doesn't commit the changes

2009-05-20 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711378#action_12711378 ] Earwin Burrfoot commented on LUCENE-1648: - I wonder if the only two tests I still

Re: Lucene's default settings back compatibility

2009-05-19 Thread Earwin Burrfoot
On Tue, May 19, 2009 at 16:56, Grant Ingersoll gsing...@apache.org wrote: There's a difference between std. coding practices and purposefully putting in lots of if checks to solve back compatibility issues that are created in order to satisfy some naming convention. Given the length of time

[jira] Created: (LUCENE-1645) Deleted documents are visible across reopened MSRs

2009-05-19 Thread Earwin Burrfoot (JIRA)
Reporter: Earwin Burrfoot -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org

Re: Re(opening) (Multi)SegmentReaders

2009-05-19 Thread Earwin Burrfoot
, Michael McCandless luc...@mikemccandless.com wrote: On Mon, May 18, 2009 at 7:56 AM, Earwin Burrfoot ear...@gmail.com wrote: Will post one soon. Had to slightly modify a whole bunch of tests relying on IndexReader.open(dir) returning SegmentReader instance for single-segment indexes. Currently

[jira] Updated: (LUCENE-1645) Deleted documents are visible across reopened MSRs

2009-05-19 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-1645: Attachment: LUCENE-1645.patch If you reopen() MSR with unchanged segments, the resulting

[jira] Commented: (LUCENE-1645) Deleted documents are visible across reopened MSRs

2009-05-19 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12710894#action_12710894 ] Earwin Burrfoot commented on LUCENE-1645: - Either that. Or having boolean

[jira] Commented: (LUCENE-1645) Deleted documents are visible across reopened MSRs

2009-05-19 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711006#action_12711006 ] Earwin Burrfoot commented on LUCENE-1645: - Lazy clone() is a bad idea, since

Re: Re(opening) (Multi)SegmentReaders

2009-05-18 Thread Earwin Burrfoot
that doesn't hamper backwards compatibility? 2009/5/17 Michael McCandless luc...@mikemccandless.com: I tentatively think that's a good idea.  The reopen logic is quite hairy... Wanna make a separate patch for that? Mike On Sun, May 17, 2009 at 8:37 AM, Earwin Burrfoot ear...@gmail.com wrote

Re(opening) (Multi)SegmentReaders

2009-05-17 Thread Earwin Burrfoot
While experimenting with indexReader 'components', I've got this thought: What if we always create MultiSegmentReader when (re)opening an index, even if index contains a single segment? Using unwrapped SegmentReader for single-segment case was a valid optimization for the times when Lucene did

Random test failure

2009-05-16 Thread Earwin Burrfoot
Running latest lucene trunk with some patches applied, but they do not touch IndexWriter and friends anywhere. Happened once, I failed to reproduce it, with and without patches. Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153) Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57,

[jira] Issue Comment Edited: (LUCENE-1387) Add LocalLucene

2009-05-10 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707821#action_12707821 ] Earwin Burrfoot edited comment on LUCENE-1387 at 5/10/09 11:16 AM

[jira] Commented: (LUCENE-1387) Add LocalLucene

2009-05-10 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707821#action_12707821 ] Earwin Burrfoot commented on LUCENE-1387: - LatLonDistanceFilter.java: public

Re: Sort on TermEnum

2009-05-08 Thread Earwin Burrfoot
Isn't it better to have specially prepared sort fields? Like lowercased, if you want case-insensitive comparisons, or stripped of whitespace and punctuation, like I did once. That way you have more flexibility and also don't kill performance outright. On Fri, May 8, 2009 at 11:58, Federica Falini

[jira] Commented: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector

2009-05-05 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12705977#action_12705977 ] Earwin Burrfoot commented on LUCENE-1593: - bq. So for this issue: create

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-05-05 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706129#action_12706129 ] Earwin Burrfoot commented on LUCENE-1607: - Bug in previous algo (unbounded hash

Score calculation with new by-segment collection

2009-04-30 Thread Earwin Burrfoot
Did I miss something, or when trunk switched to collecting on SegmentReaders we've lost proper scores? I mean, before score depended on TF calculated across all the index, and now it depends on TF for a given segment (yup, unless I missed something). Per-segment TF can vary wildly, especially in

Re: Score calculation with new by-segment collection

2009-04-30 Thread Earwin Burrfoot
On Fri, May 1, 2009 at 00:47, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Apr 30, 2009 at 4:44 PM, Earwin Burrfoot ear...@gmail.com wrote: Did I miss something, or when trunk switched to collecting on SegmentReaders we've lost proper scores? I mean, before score depended on TF

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-29 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12704225#action_12704225 ] Earwin Burrfoot commented on LUCENE-1607: - Mmm.. what's the status of this one

[jira] Updated: (LUCENE-1607) String.intern() faster alternative

2009-04-29 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-1607: Attachment: LUCENE-1607.patch This should do. I replaced a pair of intern()s

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-29 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12704313#action_12704313 ] Earwin Burrfoot commented on LUCENE-1607: - Is there 'any' benefit of dumping

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-29 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12704315#action_12704315 ] Earwin Burrfoot commented on LUCENE-1607: - A top bound on cache size will do

[jira] Commented: (LUCENE-1618) Allow setting the IndexWriter docstore to be a different directory

2009-04-28 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703656#action_12703656 ] Earwin Burrfoot commented on LUCENE-1618: - bq. You mean an opened IndexOutput

[jira] Updated: (LUCENE-1618) Allow setting the IndexWriter docstore to be a different directory

2009-04-28 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-1618: Attachment: MemoryCachedDirectory.java Allow setting the IndexWriter docstore

[jira] Commented: (LUCENE-1618) Allow setting the IndexWriter docstore to be a different directory

2009-04-28 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703666#action_12703666 ] Earwin Burrfoot commented on LUCENE-1618: - bq. what is this diff anyway? That's

[jira] Commented: (LUCENE-1618) Allow setting the IndexWriter docstore to be a different directory

2009-04-28 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703684#action_12703684 ] Earwin Burrfoot commented on LUCENE-1618: - bq. Sorry, by diff I meant

[jira] Issue Comment Edited: (LUCENE-1622) Multi-word synonym filter (synonym expansion at indexing time).

2009-04-28 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703790#action_12703790 ] Earwin Burrfoot edited comment on LUCENE-1622 at 4/28/09 11:50 AM

[jira] Commented: (LUCENE-1622) Multi-word synonym filter (synonym expansion at indexing time).

2009-04-28 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703790#action_12703790 ] Earwin Burrfoot commented on LUCENE-1622: - I'll shortly cite my experiences

[jira] Commented: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703054#action_12703054 ] Earwin Burrfoot commented on LUCENE-1616: - Separate setters might have their own

[jira] Commented: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703067#action_12703067 ] Earwin Burrfoot commented on LUCENE-1616: - I have two cases. In one case I can't

[jira] Commented: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703157#action_12703157 ] Earwin Burrfoot commented on LUCENE-1616: - bq. removing separate looks a bit

[jira] Commented: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703288#action_12703288 ] Earwin Burrfoot commented on LUCENE-1616: - bq. Span*Query api is a perfect

[jira] Commented: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector

2009-04-27 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703368#action_12703368 ] Earwin Burrfoot commented on LUCENE-1593: - Use FMPP? It is pretty nice

[jira] Commented: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector

2009-04-27 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703382#action_12703382 ] Earwin Burrfoot commented on LUCENE-1593: - bq. Forgive my ignorance, but what

Re: Synonym filter with support for phrases?

2009-04-23 Thread Earwin Burrfoot
On Wed, Apr 22, 2009 at 5:12 AM, Earwin Burrfoot ear...@gmail.com wrote: Your synonyms will break if you try searching for phrases. Building on your example, food place in new york will find nothing, because 'place' and 'in' share the same position. It'd be great to get multi-word synonyms

Re: Synonym filter with support for phrases?

2009-04-23 Thread Earwin Burrfoot
engine. So guys looking for MSU CMC really want to get Московский Государственный Университет, факультет ВМиК and his friends. And? How often do they extend this particular phrase with further terms? They don't need to. Variations of this phrase alone killed my first several approaches to

[jira] Commented: (LUCENE-1609) Eliminate synchronization contention on initial index reading in TermInfosReader ensureIndexIsRead

2009-04-23 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701986#action_12701986 ] Earwin Burrfoot commented on LUCENE-1609: - The problem is not with indexState

[jira] Issue Comment Edited: (LUCENE-1609) Eliminate synchronization contention on initial index reading in TermInfosReader ensureIndexIsRead

2009-04-23 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701986#action_12701986 ] Earwin Burrfoot edited comment on LUCENE-1609 at 4/23/09 9:41 AM

[jira] Commented: (LUCENE-1609) Eliminate synchronization contention on initial index reading in TermInfosReader ensureIndexIsRead

2009-04-23 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702011#action_12702011 ] Earwin Burrfoot commented on LUCENE-1609: - You cannot put all these fields

Re: Synonym filter with support for phrases?

2009-04-22 Thread Earwin Burrfoot
Hello everyone, I'm looking for feedback and thoughts on the following problem (it's more of development than user-centered problem, hope the dev list is appropriate): - a token stream is given, - a set of synonyms is given, where synonyms are token sequences to be matched and token

Re: Synonym filter with support for phrases?

2009-04-22 Thread Earwin Burrfoot
Building on your example, food place in new york will find nothing, because 'place' and 'in' share the same position. You're right, but is it such a big problem in real life? Well, everyone has his own requirements for the search quality. For us it was a problem. User enters a query, then

Re: Synonym filter with support for phrases?

2009-04-22 Thread Earwin Burrfoot
Your example concerns phrase queries, so somebody would have to keep adding terms to a phrase. My experience with open search queries (I had access to a larger slice of queries from Microsoft Live) is that phrases are a minority of all searches. In the most common case, people will look for a

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701626#action_12701626 ] Earwin Burrfoot commented on LUCENE-1607: - I tried it out. Works a little bit

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-20 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700696#action_12700696 ] Earwin Burrfoot commented on LUCENE-1607: - Okay, you're probably right. It's

[jira] Updated: (LUCENE-1607) String.intern() faster alternative

2009-04-20 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-1607: Attachment: LUCENE-1607.patch Okay, I thought more about that. Yonik is amazing

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-20 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700928#action_12700928 ] Earwin Burrfoot commented on LUCENE-1607: - Hehe, ten minute difference. Take over

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-20 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700935#action_12700935 ] Earwin Burrfoot commented on LUCENE-1607: - bq. Collisions should also be very

String.intern() alternative for field names

2009-04-19 Thread Earwin Burrfoot
Okay, we'd like to have equality-by-reference for field names, yielding überfast comparisions in all our tight inner loops. But we dislike default String.intern() for its java-native transitions and general lentitude. There's a perfect solution. Too dumb to come up with it myself, but fortunately

[jira] Created: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot (JIRA)
String.intern() faster alternative -- Key: LUCENE-1607 URL: https://issues.apache.org/jira/browse/LUCENE-1607 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot

[jira] Updated: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-1607: Attachment: intern.patch String.intern() faster alternative

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700601#action_12700601 ] Earwin Burrfoot commented on LUCENE-1607: - bq. This default would be more back

Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot
On Sun, Apr 19, 2009 at 23:16, Chris Miller chris.mil...@kbcfp.com wrote: As far as I can see, both these implementations only suffer from threadsafety problems in that they don't guarantee visibility across threads, ie it's possible for threads to see stale data. So the code should work fine

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700604#action_12700604 ] Earwin Burrfoot commented on LUCENE-1607: - bq. What was the field count

Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot
On Sun, Apr 19, 2009 at 23:42, Chris Miller chris.mil...@kbcfp.com wrote: As soon as all possible fields are in the pool, we're essentially readonly. The problem is, there's no guarantee we will ever reach this point. For example suppose you have a server app that spawns a new thread per

Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot
Sorry I wasn't as clear as I could have been - I realise JEE servers use a threadpool for handling requests, I was thinking of many other applications in the real world I'm aware of that don't (be that good design or otherwise...). You was. I just wanted to point out that in real apps you're

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-18 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700430#action_12700430 ] Earwin Burrfoot commented on LUCENE-831: {quote} Allowing values to change, just

Re: I wanna contribute a Chinese analyzer to lucene

2009-04-16 Thread Earwin Burrfoot
On Thu, Apr 16, 2009 at 18:16, Ken Krugler kkrugler_li...@transpac.com wrote: I wrote a Analyzer for apache lucene for analyzing sentences in Chinese language, it's called imdict-chinese-analyzer as it is a subproject of imdict, which is an intelligent online dictionary. The project on google

[jira] Commented: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector

2009-04-15 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699524#action_12699524 ] Earwin Burrfoot commented on LUCENE-1593: - Lucene failed drop in replacement

Re: IndexReader plugins

2009-04-14 Thread Earwin Burrfoot
IndexReader.java is littered with the likes of: public static IndexReader open(final Directory directory, IndexDeletionPolicy deletionPolicy) throws CorruptIndexException, IOException; But I don't understand why is this a problem... Doubling the number of factory methods? We have to keep old

Re: IndexReader plugins

2009-04-14 Thread Earwin Burrfoot
With the early binding approach, you wouldn't pass all plugins during creation; you'd pass a factory object that exposes methods like:  getPostingsComponent(SegmentInfo)  getStoredFieldsComponent(SegmentInfo)  getValueSourceComponent(SegmentInfo) That basically kills the whole idea.

Re: IndexReader plugins

2009-04-14 Thread Earwin Burrfoot
The original example justification was to avoid putting a ValueSource in the IndexReader (I guess avoiding the funky init code? valueSource = new CachingValueSource(this, new UninversionValueSource(this)) That was a bit of drama for the sake of drama, I couldn't restrain myself :) My

[jira] Commented: (LUCENE-1604) Stop creating huge arrays to represent the absense of field norms

2009-04-14 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698869#action_12698869 ] Earwin Burrfoot commented on LUCENE-1604: - bq. There is also a [presumably

[jira] Commented: (LUCENE-1604) Stop creating huge arrays to represent the absense of field norms

2009-04-14 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698886#action_12698886 ] Earwin Burrfoot commented on LUCENE-1604: - Yep, that was my blunder. :) bq

Re: IndexReader plugins

2009-04-14 Thread Earwin Burrfoot
Mark Miller wrote: The distinction I am making with core is that we will have to call known methods on those core 'modules' that are not very generic? Doesn't that keep it from playing nice with the very generic 'attach this to this segment'? Genericity spans binding, notifications and

Re: IndexReader plugins

2009-04-14 Thread Earwin Burrfoot
Michael McCandless wrote: I gave the example to show the init vs inflight distinction, because inflight makes me nervous. I'm thinking of some (bad name follows) PluginBundle, that has add/remove/inspect methods and constructor/method for filling it with default Lucene components. Then instead

Re: IndexReader plugins

2009-04-14 Thread Earwin Burrfoot
On Wed, Apr 15, 2009 at 00:15, Mark Miller markrmil...@gmail.com wrote: Mark Miller wrote: Earwin Burrfoot wrote: Mark Miller wrote: The distinction I am making with core is that we will have to call known methods on those core 'modules' that are not very generic? Doesn't that keep

Re: IndexReader plugins

2009-04-14 Thread Earwin Burrfoot
On Wed, Apr 15, 2009 at 00:55, Mark Miller markrmil...@gmail.com wrote: Earwin Burrfoot wrote: On Wed, Apr 15, 2009 at 00:15, Mark Miller markrmil...@gmail.com wrote: Mark Miller wrote: Earwin Burrfoot wrote: Mark Miller wrote: The distinction I am making with core is that we

<    1   2   3   4   5   6   7   >