Re: Lucene 2.9 and deprecated IR.open() methods

2009-10-02 Thread Earwin Burrfoot
On Sat, Oct 3, 2009 at 03:29, Uwe Schindler u...@thetaphi.de wrote: It is also probably a good idea to move various settings methods from IW to that builder and have IW immutable in regards to configuration. I'm speaking of the likes of setWriteLockTimeout, setRAMBufferSizeMB, setMergePolicy,

Re: Lucene 2.9 and deprecated IR.open() methods

2009-10-02 Thread Earwin Burrfoot
, Oct 2, 2009 at 7:45 PM, Earwin Burrfoot ear...@gmail.com wrote: On Sat, Oct 3, 2009 at 03:29, Uwe Schindler u...@thetaphi.de wrote: It is also probably a good idea to move various settings methods from IW to that builder and have IW immutable in regards to configuration. I'm speaking

Re: Lucene 2.9 and deprecated IR.open() methods

2009-10-02 Thread Earwin Burrfoot
?) EG:   IndexWriter.builder(Version.29, dir, analyzer)     .setRAMBufferSizeMB(128)     .setUseCompoundFile(false)     ...     .create() ? Mike On Fri, Oct 2, 2009 at 7:45 PM, Earwin Burrfoot ear...@gmail.com wrote: On Sat, Oct 3, 2009 at 03:29, Uwe Schindler u...@thetaphi.de wrote

Re: Optimization and Corruption Issues

2009-10-01 Thread Earwin Burrfoot
2.0 is pre Mike's fabulous indexing updates - which just for one means one thread doing the merging rather than multiple. I'm sure overall its much slower. If you're doing a full optimize, you're still using a single thread. Am I wrong? -- Kirill Zakharenko/Кирилл Захаренко

Re: Optimization and Corruption Issues

2009-10-01 Thread Earwin Burrfoot
If you're doing a full optimize, you're still using a single thread. Am I wrong? Depends on how many merges are required, and, the merge scheduler.  In this case (w/ 7000 segments, which is way too many, normally!), assuming ConcurrentMergeScheduler, multiple threads will be used since

Re: Query Parsing was Fwd: Lab - Esqueranto

2009-09-25 Thread Earwin Burrfoot
We use antlr, though without its tree api, it's a bit of overkill. It directly builds a query in our intermediate format which is traversed for synonym/phrase detection and converted to lucene query. The library/language itself is pretty easy to learn, flexible, and has a nice IDE. On Fri, Sep

Re: How to leverage the LogMergePolicy calibrateSizeByDeletes patch in Solr ?

2009-09-22 Thread Earwin Burrfoot
On Tue, Sep 22, 2009 at 19:08, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Sep 22, 2009 at 10:48 AM, Michael McCandless luc...@mikemccandless.com wrote: John are you using IndexWriter.setMergedSegmentWarmer, so that a newly merged segment is warmed before it's put into production

Re: who clears attributes?

2009-08-11 Thread Earwin Burrfoot
On Tue, Aug 11, 2009 at 15:09, Yonik Seeleyyo...@lucidimagination.com wrote: On Tue, Aug 11, 2009 at 6:50 AM, Robert Muirrcm...@gmail.com wrote: On Tue, Aug 11, 2009 at 4:28 AM, Michael Buschbusch...@gmail.com wrote: There was a performance test in Solr that apparently ran much slower after

Re: who clears attributes?

2009-08-11 Thread Earwin Burrfoot
The only person that tried to disprove this claim is Uwe. Others either say the problems are solved, so it's okay to move to the new API, or this will be usable when flexindexing arrives. Others (not me) have spent a lot of time going over this before (more than once I think) - they prob are

[jira] Commented: (LUCENE-1799) Unicode compression

2009-08-11 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741868#action_12741868 ] Earwin Burrfoot commented on LUCENE-1799: - I think right now this can

Re: indexing_slowdown_with_latest_lucene_udpate

2009-08-10 Thread Earwin Burrfoot
Or, we can just throw that detection out of the window, for less smooth back-compat experience, less hacky code and no slowdown. On Mon, Aug 10, 2009 at 19:02, Uwe Schindleru...@thetaphi.de wrote: The question is, if that would get better if the reflection calls are only done one time per class

[jira] Commented: (LUCENE-1793) remove custom encoding support in Greek/Russian Analyzers

2009-08-10 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741372#action_12741372 ] Earwin Burrfoot commented on LUCENE-1793: - bq. I am guessing the rationale

Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
I'll deviate from the topic somewhat. What are exact benefits that new tokenstream API yields? Are we sure we want it released with 2.9? By now I only see various elaborate problems, but haven't seen a single piece of code becoming simpler. On Mon, Aug 10, 2009 at 21:50, Uwe

Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
On Mon, Aug 10, 2009 at 22:50, Grant Ingersollgsing...@apache.org wrote: On Aug 10, 2009, at 2:00 PM, Earwin Burrfoot wrote: I'll deviate from the topic somewhat. What are exact benefits that new tokenstream API yields? Are we sure we want it released with 2.9? By now I only see various

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
On Tue, Aug 11, 2009 at 00:37, Michael Buschbusch...@gmail.com wrote: On 8/10/09 1:30 PM, Grant Ingersoll wrote: I think your 2.5 proposal has drawbacks: if we release 2.5 now to test the new major features in the field, then do you want to stop adding new features to trunk until we release

Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
On Tue, Aug 11, 2009 at 00:54, Uwe Schindleru...@thetaphi.de wrote: I have serious doubts about releasing this new API until these performance issues are resolved and better proven out from a usability standpoint. I think LUCENE-1796 has fixed the performance problems, which was

Re: pieces missing in reusable analyzers?

2009-08-10 Thread Earwin Burrfoot
I had thought that implementing reusable analyzers in solr was going to be cake... but either I'm missing something, or Lucene is missing something. Here's the way that one used to create custom analyzers: class CustomAnalyzer extends Analyzer {  public TokenStream tokenStream(String

Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
Well, I have real use cases for it, but all of it is still missing the biggest piece:  search side support.  It's the 900 lb. elephant in the room.   The 500 lb. elephant is the fact that all these attributes, AIUI, require you to hook in your own indexing chain, etc. in order to even be

Re: pieces missing in reusable analyzers?

2009-08-10 Thread Earwin Burrfoot
I'm just keeping a reference to Tokenizer, so I can reset it with a new reader. Though this situation is awkward, TS definetly does not need a reset(Reader). Then how do you notify the other filters that they should reset their state? TokenStream.reset()?  The javadoc specifies that it's

Re: ConcurrentMergeScheduler and MergePolicy question

2009-08-09 Thread Earwin Burrfoot
On Sun, Aug 9, 2009 at 08:38, Jason Rutherglenjason.rutherg...@gmail.com wrote: You don't have to copy. You can have one machine optimize your indexes whilst other serves user requests, then they switch roles, rinse, repeat. This approach also works with sharding, and more than 2-way

Re: ConcurrentMergeScheduler and MergePolicy question

2009-08-08 Thread Earwin Burrfoot
Perhaps the ideal search system architecture that requires optimizing is to dedicate a server to it, copy the index to the optimize server, do the optimize, copy the index off (to a search server) and start again for the next optimize task. I wonder how/if this would work with Hadoop/HDFS as

Re: Attributes, DocConsumer, Flexible Indexing, etc.

2009-08-06 Thread Earwin Burrfoot
I always thought flexible indexing is not only for storing your app-specific data next to terms/docs. Something more along the lines of efficient geo search, or ability to try out various index encoding schemes without patching lucene. In other words, this is something that can be a basis for

Re: IndexWriter.getReader usage

2009-08-03 Thread Earwin Burrfoot
The biggest win for NRT was switching to per-segment Collector because that meant we could re-use FieldCache entries for all segments that hadn't changed. In my opinion, this switch was enough to get as NRT-ey, as you want. Fusing IR/IW together makes Lucene a great deal more complicated and

Re: Java caching of low-level index data?

2009-08-03 Thread Earwin Burrfoot
I'm curious if anyone has thought about (or even tried) caching the low-level index data in Java, rather than in the OS. For example, at the IndexInput level there could be an LRU cache of byte[] blocks, similar to how a RDBMS caches index pages. (Conveniently, BufferedIndexInput already

[jira] Commented: (LUCENE-1748) getPayloadSpans on org.apache.lucene.search.spans.SpanQuery should be abstract

2009-07-18 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12732938#action_12732938 ] Earwin Burrfoot commented on LUCENE-1748: - bq. We should drop PayloadSpans

[jira] Commented: (LUCENE-1748) getPayloadSpans on org.apache.lucene.search.spans.SpanQuery should be abstract

2009-07-16 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731939#action_12731939 ] Earwin Burrfoot commented on LUCENE-1748: - bq. Shouldnt it throw a runtime

[jira] Commented: (LUCENE-1748) getPayloadSpans on org.apache.lucene.search.spans.SpanQuery should be abstract

2009-07-16 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731972#action_12731972 ] Earwin Burrfoot commented on LUCENE-1748: - I took a glance at the code, the whole

[jira] Issue Comment Edited: (LUCENE-1748) getPayloadSpans on org.apache.lucene.search.spans.SpanQuery should be abstract

2009-07-16 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731972#action_12731972 ] Earwin Burrfoot edited comment on LUCENE-1748 at 7/16/09 7:54 AM

[jira] Commented: (LUCENE-1743) MMapDirectory should only mmap large files, small files should be opened using SimpleFS/NIOFS

2009-07-15 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731632#action_12731632 ] Earwin Burrfoot commented on LUCENE-1743: - The initial motive for the issue seems

[jira] Issue Comment Edited: (LUCENE-1743) MMapDirectory should only mmap large files, small files should be opened using SimpleFS/NIOFS

2009-07-15 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731632#action_12731632 ] Earwin Burrfoot edited comment on LUCENE-1743 at 7/15/09 12:14 PM

[jira] Commented: (LUCENE-1743) MMapDirectory should only mmap large files, small files should be opened using SimpleFS/NIOFS

2009-07-15 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731639#action_12731639 ] Earwin Burrfoot commented on LUCENE-1743: - bq. My problem was more with all

Re: A Comparison of Open Source Search Engines

2009-07-06 Thread Earwin Burrfoot
I'd say out of these libraries only Lucene and Sphinx are worth mentioning. There's also MG4J, which wasn't covered and has a nice algorithmic background. Anybody knows other interesting open-source search engines? On Tue, Jul 7, 2009 at 00:39, John Wangjohn.w...@gmail.com wrote: Vik did a very

[jira] Commented: (LUCENE-1488) issues with standardanalyzer on multilingual text

2009-07-02 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12726571#action_12726571 ] Earwin Burrfoot commented on LUCENE-1488: - bq. There is no morphological

Re: Improving TimeLimitedCollector

2009-06-27 Thread Earwin Burrfoot
Why don't you use Thread.interrupt(), .isInterrupted() ? On Sat, Jun 27, 2009 at 16:16, Shai Ereraser...@gmail.com wrote: A downside of breaking it out into static methods like this is that a thread cannot run 1 time-limited activity simultaneously but I guess that might be a reasonable

[jira] Commented: (LUCENE-1342) 64bit JVM crashes on Linux

2009-06-26 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724441#action_12724441 ] Earwin Burrfoot commented on LUCENE-1342: - bq. Sun can't ignore a HotSpot compiler

Re: Improving TimeLimitedCollector

2009-06-24 Thread Earwin Burrfoot
Having scorers check timeouts while advancing will definetly increase the frequency of said timeouts. On Wed, Jun 24, 2009 at 13:13, eks deveks...@yahoo.co.uk wrote: Re: I think such a parameter should not exist on individual search methods since it's more of a global setting (i.e., I want my

[jira] Commented: (LUCENE-1712) Set default precisionStep for NumericField and NumericRangeFilter

2009-06-23 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722996#action_12722996 ] Earwin Burrfoot commented on LUCENE-1712: - Having half of your methods constantly

[jira] Commented: (LUCENE-1715) DirectoryIndexReader finalize() holding TermInfosReader longer than necessary

2009-06-23 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723224#action_12723224 ] Earwin Burrfoot commented on LUCENE-1715: - I object nulling references in attempt

[jira] Commented: (LUCENE-1715) DirectoryIndexReader finalize() holding TermInfosReader longer than necessary

2009-06-23 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723225#action_12723225 ] Earwin Burrfoot commented on LUCENE-1715: - And support removing finalizers

[jira] Commented: (LUCENE-1715) DirectoryIndexReader finalize() holding TermInfosReader longer than necessary

2009-06-23 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723289#action_12723289 ] Earwin Burrfoot commented on LUCENE-1715: - There's in fact one case where nulling

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-06-23 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723352#action_12723352 ] Earwin Burrfoot commented on LUCENE-1607: - Okay, let's have an extra class

[jira] Commented: (LUCENE-1677) Remove GCJ IndexReader specializations

2009-06-23 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723355#action_12723355 ] Earwin Burrfoot commented on LUCENE-1677: - Mike, are we going to postpone actual

[jira] Commented: (LUCENE-1677) Remove GCJ IndexReader specializations

2009-06-23 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723378#action_12723378 ] Earwin Burrfoot commented on LUCENE-1677: - I thought we're doing everything right

[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

2009-06-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722769#action_12722769 ] Earwin Burrfoot commented on LUCENE-1701: - Using 4 for int, 6 for long. Dates

[jira] Issue Comment Edited: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

2009-06-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722769#action_12722769 ] Earwin Burrfoot edited comment on LUCENE-1701 at 6/22/09 12:18 PM

[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

2009-06-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722775#action_12722775 ] Earwin Burrfoot commented on LUCENE-1701: - Design for today. And spend two years

Re: Shouldn't IndexWriter.commit(Map) accept Properties instead?

2009-06-22 Thread Earwin Burrfoot
What other issues would we be taking on by using Java's serialization here...? It's insanely slow. Though, that doesn't apply to a once-per-commit call. The other point is, if you store Object, you can no longer mix lucene and user data. With MapString, whatever approach you could reserve some

[jira] Commented: (LUCENE-1712) Set default precisionStep for NumericField and NumericRangeFilter

2009-06-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722843#action_12722843 ] Earwin Burrfoot commented on LUCENE-1712: - Am I misunderstanding something

[jira] Commented: (LUCENE-1712) Set default precisionStep for NumericField and NumericRangeFilter

2009-06-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722851#action_12722851 ] Earwin Burrfoot commented on LUCENE-1712: - Aha! And each time you invoke

Re: 3MB lucene-analyzers.jar?

2009-06-21 Thread Earwin Burrfoot
But: I do not understand the problems with this JAR file. If somebody really wants to have smaller files, one could use some tools, that do it automatically on class usage. I personally have a couple of usecases for that as I have to work in very limited environments. Imagine embedded systems

[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

2009-06-19 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721787#action_12721787 ] Earwin Burrfoot commented on LUCENE-1701: - I vote for factories - escaping back

[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

2009-06-19 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721830#action_12721830 ] Earwin Burrfoot commented on LUCENE-1701: - Mike, I very much agree with everything

[jira] Issue Comment Edited: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

2009-06-19 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721830#action_12721830 ] Earwin Burrfoot edited comment on LUCENE-1701 at 6/19/09 8:50 AM

[jira] Commented: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

2009-06-19 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722060#action_12722060 ] Earwin Burrfoot commented on LUCENE-1701: - bq. Someday maybe I'll convince you

[jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

2009-06-17 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720619#action_12720619 ] Earwin Burrfoot commented on LUCENE-1630: - I wasn't following the issue closely

Re: madvise(ptr, len, MADV_SEQUENTIAL)

2009-06-16 Thread Earwin Burrfoot
Except, you don't know the size of the file to be written upfront. One probable solution is to map output file in pages. As a complementary solution you can map a huge area of the file, and hope few real memory is allocated by OS unless you actually write all over that area. Dunno. The idea of

Re: Proposal for changing the backwards-compatibility policy

2009-06-16 Thread Earwin Burrfoot
Oh yes! Again! +1 One point is missing. What about incompatible behavioral changes that do not touch API and file format? Like posIncr=0 at the first token in stream, or analyzer fixes, or something along these lines. Are we free to introduce them in a minor release without warning, or are we

[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-16 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720231#action_12720231 ] Earwin Burrfoot commented on LUCENE-1673: - bq. This is that baking in a specific

[jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

2009-06-15 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719539#action_12719539 ] Earwin Burrfoot commented on LUCENE-1630: - I like the last option most. Creating

[jira] Issue Comment Edited: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

2009-06-15 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719539#action_12719539 ] Earwin Burrfoot edited comment on LUCENE-1630 at 6/15/09 5:36 AM

Re: Payloads and TrieRangeQuery

2009-06-14 Thread Earwin Burrfoot
Just to throw something out, the new Token API is not very consumable in my experience. The old one was very intuitive and very easy to follow the code. I've had to refigure out what the heck was going on with the new one more than once now. Writing some example code with it is hard to follow

[jira] Commented: (LUCENE-1488) issues with standardanalyzer on multilingual text

2009-06-14 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719322#action_12719322 ] Earwin Burrfoot commented on LUCENE-1488: - bq. But this can't replace

[jira] Commented: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-10 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718009#action_12718009 ] Earwin Burrfoot commented on LUCENE-1453: - bq. As the Filter is just a deprecated

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Earwin Burrfoot
And this information about the trie structure and where payloads are should be stored in FieldInfos. As is the case today, the info is encoded in the class you use (and it's settings)... no need to add it to the index structure.  In any case, it's a completely different issue and shouldn't

[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-06-10 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718198#action_12718198 ] Earwin Burrfoot commented on LUCENE-1607: - bq. but I was waiting for some kind

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Earwin Burrfoot
 * Was the field even indexed w/ Trie, or indexed as simple text?    It's useful to know this automatically at search time, so eg a    RangeQuery can do the right thing by default.  FieldInfos seems    like the natural place to store this.  It's basically Lucene's    per-segment write-once

[jira] Commented: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-09 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717657#action_12717657 ] Earwin Burrfoot commented on LUCENE-1453: - Patch looks fine. I read the last one

Re: Some thoughts around the use of reader.isDeleted and hasDeletions

2009-06-09 Thread Earwin Burrfoot
Actually: I think we should also change IndexReader.document to not check if it's deleted?  (Renaming it to something like rawDocument(), storedDocument(), something, in the process, and deprecating the old one). Yup. After all the most common use-case is to load a document after finding it in

[jira] Commented: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-09 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717769#action_12717769 ] Earwin Burrfoot commented on LUCENE-1453: - bq. I think it should (be closed

[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-06-09 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717823#action_12717823 ] Earwin Burrfoot commented on LUCENE-1678: - Second this. Though I lost any hope

[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-06-09 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717862#action_12717862 ] Earwin Burrfoot commented on LUCENE-1678: - bq. If there are sane/smart ways

[jira] Commented: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-09 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717866#action_12717866 ] Earwin Burrfoot commented on LUCENE-1453: - Two suggestions: Factor out RefCount

Re: Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-06-09 Thread Earwin Burrfoot
@Mark: Okay, there's an escape hatch I (and someone else) mentioned on the list before. Adopting a fixed release cycle with small intervals between releases (compared to what we have now). Fixed - as in, releases are made each N months instead of when everyone feels they finished and polished

[jira] Commented: (LUCENE-1648) when you clone or reopen an IndexReader with pending changes, the new reader doesn't commit the changes

2009-06-07 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717089#action_12717089 ] Earwin Burrfoot commented on LUCENE-1648: - As LUCENE-1651 is now committed

[jira] Commented: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-07 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717107#action_12717107 ] Earwin Burrfoot commented on LUCENE-1453: - bq. There are two possibilities to fix

[jira] Commented: (SOLR-706) Fast auto-complete suggestions

2009-06-07 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717108#action_12717108 ] Earwin Burrfoot commented on SOLR-706: -- When I did autocompletion for my project, simple

[jira] Commented: (SOLR-236) Field collapsing

2009-06-07 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717110#action_12717110 ] Earwin Burrfoot commented on SOLR-236: -- I have implemented collapsing on a high-volume

Re: IR static methods

2009-06-04 Thread Earwin Burrfoot
/SegmentInfos.java How about IndexInfo?  Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Earwin Burrfoot ear...@gmail.com To: java-dev@lucene.apache.org Sent: Wednesday, June 3, 2009 8:08:50 AM Subject: IR static methods I have

[jira] Commented: (LUCENE-1651) Make IndexReader.open() always return MSR to simplify (re-)opens.

2009-06-03 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715836#action_12715836 ] Earwin Burrfoot commented on LUCENE-1651: - Seems the patch didn't apply completely

[jira] Commented: (LUCENE-1651) Make IndexReader.open() always return MSR to simplify (re-)opens.

2009-06-03 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715908#action_12715908 ] Earwin Burrfoot commented on LUCENE-1651: - bq. Patch looks good Earwin, thanks! I

IR static methods

2009-06-03 Thread Earwin Burrfoot
I have a strong desire to remove all these static methods from IR - lastModified, getCurrentVersion, getCommitUserData, indexExists. But haven't found a good place for them yet. Directory - is a bad place, it shouldn't concern itself with details of what exactly is stored inside, it should think

[jira] Commented: (LUCENE-1672) Deprecate all String/File ctors/opens in IndexReader/IndexWriter/IndexSearcher

2009-06-03 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715944#action_12715944 ] Earwin Burrfoot commented on LUCENE-1672: - bq. I will later try to solve

[jira] Commented: (LUCENE-1672) Deprecate all String/File ctors/opens in IndexReader/IndexWriter/IndexSearcher

2009-06-03 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715962#action_12715962 ] Earwin Burrfoot commented on LUCENE-1672: - bq. And DirectoryIR/MSR still have

[jira] Updated: (LUCENE-1651) Make IndexReader.open() always return MSR to simplify (re-)opens.

2009-06-03 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-1651: Attachment: LUCENE-1651-tag.patch LUCENE-1651.patch Argh! The rename

[jira] Updated: (LUCENE-1651) Make IndexReader.open() always return MSR to simplify (re-)opens.

2009-06-03 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-1651: Attachment: LUCENE-1651.patch One more version, applies against current trunk without

Re: Enhance StandardTokenizer to support words which will not be tokenized

2009-06-03 Thread Earwin Burrfoot
Not sure you can easily marry generated JFlex grammar and runtime-provided list of protected words. I took the approach of creating tokens for punctuation inside my tokenizer and later gluing them with nearby text tokens or dropping from the stream with a tokenfilter. On Wed, Jun 3, 2009 at

[jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

2009-06-03 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715973#action_12715973 ] Earwin Burrfoot commented on LUCENE-1630: - Searcher is supposed to be a little

[jira] Created: (LUCENE-1677) Remove GCJ IndexReader specializations

2009-06-03 Thread Earwin Burrfoot (JIRA)
Remove GCJ IndexReader specializations -- Key: LUCENE-1677 URL: https://issues.apache.org/jira/browse/LUCENE-1677 Project: Lucene - Java Issue Type: Task Reporter: Earwin Burrfoot

[jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

2009-06-02 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715509#action_12715509 ] Earwin Burrfoot commented on LUCENE-1630: - You can't, because Weights produced

[jira] Commented: (LUCENE-1651) Make IndexReader.open() always return MSR to simplify (re-)opens.

2009-06-02 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715672#action_12715672 ] Earwin Burrfoot commented on LUCENE-1651: - Hm.. okay, I've got back to work

[jira] Updated: (LUCENE-1651) Make IndexReader.open() always return MSR to simplify (re-)opens.

2009-06-02 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-1651: Attachment: LUCENE-1651-tag.patch LUCENE-1651.patch Here are the patches

[jira] Commented: (LUCENE-1658) Absorb NIOFSDirectory into FSDirectory

2009-06-01 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715008#action_12715008 ] Earwin Burrfoot commented on LUCENE-1658: - I told you, Java mmap doesn't work

[jira] Issue Comment Edited: (LUCENE-1658) Absorb NIOFSDirectory into FSDirectory

2009-06-01 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715016#action_12715016 ] Earwin Burrfoot edited comment on LUCENE-1658 at 6/1/09 1:14 AM

[jira] Commented: (LUCENE-1658) Absorb NIOFSDirectory into FSDirectory

2009-06-01 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715016#action_12715016 ] Earwin Burrfoot commented on LUCENE-1658: - Really? Let me quote some code (MacOS

[jira] Commented: (LUCENE-1658) Absorb NIOFSDirectory into FSDirectory

2009-06-01 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715018#action_12715018 ] Earwin Burrfoot commented on LUCENE-1658: - Ah! You was referring to your code

[jira] Commented: (LUCENE-1658) Absorb NIOFSDirectory into FSDirectory

2009-06-01 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715026#action_12715026 ] Earwin Burrfoot commented on LUCENE-1658: - I tested on MacOS: Invalid memory

[jira] Commented: (LUCENE-1658) Absorb NIOFSDirectory into FSDirectory

2009-06-01 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715027#action_12715027 ] Earwin Burrfoot commented on LUCENE-1658: - bq. It uses less virtual memory

[jira] Commented: (LUCENE-1658) Absorb NIOFSDirectory into FSDirectory

2009-06-01 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715057#action_12715057 ] Earwin Burrfoot commented on LUCENE-1658: - bq. I'm a bit nervous about creating

[jira] Issue Comment Edited: (LUCENE-1658) Absorb NIOFSDirectory into FSDirectory

2009-06-01 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715063#action_12715063 ] Earwin Burrfoot edited comment on LUCENE-1658 at 6/1/09 4:16 AM

[jira] Commented: (LUCENE-1658) Absorb NIOFSDirectory into FSDirectory

2009-06-01 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715063#action_12715063 ] Earwin Burrfoot commented on LUCENE-1658: - bq. On a couple of projects I've worked

<    1   2   3   4   5   6   7   >