Re: IndexReader plugins

2009-04-13 Thread Earwin Burrfoot
, or is it an unnecessary complication? Like: indexReader.bindPlugin(instance).to(Iface1.class, Iface2.class); And then: indexReader.plugin(Iface1.class) == indexReader.plugin(Iface2.class) Mike On Sun, Apr 12, 2009 at 7:34 PM, Earwin Burrfoot ear...@gmail.com wrote: To support my dream

Re: IndexReader plugins

2009-04-13 Thread Earwin Burrfoot
Can we outline some requirements for the plugin API? Do we want to attach/detach them to IndexReader after it is created, or only during construction? I think I'd lean towards only at construction.  Seems dangerous to allow swap in/out at some later time. I have several points pro-runtime:

Re: IndexReader plugins

2009-04-13 Thread Earwin Burrfoot
On Mon, Apr 13, 2009 at 17:14, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Apr 13, 2009 at 9:02 AM, Earwin Burrfoot ear...@gmail.com wrote: I think I'd lean towards only at construction.  Seems dangerous to allow swap in/out at some later time. I have several points pro

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698215#action_12698215 ] Earwin Burrfoot commented on LUCENE-831: I'm using a similar approach. There's

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698233#action_12698233 ] Earwin Burrfoot commented on LUCENE-831: bq. I guess you would just have to set

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698243#action_12698243 ] Earwin Burrfoot commented on LUCENE-831: bq. At least, we should upgrade

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-04-12 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698248#action_12698248 ] Earwin Burrfoot commented on LUCENE-831: bq. I'm kind of worried that any change

IndexReader plugins

2009-04-12 Thread Earwin Burrfoot
To support my dream of kicking fieldCache out of the core and to add some extensibility to Lucene, I want to introduce IndexReaderPlugins. Rough pseudocode follows: interface IndexReaderPlugin { void attach(SegmentReader reader); void detach(SegmentReader reader); void

Re: IndexReader plugins

2009-04-12 Thread Earwin Burrfoot
Earwin Burrfoot wrote: Benefits are numerous. We get rid of alien code like: +++ src/java/org/apache/lucene/index/SegmentReader.java (working copy) @@ -83,6 +86,8 @@ +  protected ValueSource valueSource; + @@ -555,6 +560,8 @@ + +      valueSource = new CachingValueSource(this, new

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-08 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696977#action_12696977 ] Earwin Burrfoot commented on LUCENE-1231: - I can share my design for doc loading

possible TermInfosReader speedup

2009-04-08 Thread Earwin Burrfoot
Currently, when we're seeking a given Term, it does a binary search across all term space, including terms belonging to other fields. I propose augmenting fields file with two pointers (firstTerm, lastTerm) for each field. That reduces range we need to search, and instead of comparing Terms we

Re: possible TermInfosReader speedup

2009-04-08 Thread Earwin Burrfoot
On Thu, Apr 9, 2009 at 00:14, Michael McCandless luc...@mikemccandless.com wrote: On Wed, Apr 8, 2009 at 3:46 PM, Earwin Burrfoot ear...@gmail.com wrote: Currently, when we're seeking a given Term, it does a binary search across all term space, including terms belonging to other fields. I

Re: possible TermInfosReader speedup

2009-04-08 Thread Earwin Burrfoot
On Thu, Apr 9, 2009 at 02:01, Uwe Schindler u...@thetaphi.de wrote: Also, on the other topic - how hard is it to boost TermEnum.skipTo(term) speed to IndexReader.terms(term) level? Would be nice for TrieRangeFilter and probably some other filters. I think all that's needed is to implement

[jira] Commented: (LUCENE-1584) Callback for intercepting merging segments in IndexWriter

2009-04-07 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696511#action_12696511 ] Earwin Burrfoot commented on LUCENE-1584: - .bq I'd like to step back

[jira] Commented: (LUCENE-1584) Callback for intercepting merging segments in IndexWriter

2009-04-07 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696583#action_12696583 ] Earwin Burrfoot commented on LUCENE-1584: - bq. The problem is you need more

Re: Modularization

2009-04-01 Thread Earwin Burrfoot
Lucene is in fact already available through maven. poms do exist, all what is left is to find who manages them and releases. On Thu, Apr 2, 2009 at 01:40, Douglas Campos doug...@theros.info wrote: +1 on maven, and I volunteer to aid in the creation of the maven project files (pom's) On Wed,

Re: Possible IndexInput optimization

2009-03-29 Thread Earwin Burrfoot
A while ago I tried overriding the read* methods in BufferedIndexInput like this: I'm still surprised there was no performance improvement at all. Maybe something was wrong with my test and I should try it again... For BufferedIndexInput improvement should be

Re: Possible IndexInput optimization

2009-03-29 Thread Earwin Burrfoot
Earwin, I did not experiment lately, but I'd like to add a general compressed integer array to the basic types in an index, that would be compressed on writing and decompressed on reading. A first attempt is at LUCENE-1410, and one of the choices I had there was whether or not to use NIO

Re: Possible IndexInput optimization

2009-03-29 Thread Earwin Burrfoot
In my case I have to switch to MMap/Buffers, Java behaves ugly with 8Gb heaps. Do you mean that because garbage collection does not perform well on these larger heaps, one should avoid to create arrays to have heaps of that size, and rather use (direct) MMap/Buffers? Yes, exactly. Keeping big

Re: NIO.2

2009-03-28 Thread Earwin Burrfoot
On Sat, Mar 28, 2009 at 16:44, Michael Busch busch...@gmail.com wrote: NIO.2 sounds great. Though, it will probably take a pretty long time before we can switch Lucene to Java 1.7 :( We could write a (contrib) module that we don't ship together with the core that has a Directory

Re: NIO.2

2009-03-28 Thread Earwin Burrfoot
I think having async IO will be great, though I wonder how we would change Lucene to take advantage of it.  It ought to gain us concurrency (eg we can score last chunk while we have an io request out to retrieve next chunk, of term docs / positions / etc.). A presentation given above

Possible IndexInput optimization

2009-03-28 Thread Earwin Burrfoot
While drooling over MappedBigByteBuffer, which we'll (hopefully) see in JDK7, I revisited my own Directory code and noticed a certain peculiarity, shared by Lucene core classes: Each and every IndexInput implementation only implements readByte() and readBytes(), never trying to override

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2009-03-27 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12689867#action_12689867 ] Earwin Burrfoot commented on LUCENE-831: Adding to Tim, I'd like to see the ability

Re: Is TopDocCollector's collect() implementation correct?

2009-03-26 Thread Earwin Burrfoot
I'd say it is a bad name. Raw hit is way far from being result of a search. If you're already breaking back compat with 3.0 release (by incrementing java version), maybe its worthy to break it in some more places, just so ugly names like MRHC and special code paths that check for n-year-old

Re: Is TopDocCollector's collect() implementation correct?

2009-03-26 Thread Earwin Burrfoot
BTW, I like the name ResultsCollector, as it's just like HitCollector, but does not commit too much to hits .. i.e., facets aren't hits ... I think? What this class consumes and what it produces is a totally different thing. HitCollector always collects 'hits', and then produces whatever

Re: Is TopDocCollector's collect() implementation correct?

2009-03-26 Thread Earwin Burrfoot
On Thu, Mar 26, 2009 at 08:44:57AM -0400, Michael McCandless wrote: do you have an alternative? Brainstorming  * Harvester  * Trawler  * HitPicker  * HitGrabber Marvin Humphrey NitPicker - that absolutely made my day -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home

Re: Is TopDocCollector's collect() implementation correct?

2009-03-26 Thread Earwin Burrfoot
I think ResultsCollector (or maybe ResultCollector) is my favorite so far... But how about simply Collector?  (I realize it's very generic... but we don't collect anything else in Lucene?). That's exactly what I'm using in my app - abstract class Collector extends HitCollector, that serves as

Re: Modularization

2009-03-23 Thread Earwin Burrfoot
- contrib has always had a lower bar and stuff was committed under that lower bar - there should be no blanket promotion. - contrib items may have different dependencies... putting it all under the same source root can make a developers job harder - many contrib items are less related to

Re: Modularization

2009-03-23 Thread Earwin Burrfoot
On Mon, Mar 23, 2009 at 22:13, Mark Miller markrmil...@gmail.com wrote: Earwin Burrfoot wrote: - contrib has always had a lower bar and stuff was committed under that lower bar - there should be no blanket promotion. - contrib items may have different dependencies... putting it all under

Re: move TrieRange* to core?

2009-03-18 Thread Earwin Burrfoot
On Wed, Mar 18, 2009 at 23:08, Andi Vajda va...@osafoundation.org wrote: On Mar 18, 2009, at 13:01, Michael McCandless luc...@mikemccandless.com wrote: I think we should move TrieRange* into core before 2.9? It's received alot of attention, from both developers (Uwe Yonik did lots of

Re: extending the query parser

2009-03-12 Thread Earwin Burrfoot
Take ANTLR and roll your own query parser from scratch? It's pretty easy. On Thu, Mar 12, 2009 at 04:24, Candide Kemmler cand...@palacehotel.org wrote: Hello, I'm looking at a way to extend the lucene query parser to allow for semantic computations in IEML space (see http://ieml.org). What

Re: extending the query parser

2009-03-12 Thread Earwin Burrfoot
On Thu, Mar 12, 2009 at 21:16, Candide Kemmler cand...@palacehotel.org wrote: On 11 Mar 2009, at 23:21, Earwin Burrfoot wrote: Take ANTLR and roll your own query parser from scratch? It's pretty easy. Hi Earwin, That would be fantastic, since our parser is already specified as an ANTLR

Re: Sorting and multi-term fields again

2009-03-02 Thread Earwin Burrfoot
My opinion is that if you want to enable sorting on multi-term fields, you need a pluggable selection policy. I see someone wanting biggest/smallest term represent a document when sorting. Or maybe a function of the terms. On Mon, Mar 2, 2009 at 20:34, Uwe Schindler u...@thetaphi.de wrote: I

Re: Bitmap index

2009-02-27 Thread Earwin Burrfoot
Maybe we can use the compression technology mentioned in this Wikipedia article to further optimize filters and their DocIdSetIterators. We already use WAH-encoded bitmap filters over here for roughly a year. And yes, they are nice. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)

Re: Integrating Language Models into Lucene

2009-02-25 Thread Earwin Burrfoot
Have you looked at MG4J (http://mg4j.dsi.unimi.it/)? Last time I did, it looked like an opposite of lucene - nice and up-to-date algorithmics, but hard to apply to complex real-world tasks. On Thu, Feb 26, 2009 at 04:21, Koren Krupko krup...@gmail.com wrote: Hello Lucene Developers! My name

[jira] Commented: (LUCENE-1524) True reverse sorting

2009-01-26 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12667175#action_12667175 ] Earwin Burrfoot commented on LUCENE-1524: - If you're not using a deterministic tie

Re: wiki

2009-01-24 Thread Earwin Burrfoot
Looks like Czech to my slavic eyes :) On Sat, Jan 24, 2009 at 18:14, Paul Elschot paul.elsc...@xs4all.nl wrote: On Saturday 24 January 2009 15:29:12 Grant Ingersoll wrote: Anyone know what this is: http://wiki.apache.org/lucene-java/IndeksRe%C4%8Di After looking around on the lucene wiki a

[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2009-01-12 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662984#action_12662984 ] Earwin Burrfoot commented on LUCENE-1345: - What about complete merge of filters

Re: 2.9/3.0 plan Java 1.5

2008-12-14 Thread Earwin Burrfoot
2. Generics' utility is not limited to collections, we use it for type-safe index fields storage/querying for example. Define field: FieldInfoEmployerCategory EMPLOYER_CATEGORY = field(ENUM(EmployerCategory.class), INDEX); Store it: add(vacancy.getEmployerCategory(), EMPLOYER_CATEGORY);

Re: 2.9/3.0 plan Java 1.5

2008-12-14 Thread Earwin Burrfoot
For return parameters, I think you should return the most specific interface you can give to the user (without fixing to something you may change in future versions). Maybe a user wants to use the return value of getFields() as List? If it's only Iterable, he cannot e.g. access the list

Re: Java logging in Lucene

2008-12-08 Thread Earwin Burrfoot
The common problem with native logging, log4j and slf4j (logback impl) is that they are totally unsuitable for actually logging something. They do good work checking if the logging can be avoided, but use almost-global locking if you really try to write this line to a file. My research shows there

Re: Java logging in Lucene

2008-12-08 Thread Earwin Burrfoot
- you're not into performance, but for debugging. Anyway, as far as SLF4J goes, I've written a patch using it, and replacing infoStream. I'm about to open an issue and submit the patch, for everyone to review. We can continue the discussion there. Shai On Mon, Dec 8, 2008 at 10:13 AM, Earwin

Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

2008-12-08 Thread Earwin Burrfoot
Building your own parser with Antlr is really easy. Using Ragel is harder, but yields insane parsing performance. Is there any reason to worry about library-bundled parsers if you're making something more complex then a college project? On Tue, Dec 9, 2008 at 01:49, robert engels [EMAIL

Re: [jira] Commented: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2008-12-04 Thread Earwin Burrfoot
It would be cool to be able to explicitly list subreaders that were added/removed as a result of reopen(), or have some kind of notification mechanism. We have filter caches, custom field/sort caches here and they are all reader-bound. Currently warm-up delay is negated by reopening and warming up

PhraseQuery with non-strict offsets

2008-12-03 Thread Earwin Burrfoot
Not sure if this belongs to java-dev or java-user, correct me if I'm wrong. I need a variation of PhraseQuery for which position difference between adjacent terms shouldn't match exactly, but in equals-or-less fashion. Example: a+1 b+1 c+3 should match a b c, a b e c, a b e e c should not match a

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

2008-11-26 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650974#action_12650974 ] Earwin Burrfoot commented on LUCENE-1461: - bq. RangeQuery no longer relies

[jira] Issue Comment Edited: (LUCENE-1470) Add TrieRangeQuery to contrib

2008-11-26 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651014#action_12651014 ] earwin edited comment on LUCENE-1470 at 11/26/08 6:35 AM:

[jira] Issue Comment Edited: (LUCENE-1470) Add TrieRangeQuery to contrib

2008-11-26 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651014#action_12651014 ] earwin edited comment on LUCENE-1470 at 11/26/08 6:37 AM:

[jira] Issue Comment Edited: (LUCENE-1461) Cached filter for a single term field

2008-11-26 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650974#action_12650974 ] earwin edited comment on LUCENE-1461 at 11/26/08 6:37 AM:

[jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib

2008-11-26 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651056#action_12651056 ] Earwin Burrfoot commented on LUCENE-1470: - bq. in base 2^15, you only have 4

[jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib

2008-11-26 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651070#action_12651070 ] Earwin Burrfoot commented on LUCENE-1470: - bq. But the encoding format

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

2008-11-25 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650882#action_12650882 ] Earwin Burrfoot commented on LUCENE-1461: - Somewhat off topic, but nonetheless, my

<    2   3   4   5   6   7