, or is it an
unnecessary complication?
Like:
indexReader.bindPlugin(instance).to(Iface1.class, Iface2.class);
And then:
indexReader.plugin(Iface1.class) == indexReader.plugin(Iface2.class)
Mike
On Sun, Apr 12, 2009 at 7:34 PM, Earwin Burrfoot ear...@gmail.com wrote:
To support my dream
Can we outline some requirements for the plugin API?
Do we want to attach/detach them to IndexReader after it is created,
or only during construction?
I think I'd lean towards only at construction. Seems dangerous to
allow swap in/out at some later time.
I have several points pro-runtime:
On Mon, Apr 13, 2009 at 17:14, Michael McCandless
luc...@mikemccandless.com wrote:
On Mon, Apr 13, 2009 at 9:02 AM, Earwin Burrfoot ear...@gmail.com wrote:
I think I'd lean towards only at construction. Seems dangerous to
allow swap in/out at some later time.
I have several points pro
[
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698215#action_12698215
]
Earwin Burrfoot commented on LUCENE-831:
I'm using a similar approach.
There's
[
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698233#action_12698233
]
Earwin Burrfoot commented on LUCENE-831:
bq. I guess you would just have to set
[
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698243#action_12698243
]
Earwin Burrfoot commented on LUCENE-831:
bq. At least, we should upgrade
[
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698248#action_12698248
]
Earwin Burrfoot commented on LUCENE-831:
bq. I'm kind of worried that any change
To support my dream of kicking fieldCache out of the core and to add
some extensibility to Lucene, I want to introduce IndexReaderPlugins.
Rough pseudocode follows:
interface IndexReaderPlugin {
void attach(SegmentReader reader);
void detach(SegmentReader reader);
void
Earwin Burrfoot wrote:
Benefits are numerous. We get rid of alien code like:
+++ src/java/org/apache/lucene/index/SegmentReader.java (working copy)
@@ -83,6 +86,8 @@
+ protected ValueSource valueSource;
+
@@ -555,6 +560,8 @@
+
+ valueSource = new CachingValueSource(this, new
[
https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696977#action_12696977
]
Earwin Burrfoot commented on LUCENE-1231:
-
I can share my design for doc loading
Currently, when we're seeking a given Term, it does a binary search
across all term space, including terms belonging to other fields.
I propose augmenting fields file with two pointers (firstTerm,
lastTerm) for each field. That reduces range we need to search, and
instead of comparing Terms we
On Thu, Apr 9, 2009 at 00:14, Michael McCandless
luc...@mikemccandless.com wrote:
On Wed, Apr 8, 2009 at 3:46 PM, Earwin Burrfoot ear...@gmail.com wrote:
Currently, when we're seeking a given Term, it does a binary search
across all term space, including terms belonging to other fields.
I
On Thu, Apr 9, 2009 at 02:01, Uwe Schindler u...@thetaphi.de wrote:
Also, on the other topic - how hard is it to boost
TermEnum.skipTo(term) speed to IndexReader.terms(term) level? Would be
nice for TrieRangeFilter and probably some other filters.
I think all that's needed is to implement
[
https://issues.apache.org/jira/browse/LUCENE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696511#action_12696511
]
Earwin Burrfoot commented on LUCENE-1584:
-
.bq I'd like to step back
[
https://issues.apache.org/jira/browse/LUCENE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696583#action_12696583
]
Earwin Burrfoot commented on LUCENE-1584:
-
bq. The problem is you need more
Lucene is in fact already available through maven. poms do exist, all
what is left is to find who manages them and releases.
On Thu, Apr 2, 2009 at 01:40, Douglas Campos doug...@theros.info wrote:
+1 on maven, and I volunteer to aid in the creation of the maven project
files (pom's)
On Wed,
A while ago I tried overriding the read* methods in BufferedIndexInput like
this:
I'm still surprised there was no performance improvement at all. Maybe
something was wrong with my test and I should try it again...
For BufferedIndexInput improvement should be
Earwin,
I did not experiment lately, but I'd like to add a general compressed
integer array to the basic types in an index, that would be compressed
on writing and decompressed on reading.
A first attempt is at LUCENE-1410, and one of the choices I had there
was whether or not to use NIO
In my case I have to switch to MMap/Buffers, Java behaves ugly with
8Gb heaps.
Do you mean that because garbage collection does not perform well
on these larger heaps, one should avoid to create arrays to have heaps
of that size, and rather use (direct) MMap/Buffers?
Yes, exactly. Keeping big
On Sat, Mar 28, 2009 at 16:44, Michael Busch busch...@gmail.com wrote:
NIO.2 sounds great.
Though, it will probably take a pretty long time before we can switch Lucene
to Java 1.7 :(
We could write a (contrib) module that we don't ship together with the core
that has a Directory
I think having async IO will be great, though I wonder how we would
change Lucene to take advantage of it. It ought to gain us
concurrency (eg we can score last chunk while we have an io request
out to retrieve next chunk, of term docs / positions / etc.).
A presentation given above
While drooling over MappedBigByteBuffer, which we'll (hopefully) see
in JDK7, I revisited my own Directory code and noticed a certain
peculiarity, shared by Lucene core classes:
Each and every IndexInput implementation only implements readByte()
and readBytes(), never trying to override
[
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12689867#action_12689867
]
Earwin Burrfoot commented on LUCENE-831:
Adding to Tim, I'd like to see the ability
I'd say it is a bad name. Raw hit is way far from being result of a search.
If you're already breaking back compat with 3.0 release (by
incrementing java version), maybe its worthy to break it in some more
places, just so ugly names like MRHC and special code paths that check
for n-year-old
BTW, I like the name ResultsCollector, as it's just like HitCollector, but
does not commit too much to hits .. i.e., facets aren't hits ... I think?
What this class consumes and what it produces is a totally different
thing. HitCollector always collects 'hits', and then produces whatever
On Thu, Mar 26, 2009 at 08:44:57AM -0400, Michael McCandless wrote:
do you have an alternative?
Brainstorming
* Harvester
* Trawler
* HitPicker
* HitGrabber
Marvin Humphrey
NitPicker - that absolutely made my day
--
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home
I think ResultsCollector (or maybe ResultCollector) is my favorite so far...
But how about simply Collector? (I realize it's very generic... but
we don't collect anything else in Lucene?).
That's exactly what I'm using in my app - abstract class Collector
extends HitCollector, that serves as
- contrib has always had a lower bar and stuff was committed under
that lower bar - there should be no blanket promotion.
- contrib items may have different dependencies... putting it all
under the same source root can make a developers job harder
- many contrib items are less related to
On Mon, Mar 23, 2009 at 22:13, Mark Miller markrmil...@gmail.com wrote:
Earwin Burrfoot wrote:
- contrib has always had a lower bar and stuff was committed under
that lower bar - there should be no blanket promotion.
- contrib items may have different dependencies... putting it all
under
On Wed, Mar 18, 2009 at 23:08, Andi Vajda va...@osafoundation.org wrote:
On Mar 18, 2009, at 13:01, Michael McCandless luc...@mikemccandless.com
wrote:
I think we should move TrieRange* into core before 2.9?
It's received alot of attention, from both developers (Uwe Yonik did
lots of
Take ANTLR and roll your own query parser from scratch? It's pretty easy.
On Thu, Mar 12, 2009 at 04:24, Candide Kemmler cand...@palacehotel.org wrote:
Hello,
I'm looking at a way to extend the lucene query parser to allow for semantic
computations in IEML space (see http://ieml.org). What
On Thu, Mar 12, 2009 at 21:16, Candide Kemmler cand...@palacehotel.org wrote:
On 11 Mar 2009, at 23:21, Earwin Burrfoot wrote:
Take ANTLR and roll your own query parser from scratch? It's pretty easy.
Hi Earwin,
That would be fantastic, since our parser is already specified as an ANTLR
My opinion is that if you want to enable sorting on multi-term fields,
you need a pluggable selection policy. I see someone wanting
biggest/smallest term represent a document when sorting. Or maybe a
function of the terms.
On Mon, Mar 2, 2009 at 20:34, Uwe Schindler u...@thetaphi.de wrote:
I
Maybe we can use the
compression technology mentioned in this Wikipedia article to further
optimize filters and their DocIdSetIterators.
We already use WAH-encoded bitmap filters over here for roughly a
year. And yes, they are nice.
--
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Have you looked at MG4J (http://mg4j.dsi.unimi.it/)?
Last time I did, it looked like an opposite of lucene - nice and
up-to-date algorithmics, but hard to apply to complex real-world
tasks.
On Thu, Feb 26, 2009 at 04:21, Koren Krupko krup...@gmail.com wrote:
Hello Lucene Developers!
My name
[
https://issues.apache.org/jira/browse/LUCENE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12667175#action_12667175
]
Earwin Burrfoot commented on LUCENE-1524:
-
If you're not using a deterministic tie
Looks like Czech to my slavic eyes :)
On Sat, Jan 24, 2009 at 18:14, Paul Elschot paul.elsc...@xs4all.nl wrote:
On Saturday 24 January 2009 15:29:12 Grant Ingersoll wrote:
Anyone know what this is:
http://wiki.apache.org/lucene-java/IndeksRe%C4%8Di
After looking around on the lucene wiki a
[
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662984#action_12662984
]
Earwin Burrfoot commented on LUCENE-1345:
-
What about complete merge of filters
2. Generics' utility is not limited to collections, we use it for
type-safe index fields storage/querying for example.
Define field:
FieldInfoEmployerCategory EMPLOYER_CATEGORY =
field(ENUM(EmployerCategory.class), INDEX);
Store it:
add(vacancy.getEmployerCategory(), EMPLOYER_CATEGORY);
For return parameters, I think you should return the most specific interface
you can give to the user (without fixing to something you may change in
future versions). Maybe a user wants to use the return value of getFields()
as List? If it's only Iterable, he cannot e.g. access the list
The common problem with native logging, log4j and slf4j (logback impl)
is that they are totally unsuitable for actually logging something.
They do good work checking if the logging can be avoided, but use
almost-global locking if you really try to write this line to a file.
My research shows there
- you're not into performance, but for debugging.
Anyway, as far as SLF4J goes, I've written a patch using it, and replacing
infoStream. I'm about to open an issue and submit the patch, for everyone to
review. We can continue the discussion there.
Shai
On Mon, Dec 8, 2008 at 10:13 AM, Earwin
Building your own parser with Antlr is really easy. Using Ragel is
harder, but yields insane parsing performance.
Is there any reason to worry about library-bundled parsers if you're
making something more complex then a college project?
On Tue, Dec 9, 2008 at 01:49, robert engels [EMAIL
It would be cool to be able to explicitly list subreaders that were
added/removed as a result of reopen(), or have some kind of
notification mechanism.
We have filter caches, custom field/sort caches here and they are all
reader-bound. Currently warm-up delay is negated by reopening and
warming up
Not sure if this belongs to java-dev or java-user, correct me if I'm wrong.
I need a variation of PhraseQuery for which position difference
between adjacent terms shouldn't match exactly, but in equals-or-less
fashion.
Example:
a+1 b+1 c+3
should match
a b c, a b e c, a b e e c
should not match
a
[
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650974#action_12650974
]
Earwin Burrfoot commented on LUCENE-1461:
-
bq. RangeQuery no longer relies
[
https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651014#action_12651014
]
earwin edited comment on LUCENE-1470 at 11/26/08 6:35 AM:
[
https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651014#action_12651014
]
earwin edited comment on LUCENE-1470 at 11/26/08 6:37 AM:
[
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650974#action_12650974
]
earwin edited comment on LUCENE-1461 at 11/26/08 6:37 AM:
[
https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651056#action_12651056
]
Earwin Burrfoot commented on LUCENE-1470:
-
bq. in base 2^15, you only have 4
[
https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651070#action_12651070
]
Earwin Burrfoot commented on LUCENE-1470:
-
bq. But the encoding format
[
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650882#action_12650882
]
Earwin Burrfoot commented on LUCENE-1461:
-
Somewhat off topic, but nonetheless, my
601 - 652 of 652 matches
Mail list logo