Re: non-overlapping Span queries

2006-12-07 Thread Paul Elschot
On Thursday 07 December 2006 22:57, Ruslan Sivak wrote: > I see back in Jul 2005 there was a thread about SpanNearQueries which > were overlapping. > > http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200507.mbox/[EMAIL PROTECTED] > > A fix was posted by Paul El

Re: non-overlapping Span queries

2006-12-08 Thread Paul Elschot
, and to have ordered span queries without overlap. This could be done by replacing the trunk NearSpansOrdered.java by the one at Lucene issue 413. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: 15 minute hang in IndexInput.clone() involving finalizers

2006-12-16 Thread Paul Elschot
able for the copying. This might be another source of delays, like the network accesses mentioned earlier. The way around this is to make sure the swap space is never needed, i.e. limit the memory to the JVM, or make more physical RAM availab

Re: Beyond Lucene 2.0 Index Design

2007-01-12 Thread Paul Elschot
require the impact-sorted index representation. A weighted filter clause could already be used as a prescored clause in a boolean query. That makes weighted filters a useful addition to the current search methods. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Fwd: Re: svn commit: r525669 - /lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer.java

2007-04-05 Thread Paul Elschot
Once more, now to java-dev instead of to java-commits: Otis, Can I ask which tool you used to catch this, and the previous one? Regards, Paul Elschot On Thursday 05 April 2007 03:06, [EMAIL PROTECTED] wrote: > Author: otis > Date: Wed Apr 4 18:06:16 2007 > New Revision: 525669 >

TestIndexWriter.testAddIndexOnDiskFull failed

2007-04-05 Thread Paul Elschot
) [junit] Is there anything I can do to make this test pass locally? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: TestIndexWriter.testAddIndexOnDiskFull failed

2007-04-06 Thread Paul Elschot
On Thursday 05 April 2007 20:11, Michael McCandless wrote: > > "Paul Elschot" <[EMAIL PROTECTED]> wrote: > > At revision 525912: ... > > I just got a fresh checkout and the test is passing. That's one scary output > from the tes

Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-04-10 Thread Paul Elschot
mething like. It looks easy to add it, but I may be missing something: > BooleanQuery.add(Matcher mtr, > BooleanClause.Occur occur) That's one of the things I'd like to see added. It would allow a single ConjunctionScorer to do a filtered

Re: Why ORScorer delayed init?

2007-04-10 Thread Paul Elschot
did not really like it at the time either. I thought it would avoid accessing the index as much as possible before actually doing a search, but I did not verify whether that is important. In case it is not, any simplification is off course welcome. Regards, Paul Elschot ---

Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; relation with LUCENE-730

2007-04-13 Thread Paul Elschot
ification of IndexSearcher is not in the LUCENE-730 patch, because LUCENE-584 is not committed. At the moment I don't know precisely what IndexSearcher would look like after LUCENE-730. With LUCENE-730 BooleanScorer.setUseScorer14() could also be removed/deprecated, but that is also not yet in

Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; relation with LUCENE-730

2007-04-14 Thread Paul Elschot
stead of doing the filtering in IndexSearcher, but that still needs to be added. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Resolved: (LUCENE-730) Restore top level disjunction performance

2007-04-18 Thread Paul Elschot
sense to deprecate the setUseScorer14() method and the corresponding get...() method. If you want a patch for that, I'll gladly provide one. Actually I prefer to have these methods removed altogether now, but that is probably not compatible with

Re: TestIndexWriter.testAddIndexOnDiskFull failed

2007-05-15 Thread Paul Elschot
Michael, In spite of my intention, I'm afraid this won't make it to the top of my todo list within reasonable time ... Regards, Paul Elschot On Friday 06 April 2007 13:32, Paul Elschot wrote: > On Thursday 05 April 2007 20:11, Michael McCandless wrote: > > > > "P

Re: Multi-field distinct query

2007-05-16 Thread Paul Elschot
tering, but if it does, decoupling filters from bitsets might help: http://issues.apache.org/jira/browse/LUCENE-584 Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Tests, Contribs, and Releases

2007-05-17 Thread Paul Elschot
olicy of removing a contrib before a release when its unit tests fail could make a nice fit for that. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Tests, Contribs, and Releases

2007-05-19 Thread Paul Elschot
On Friday 18 May 2007 18:52, Doug Cutting wrote: ... > > Paul Elschot wrote: > > When a contrib fails and is not fixed, that might be a good reason to > > remove it from the distribution. With such a policy the present contribs would > > also stay up to date, provided the

Re: Documentation Brainstorming

2007-05-30 Thread Paul Elschot
mo. Currently it is not clear in the javadocs whether a class belongs to core or contrib. Having separate javadocs would probably improve that. I have no experience in linking between javadoc "packages", so I have no suggestion on how to make such a separation. Regards, Paul Elschot

Re: enabling java assertions in the tests

2007-05-31 Thread Paul Elschot
have the > same effect, but it doesn't, at least not for me. > > Adding: > > > > > to the task would enable assertions during tests > regardless of ANT_OPTS variable (and hopefully on all OSs). > > Anyone sees a problem with addi

Re: RLE Compressing bit vectors, just toughts

2007-08-04 Thread Paul Elschot
(how about 0xFF ?) to start an encoded run length encoded series of bits. For example 0xFF would be followed by the next delta as a VInt, and by the run length as the next VInt. You might also try and generalize the bytes of VInt to nibbles (half bytes).

Re: RLE Compressing bit vectors, just toughts

2007-08-04 Thread Paul Elschot
en it works only slightly better than reasonable, there is probably no need to try any compression tricks on its input, except for delta encoding to reduce the total size of the uncompressed data stream. Regards, Paul Elschot On Saturday 04 August 2007 20:26, eks dev wrote: > Hi Paul, th

Re: RLE Compressing bit vectors, just toughts

2007-08-05 Thread Paul Elschot
Scorer does now, but without the frequency information. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Fwd: Decouple Filter from BitSet: API change and xml query parser

2007-08-10 Thread Paul Elschot
tFilter instead */ public BitSet bits(IndexReader); {return null;} abstract public Matcher getMatcher(IndexReader); } Finally, are DocIdSet and DocIdSetIterator currently part of Lucene? I don't know how to go about these. Regards, Paul Elschot -- Forwarded Message --

Re: Fwd: Decouple Filter from BitSet: API change and xml query parser

2007-08-10 Thread Paul Elschot
e take the cost of calling > newFilter.getDocIdSet(reader) and cache that result. > > This is effectively how the remote FilterManager and the XMLQueryParser > filter caching stuff work with Filters/Bitsets today. > > Hope this makes sense. It

Re: Fwd: Decouple Filter from BitSet: API change and xml query parser

2007-08-10 Thread Paul Elschot
> > public class Filter { > abstract public Matcher getMatcher(IndexReader); > } > > The patch proposes to do this by moving all current use of Filter to > BitSetFilter: > > public class BitSetFilter extends Filter { > abstract public BitSet bits(IndexReader);

Re: Spans questions

2007-08-30 Thread Paul Elschot
The split into ordered and unordered was a split into (ordered + non overlapping) and (unordered + overlapping), and this is what you see in your test cases for unordered spans. To totally clear the semantics of NearSpans, it is probably a good idea to make all four cases for the subspans separat

Possible thread safety problem in CachingWrapperFilter

2007-09-04 Thread Paul Elschot
e the cache: synchronized(this) { if (cache == null) { cache = new WeakHashMap(); } } and should the cache accesses also use synchronized(this) ? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: Possible thread safety problem in CachingWrapperFilter

2007-09-04 Thread Paul Elschot
nchronized(this) blocks with the first one containing the lazy initialisation of the cache. A patch for this will probably conflict with LUCENE-584 on CachingWrapperFilter, so I'd rather not provide a patch myself this time. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Possible thread safety problem in CachingWrapperFilter

2007-09-05 Thread Paul Elschot
he first synchronized(this) block. I'll post a new patch at LUCENE-584 soon, and this will have two synchronized(this) blocks in CachingWrapperFilter with a lazy initialisation of cache in the first block. In the patch, the caching has moved from the bits() method to the getMatcher() method of

Re: Spans questions

2007-09-16 Thread Paul Elschot
treated as a case closely related to SpanNotQuery. Regards, Paul Elschot On Sunday 16 September 2007 04:43, Grant Ingersoll wrote: > > On Aug 30, 2007, at 2:42 PM, Grant Ingersoll wrote: > > > > > On Aug 30, 2007, at 11:21 AM, Paul Elschot wrote: > > > >> Gra

Re: Span queries, API and difficulties

2007-09-23 Thread Paul Elschot
ors AND, OR, ANDoptional, and ANDNOT. For some cases of top level OR, BooleanScorer can also be a target scorer when scoring out of document order is allowed. Most of the complexity of BooleanScorer2 comes from mapping the + and - query operators for required and prohibited subqueries to these ta

Re: Span queries, API and difficulties

2007-09-23 Thread Paul Elschot
e one advantage of BooleanScorer is that it is very fast for disjunctions. Regards, Paul Elschot On Sunday 23 September 2007 13:11, melix wrote: > > Hi Paul, > > His there any document which explains how those scorers interact ? My main > problem is finding out how to create

Re: Span queries, API and difficulties

2007-09-25 Thread Paul Elschot
e more, and I'm glad I added some comments to allow me to understand the code again. I'd prefer a version in which fewer comments are necessary, but I don't know a simpler way. Regards, Paul Elschot > > > melix wrote: > > > > I think I'll focus on that l

Re: Span queries, API and difficulties

2007-09-25 Thread Paul Elschot
ks about the document beyond the current greater than or equal to the given argument. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: svn commit: r582054: typo

2007-10-04 Thread Paul Elschot
Hoss, Thanks for that. There is a typo, it says "llthough" in one of the javadocs. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: setSimilarity on Query

2007-11-11 Thread Paul Elschot
On Sunday 11 November 2007 18:28:09 John Wang wrote: > Anyone has comments on this one or should I forward this to the user list? An inline subclass overriding Query.getSimilarity() to have a Query use another Similarity worked for me. Regards, Paul Elschot > thanks > > -John &g

Re: Let's release Lucene 2.3 soon?

2007-12-06 Thread Paul Elschot
re discussion will be needed, whichever way is chosen. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Performance Improvement for Search using PriorityQueue

2007-12-10 Thread Paul Elschot
to have better scoring docs at the end. I wouldn't expect a 30% improvement out of that, but it would help, if only to reduce occasional performance deteriorations. Regards, Paul Elschot On Monday 10 December 2007 08:11:50 Shai Erera wrote: > Hi > > Lucene's PQ implemen

Re: Performance Improvement for Search using PriorityQueue

2007-12-10 Thread Paul Elschot
should be quite helpful for comparing with the current implementation and to get the last bits of performance out of this. Regards, Paul Elschot > > On Dec 10, 2007 10:15 AM, Paul Elschot <[EMAIL PROTECTED]> wrote: > > > The current TopDocCollector only allocates a ScoreDoc

Re: IndexOutput writeVInt and others

2007-12-14 Thread Paul Elschot
I've copied writing and reading VInt's into SortedVIntList in LUCENE-584, and I added a comment there refering to the original code. A utility class on byte[] (or ByteBuffer) would be good to get rid of that copy. Regards, Paul Elschot On Saturday 15 December 2007 07:51:06 Shai E

Re: Too ambitious : but wanting to know the exact procedure.

2008-01-07 Thread Paul Elschot
e you still want to know the exact procedure on how storing and indexing is done, get the lucene source code and use a debugger to step into that from your example code below. Regards, Paul Elschot On Monday 07 January 2008 06:12:16 java_is_everything wrote: > > Hi all. > >

Re: EnwikiDocMaker ?

2008-01-09 Thread Paul Elschot
helps nonetheless. Regards, Paul Elschot On Wednesday 09 January 2008 14:55:05 Grant Ingersoll wrote: > As one can probably guess, I have been looking at the EnwikiDocMaker a > bit and using it outside of the benchmark suite, as related to the new > contrib/wikipedia stuff. Just wante

Re: Bug or Feature in BooleanQuery.setMinimumNumberShouldMatch

2008-01-15 Thread Paul Elschot
Shai, I think it would be enough to add to the javadocs of BooleanQuery that the minimum number of SHOULD clauses is ignored when no such clauses are added. Regards, Paul Elschot On Tuesday 15 January 2008 13:04:51 Shai Erera wrote: > Hi > > I'm not sure if this is a bug

Re: Bug or Feature in BooleanQuery.setMinimumNumberShouldMatch

2008-01-15 Thread Paul Elschot
nScorer2 before answering. When the number of SHOULD clauses is smaller than the minimum number required, no results should be returned, except in the case given below. > > Regards, > Paul Elschot > > > On Tuesday 15 January 2008 13:04:51 Shai Erera wrote: > > Hi > > &g

Re: DisjunctionSumScorer small tweak maybe?

2008-01-18 Thread Paul Elschot
you showed above. I have not looked at contrib/benchmark yet, could that provide a way to test this? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Unique doc ids

2008-01-23 Thread Paul Elschot
Michael, How would IndexWriter.addIndexes() work with unique doc ids? Regards, Paul Elschot Op Tuesday 22 January 2008 12:07:16 schreef Michael Busch: > Hi Team, > > the question of how to delete with IndexWriter using doc ids is > currently being discussed on java-user > (http

LUCENE-584, Decouple Filter from BitSet

2008-01-31 Thread Paul Elschot
Can I suggest to commit the take4 patch, together with the two 20080111 patches? I'd normally request this at the issue itself, but it's too big as it is. In case more discussion is needed, it's still better to do it there, though: https://issues.apache.org/jira/browse/LUCENE-584

Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-01-31 Thread Paul Elschot
it's not the normal case. Less (using a boolean) is more (performance) in this case, I think, but benchmarking may show something else. This skipTo() is also Scorer.skipTo(), so a change there could have an even bigger impact than a change in Filter. Have a look at the s

Re: [Lucene-java Wiki] Update of "TREC 2007 Million Queries Track - IBM Haifa Team" by PaulElschot

2008-02-06 Thread Paul Elschot
Oh well, I ticked the "remove trailing white space" box. The only real addition is at the end: >* Easier and more efficient ways to add proximity scoring? > +For example specialize Span-Near-Query for the case when all subqueries > are terms. Re

Re: svn commit: r619685 - /lucene/java/trunk/src/java/org/apache/lucene/search/IndexSearcher.java

2008-02-07 Thread Paul Elschot
Timing is everything :) Op Friday 08 February 2008 00:22:39 schreef [EMAIL PROTECTED]: > -DocIdSetIterator docIdSetIterator = > filter.getDocIdSet(reader).iterator(); // CHECKME: use ConjunctionScorer here? > +DocIdSetIterator filterDocIdIterator = > filter.getDocIdSet(reader).iterator(

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-09 Thread Paul Elschot
anyway, any hand optimizing > might actually reduce performance. I don't like the bloat either, but I'll gladly admit to having copied some code, adapted it a bit, and proposed to have that adapted copy added back into the code base. I wish there was a better way. Regards, Pau

Re: Usefulness of Similarity.queryNorm()

2008-02-12 Thread Paul Elschot
er compilation phase so > that it's easier to write Query subclasses, and I'm happy to sacrifice > consistency of scoring range if it'll help simplify things. For consistency of scoring ranges on the leaf side of the scorer tree LUCENE-293

Re: Out of memory - CachingWrappperFilter and multiple threads

2008-02-18 Thread Paul Elschot
on this object, i.e. declare it as public synchronized, and then the synchronized(cache) occurrences can be removed. It might be better to initialize the cache in the constructor, and then synchronize on the cache while even while calling filter.bits(reader). This is safe when the cache is private. R

Re: Out of memory - CachingWrappperFilter and multiple threads

2008-02-18 Thread Paul Elschot
gt; content on a new reader. > > This last one sounds a bit nasty? Yes, that could be problematic. But I'm afraid more detailed information will be needed before this OOM can be solved. Regards, Paul Elschot > > Cheers > Mark > > - Original Message > From: P

Re: Out of memory - CachingWrappperFilter and multiple threads

2008-02-19 Thread Paul Elschot
rrent version, and this could be used as a prefix to encode a run of set bits. Regards, Paul Elschot Op Tuesday 19 February 2008 12:58:34 schreef eks dev: > hi Mark, > > just out of curiosity, do you know the distribution of set bits in > these terms you have tried to cache? maybe this

Memory requirements for filters (was Re: Out of memory - CachingWrappperFilter and multiple threads)

2008-02-19 Thread Paul Elschot
> http://repositories.cdlib.org/cgi/viewcontent.cgi?article=3104&contex >t=lbnl First impression: Nice article, good for relational dbs, and for bitwise boolean ops. In Lucene there is normally the need to score each matching doc though, and for that the doc number is needed, and that do

Wiki link from documentation page

2008-03-27 Thread Paul Elschot
Currently, on this page: http://lucene.apache.org/java/2_3_1/ the wiki is linked to as: http://wiki.apache.org/lucene-java but it should probably be this: http://wiki.apache.org/jakarta-lucene/FrontPage Regards, Paul Elschot

Re: Wiki link from documentation page

2008-03-27 Thread Paul Elschot
The link is working normally again, I think I tried it at a moment when the redirection failed. Op Thursday 27 March 2008 08:32:25 schreef Paul Elschot: > Currently, on this page: >http://lucene.apache.org/java/2_3_1/ > the wiki is linked to as: > http://wiki.apache.org/lucene-j

Contrib: ChainedFilter and BooleanFilter in LUCENE-1187

2008-05-13 Thread Paul Elschot
date. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fwd: SpanNearQuery: how to get the "intra-span" matching positions?

2008-06-06 Thread Paul Elschot
tching position. This will probably involve some fruitless copying for incomplete matches that never become a real match. There is also a difference beyond ordered/unordered. In the ordered case, no overlaps between the matching subspans are allowed, and in the unordered case overlaps are allowed.

Re: Fwd: changing index format

2008-06-25 Thread Paul Elschot
f the term, and these are easily determined. Btw, I've just started to add encoding intervals of consecutive doc ids to SortedVIntList. For very high document frequencies, that might actually be faster than TermScorer and more compact than the current index. Once I've g

Re: Fwd: changing index format

2008-06-25 Thread Paul Elschot
plain(docid), what happens if termDoc is already closed from the > next() call? When explain() is called on a Scorer, next() and skipTo() should not be called. A Scorer can either explain, or search, but not both. Regards, Paul Elschot > > Thanks > > -John > > On Wed, Jun 25

BooleanQuery and DocIdSet; Was: Fwd: changing index format

2008-06-25 Thread Paul Elschot
my todo list? Perhaps I could even take a vacation :) More seriously: would DocIdSetQuery be superfluous when a DocIdSet could be added directly to a BooleanQuery? Could you elaborate a bit on the customized scoring? Regards, Paul Elschot

Re: ScorerDocQueue.HeapedScorerDoc

2008-07-26 Thread Paul Elschot
ze where the ScorerDocQueue is used, so I might as well try and remove this doc value caching. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: ScorerDocQueue.HeapedScorerDoc

2008-07-27 Thread Paul Elschot
To()/doc(), especially with good inlining. But when it improves performance, I'm all ears. Also, would sentinel testing keep its speed when doc numbers change from int to long? I really don't know... Regards, Paul Elschot > > > > > > - Original Message > > >

Re: SpanNearQuery: All matches within slop

2008-09-02 Thread Paul Elschot
some differences too, especially in the queue ordering conditions. Regards, Paul Elschot Op Thursday 28 August 2008 01:09:51 schreef Mark Miller: > Its a matter of speed. Once you know the document matches the query, > it would in general, make no sense to keep looking unless you had a &g

Re: Can I filter the results returned by IndexReader.terms(term)?

2008-09-03 Thread Paul Elschot
, have a look here to see whether it could help in your case: https://issues.apache.org/jira/browse/LUCENE-1296 Regards, Paul Elschot Op Wednesday 03 September 2008 18:00:27 schreef mark harwood: > One way is to read TermDocs for each candidate term and see if they > are in your filter - bu

Re: Realtime Search for Social Networks Collaboration

2008-09-06 Thread Paul Elschot
first step to be taken from this patch that would be an improvement on its own? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: docid set compression and boolean docid set operations

2008-09-11 Thread Paul Elschot
t of work to do get such "details" right. Fortunately, the existing Lucene TermDocs and TermPositions appear to be just right for this. Regards, Paul Elschot Op Wednesday 10 September 2008 23:09:18 schreef John Wang: > Sorry, I meant lucene 2.4 > > -John > > On Wed, Se

Re: docid set compression and boolean docid set operations

2008-09-13 Thread Paul Elschot
> at all we can assimilate them. This fits nicely with the recent flexible indexing efforts. Most of the performance improvements are are reported from the positions, so we might try and start there. Alternatively, to get going, the p4delta data structure might be initially used to support bool

Could positions/payloads in SegmentMerger be copied directly?

2008-09-19 Thread Paul Elschot
irect copy from the input postings to proxOutput. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Could positions/payloads in SegmentMerger be copied directly?

2008-09-20 Thread Paul Elschot
tions by using a proxPointer itself, as it accesses all positions serially. This leaves me without an example on how to use proxPointer from a TermInfo. Any tips on how to continue? Regards, Paul Elschot > Mike > > Paul Elschot wrote: > > I'm looking at the for loop in SegmentM

Re: Could positions/payloads in SegmentMerger be copied directly?

2008-09-22 Thread Paul Elschot
Mike, I had another look at SegmentTermDocs.skipTo() and at SegmentTermPositions, and I think I'm beginning to get your point. Could it be doable per skipInterval docs? Regards, Paul Elschot Op Monday 22 September 2008 19:24:38 schreef Michael McCandless: > OK, on closer inspection,

Re: Could positions/payloads in SegmentMerger be copied directly?

2008-09-23 Thread Paul Elschot
Op Tuesday 23 September 2008 10:56:04 schreef Michael McCandless: > Paul Elschot wrote: > > I had another look at SegmentTermDocs.skipTo() and at > > SegmentTermPositions, and I think I'm beginning to get > > your point. > > > > Could it be doable per ski

Re: Could positions/payloads in SegmentMerger be copied directly?

2008-09-23 Thread Paul Elschot
Op Tuesday 23 September 2008 20:26:18 schreef Michael McCandless: > Paul Elschot wrote: > > So, adding a document offset from the documents/frequencies > > into the positions/payloads for each document would allow: > > - bulk copying of the position/payloads during mergi

Re: [jira] Commented: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2008-12-04 Thread Paul Elschot
e memory, as every > field used needs to be cached. With my code you would only have a > single "bitset" for the filter. But with many ranges that would mean many bitsets, and MemoryCachedRangeFilter only needs to cache the field values once for any number of ranges. It's a tradeo

Re: Realtime Search

2008-12-24 Thread Paul Elschot
Nes, P Boncz - cwi.nl currently available from this link: http://www.cwi.nl/htbin/ins1/publications?request=pdfgz&key=ZuHeNeBo:ICDE:06 Also, some preliminary results on lucene indexes are available at LUCENE-1410. Regards, Paul Elschot > > But, if you also add a 'least term and greatest

Re: DisjunctionScorer performance

2009-01-06 Thread Paul Elschot
DisjunctionDISI, probably the same function as the OrDocIdSetIterator you mentioned above. In case you have something faster than that, could you post it at LUCENE-1345 or at a new issue? An AndDocIdSetIterator could also be useful for the PhraseScorers and for the SpanNear queries, but that is of later concern. So I'd prefer option 2. Regards, Paul Elschot

Re: DisjunctionScorer performance

2009-01-07 Thread Paul Elschot
? No need for that I think, the DisjunctionDISI there is still based on basically the same priority queue that Disjunction...Scorer uses. Regards, Paul Elschot

Re: DisjunctionScorer performance

2009-01-07 Thread Paul Elschot
pt to use skipTo() in this way. Regards, Paul Elschot

Re: Filesystem based bitset

2009-01-18 Thread Paul Elschot
rting from the ground up, but I don't have any practical programming experience with transaction semantics, so it may be better to start from something that has transactions right from the start. Regards, Paul Elschot

Re: Filesystem based bitset

2009-01-19 Thread Paul Elschot
On Monday 19 January 2009 11:32:17 Michael McCandless wrote: > > Paul Elschot wrote: > > > Since this started by thinking out loud, I'd like to continue doing > > that. > > I've been thinking about how to add a decent skipTo() to something > > that

Re: Lucene’s Missing Term/Field Query Structure

2009-01-22 Thread Paul Elschot
und query language in contrib, so it's not too difficult to add a similar solution as a layer on top of Lucene. Regards, Paul Elschot

Re: wiki

2009-01-24 Thread Paul Elschot
Of Words. Both pages are automatically generated. I don't know the language of the one you referenced. It could be a Slavic language, but that's really no more a guess. Regards, Paul Elschot

Re: wiki

2009-01-24 Thread Paul Elschot
On Saturday 24 January 2009 21:18:12 eks dev wrote: > "It could be a Slavic language, but that's really no more a guess." ... than a guess. Thanks for the confirmations. Would that do, Grant? Regards, Paul Elschot

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread Paul Elschot
gets implicitly transformed to "OriginalQuery AND > isDeleted:false" without scoring on second clause. > > skipTo() performance here is obviously relevant. > That part is already done. https://issues.apache.org/jira/browse/LUCENE-1436 Regards, Paul Elschot

Re: 2.4.0 missing javadocs

2009-02-12 Thread Paul Elschot
e used to be a javadocs-internal target with private access, but it was removed when the javadocs building targets were extended some time ago: https://issues.apache.org/jira/browse/LUCENE-376 Regards, Paul Elschot > > > Doing “ant javadocs” locally does not generate the javadoc for these

Re: Integrating Language Models into Lucene

2009-02-26 Thread Paul Elschot
become a part of official future Lucene > versions? A contrib module with an alternative set of scorers would be a nice goal, for example starting from the one referenced above. > 4.How would you recommend implementing the index additions with minimal > changes as a temporary patch? No need for a temporary patch, just create a separate issue for each index addition, and see what happens. Regards, Paul Elschot

Re: Integrating Language Models into Lucene

2009-02-26 Thread Paul Elschot
On Thursday 26 February 2009 13:41:30 Grant Ingersoll wrote: > I think there is a group in the Netherlands that has open sourced a > version of Lucene using Language Models. http://ilps.science.uva.nl/resources/lm-lucene Regards, Paul Elschot

Re: New flexible query parser

2009-03-17 Thread Paul Elschot
; query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. For example, an option to get rid of redundant layers of BooleanQueries would be welcome. > > So if there is interest we would like to contribute this work to Lucene. I'd like to port the Surround language onto it, and perhaps even create a syntax extension (from the standard parser) for the result. > ... Regards, Paul Elschot

Re: Possible IndexInput optimization

2009-03-29 Thread Paul Elschot
imizations you'd like to see applied here. Regards, Paul Elschot On Sunday 29 March 2009 00:43:28 Earwin Burrfoot wrote: > While drooling over MappedBigByteBuffer, which we'll (hopefully) see > in JDK7, I revisited my own Directory code and noticed a certain > peculiarity,

Re: Possible IndexInput optimization

2009-03-29 Thread Paul Elschot
ernative. > > I think my question boils down to whether or not these NIO buffers will > > (in the end) get in the way of similar low level optimizations > > you'd like to see applied here. > > Regards, > > > > Paul Elschot > In my case I have to switch t

Re: Question on CachingWrapperFilter

2009-06-02 Thread Paul Elschot
javadocs of the current CWF it could be sufficient to mention more prominently that the default CWF caches the given DocIdSet, basically assuming that it's disi is cheap. But it might be a good idea to change the default implementation to check whether the given DocIdSet is an OpenBitSet, and use that to be cached in that case, and otherwise provide an OpenBitSetDISI. Regards, Paul Elschot

Precedence parser: NOT/AND, disableCoord

2005-03-13 Thread Paul Elschot
However, from what I see now in the precedence parser, giving up might have been premature. It seems to be possible to make the mix after all. I also noticed a BooleanQuery(disableCoord) constructor. This would be straightforward to implement in the new BooleanScorer2 by dropping the Coordinator t

Re: Precedence parser: NOT/AND, disableCoord

2005-03-15 Thread Paul Elschot
On Tuesday 15 March 2005 01:55, Erik Hatcher wrote: > > On Mar 13, 2005, at 2:35 AM, Paul Elschot wrote: > > I had a short look through the new precedence parser > > and noticed a possible issue. > > > > Adding this in the TestPrecedenceParser testSimple() method:

Re: DO NOT REPLY [Bug 32965] - [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2005-04-04 Thread Paul Elschot
java 1.4 is not acceptable? That would leave them useable for later. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: DO NOT REPLY [Bug 32965] - [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2005-04-04 Thread Paul Elschot
nt? The FilteredQuery as posted there requires jdk 1.4 because it uses BitSet.nextSetBit(): http://issues.apache.org/bugzilla/show_bug.cgi?id=32965#c2 Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For ad

Re: UnscoredRangeQuery

2005-04-15 Thread Paul Elschot
it currently impersonates a BooleanQuery because of > > http://issues.apache.org/bugzilla/show_bug.cgi?id=34407 > > - no per-doc scoring (a small constant is returned). we don't have > > any range queries where scorin

Re: Troubling with StandarTokenizer/QueryParser code generate in JavaCC

2005-04-18 Thread Paul Elschot
cc.dev.java.net/ Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Troubling with StandarTokenizer/QueryParser code generate in JavaCC

2005-04-18 Thread Paul Elschot
ectory. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene 2.0?

2005-04-20 Thread Paul Elschot
ghts? I agree that it shouldn't be released as it is, so pulling it out and putting it back in when its development continues after 2.0 seems the right way to go. I've just renamed my copy with a _2_1 suffix to keep it ou

  1   2   3   4   5   6   7   8   >