Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Michael Busch
On 10/16/09 10:27 AM, Steven A Rowe wrote: On 10/16/2009 at 2:58 AM, Michael Busch wrote: B) best effort drop-in back compatibility for the next minor version number only, and deprecations may be removed after one minor release (e.g. v3.3 will be compat with v3.2, but not v3.4) This

Proposal for changing Lucene's backwards-compatibility policy

2009-10-15 Thread Michael Busch
*after* the 3.0 release. On behalf of the Lucene developers, Michael Busch

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-05 Thread Michael Busch
On 10/5/09 5:30 PM, Nigel wrote: Before Lucene 2.9, I don't think this made any difference, as (I think) the only advantage to calling reopen vs. just creating another IndexReader was having reopen figure out whether the index had actually changed. (And whave a different way to figure that out

Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.

2009-09-01 Thread Michael Busch
= 0; } public boolean incrementToken() throws IOException { reset(); if (this.peekedTokens.size() > 0) { restoreState(this.peekedTokens.removeFirst()); return true; } return this.input.incrementToken(); } } On 9/1/09 4:44 PM, Michael Busch wrote: Daniel, take a look

Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.

2009-09-01 Thread Michael Busch
Daniel, take a look at the captureState() and restoreState() APIs in AttributeSource and TokenStream. captureState() returns a State object containing all attributes with its' current values. restoreState(State) takes a given State and copies its values back into the TokenStream. You should b

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-05-29 Thread Michael Busch
Mark Miller wrote: Paul J. Lucas wrote: Also, if you get a ton of concurrent searches, you will have an IndexReader open for each...not only is this very wasteful in terms of RAM and time, but as your IndexWriter merges you can have all kinds of momentary references to normally unneeded inde

Re: IndexReader.reopen memory leak

2008-05-29 Thread Michael Busch
()); } finally { this.in=newReader(); <-- new instance of IndexReader _cache.clear(); _indexData.load(this, true); init(_fieldConfig); } fixes my leak. -John On Thu, May 29, 2008 at 12:35 AM, Michael Busch <[EMAIL PROTECTED]> wrote: Could you s

Re: IndexReader.reopen memory leak

2008-05-29 Thread Michael Busch
On Wed, May 28, 2008 at 4:23 PM, Mark Miller <[EMAIL PROTECTED]> wrote: As someone that has done a lot of reopens, I can vouch there is no leak under simple, normal usage. Are you sure your closing the original reader after getting the reopened reference? Michael Busch wrote: Hi John, h

Re: IndexReader.reopen memory leak

2008-05-28 Thread Michael Busch
Hi John, hmm not good. I will take a look. It has probably to do with the reference counting. Are you doing anything special? E. g. do you have own reader implementations that you call reopen() on? What kinds of readers are you using? Are you maybe able to provide a heapdump? -Michael John

[ANNOUNCE] Lucene Java 2.3.2 release available

2008-05-12 Thread Michael Busch
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Release 2.3.2 of Lucene Java is now available! This release contains fixes for bugs found in 2.3.1. It does not contain any new features, API or file format changes, which makes it fully compatible to 2.3.0 and 2.3.1. The detailed change log is at:

Re: index corruption with latest lucene

2008-05-05 Thread Michael Busch
Mark Miller wrote: MB: Ah, thanks for clearing the version stuff up...I just assumed that trunk last week was pretty close to 2.3.1. I am def trunk last thurs or fri. Perhaps the problem is after 2.3.1, and perhaps the problem is only with me. OK, thanks for verifying. I'll go ahead and publis

Re: index corruption with latest lucene

2008-05-05 Thread Michael Busch
em or make one. The two installs that I have detected the problem were rebuilt, one inadvertently. - Mark On Mon, 2008-05-05 at 14:32 -0700, Michael Busch wrote: If that is the case then I will go ahead and publish the 2.3.2 release? Have you seen this on 2.3.x, Mark? -Michael Michael McCand

Re: index corruption with latest lucene

2008-05-05 Thread Michael Busch
If that is the case then I will go ahead and publish the 2.3.2 release? Have you seen this on 2.3.x, Mark? -Michael Michael McCandless wrote: Actually that stack trace looks like it's from trunk, not from 2.3.2(pre)? OK, I think you said it's from "post 2.3 trunk". Another question: is au

Re: Does LUCENE-831) "Complete overhaul of FieldCache API" provide fieldcache offloading to disk?

2008-04-18 Thread Michael Busch
Chris Hostetter wrote: : But then the FieldCache is just starting to feel alot like column-stride : fields : (LUCENE-1231). that's what i've been thinking ... my goal with LUCENE-831 was to make it easier to manage FieldCache and hopefully the norms[] as well particularly in the case of reopen

Re: Build Lucene maven artifacts

2008-03-13 Thread Michael Busch
Hi Patrick, I noticed that we do not package the *.pom.template files in the source release files. That's why it is not possible to build the maven artifacts using official releases. I'll open a JIRA issue and make sure that we will ship 2.3.2 with the template files. In the meantime, you ca

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-13 Thread Michael Busch
Daniel Noll wrote: For interest's sake I also timed fetching the document with no FieldSelector, that takes around 410ms for the same documents. So there is still a big benefit in using the field selector, it just isn't anywhere near enough to get it close to the time it takes to retrieve th

[ANNOUNCE] Lucene Java 2.3.0 release available

2008-01-24 Thread Michael Busch
Release 2.3.0 of Lucene Java is now available! Many new features, optimizations, and bug fixes have been added since 2.2, including: * significantly improved indexing performance * segment merging in background threads * refreshable IndexReaders * faster StandardAnalyzer and improved Toke

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-22 Thread Michael Busch
e code tomorrow (have to go to bed now, it's 3am), maybe I can find something out. Toke Eskildsen wrote: > On Tue, 2008-01-22 at 02:22 -0800, Michael Busch wrote: >> Is your default operator AND or OR? > > AND > > > --

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-22 Thread Michael Busch
Thanks for your detailed answer, Toke! Is your default operator AND or OR? Toke Eskildsen wrote: > On Mon, 2008-01-21 at 11:40 -0800, Michael Busch wrote: >> what kind of queries are you using for your tests? (num query terms, >> booleans clauses, phrases, wildcards?) > >

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-21 Thread Michael Busch
Hi Toke, what kind of queries are you using for your tests? (num query terms, booleans clauses, phrases, wildcards?) -Michael Yonik Seeley wrote: > On Jan 21, 2008 10:32 AM, Toke Eskildsen <[EMAIL PROTECTED]> wrote: >> If we >> only look at the forst 50.000 queries, the difference in speed for

Lucene 2.3 RC3 available for testing

2008-01-14 Thread Michael Busch
Hi all, I just uploaded Lucene 2.3 RC3 to: http://people.apache.org/~buschmi/staging_area/lucene_2_3/ RC3 fixes a problem in the indexer that could cause it to hang after a disk full exception occurred. (see https://issues.apache.org/jira/browse/LUCENE-1130 for details). Please switch to RC3 and

Lucene 2.3 RC2 available for testing

2008-01-11 Thread Michael Busch
Hi Lucene Users, good news: we are planning to release Lucene 2.3 in about ten days from now! Lucene 2.3 will have significant performance improvements and various other new features. (see http://people.apache.org/~buschmi/staging_area/lucene_2_3/CHANGES.txt for a full list of new features and API

Nightly Snapshots available in the Apache Maven Snapshot Repository

2007-12-23 Thread Michael Busch
Dear Lucene Users, we are now publishing nightly artifacts to the Maven Snapshot Repository [1]. The current version is '2.3-SNAPSHOT'. The artifacts include * Binary jars * Sources * Javadocs You can find separate artifacts for the core, demo, and the different contrib modules. Merry

Re: Scoring for all the documents in the index relative to a query

2007-11-19 Thread Michael Busch
Hi Sonia, I agree with Erick here. Negative scores don't make sense and Lucene never computes scores for documents that don't match a query. E. g. if your query is: "term1 OR term2", then every document that contains term1 or term2 or both will have a score greater than 0. But if two docs don't c

Re: 答复: how to effeciently implement th e stastical scores like pagerank?

2007-11-15 Thread Michael Busch
John Wang wrote: > Would payload work? > -John > > Yes, if you used payloads instead of stored fields your performance should be much better. Try and index one special term per document (e. g. score:pagerank), and index one position with a payload for each doc. Then when you retrieve hits open

Re: How's 2.3 doing?

2007-11-13 Thread Michael Busch
testn wrote: > Hi, > > Are we closed to release Lucene 2.3? Is it stable enough to production? I > thought it's supposed to be released in October. > > Thanks, I think it's very close. There are a couple of outstanding issues: http://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&m

Re: TermDocs.skipTo error

2007-11-09 Thread Michael Busch
Mike Streeton wrote: > I have just tried this again using the index I built with lucene 2.1 but > running the test using lucene 2.2 and it works okay, so it seems to be > something related to an index built using lucene 2.2. > > Mike > Hi Mike, does this also happen with the current trunk ver

Re: TermDocs.skipTo

2007-10-29 Thread Michael Busch
Mike Streeton wrote: > e.g. Iterating using TermDocs.next() and TermDocs.doc() 1,50,1,2 but > suing TermDocs.skipTo(51) returns false indicating that no doc id > 50 exists. Hi Mike, I quickly tried to reproduce this (with the same docids), but for me skipTo() works fine, i. e. td.skipTo(

Re: How to speed-up index opening

2007-08-31 Thread Michael Busch
Antoine Baudoux wrote: > From what I have seen in the patch, It re-opens the segments tha > have changed. > > So Imagine I always change the biggest sement (because that's where > most docs are and i need to update them frequently) . Will there still > be a benefit of IndexReader.reopen()?

Re: IndexReader#docFreq(Term)

2007-08-30 Thread Michael Busch
Chris Hostetter wrote: > > unless i'm mistaken, docFreq isn't the only method affected by deleted > docs, things like termDocs, termPositions, terms, ... pretty much all of > hte IndexReader methods work that way (even getFieldNames could be > missleading if the only doc with a field of that name

Re: How to speed-up index opening

2007-08-30 Thread Michael Busch
Antoine Baudoux wrote: > > > That's some good news! > > Any idea on the release date for 2.3? We're aiming for a release in early October. Keep your fingers crossed ;) - Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For

Re: How to speed-up index opening

2007-08-29 Thread Michael Busch
Chris Lu wrote: > Hi, Antoine, > > It does take a long time to open the index reader. > One thing you could do is to put new documents into one smaller index and > re-open it, it should be much faster. > We're planning to add a reopen() method to IndexReader that should significantly speed up re

Re: NPE in IndexReader

2007-08-22 Thread Michael Busch
Chris Hostetter wrote: > > This is one of the reasons why i was suggesting in a java-dev thread that > *all* of the refrences to SegmentInfos be refactored out of IndexReader > and into the subclasses -- any attempt to access the SegmentInfos in a OK, I'm convinced that we should refactor segment

Re: NPE in IndexReader

2007-08-21 Thread Michael Busch
Eric Louvard wrote: > Hello while calling IndexReader.deletedoc(int) I am becomming a NPE. > > java.lang.NullPointerException >at > org.apache.lucene.index.IndexReader.acquireWriteLock(IndexReader.java:658) >at > org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:6

Re: [Fwd: Exception in MultiLevelSkipListReader$SkipBuffer.readByte]

2007-08-16 Thread Michael Busch
Scott Montgomerie wrote: > I just tried it with the latest nightly build, the problem still happens. > > I think it must have to do with a corrupted index somehow. I've also > noticed, as a separate issue, that after this period of time (4-5 days), > certain documents aren't indexed correctly.

Re: Problem using RAMDirectory as a buffer

2007-06-21 Thread Michael Busch
Tanya Levshina wrote: > Nope, doesn't work. > I've tried: > > ramWriter.addDocument(doc); > ramWriter.flush(); > ramWriter.close(); > fsWriter.addIndexes(new Directory[] {ramDir,}); > > > > Any other suggestions? > Are you sure? That's strange, I just took your code and tried it out m

Re: Problem using RAMDirectory as a buffer

2007-06-21 Thread Michael Busch
Daniel Noll wrote: > On Friday 22 June 2007 09:34:44 Tanya Levshina wrote: >> ramWriter.addDocument(doc); >> >> fsWriter.addIndexes(new Directory[] {ramDir,}); > > As IndexWriter already does this internally, I'm not exactly sure why you're > trying to implement it again on the

Lucene 2.2.0 release available

2007-06-19 Thread Michael Busch
Release 2.2.0 of Lucene is now available! Many new features, optimizations, and bug fixes have been added since 2.1, including "point-in-time" searching, payloads, function queries and new APIs for pre-analyzed fields. The detailed change log is at: http://svn.apache.org/repos/asf/lucene/java/

Re: phrases containing escaped quotes

2007-05-15 Thread Michael Busch
Martin Kobele wrote: Hi, I tried to parse the following phrase: "foo \"bar\"" I get the following exception: org.apache.lucene.queryParser.ParseException: Lexical error at line 1, column 18. Encountered: after : "\") " Am I mistaken that "foo \"bar\"" is a valid phrase? Thanks! Martin

Re: Multiple lock files

2006-08-08 Thread Michael Busch
Yeah. But how do I know if a lock file is related to an index or app? I don't want to remove a lock file that another app is using Leandro, check out the static method of IndexReader: unlock(Directory). Link: http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#unl

Re: Best way to Add items to Index in Real Time

2006-07-05 Thread Michael Busch
Otis Gospodnetic wrote: If you are getting errors while searching and at the same either adding or deleting documents, chances are you are not using the API correctly and following the concurrency rules (described many times on this list). Yo ucan search and modify your index at the same time