"Deleting" documents without deleting them

2010-03-15 Thread Daniel Noll
Hi all. I'm trying to implement a form of document deletion where the previous versions are kept around forever ( a primitive form of versioning) but excluded from the search results. I notice that after calling IndexWriter.deleteDocuments, even if you close and reopen the index, the documents ar

Re: "Deleting" documents without deleting them

2010-03-16 Thread Michael McCandless
An incidental merge will delete them. I think you'll have to maintain your own filter... but it shouldn't be that large? Ie it's as large as deleted docs BitVector would be anyway... except that the docs never go away. Mike On Mon, Mar 15, 2010 at 11:20 PM, Daniel Noll wrote: > Hi all. > > I'm

Re: "Deleting" documents without deleting them

2010-03-16 Thread Rene Hackl-Sommer
Hi Daniel, Unless you have only a few documents and a small index, I don't think never calling optimize is going to be a means you should rely upon. What about if you reindexed the documents you are deleting, adding a field with the value "true"? This would imply that either 1) all fields

Re: "Deleting" documents without deleting them

2010-03-16 Thread TCK
Wouldn't these excluded/filtered documents skew the scores even though they are supposed to be marked as deleted? Don't the idf values used in scoring depend on the entire document set and not just the matching hits for a query? Thanks, TCK On Tue, Mar 16, 2010 at 5:45 AM, Rene Hackl-Sommer wr

Re: "Deleting" documents without deleting them

2010-03-16 Thread Rene Hackl-Sommer
I cannot comment on the "marked-as-deleted" documents, but for the approach I outlined: this might impact the scores. I prefer to say 'impact' instead of 'skew', because to me 'skew' would imply that the original scores are some kind of ideal state which is distorted. I don't think this is nece

Re: "Deleting" documents without deleting them

2010-03-16 Thread Daniel Noll
On Tue, Mar 16, 2010 at 20:45, Rene Hackl-Sommer wrote: > Hi Daniel, > > Unless you have only a few documents and a small index, I don't think never > calling optimize is going to be a means you should rely upon. > > What about if you reindexed the documents you are deleting, adding a field > wit