Re: adding "explicit commits" to Lucene?

2007-01-15 Thread Doron Cohen
The problem Ning pointed out seems to stem from the two roles of IndexReader: (1) reading (read only) the Index for searching and for inspecting its content; (2) modifying the index by deleting documents; This is further complicated by the fact that often a reader is used for search and then retur

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread robert engels
That is true, but you need to use the same techniques as any db. You need to write a tx log file. This has the semantics that you know if it has committed. Juts like a db. You check that is has committed before writing anything to the actual index. Since Lucene does not modify any segments,

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread Chuck Williams
robert engels wrote on 01/15/2007 08:11 PM: > If that is all you need, I think it is far simpler: > > If you have an OID, then al that is required is to a write to a > separate disk file the operations (delete this OID, insert this > document, etc...) > > Once the file is permanently on disk. Then

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread robert engels
If that is all you need, I think it is far simpler: If you have an OID, then al that is required is to a write to a separate disk file the operations (delete this OID, insert this document, etc...) Once the file is permanently on disk. Then it is simple to just keep playing the file back

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread Chuck Williams
My interest is transactions, not making doc-id's permanent. Specifically, the ability to ensure that a group of adds either all go into the index or none go into the index, and to ensure that if none go into the index that the index is not changed in any way. I have UID's but they cannot ensure t

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread robert engels
I honestly think that having a unique OID as an indexed field and putting a layer on top of Lucene is the best solution to all of this. It makes it almost trivial, and you can implement transaction handling in a variety of ways. Attempting to make the doc ids "permanent" is a tough challenge,

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread Chuck Williams
Ning Li wrote on 01/15/2007 06:29 PM: > On 1/14/07, Michael McCandless <[EMAIL PROTECTED]> wrote: >> * The "support deleteDocuments in IndexWriter" (LUCENE-565) feature >> could have a more efficient implementation (just like Solr) when >> autoCommit is false, because deletes don't need t

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread Ning Li
On 1/14/07, Michael McCandless <[EMAIL PROTECTED]> wrote: * The "support deleteDocuments in IndexWriter" (LUCENE-565) feature could have a more efficient implementation (just like Solr) when autoCommit is false, because deletes don't need to be flushed until commit() is called. Whe

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread robert engels
Actually, my comment below was not quite accurate. It only matter on multiple CPU machines if you are writing everything to a memory index first. If writing to a filesystem, then multiple threads on a single processor would allow more documents to be inverted while the disk write were occ

Re: allowing applications to control docids change? (e.g. setKeepDeletes(boolean)?)

2007-01-15 Thread robert engels
I did a cursory review of the discussion. The problem I see is that in the checkpoint tx files you need a 'delete file' for every segment where a deletion SHOULD occur when it is commited, but if you have multiple open transactions being created, as soon as one is applied (committed), the d

allowing applications to control docids change? (e.g. setKeepDeletes(boolean)?)

2007-01-15 Thread Doron Cohen
Note: discussion started originally in http://www.nabble.com/adding-%22explicit-commits%22-to-Lucene--t3011270.html robert engels <[EMAIL PROTECTED]> wrote on 15/01/2007 13:23:14: > I think that you will find a much larger performance decrease in > doing things this way - if the external resour

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread robert engels
I think that you will find a much larger performance decrease in doing things this way - if the external resource is a db, or any networked accessed resource. When even just a single document is changed in the Lucene index you could have MILLIONS of changes to internal doc ids (if say an ea

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread Doron Cohen
Also related is the request made several times in the list to be able to control when docids are changing, for applications that need to maintain some mapping between external IDs to Lucene docs but for some performance reasons cannot afford to only count on storing external (DB) IDs in Lucene's fi

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread robert engels
I looked at doing a similar thing with the parallel 'inverting'. I then decided that it will only make a difference on a multiple CPU machine, so I put it on the back burner. But if you have code already done... On Jan 15, 2007, at 12:24 PM, Chuck Williams wrote: robert engels wrote on 01/

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread Chuck Williams
robert engels wrote on 01/15/2007 08:01 AM: > Is your parallel adding code available? > There is an early version in LUCENE-600, but without the enhancements described. I didn't update that version because it didn't capture any interest and requires Java 1.5 and so it seems will not be committed.

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread robert engels
Is your parallel adding code available? On Jan 15, 2007, at 11:54 AM, Chuck Williams wrote: Michael McCandless wrote on 01/15/2007 01:49 AM: Chuck, Possibly related, one of the ways I improved concurrency in ParallelWriter was to break up IndexWriter.addDocument() into one method to inver

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread Chuck Williams
Michael McCandless wrote on 01/15/2007 01:49 AM: > Chuck, > >> Possibly related, one of the ways I improved concurrency in >> ParallelWriter was to break up IndexWriter.addDocument() into one method >> to invert the document and create a RAMSegment and a second method that >> takes the RAMSegment

Re: adding "explicit commits" to Lucene?

2007-01-15 Thread Michael McCandless
Chuck, This seems to me to be a great idea, especially the ability to support index transactions. ParallelWriter (original implementation in LUCENE-600 -- I have a much better one now) provides a companion writer to ParallelReader. It takes a Document, breaks it up into subdocuments associated

Re: [jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

2007-01-15 Thread Otis Gospodnetic
Man! I think you need to ask your girlfriend to move closer! ;) Otis - Original Message From: Hoss Man (JIRA) <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Monday, January 15, 2007 4:36:27 AM Subject: [jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consumi

[jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

2007-01-15 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464716 ] Karl Wettin commented on LUCENE-550: Thanks alot Hoss, for taking the time. I sure do appreciate it. I'll get ba

[jira] Commented: (LUCENE-774) TopDocs and TopFieldDocs does not implement equals and hashCode

2007-01-15 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464711 ] Karl Wettin commented on LUCENE-774: > The summary refers to TopDocs and TopFieldDocs, but the diff changes Field

[jira] Commented: (LUCENE-775) Searcher code creating Hits is somewhat messy

2007-01-15 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464708 ] Karl Wettin commented on LUCENE-775: > can you explain this... > > + /** Sub class ad hoc IndexReader coupling

[jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

2007-01-15 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464707 ] Hoss Man commented on LUCENE-550: - Karl: the trunk.diff i just attached fixes a small autoboxing dependency your pat

[jira] Updated: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

2007-01-15 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-550: Attachment: (was: trunk.diff) > InstantiatedIndex - faster but memory consuming index > ---

[jira] Updated: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

2007-01-15 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-550: Attachment: trunk.diff test-reports.zip > InstantiatedIndex - faster but memory consuming i

[jira] Updated: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

2007-01-15 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-550: Attachment: (was: test-reports.zip) > InstantiatedIndex - faster but memory consuming index > -

[jira] Updated: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

2007-01-15 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-550: Attachment: trunk.diff test-reports.zip > InstantiatedIndex - faster but memory consuming i

[jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

2007-01-15 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464692 ] Hoss Man commented on LUCENE-550: - I just realized that all of the tests in contrib/instantiated/src/test/java/org/a

[jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

2007-01-15 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464676 ] Hoss Man commented on LUCENE-550: - I've been trying to follow the work you've been doing Karl, but i must admit a lo