The problem Ning pointed out seems to stem from the two roles of
IndexReader:
(1) reading (read only) the Index for searching and for inspecting its
content;
(2) modifying the index by deleting documents;
This is further complicated by the fact that often a reader is used for
search and then retur
That is true, but you need to use the same techniques as any db. You
need to write a tx log file. This has the semantics that you know if
it has committed. Juts like a db. You check that is has committed
before writing anything to the actual index. Since Lucene does not
modify any segments,
robert engels wrote on 01/15/2007 08:11 PM:
> If that is all you need, I think it is far simpler:
>
> If you have an OID, then al that is required is to a write to a
> separate disk file the operations (delete this OID, insert this
> document, etc...)
>
> Once the file is permanently on disk. Then
If that is all you need, I think it is far simpler:
If you have an OID, then al that is required is to a write to a
separate disk file the operations (delete this OID, insert this
document, etc...)
Once the file is permanently on disk. Then it is simple to just keep
playing the file back
My interest is transactions, not making doc-id's permanent.
Specifically, the ability to ensure that a group of adds either all go
into the index or none go into the index, and to ensure that if none go
into the index that the index is not changed in any way.
I have UID's but they cannot ensure t
I honestly think that having a unique OID as an indexed field and
putting a layer on top of Lucene is the best solution to all of this.
It makes it almost trivial, and you can implement transaction
handling in a variety of ways.
Attempting to make the doc ids "permanent" is a tough challenge,
Ning Li wrote on 01/15/2007 06:29 PM:
> On 1/14/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
>> * The "support deleteDocuments in IndexWriter" (LUCENE-565) feature
>> could have a more efficient implementation (just like Solr) when
>> autoCommit is false, because deletes don't need t
On 1/14/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
* The "support deleteDocuments in IndexWriter" (LUCENE-565) feature
could have a more efficient implementation (just like Solr) when
autoCommit is false, because deletes don't need to be flushed
until commit() is called. Whe
Actually, my comment below was not quite accurate. It only matter on
multiple CPU machines if you are writing everything to a memory index
first.
If writing to a filesystem, then multiple threads on a single
processor would allow more documents to be inverted while the disk
write were occ
I did a cursory review of the discussion.
The problem I see is that in the checkpoint tx files you need a
'delete file' for every segment where a deletion SHOULD occur when it
is commited, but if you have multiple open transactions being
created, as soon as one is applied (committed), the d
Note: discussion started originally in
http://www.nabble.com/adding-%22explicit-commits%22-to-Lucene--t3011270.html
robert engels <[EMAIL PROTECTED]> wrote on 15/01/2007 13:23:14:
> I think that you will find a much larger performance decrease in
> doing things this way - if the external resour
I think that you will find a much larger performance decrease in
doing things this way - if the external resource is a db, or any
networked accessed resource.
When even just a single document is changed in the Lucene index you
could have MILLIONS of changes to internal doc ids (if say an ea
Also related is the request made several times in the list to be able to
control when docids are changing, for applications that need to maintain
some mapping between external IDs to Lucene docs but for some performance
reasons cannot afford to only count on storing external (DB) IDs in
Lucene's fi
I looked at doing a similar thing with the parallel 'inverting'.
I then decided that it will only make a difference on a multiple CPU
machine, so I put it on the back burner.
But if you have code already done...
On Jan 15, 2007, at 12:24 PM, Chuck Williams wrote:
robert engels wrote on 01/
robert engels wrote on 01/15/2007 08:01 AM:
> Is your parallel adding code available?
>
There is an early version in LUCENE-600, but without the enhancements
described. I didn't update that version because it didn't capture any
interest and requires Java 1.5 and so it seems will not be committed.
Is your parallel adding code available?
On Jan 15, 2007, at 11:54 AM, Chuck Williams wrote:
Michael McCandless wrote on 01/15/2007 01:49 AM:
Chuck,
Possibly related, one of the ways I improved concurrency in
ParallelWriter was to break up IndexWriter.addDocument() into one
method
to inver
Michael McCandless wrote on 01/15/2007 01:49 AM:
> Chuck,
>
>> Possibly related, one of the ways I improved concurrency in
>> ParallelWriter was to break up IndexWriter.addDocument() into one method
>> to invert the document and create a RAMSegment and a second method that
>> takes the RAMSegment
Chuck,
This seems to me to be a great idea, especially the ability to support
index transactions.
ParallelWriter (original implementation in LUCENE-600 -- I have a much
better one now) provides a companion writer to ParallelReader. It takes
a Document, breaks it up into subdocuments associated
Man!
I think you need to ask your girlfriend to move closer! ;)
Otis
- Original Message
From: Hoss Man (JIRA) <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Monday, January 15, 2007 4:36:27 AM
Subject: [jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory
consumi
[
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464716
]
Karl Wettin commented on LUCENE-550:
Thanks alot Hoss, for taking the time. I sure do appreciate it.
I'll get ba
[
https://issues.apache.org/jira/browse/LUCENE-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464711
]
Karl Wettin commented on LUCENE-774:
> The summary refers to TopDocs and TopFieldDocs, but the diff changes Field
[
https://issues.apache.org/jira/browse/LUCENE-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464708
]
Karl Wettin commented on LUCENE-775:
> can you explain this...
>
> + /** Sub class ad hoc IndexReader coupling
[
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464707
]
Hoss Man commented on LUCENE-550:
-
Karl: the trunk.diff i just attached fixes a small autoboxing dependency your
pat
[
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated LUCENE-550:
Attachment: (was: trunk.diff)
> InstantiatedIndex - faster but memory consuming index
> ---
[
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated LUCENE-550:
Attachment: trunk.diff
test-reports.zip
> InstantiatedIndex - faster but memory consuming i
[
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated LUCENE-550:
Attachment: (was: test-reports.zip)
> InstantiatedIndex - faster but memory consuming index
> -
[
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated LUCENE-550:
Attachment: trunk.diff
test-reports.zip
> InstantiatedIndex - faster but memory consuming i
[
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464692
]
Hoss Man commented on LUCENE-550:
-
I just realized that all of the tests in
contrib/instantiated/src/test/java/org/a
[
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464676
]
Hoss Man commented on LUCENE-550:
-
I've been trying to follow the work you've been doing Karl, but i must admit a
lo
29 matches
Mail list logo