I see, makes better sense now.
The query is a BooleanQuery. Here is what I do:
https://gist.github.com/Kudret/56879bf30fa129e752895305e1db5a80
On Wed, May 10, 2017 at 1:31 PM Michael McCandless <
luc...@mikemccandless.com> wrote:
> IndexWriter simply buffers that Query you passed to deleteDo
IndexWriter simply buffers that Query you passed to deleteDocuments, so
that's very fast.
Only later on will it (lazily) resolve that Query to the docIDs to delete,
which is the costly part, when a merge wants to kick off, or a refresh, or
a commit.
What Query are you using to identify documents
Fair enough, however, I see this:
$ cat log
Tue May 9 07:19:45 EDT 2017: Indexing starts
Tue May 9 07:32:33 EDT 2017: Deletion starts with a list of 1278635 files
Tue May 9 07:49:47 EDT 2017: Deletion complete, Addition starts with
1272334 files
$ date
Tue May 9 13:12:58 EDT 2017
I am using t
addDocument can be a significant gain compared to updateDocument as doing a
PK lookup on a unique field has a cost that is not negligible compared to
indexing a document, especially if the indexing chain is simple (no large
text fields with complex analyzers). Reindexing in place will also cause
mo
As far as I know, the updateDocument method on the IndexWriter delete and
add. See also the javadoc:
[..] Updates a document by first deleting the document(s)
containing term and then adding the new
document. The delete and then add are atomic as seen
by a reader on the same index (fl
I do update the entire document each time. Furthermore, this sometimes
means deleting compressed archives which are stores as multiple documents
for each compressed archive file and readding them.
Is there an update method, is it better performance than remove then add? I
was simply removing modif
Do you update each entire document? (vs updating numeric docvalues?)
That is implemented as 'delete and add' so I guess that will be slower than
clean sheet indexing. Not sure if it is 3x slower, that seems a bit much?
On Tue, May 9, 2017 at 3:24 PM, Kudrettin Güleryüz
wrote:
> Hi,
>
> For a 5.
Hi,
For a 5.2.1 index that contains around 1.2 million documents, updating the
index with 1.3 million files seems to take 3X longer than doing a scratch
indexing. (Files are crawled over NFS, indexes are stored on a mechanical
disk locally (Btrfs))
Is this expected for Lucene's update index logic