On 28/02/2017 20:17, Serkan Mulayim wrote:
So as I see:
1- when we do indexing operation in an existing index, a new segment is
created and it is not put into the index until it is committed. When it is
committed, its segment is kept separately and the snapshot.json file is
updated to include the new segment.

That's right, but segments are merged occasionally.

2- lock files are being generated and are kept separate based on the pid
(no shared FS adjustments).

What I would like to do is, to be able to index thousands of documents in
batches with asynchronous calls to the library. Asynchronous calls will try
to update the newly created segment to be written by different calls. If
PIDs are the same, it seems like system will crash due to write.lock
containing the PIDs.

This has nothing to do with PIDs (they're only used to remove stale lock files). You'll receive a LockErr exception if an Indexer can't acquire the write lock after several retries regardless of the process ID.

Do you think there is a way to make this work with
calls from different PIDs, with an addition of commit.lock file? I hope
this makes sense :( :)

Parallel indexing isn't supported by Lucy. We only support background merging which is mostly geared towards interactive applications that only index a few documents at a time. Non-interactive batch jobs that index thousands of documents in parallel aren't handled well by Lucy, although this could probably be improved. Your only options right now are:

- If it's OK for your indexing processes to potentially wait for a long
  time, increase the write lock timeout to a huge value or catch LockErrs
  and implement your own retry logic.

- Implement your own document queue where multiple processes can add
  documents and a single indexing process removes them.

One more question is when I index documents and commit each time (let's say
5000 batches of commits in synchronous way), I see that the indexing works
fine. How are the segments being handled. I do not see that 5000 different
segments created. Is it because after a certain number of segments (say
32), the segments are being merged and optimized?

Yes, that's how it works. The FastUpdates cookbook entry contains more details:

    https://lucy.apache.org/docs/c/Lucy/Docs/Cookbook/FastUpdates.html

But I don't think background merging would help much in your case.

Nick

Reply via email to