[ http://issues.apache.org/jira/browse/SOLR-65?page=all ]
Mike Klaas updated SOLR-65:
---------------------------
Attachment: autocommit_patch.diff
New patch.
First, the locking semantics actually were wrong. Since ever addDoc call
grabbed the commit lock and downgraded to access lock, subsequent calls would
block on the commit. I tried a few vastly different schemes, and it took a
while to figure out something that allowed concurrency but also gave the same
protections as before. I finally settled on using the read/write commit lock
as the principal lock, with a touch of synchronization to protect the addDoc
calls.
That finally enabled concurrency, but other bottlenecks emerged. checkCommit()
was grabbing the commit lock, which created a barrier at the end of every
addDoc call which was forced to wait for all pending addDoc calls. Switched
to synchro on the tracker (synchronizing on DUH2 would provoke a potential
deadlock).
Finally, there was significant contention on the lock for the logger output
stream. When merging wasn't occuring, the doc rate could reach 200-300 dps,
and each docId was being logged. I modified the bulk add code to log the docid
of all documents in a single log statement. While I was at it, I converted the
<result> output for multi-adds to a single xml element. Was more information
going to be added to this?
The gains of multi-threaded indexing for my application are modest. The cpu
usage is >100% consistently; it drops a bit during medium merges and drops a
lot during large merges (merges effectively serialize adding documents).
Still, the throughput gain is about 20-30%. In retrospect, this isn't terribly
surprising, as our analysis is relatively modest. Applications with heavier
analysis needs would see more gains.
> autoCommit/autoOptimize implementation + multithreaded document adding
> ----------------------------------------------------------------------
>
> Key: SOLR-65
> URL: http://issues.apache.org/jira/browse/SOLR-65
> Project: Solr
> Issue Type: New Feature
> Components: update
> Reporter: Mike Klaas
> Assigned To: Mike Klaas
> Attachments: autocommit_patch.diff, autocommit_patch.diff
>
>
> Basic implementation of autoCommit/autoOptimize functionality, plus overhaul
> of DUH2 threading to reduce contention
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira