Great explanation Shawn! BTW soft commited documents will be not be recovered on JVM crash.
On 8 February 2013 13:27, Shawn Heisey <s...@elyograg.org> wrote: > On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote: > >> Hello, >> >> What actually happens when using soft (as opposed to hard) commit? >> >> I understand somewhat very high-level picture (documents become available >> faster, but you may loose them on power loss). >> I don't care about low-level implementation details. >> >> But I am trying to understand what is happening on the medium level of >> details. >> >> For example what are stages of a document if we are using all available >> transaction log, soft commit, hard commit options? It feels like there is >> three stages: >> *) Uncommitted (soft or hard): accessible only via direct real-time get? >> *) Soft-committed: accessible through all search operatons? (but not on >> disk? but where is it? in memory?) >> *) Hard-committed: all the same as soft-committed but it is now on disk >> >> Similarly, in performance section of Wiki, it says: "A commit (including >> a >> soft commit) will free up almost all heap memory" - why would soft commit >> free up heap memory? I thought it was not flushed to disk. >> >> Also, with soft-commits and transaction log enabled, doesn't transaction >> log allows to replay/recover the latest state after crash? I believe >> that's >> what transaction log does for the database. If not, how does one recover, >> if at all? >> >> And where does openSearcher=false fits into that? Does it cause >> inconsistent results somehow? >> >> I am missing something, but I am not sure what or where. Any points in the >> right direction would be appreciated. >> > > Let's see if I can answer your questions without giving you incorrect > information. > > New indexed content is not searchable until you open a new searcher, > regardless of the type of commit that you do. > > A hard commit will close the current transaction log and start a new one. > It will also instruct the Directory implementation to flush to disk. If > you specify openSearcher=false, then the content that has just been > committed will NOT be searchable, as discussed in the previous paragraph. > The existing searcher will remain open and continue to serve queries > against the same index data. > > A soft commit does not flush the new content to disk, but it does open a > new searcher. I'm sure that the amount of memory available for caching > this content is not large, so it's possible that if you do a lot of > indexing with soft commits and your hard commits are too infrequent, you'll > end up flushing part of the cached data to disk anyway. I'd love to hear > from a committer about this, because I could be wrong. > > There's a caveat with that 'flush to disk' operation -- the default > Directory implementation in the Solr example config, which is > NRTCachingDirectoryFactory, will cache the last few megabytes of indexed > data and not flush it to disk even with a hard commit. If your commits are > small, then the net result is similar to a soft commit. If the server or > Solr were to crash, the transaction logs would be replayed on Solr startup, > recovering that last few megabytes. The transaction log may also recover > documents that were soft committed, but I'm not 100% sure about that. > > To take full advantage of NRT functionality, you can commit as often as > you like with soft commits. On some reasonable interval, say every one to > fifteen minutes, you can issue a hard commit with openSearcher set to > false, to flush things to disk and cycle through transaction logs before > they get huge. Solr will keep a few of the transaction logs around, and if > they are huge, it can take a long time to replay them. You'll want to > choose a hard commit interval that doesn't create giant transaction logs. > > If any of the info I've given here is wrong, someone should correct me! > > Thanks, > Shawn > > -- Regards, Prakhar Birla +91 9739868086