Re: Trying to understand soft vs hard commit vs transaction log

Shawn Heisey Thu, 07 Feb 2013 23:57:20 -0800

On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote:

Hello,


What actually happens when using soft (as opposed to hard) commit?

I understand somewhat very high-level picture (documents become available
faster, but you may loose them on power loss).
I don't care about low-level implementation details.

But I am trying to understand what is happening on the medium level of
details.

For example what are stages of a document if we are using all available
transaction log, soft commit, hard commit options? It feels like there is
three stages:
*) Uncommitted (soft or hard): accessible only via direct real-time get?
*) Soft-committed: accessible through all search operatons? (but not on
disk? but where is it? in memory?)
*) Hard-committed: all the same as soft-committed but it is now on disk

Similarly,  in performance section of Wiki, it says: "A commit (including a
soft commit) will free up almost all heap memory" - why would soft commit
free up heap memory? I thought it was not flushed to disk.

Also, with soft-commits and transaction log enabled, doesn't transaction
log allows to replay/recover the latest state after crash? I believe that's
what transaction log does for the database. If not, how does one recover,
if at all?

And where does openSearcher=false fits into that? Does it cause
inconsistent results somehow?

I am missing something, but I am not sure what or where. Any points in the
right direction would be appreciated.

Let's see if I can answer your questions without giving you incorrectinformation.

New indexed content is not searchable until you open a new searcher,regardless of the type of commit that you do.

A hard commit will close the current transaction log and start a newone. It will also instruct the Directory implementation to flush todisk. If you specify openSearcher=false, then the content that has justbeen committed will NOT be searchable, as discussed in the previousparagraph. The existing searcher will remain open and continue to servequeries against the same index data.

A soft commit does not flush the new content to disk, but it does open anew searcher. I'm sure that the amount of memory available for cachingthis content is not large, so it's possible that if you do a lot ofindexing with soft commits and your hard commits are too infrequent,you'll end up flushing part of the cached data to disk anyway. I'd loveto hear from a committer about this, because I could be wrong.

There's a caveat with that 'flush to disk' operation -- the defaultDirectory implementation in the Solr example config, which isNRTCachingDirectoryFactory, will cache the last few megabytes of indexeddata and not flush it to disk even with a hard commit. If your commitsare small, then the net result is similar to a soft commit. If theserver or Solr were to crash, the transaction logs would be replayed onSolr startup, recovering that last few megabytes. The transaction logmay also recover documents that were soft committed, but I'm not 100%sure about that.

To take full advantage of NRT functionality, you can commit as often asyou like with soft commits. On some reasonable interval, say every oneto fifteen minutes, you can issue a hard commit with openSearcher set tofalse, to flush things to disk and cycle through transaction logs beforethey get huge. Solr will keep a few of the transaction logs around, andif they are huge, it can take a long time to replay them. You'll wantto choose a hard commit interval that doesn't create giant transaction logs.


If any of the info I've given here is wrong, someone should correct me!

Thanks,
Shawn

Re: Trying to understand soft vs hard commit vs transaction log

Reply via email to