If you check the revision history of the wiki page, a Mr. jayqhacker added the quoted statement on November 26, 2012. I don't recognize his "name" as being a "known authority" on anything related to Solr, so maybe his uncorroborated comments should be taken with a grain of salt.

-- Jack Krupansky

-----Original Message----- From: Alexandre Rafalovitch
Sent: Friday, February 08, 2013 6:11 PM
To: solr-user@lucene.apache.org
Subject: Re: Trying to understand soft vs hard commit vs transaction log

Sorry Shawn,

Somehow I am still not quite grasping it. I would really appreciate if
somebody (or even you) could have another go at very small part of this.
Maybe it will clear it up:
Similarly,  in performance section of Wiki, it says: "A commit (including
a soft commit) will free up almost all heap memory"
Why? What is the "hard work" that hard commit does and soft commit does not
but still commit to disk. Is it some sort of Lucene segment finalization
and new segment creation?

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Feb 8, 2013 at 2:57 AM, Shawn Heisey <s...@elyograg.org> wrote:

On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote:

Hello,

What actually happens when using soft (as opposed to hard) commit?

I understand somewhat very high-level picture (documents become available
faster, but you may loose them on power loss).
I don't care about low-level implementation details.

But I am trying to understand what is happening on the medium level of
details.

For example what are stages of a document if we are using all available
transaction log, soft commit, hard commit options? It feels like there is
three stages:
*) Uncommitted (soft or hard): accessible only via direct real-time get?
*) Soft-committed: accessible through all search operatons? (but not on
disk? but where is it? in memory?)
*) Hard-committed: all the same as soft-committed but it is now on disk

Similarly,  in performance section of Wiki, it says: "A commit (including
a
soft commit) will free up almost all heap memory" - why would soft commit
free up heap memory? I thought it was not flushed to disk.

Also, with soft-commits and transaction log enabled, doesn't transaction
log allows to replay/recover the latest state after crash? I believe
that's
what transaction log does for the database. If not, how does one recover,
if at all?

And where does openSearcher=false fits into that? Does it cause
inconsistent results somehow?

I am missing something, but I am not sure what or where. Any points in the
right direction would be appreciated.


Let's see if I can answer your questions without giving you incorrect
information.

New indexed content is not searchable until you open a new searcher,
regardless of the type of commit that you do.

A hard commit will close the current transaction log and start a new one.
 It will also instruct the Directory implementation to flush to disk.  If
you specify openSearcher=false, then the content that has just been
committed will NOT be searchable, as discussed in the previous paragraph.
 The existing searcher will remain open and continue to serve queries
against the same index data.

A soft commit does not flush the new content to disk, but it does open a
new searcher.  I'm sure that the amount of memory available for caching
this content is not large, so it's possible that if you do a lot of
indexing with soft commits and your hard commits are too infrequent, you'll
end up flushing part of the cached data to disk anyway.  I'd love to hear
from a committer about this, because I could be wrong.

There's a caveat with that 'flush to disk' operation -- the default
Directory implementation in the Solr example config, which is
NRTCachingDirectoryFactory, will cache the last few megabytes of indexed
data and not flush it to disk even with a hard commit. If your commits are
small, then the net result is similar to a soft commit.  If the server or
Solr were to crash, the transaction logs would be replayed on Solr startup,
recovering that last few megabytes.  The transaction log may also recover
documents that were soft committed, but I'm not 100% sure about that.

To take full advantage of NRT functionality, you can commit as often as
you like with soft commits.  On some reasonable interval, say every one to
fifteen minutes, you can issue a hard commit with openSearcher set to
false, to flush things to disk and cycle through transaction logs before
they get huge. Solr will keep a few of the transaction logs around, and if
they are huge, it can take a long time to replay them.  You'll want to
choose a hard commit interval that doesn't create giant transaction logs.

If any of the info I've given here is wrong, someone should correct me!

Thanks,
Shawn



Reply via email to