Shawn, what about 'flush to disk' behaviour on MMapDirectoryFactory?

On Fri, Feb 8, 2013 at 11:12 AM, Prakhar Birla <prakharbi...@gmail.com>wrote:

> Great explanation Shawn! BTW soft commited documents will be not be
> recovered on JVM crash.
>
> On 8 February 2013 13:27, Shawn Heisey <s...@elyograg.org> wrote:
>
> > On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote:
> >
> >> Hello,
> >>
> >> What actually happens when using soft (as opposed to hard) commit?
> >>
> >> I understand somewhat very high-level picture (documents become
> available
> >> faster, but you may loose them on power loss).
> >> I don't care about low-level implementation details.
> >>
> >> But I am trying to understand what is happening on the medium level of
> >> details.
> >>
> >> For example what are stages of a document if we are using all available
> >> transaction log, soft commit, hard commit options? It feels like there
> is
> >> three stages:
> >> *) Uncommitted (soft or hard): accessible only via direct real-time get?
> >> *) Soft-committed: accessible through all search operatons? (but not on
> >> disk? but where is it? in memory?)
> >> *) Hard-committed: all the same as soft-committed but it is now on disk
> >>
> >> Similarly,  in performance section of Wiki, it says: "A commit
> (including
> >> a
> >> soft commit) will free up almost all heap memory" - why would soft
> commit
> >> free up heap memory? I thought it was not flushed to disk.
> >>
> >> Also, with soft-commits and transaction log enabled, doesn't transaction
> >> log allows to replay/recover the latest state after crash? I believe
> >> that's
> >> what transaction log does for the database. If not, how does one
> recover,
> >> if at all?
> >>
> >> And where does openSearcher=false fits into that? Does it cause
> >> inconsistent results somehow?
> >>
> >> I am missing something, but I am not sure what or where. Any points in
> the
> >> right direction would be appreciated.
> >>
> >
> > Let's see if I can answer your questions without giving you incorrect
> > information.
> >
> > New indexed content is not searchable until you open a new searcher,
> > regardless of the type of commit that you do.
> >
> > A hard commit will close the current transaction log and start a new one.
> >  It will also instruct the Directory implementation to flush to disk.  If
> > you specify openSearcher=false, then the content that has just been
> > committed will NOT be searchable, as discussed in the previous paragraph.
> >  The existing searcher will remain open and continue to serve queries
> > against the same index data.
> >
> > A soft commit does not flush the new content to disk, but it does open a
> > new searcher.  I'm sure that the amount of memory available for caching
> > this content is not large, so it's possible that if you do a lot of
> > indexing with soft commits and your hard commits are too infrequent,
> you'll
> > end up flushing part of the cached data to disk anyway.  I'd love to hear
> > from a committer about this, because I could be wrong.
> >
> > There's a caveat with that 'flush to disk' operation -- the default
> > Directory implementation in the Solr example config, which is
> > NRTCachingDirectoryFactory, will cache the last few megabytes of indexed
> > data and not flush it to disk even with a hard commit.  If your commits
> are
> > small, then the net result is similar to a soft commit.  If the server or
> > Solr were to crash, the transaction logs would be replayed on Solr
> startup,
> > recovering that last few megabytes.  The transaction log may also recover
> > documents that were soft committed, but I'm not 100% sure about that.
> >
> > To take full advantage of NRT functionality, you can commit as often as
> > you like with soft commits.  On some reasonable interval, say every one
> to
> > fifteen minutes, you can issue a hard commit with openSearcher set to
> > false, to flush things to disk and cycle through transaction logs before
> > they get huge.  Solr will keep a few of the transaction logs around, and
> if
> > they are huge, it can take a long time to replay them.  You'll want to
> > choose a hard commit interval that doesn't create giant transaction logs.
> >
> > If any of the info I've given here is wrong, someone should correct me!
> >
> > Thanks,
> > Shawn
> >
> >
>
>
> --
> Regards,
> Prakhar Birla
> +91 9739868086
>

Reply via email to