Great explanation Shawn! BTW soft commited documents will be not be
recovered on JVM crash.

On 8 February 2013 13:27, Shawn Heisey <s...@elyograg.org> wrote:

> On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote:
>
>> Hello,
>>
>> What actually happens when using soft (as opposed to hard) commit?
>>
>> I understand somewhat very high-level picture (documents become available
>> faster, but you may loose them on power loss).
>> I don't care about low-level implementation details.
>>
>> But I am trying to understand what is happening on the medium level of
>> details.
>>
>> For example what are stages of a document if we are using all available
>> transaction log, soft commit, hard commit options? It feels like there is
>> three stages:
>> *) Uncommitted (soft or hard): accessible only via direct real-time get?
>> *) Soft-committed: accessible through all search operatons? (but not on
>> disk? but where is it? in memory?)
>> *) Hard-committed: all the same as soft-committed but it is now on disk
>>
>> Similarly,  in performance section of Wiki, it says: "A commit (including
>> a
>> soft commit) will free up almost all heap memory" - why would soft commit
>> free up heap memory? I thought it was not flushed to disk.
>>
>> Also, with soft-commits and transaction log enabled, doesn't transaction
>> log allows to replay/recover the latest state after crash? I believe
>> that's
>> what transaction log does for the database. If not, how does one recover,
>> if at all?
>>
>> And where does openSearcher=false fits into that? Does it cause
>> inconsistent results somehow?
>>
>> I am missing something, but I am not sure what or where. Any points in the
>> right direction would be appreciated.
>>
>
> Let's see if I can answer your questions without giving you incorrect
> information.
>
> New indexed content is not searchable until you open a new searcher,
> regardless of the type of commit that you do.
>
> A hard commit will close the current transaction log and start a new one.
>  It will also instruct the Directory implementation to flush to disk.  If
> you specify openSearcher=false, then the content that has just been
> committed will NOT be searchable, as discussed in the previous paragraph.
>  The existing searcher will remain open and continue to serve queries
> against the same index data.
>
> A soft commit does not flush the new content to disk, but it does open a
> new searcher.  I'm sure that the amount of memory available for caching
> this content is not large, so it's possible that if you do a lot of
> indexing with soft commits and your hard commits are too infrequent, you'll
> end up flushing part of the cached data to disk anyway.  I'd love to hear
> from a committer about this, because I could be wrong.
>
> There's a caveat with that 'flush to disk' operation -- the default
> Directory implementation in the Solr example config, which is
> NRTCachingDirectoryFactory, will cache the last few megabytes of indexed
> data and not flush it to disk even with a hard commit.  If your commits are
> small, then the net result is similar to a soft commit.  If the server or
> Solr were to crash, the transaction logs would be replayed on Solr startup,
> recovering that last few megabytes.  The transaction log may also recover
> documents that were soft committed, but I'm not 100% sure about that.
>
> To take full advantage of NRT functionality, you can commit as often as
> you like with soft commits.  On some reasonable interval, say every one to
> fifteen minutes, you can issue a hard commit with openSearcher set to
> false, to flush things to disk and cycle through transaction logs before
> they get huge.  Solr will keep a few of the transaction logs around, and if
> they are huge, it can take a long time to replay them.  You'll want to
> choose a hard commit interval that doesn't create giant transaction logs.
>
> If any of the info I've given here is wrong, someone should correct me!
>
> Thanks,
> Shawn
>
>


-- 
Regards,
Prakhar Birla
+91 9739868086

Reply via email to