Shawn, what about 'flush to disk' behaviour on MMapDirectoryFactory?
On Fri, Feb 8, 2013 at 11:12 AM, Prakhar Birla <prakharbi...@gmail.com>wrote: > Great explanation Shawn! BTW soft commited documents will be not be > recovered on JVM crash. > > On 8 February 2013 13:27, Shawn Heisey <s...@elyograg.org> wrote: > > > On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote: > > > >> Hello, > >> > >> What actually happens when using soft (as opposed to hard) commit? > >> > >> I understand somewhat very high-level picture (documents become > available > >> faster, but you may loose them on power loss). > >> I don't care about low-level implementation details. > >> > >> But I am trying to understand what is happening on the medium level of > >> details. > >> > >> For example what are stages of a document if we are using all available > >> transaction log, soft commit, hard commit options? It feels like there > is > >> three stages: > >> *) Uncommitted (soft or hard): accessible only via direct real-time get? > >> *) Soft-committed: accessible through all search operatons? (but not on > >> disk? but where is it? in memory?) > >> *) Hard-committed: all the same as soft-committed but it is now on disk > >> > >> Similarly, in performance section of Wiki, it says: "A commit > (including > >> a > >> soft commit) will free up almost all heap memory" - why would soft > commit > >> free up heap memory? I thought it was not flushed to disk. > >> > >> Also, with soft-commits and transaction log enabled, doesn't transaction > >> log allows to replay/recover the latest state after crash? I believe > >> that's > >> what transaction log does for the database. If not, how does one > recover, > >> if at all? > >> > >> And where does openSearcher=false fits into that? Does it cause > >> inconsistent results somehow? > >> > >> I am missing something, but I am not sure what or where. Any points in > the > >> right direction would be appreciated. > >> > > > > Let's see if I can answer your questions without giving you incorrect > > information. > > > > New indexed content is not searchable until you open a new searcher, > > regardless of the type of commit that you do. > > > > A hard commit will close the current transaction log and start a new one. > > It will also instruct the Directory implementation to flush to disk. If > > you specify openSearcher=false, then the content that has just been > > committed will NOT be searchable, as discussed in the previous paragraph. > > The existing searcher will remain open and continue to serve queries > > against the same index data. > > > > A soft commit does not flush the new content to disk, but it does open a > > new searcher. I'm sure that the amount of memory available for caching > > this content is not large, so it's possible that if you do a lot of > > indexing with soft commits and your hard commits are too infrequent, > you'll > > end up flushing part of the cached data to disk anyway. I'd love to hear > > from a committer about this, because I could be wrong. > > > > There's a caveat with that 'flush to disk' operation -- the default > > Directory implementation in the Solr example config, which is > > NRTCachingDirectoryFactory, will cache the last few megabytes of indexed > > data and not flush it to disk even with a hard commit. If your commits > are > > small, then the net result is similar to a soft commit. If the server or > > Solr were to crash, the transaction logs would be replayed on Solr > startup, > > recovering that last few megabytes. The transaction log may also recover > > documents that were soft committed, but I'm not 100% sure about that. > > > > To take full advantage of NRT functionality, you can commit as often as > > you like with soft commits. On some reasonable interval, say every one > to > > fifteen minutes, you can issue a hard commit with openSearcher set to > > false, to flush things to disk and cycle through transaction logs before > > they get huge. Solr will keep a few of the transaction logs around, and > if > > they are huge, it can take a long time to replay them. You'll want to > > choose a hard commit interval that doesn't create giant transaction logs. > > > > If any of the info I've given here is wrong, someone should correct me! > > > > Thanks, > > Shawn > > > > > > > -- > Regards, > Prakhar Birla > +91 9739868086 >