Thank you so much for your response Erik. On Fri, Aug 26, 2011 at 8:30 AM, Erick Erickson <erickerick...@gmail.com>wrote:
> See below > > On Thu, Aug 25, 2011 at 4:22 PM, zarni aung <zau...@gmail.com> wrote: > > First, I would like to apologize if this is a repeat question but can't > seem > > to get the right answer anywhere. > > > > - What happens to pending documents when the server dies abruptly? I > > understand that when the server shuts down gracefully, it will commit > the > > pending documents and close the IndexWriter. For the case where the > server > > just crashes, I am assuming that the pending documents are lost but > would > > it also corrupt the index files? If so, when the server comes back > online > > what is the state? I would think that a full re-indexing is in order. > > > > > > This is generally not a problem, your pending updates are simply lost. A > lot > of work has gone into making sure that the indexes aren't corrupted in this > situation. You can use the checkindex utility if you're worried. > > A brief outline here. Solr only writes new segments, it does NOT modify > existing > segments. There is a file that lets Solr know what the current valid > segments are. > During indexing (including merging, optimization, etc), only NEW segments > are > written and the file that tells Solr what's current is left alone > during the new segment > writes. > > The very last thing that's done is the segments file (i.e. the file > that tells Solr what's > current) is updated, and it's very small. I suppose there's a > vanishingly small chance > that that file could be corrupted when begin written, and it may even > be that a temp > file is written first then files renamed (but I don't know that for > sure)... > > So, the point of this long digression is that if your server gets > killed, upon restart it > should see a consistent picture of the index as of the last completed > commit, any > interim docs will be lost. > > > - What are the dangers of having n-number of ReadOnly Solr instances > > pointing to the same data directory? (Shared by a SAN)? Will there be > > issues with locking? This is a scenario with replication. The > Read-Only > > instances are pointing to the same data directory on a SAN. > > > > This is not a problem. You should have only one *writer* > pointing to the index, but readers are OK. Applying the discussion above to > readers, note that the segments available to any reader are never changed. > So > having N Solr instances reading from these unchanging files is no problem. > > That said, this will be slower than using Solr's replication (which is > preferred) for > two reasons. > 1> any networked filesystem will have some inherent speed issues. > 2> all these read requests will have to be queued somehow. > > But if your performance is acceptable with this setup it'll work. > > > Best > Erick > > > Thank you very much. > > > > Z > > >