I have an index (different from the ones mentioned yesterday) that was
working fine with 3M docs or so, but when I added a bunch more docs,
bringing it closer to 4M docs, the index seemed to get corrupted. In
particular, now when I start Solr up, or when when my indexing process
tries add a document, I get a complaint about missing index files.

The error on startup looks like this:

<record>
  <date>2008-08-15T10:18:54</date>
  <millis>1218820734592</millis>
  <sequence>92</sequence>
  <logger>org.apache.solr.core.MultiCore</logger>
  <level>SEVERE</level>
  <class>org.apache.solr.common.SolrException</class>
  <method>log</method>
  <thread>10</thread>
  <message>java.lang.RuntimeException: java.io.FileNotFoundException:
/ssd/solr-9999/solr/exhibitcore/data/index/_p7.fdt (No such file or
directory)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:733)
        at org.apache.solr.core.SolrCore.&lt;init&gt;(SolrCore.java:387)
        at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
        at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
        at 
org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147)
        at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75)
        at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
        at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
        at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
        at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
        at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
        at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
        at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
        at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
        at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
        at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
        at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
        at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
        at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
        at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
        at org.mortbay.jetty.Server.doStart(Server.java:210)
        at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
        at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.mortbay.start.Main.invokeMain(Main.java:183)
        at org.mortbay.start.Main.start(Main.java:497)
        at org.mortbay.start.Main.main(Main.java:115)
Caused by: java.io.FileNotFoundException:
/ssd/solr-9999/solr/exhibitcore/data/index/_p7.fdt (No such file or
directory)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.&lt;init&gt;(RandomAccessFile.java:233)
        at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.&lt;init&gt;(FSDirectory.java:506)
        at 
org.apache.lucene.store.FSDirectory$FSIndexInput.&lt;init&gt;(FSDirectory.java:536)
        at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
        at 
org.apache.lucene.index.FieldsReader.&lt;init&gt;(FieldsReader.java:75)
        at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:308)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197)
        at 
org.apache.lucene.index.MultiSegmentReader.&lt;init&gt;(MultiSegmentReader.java:55)
        at 
org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:75)
        at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
        at 
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
        at 
org.apache.solr.search.SolrIndexSearcher.&lt;init&gt;(SolrIndexSearcher.java:93)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:724)
        ... 29 more
</message>
</record>

And the error on doc add looks like this:

<record>
  <date>2008-08-15T09:51:30</date>
  <millis>1218819090142</millis>
  <sequence>6571937</sequence>
  <logger>org.apache.solr.core.SolrCore</logger>
  <level>SEVERE</level>
  <class>org.apache.solr.common.SolrException</class>
  <method>log</method>
  <thread>14</thread>
  <message>java.io.FileNotFoundException:
/ssd/solr-9999/solr/exhibitcore/data/index/_p7.fdt (No such file or
directory)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.&lt;init&gt;(RandomAccessFile.java:233)
        at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.&lt;init&gt;(FSDirectory.java:506)
        at 
org.apache.lucene.store.FSDirectory$FSIndexInput.&lt;init&gt;(FSDirectory.java:536)
        at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
        at 
org.apache.lucene.index.FieldsReader.&lt;init&gt;(FieldsReader.java:75)
        at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:308)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197)
        at 
org.apache.lucene.index.MultiSegmentReader.&lt;init&gt;(MultiSegmentReader.java:55)
        at 
org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:75)
        at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
        at 
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
        at 
org.apache.solr.search.SolrIndexSearcher.&lt;init&gt;(SolrIndexSearcher.java:93)
        at org.apache.solr.core.SolrCore.newSearcher(SolrCore.java:213)
        at 
org.apache.solr.update.DirectUpdateHandler2.openSearcher(DirectUpdateHandler2.java:207)
        at 
org.apache.solr.update.DirectUpdateHandler2.doDeletions(DirectUpdateHandler2.java:466)
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:295)
        at 
org.apache.solr.handler.RichDocumentLoader.doAdd(RichDocumentRequestHandler.java:231)
        at 
org.apache.solr.handler.RichDocumentLoader.addDoc(RichDocumentRequestHandler.java:236)
        at 
org.apache.solr.handler.RichDocumentLoader.load(RichDocumentRequestHandler.java:278)
        at 
org.apache.solr.handler.RichDocumentRequestHandler.handleRequestBody(RichDocumentRequestHandler.java:80)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
        at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:228)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
        at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
        at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
        at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
        at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
        at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
        at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
        at org.mortbay.jetty.Server.handle(Server.java:285)
        at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
        at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
        at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
        at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
</message>
</record>

I just checked, and the files that Solr is complaining about are
indeed not in the index directory.

The earliest indication of trouble I found in my log was an error like this:

<record>
  <date>2008-08-15T09:47:48</date>
  <millis>1218818868528</millis>
  <sequence>6525387</sequence>
  <logger>org.apache.solr.update.UpdateHandler</logger>
  <level>SEVERE</level>
  <class>org.apache.solr.update.DirectUpdateHandler2$CommitTracker</class>
  <method>run</method>
  <thread>15</thread>
  <message>auto commit error...</message>
</record>

There may have been SEVERE errors before this, but my log doesn't go
back to the very beginning.

It's interesting that while adding documents seems to be usually
failing now (yielding the "file not found" exception), I could add
documents successfully for some time before things started to go
wrong. What's more, some documents do seem to *still* get added
successfully. I'm using the rich document update handler, so the
successful log entries look like this:

<record>
  <date>2008-08-15T09:50:54</date>
  <millis>1218819054600</millis>
  <sequence>6561534</sequence>
  <logger>org.apache.solr.core.SolrCore</logger>
  <level>INFO</level>
  <class>org.apache.solr.core.SolrCore</class>
  <method>execute</method>
  <thread>14</thread>
  <message>[exhibitcore] webapp=/solr path=/update/rich
params={filenumber=333-112076-85&amp;formtype=S-4/A&amp;stream.fieldname=body&amp;exhibittype=EX-3.99&amp;date=2004-02-09T00:00:00Z&amp;companyname=PROGRESSIVE+VENTURE+CAPITAL+CORP&amp;exhibitdescription=EXHIBIT+3.99&amp;id=37684831&amp;cik=1275089&amp;stream.type=html&amp;filingkey=0001193125-04-017196/1275089/FILER&amp;stateofincorporation=WV&amp;fieldnames=key,filingkey,companyname,accessionnumber,cik,date,exhibitdescription,exhibittype,exhibittypeint,filenumber,filename,formtype,stateofheadquarters,stateofincorporation&amp;filename=dex399.htm&amp;exhibittypeint=3&amp;accessionnumber=0001193125-04-017196&amp;stateofheadquarters=~&amp;key=0001193125-04-017196/1275089/FILER/dex399.htm}
status=0 QTime=9 </message>
</record>

The deletes I'm seeing in my log also seem to be working fine; I get
log entries like

<record>
  <date>2008-08-15T09:50:54</date>
  <millis>1218819054602</millis>
  <sequence>6561535</sequence>
  <logger>org.apache.solr.update.processor.UpdateRequestProcessor</logger>
  <level>INFO</level>
  <class>org.apache.solr.update.processor.LogUpdateProcessor</class>
  <method>finish</method>
  <thread>14</thread>
  <message>{delete=[0001193125-04-017196/1275096/FILER/dex231.htm]} 0
1</message>
</record>

and

<record>
  <date>2008-08-15T09:51:30</date>
  <millis>1218819090153</millis>
  <sequence>6571944</sequence>
  <logger>org.apache.solr.update.UpdateHandler</logger>
  <level>INFO</level>
  <class>org.apache.solr.update.DirectUpdateHandler2</class>
  <method>doDeletions</method>
  <thread>13</thread>
  <message>DirectUpdateHandler2 deleting and removing dups for 100788
ids</message>
</record>

After I noticed this corruption thing, I thought I'd see if I could
get it to happen again, so I went back to the original 3M-ish doc
index, and tried adding the new documents again. (If it matters, the
new docs would have come into the index in a different permutation on
this retry.) This too resulted in an index with "file not found"
problems.

The following may or may not be relevant: I built the base 3M-ish doc
index on a Windows machine, and it's a compound (.cfs) format index.
(I actually created it not with Solr, but by using the index merging
tool that comes with Lucene in order to merge three different
non-compound format indexes that I'd previously made with Solr into a
single index.) Before I started adding documents, I moved the index to
a Linux machine running a newer version of Solr/Lucene than was on the
Windows machine. The stuff described above all happened on Linux.

Any thoughts?

Thanks a bunch,
Chris

Reply via email to