segment gets corrupted (after background merge ?)

Stéphane Delprat Mon, 10 Jan 2011 02:53:31 -0800

Hi,

We are using :
Solr Specification Version: 1.4.1
Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42
Lucene Specification Version: 2.9.3
Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55


# java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)

We want to index 4M docs in one core (and when it works fine we will addother cores with 2M on the same server) (1 doc ~= 1kB)

We use SOLR replication every 5 minutes to update the slave server(queries are executed on the slave only)

Documents are changing very quickly, during a normal day we will haveapprox :

* 200 000 updated docs
* 1000 new docs
* 200 deleted docs


I attached the last good checkIndex : solr20110107.txt
And the corrupted one : solr20110110.txt

This is not the first time a segment gets corrupted on this server,that's why I ran frequent "checkIndex". (but as you can see the firstsegment is 1.800.000 docs and it works fine!)



I can't find any "SEVER" "FATAL" or "exception" in the Solr logs.


I also attached my schema.xml and solrconfig.xml


Is there something wrong with what we are doing ? Do you need other info ?


Thanks,

Opening index @ /solr/multicore/core1/data/index/

Segments file=segments_i7t numSegments=9 version=FORMAT_DIAGNOSTICS [Lucene 2.9]
  1 of 9: name=_ncc docCount=1841685
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=6,683.447
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun 
Microsystems Inc.}
    has deletions [delFileName=_ncc_13m.del]
    test: open reader.........OK [105940 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...OK [17952652 terms; 174113812 terms/docs pairs; 
248678841 tokens]
    test: stored fields.......OK [51585300 total field count; avg 29.719 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  2 of 9: name=_nqt docCount=431889
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=1,671.375
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun 
Microsystems Inc.}
    has deletions [delFileName=_nqt_gt.del]
    test: open reader.........OK [10736 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...OK [5211271 terms; 39824029 terms/docs pairs; 
67787288 tokens]
    test: stored fields.......OK [12562924 total field count; avg 29.83 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  3 of 9: name=_ol7 docCount=913886
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=3,567.63
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun 
Microsystems Inc.}
    has deletions [delFileName=_ol7_3.del]
    test: open reader.........OK [11 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...OK [9825896 terms; 93954470 terms/docs pairs; 
152947518 tokens]
    test: stored fields.......OK [29587930 total field count; avg 32.376 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  4 of 9: name=_ol2 docCount=1011
    compound=false
    hasProx=true
    numFiles=8
    size (MB)=6.959
    diagnostics = {os.version=2.6.26-2-amd64, os=Linux, lucene.version=2.9.3 
951790 - 2010-06-06 01:30:55, source=flush, os.arch=amd64, 
java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [38 fields]
    test: field norms.........OK [38 fields]
    test: terms, freq, prox...OK [54205 terms; 220705 terms/docs pairs; 389336 
tokens]
    test: stored fields.......OK [27402 total field count; avg 27.104 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  5 of 9: name=_ol3 docCount=1000
    compound=false
    hasProx=true
    numFiles=8
    size (MB)=6.944
    diagnostics = {os.version=2.6.26-2-amd64, os=Linux, lucene.version=2.9.3 
951790 - 2010-06-06 01:30:55, source=flush, os.arch=amd64, 
java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [33 fields]
    test: field norms.........OK [33 fields]
    test: terms, freq, prox...OK [54825 terms; 221934 terms/docs pairs; 388998 
tokens]
    test: stored fields.......OK [26318 total field count; avg 26.318 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  6 of 9: name=_ol4 docCount=7191
    compound=false
    hasProx=true
    numFiles=8
    size (MB)=44.441
    diagnostics = {os.version=2.6.26-2-amd64, os=Linux, lucene.version=2.9.3 
951790 - 2010-06-06 01:30:55, source=flush, os.arch=amd64, 
java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [50 fields]
    test: field norms.........OK [50 fields]
    test: terms, freq, prox...OK [236339 terms; 1056365 terms/docs pairs; 
1755826 tokens]
    test: stored fields.......OK [285974 total field count; avg 39.768 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  7 of 9: name=_oli docCount=10104
    compound=false
    hasProx=true
    numFiles=8
    size (MB)=17.072
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun 
Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [44 fields]
    test: field norms.........OK [44 fields]
    test: terms, freq, prox...OK [108959 terms; 645138 terms/docs pairs; 765380 
tokens]
    test: stored fields.......OK [417692 total field count; avg 41.339 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  8 of 9: name=_olh docCount=762
    compound=false
    hasProx=true
    numFiles=8
    size (MB)=1.619
    diagnostics = {os.version=2.6.26-2-amd64, os=Linux, lucene.version=2.9.3 
951790 - 2010-06-06 01:30:55, source=flush, os.arch=amd64, 
java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [36 fields]
    test: field norms.........OK [36 fields]
    test: terms, freq, prox...OK [13859 terms; 51106 terms/docs pairs; 65556 
tokens]
    test: stored fields.......OK [21564 total field count; avg 28.299 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  9 of 9: name=_olj docCount=232
    compound=false
    hasProx=true
    numFiles=8
    size (MB)=5.225
    diagnostics = {os.version=2.6.26-2-amd64, os=Linux, lucene.version=2.9.3 
951790 - 2010-06-06 01:30:55, source=flush, os.arch=amd64, 
java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [34 fields]
    test: field norms.........OK [34 fields]
    test: terms, freq, prox...OK [12307 terms; 36070 terms/docs pairs; 60345 
tokens]
    test: stored fields.......OK [7817 total field count; avg 33.694 fields per 
doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

No problems were detected with this index.

<?xml version="1.0" ?>
<!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

<schema name="example core zero" version="1.1">

  <types>
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>
    <fieldtype name="binary" class="solr.BinaryField"/>
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/>
    <fieldType name="pint" class="solr.IntField" omitNorms="true"/>
    <fieldType name="plong" class="solr.LongField" omitNorms="true"/>
    <fieldType name="pfloat" class="solr.FloatField" omitNorms="true"/>
    <fieldType name="pdouble" class="solr.DoubleField" omitNorms="true"/>
    <fieldType name="pdate" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="random" class="solr.RandomSortField" indexed="true"/>
    <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>

    <fieldType name="text_no_ws" class="solr.TextField" positionIncrementGap="100">
    </fieldType>

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
               />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
               />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
    </fieldType>
    <fieldType name="textTight" class="solr.TextField" positionIncrementGap="100" >
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
        <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
             possible with WordDelimiterFilter in conjuncton with stemming. -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>
    <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
               />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
    <fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.TrimFilterFactory"/>
        <filter class="solr.PatternReplaceFilterFactory"
                pattern="([^a-z])" replacement="" replace="all"
       />
      </analyzer>
    </fieldType>


    <fieldType name="charfilthtmlmap" class="solr.TextField">
      <analyzer>
        <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>

    <fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" >
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>
      </analyzer>
    </fieldtype>
    <fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
    <fieldType name="bucketFirstLetter" class="solr.TextField" sortMissingLast="true" omitNorms="true">
      <analyzer type="index">
        <tokenizer class="solr.PatternTokenizerFactory" pattern="http:\/\/([a-z0-9]).*" group="1"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
      </analyzer>
    </fieldType>
    <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField"/>
 </types>



 <fields>
  <!-- general -->
  <field name="id"                      type="string"       indexed="true"  stored="true" multiValued="false" required="true"/>
  <field name="type"                    type="string"       indexed="true"  stored="true"/>
  <field name="language"                type="string"       indexed="true"  stored="true"/>
  <field name="status"                  type="string"       indexed="true"  stored="true"/>
  <field name="blog_title"              type="string"       indexed="true"  stored="true"/>
  <field name="source"                  type="text"         indexed="true"  stored="true"/>
  <field name="post_tags"               type="text_no_ws"   indexed="true"  stored="true" multiValued="true" omitNorms="true"/>
  <field name="post_media_tags"         type="text_no_ws"   indexed="true"  stored="true" multiValued="true" omitNorms="true"/>
  <field name="flags"                   type="text_no_ws"   indexed="true"  stored="true" multiValued="true" omitNorms="true"/>
  <field name="blog_topics"             type="text_no_ws"   indexed="true"  stored="true" multiValued="true" omitNorms="true"/>
  <field name="blog_keywords"           type="text_no_ws"   indexed="true"  stored="true" multiValued="true" omitNorms="true"/>
  <field name="blog_favorites"          type="text_no_ws"   indexed="true"  stored="true" multiValued="true" omitNorms="true"/>
  <field name="post_categories"         type="text_no_ws"   indexed="true"  stored="true" multiValued="true" omitNorms="true"/>
  <field name="blog_visits"             type="pint"         indexed="true"  stored="true"/>
  <field name="blog_visits_weekly"      type="pint"         indexed="true"  stored="true"/>
  <field name="blog_visits_daily"       type="pint"         indexed="true"  stored="true"/>
  <field name="post_visits"             type="pint"         indexed="true"  stored="true"/>
  <field name="post_visits_weekly"      type="pint"         indexed="true"  stored="true"/>
  <field name="post_visits_daily"       type="pint"         indexed="true"  stored="true"/>
  <field name="blog_comments"           type="pint"         indexed="true"  stored="true"/>
  <field name="blog_comments_weekly"    type="pint"         indexed="true"  stored="true"/>
  <field name="blog_comments_daily"     type="pint"         indexed="true"  stored="true"/>
  <field name="post_comments"           type="pint"         indexed="true"  stored="true"/>
  <field name="post_comments_weekly"    type="pint"         indexed="true"  stored="true"/>
  <field name="post_comments_daily"     type="pint"         indexed="true"  stored="true"/>
  <field name="blog_posts"              type="pint"         indexed="true"  stored="true"/>
  <field name="blog_url_facet"          type="bucketFirstLetter" indexed="true" stored="false"/>
  <field name="blog_id"                 type="pint"         indexed="true"  stored="true"/>
  <field name="post_id"                 type="pint"         indexed="true"  stored="true"/>
  <field name="user_id"                 type="pint"         indexed="true"  stored="true"/>

  <copyField source="blog_url"      dest="blog_url_facet"/>
  <!--copyField source="post_source"   dest="source"/-->

  <field name="blog_creadate"   type="date"     indexed="true"  stored="true"/>
  <field name="post_creadate"   type="date"     indexed="true"  stored="true"/>
  <field name="post_pubdate"    type="date"     indexed="true"  stored="true"/>
  <field name="post_update"     type="date"     indexed="true"  stored="true"/>
  <field name="solr_adddate"    type="date"     indexed="true"  stored="true" default="NOW"/>

   <dynamicField name="blog_*"  type="string"   indexed="true"  stored="true"/>
   <dynamicField name="post_*"  type="string"   indexed="false" stored="true"/>
   <dynamicField name="media_*" type="string"   indexed="true"  stored="true"/>
   <dynamicField name="user_*"  type="string"   indexed="false" stored="true"/>
 </fields>

 <!-- field to use to determine and enforce document uniqueness. -->
 <uniqueKey>id</uniqueKey>

 <!-- field for the QueryParser to use when an explicit fieldname is absent -->
 <defaultSearchField>blog_title</defaultSearchField>

 <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
 <solrQueryParser defaultOperator="OR"/>
</schema>

<?xml version="1.0" encoding="UTF-8" ?>
<!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

<!--
 This is a stripped down config file used for a simple example...
 It is *not* a good example to work from.
-->
<config>

<indexDefaults>
   <!-- Values here affect all index writers and act as a default unless overridden. -->
    <useCompoundFile>false</useCompoundFile>

    <mergeFactor>10</mergeFactor>

    <!-- Sets the amount of RAM that may be used by Lucene indexing
      for buffering added documents and deletions before they are
      flushed to the Directory.  -->
    <ramBufferSizeMB>32</ramBufferSizeMB>
    <maxMergeDocs>2147483647</maxMergeDocs>
    <maxFieldLength>10000</maxFieldLength>
    <writeLockTimeout>1000</writeLockTimeout>
    <commitLockTimeout>10000</commitLockTimeout>

    <!--
      This option specifies which Lucene LockFactory implementation to use.

      single = SingleInstanceLockFactory - suggested for a read-only index
               or when there is no possibility of another process trying
               to modify the index.
      native = NativeFSLockFactory  - uses OS native file locking
      simple = SimpleFSLockFactory  - uses a plain file for locking

      (For backwards compatibility with Solr 1.2, 'simple' is the default
       if not specified.)
    -->
    <lockType>native</lockType>
  </indexDefaults>

  <mainIndex>
    <!-- options specific to the main on-disk lucene index -->
    <useCompoundFile>false</useCompoundFile>
    <ramBufferSizeMB>32</ramBufferSizeMB>
    <mergeFactor>10</mergeFactor>
    <maxMergeDocs>2147483647</maxMergeDocs>
    <maxFieldLength>10000</maxFieldLength>

    <!-- inherit from indexDefaults <maxFieldLength>10000</maxFieldLength> -->

    <!-- If true, unlock any held write or commit locks on startup.
         This defeats the locking mechanism that allows multiple
         processes to safely access a lucene index, and should be
         used with care.
         This is not needed if lock type is 'none' or 'single'
     -->
    <unlockOnStartup>false</unlockOnStartup>

    <!-- If true, IndexReaders will be reopened (often more efficient) instead
         of closed and then opened.  -->
    <reopenReaders>true</reopenReaders>

    <!--
        Custom deletion policies can specified here. The class must
        implement org.apache.lucene.index.IndexDeletionPolicy.

        http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/index/IndexDeletionPolicy.html

        The standard Solr IndexDeletionPolicy implementation supports deleting
        index commit points on number of commits, age of commit point and
        optimized status.

        The latest commit point should always be preserved regardless
        of the criteria.
    -->
    <deletionPolicy class="solr.SolrDeletionPolicy">
      <!-- The number of commit points to be kept -->
      <str name="maxCommitsToKeep">1</str>
      <!-- The number of optimized commit points to be kept -->
      <str name="maxOptimizedCommitsToKeep">0</str>
      <!--
          Delete all commit points once they have reached the given age.
          Supports DateMathParser syntax e.g.

          <str name="maxCommitAge">30MINUTES</str>
          <str name="maxCommitAge">1DAY</str>
      -->
    </deletionPolicy>

    <!--  To aid in advanced debugging, you may turn on IndexWriter debug logging.
      Setting to true will set the file that the underlying Lucene IndexWriter
      will write its debug infostream to.  -->
     <infoStream file="INFOSTREAM.txt">false</infoStream>

  </mainIndex>

  <!--  Enables JMX if and only if an existing MBeanServer is found, use this
    if you want to configure JMX through JVM parameters. Remove this to disable
    exposing Solr configuration and statistics to JMX.

        If you want to connect to a particular server, specify the agentId
        e.g. <jmx agentId="myAgent" />

        If you want to start a new MBeanServer, specify the serviceUrl
        e.g <jmx serviceUrl="service:jmx:rmi:///jndi/rmi://localhost:9999/solr"/>

        For more details see http://wiki.apache.org/solr/SolrJmx
  -->
  <jmx />

  <updateHandler class="solr.DirectUpdateHandler2" />

  <requestDispatcher handleSelect="true" >
    <requestParsers enableRemoteStreaming="false" multipartUploadLimitInKB="2048" />
  </requestDispatcher>

  <requestHandler name="standard" class="solr.StandardRequestHandler" default="true" />
  <requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />
  <requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />


  <requestHandler name="bsportal" class="solr.SearchHandler" default="false">
     <lst name="defaults">
       <str name="echoParams">explicit</str>
     </lst>
     <lst name="appends">
       <str name="fq">status:portal</str>
     </lst>
  </requestHandler>

<requestHandler name="/replication" class="solr.ReplicationHandler" >
    <lst name="master">
    <str name="enable">${enable.master:false}</str>
        <!--Replicate on 'startup' and 'commit'. 'optimize' is also a valid value for replicateAfter. -->
        <str name="replicateAfter">startup</str>
        <str name="replicateAfter">commit</str>
        <!--Create a backup after 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries of this config string.  Note that this is just for backup, replication does not require this. -->
        <!-- <str name="backupAfter">optimize</str> -->

        <!--If configuration files need to be replicated give the names here, separated by comma -->
        <str name="confFiles">solrconfig_slave.xml:solrconfig.xml,schema.xml,stopwords.txt,elevate.xml,protwords.txt,spellings.txt,synonyms.txt</str>
       <!--The default value of reservation is 10 secs.See the documentation below . Normally , you should not need to specify this -->
        <str name="commitReserveDuration">00:00:10</str>
    </lst>
    <lst name="slave">
    <str name="enable">${enable.slave:false}</str>
        <!--fully qualified url for the replication handler of master . It is possible to pass on this as a request param for the fetchindex command-->
        <str name="masterUrl">http://solr001.bhpr.net:8983/solr/${solr.core.name}/replication</str>

        <!--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically.
         But a fetchindex can be triggered from the admin or the http API -->
        <str name="pollInterval">00:05:00</str>
        <!-- THE FOLLOWING PARAMETERS ARE USUALLY NOT REQUIRED-->
        <!--to use compression while transferring the index files. The possible values are internal|external
         if the value is 'external' make sure that your master Solr has the settings to honour the accept-encoding header.
         see here for details http://wiki.apache.org/solr/SolrHttpCompression
         If it is 'internal' everything will be taken care of automatically.
         USE THIS ONLY IF YOUR BANDWIDTH IS LOW . THIS CAN ACTUALLY SLOWDOWN REPLICATION IN A LAN-->
        <str name="compression">internal</str>
        <!--The following values are used when the slave connects to the master to download the index files.
         Default values implicitly set as 5000ms and 10000ms respectively. The user DOES NOT need to specify
         these unless the bandwidth is extremely low or if there is an extremely high latency-->
        <str name="httpConnTimeout">5000</str>
        <str name="httpReadTimeout">10000</str>

        <!-- If HTTP Basic authentication is enabled on the master, then the slave can be configured with the following -->
        <!--<str name="httpBasicAuthUser">username</str>
        <str name="httpBasicAuthPassword">password</str> -->

     </lst>
</requestHandler>


 <!-- config for the admin interface -->
  <admin>
    <defaultQuery>solr</defaultQuery>
  </admin>
</config>

Opening index @ /solr/multicore/core1/data/index/

Segments file=segments_j1n numSegments=11 version=FORMAT_DIAGNOSTICS [Lucene 
2.9]
  1 of 11: name=_ncc docCount=1841685
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=6,683.447
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun 
Microsystems Inc.}
    has deletions [delFileName=_ncc_1vu.del]
    test: open reader.........OK [214927 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...OK [17952652 terms; 174113812 terms/docs pairs; 
219620042 tokens]
    test: stored fields.......OK [47668964 total field count; avg 29.303 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  2 of 11: name=_nqt docCount=431889
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=1,671.375
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun 
Microsystems Inc.}
    has deletions [delFileName=_nqt_142.del]
    test: open reader.........OK [21062 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...OK [5211271 terms; 39824029 terms/docs pairs; 
64960271 tokens]
    test: stored fields.......OK [12215322 total field count; avg 29.733 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  3 of 11: name=_ol7 docCount=913886
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=3,567.739
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun 
Microsystems Inc.}
    has deletions [delFileName=_ol7_ox.del]
    test: open reader.........OK [23340 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...OK [9825896 terms; 93954470 terms/docs pairs; 
146434685 tokens]
    test: stored fields.......OK [28752231 total field count; avg 32.286 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  4 of 11: name=_p40 docCount=470035
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=1,946.747
    diagnostics = {optimize=true, mergeFactor=6, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun 
Microsystems Inc.}
    has deletions [delFileName=_p40_bj.del]
    test: open reader.........OK [9299 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...ERROR [term source:margolisphil docFreq=1 != num 
docs seen 0 + num docs deleted 0]
java.lang.RuntimeException: term source:margolisphil docFreq=1 != num docs seen 
0 + num docs deleted 0
        at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
    test: stored fields.......OK [15454281 total field count; avg 33.543 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]
FAILED
    WARNING: fixIndex() would remove reference to this segment; full exception:
java.lang.RuntimeException: Term Index test failed
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)

  5 of 11: name=_p9v docCount=319992
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=1,255.792
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_p9v_6r.del]
    test: open reader.........OK [10132 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...OK [3947589 terms; 34468256 terms/docs pairs; 
49975218 tokens]
    test: stored fields.......OK [10717982 total field count; avg 34.59 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  6 of 11: name=_phe docCount=264148
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=928.977
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_phe_10.del]
    test: open reader.........OK [6225 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...OK [3200899 terms; 26804334 terms/docs pairs; 
38324745 tokens]
    test: stored fields.......OK [8553750 total field count; avg 33.164 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  7 of 11: name=_phy docCount=24477
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=112.431
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_phy_3.del]
    test: open reader.........OK [3 deleted docs]
    test: fields..............OK [51 fields]
    test: field norms.........OK [51 fields]
    test: terms, freq, prox...OK [476073 terms; 3114433 terms/docs pairs; 
5066955 tokens]
    test: stored fields.......OK [874832 total field count; avg 35.745 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  8 of 11: name=_pi9 docCount=10309
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=28.403
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_pi9_3.del]
    test: open reader.........OK [4 deleted docs]
    test: fields..............OK [44 fields]
    test: field norms.........OK [44 fields]
    test: terms, freq, prox...OK [145803 terms; 814384 terms/docs pairs; 
1056258 tokens]
    test: stored fields.......OK [347796 total field count; avg 33.75 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  9 of 11: name=_pik docCount=8822
    compound=false
    hasProx=true
    numFiles=8
    size (MB)=36.06
    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, 
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [42 fields]
    test: field norms.........OK [42 fields]
    test: terms, freq, prox...OK [171731 terms; 978687 terms/docs pairs; 
1464724 tokens]
    test: stored fields.......OK [322834 total field count; avg 36.594 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  10 of 11: name=_pil docCount=1026
    compound=false
    hasProx=true
    numFiles=8
    size (MB)=1.869
    diagnostics = {os.version=2.6.26-2-amd64, os=Linux, lucene.version=2.9.3 
951790 - 2010-06-06 01:30:55, source=flush, os.arch=amd64, 
java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [41 fields]
    test: field norms.........OK [41 fields]
    test: terms, freq, prox...OK [18080 terms; 71965 terms/docs pairs; 90078 
tokens]
    test: stored fields.......OK [38005 total field count; avg 37.042 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

  11 of 11: name=_pim docCount=1025
    compound=false
    hasProx=true
    numFiles=9
    size (MB)=2.741
    diagnostics = {os.version=2.6.26-2-amd64, os=Linux, lucene.version=2.9.3 
951790 - 2010-06-06 01:30:55, source=flush, os.arch=amd64, 
java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_pim_1.del]
    test: open reader.........OK [1 deleted docs]
    test: fields..............OK [39 fields]
    test: field norms.........OK [39 fields]
    test: terms, freq, prox...OK [28656 terms; 84604 terms/docs pairs; 116912 
tokens]
    test: stored fields.......OK [30919 total field count; avg 30.194 fields 
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector 
fields per doc]

WARNING: 1 broken segments (containing 460736 documents) detected
WARNING: 460736 documents will be lost

NOTE: will write new segments file in 5 seconds; this will remove 460736 docs 
from the index. THIS IS YOUR LAST CHANCE TO CTRL+C!
  5...
  4...
  3...
  2...
  1...
Writing...
OK
Wrote new segments file "segments_j1o"

segment gets corrupted (after background merge ?)

Reply via email to