We currently index using DIH along with the SortedMapBackedCache cache
implementation which has worked well until recently when we needed to index
a much larger table. We were running into memory issues using the
SortedMapBackedCache so we tried switching to the BerkleyBackedCache but
appear to have some configuration issues. I've included our basic setup
below. The issue we're running into is that it appears the Berkley database
is evicting database files (see message below) before they've completed.
When I watch the cache directory I only ever see two database files at a
time with each one being ~1GB in size (this appears to be hard coded). Is
there some additional configuration I'm missing to prevent the process from
"cleaning" up database files before the index has finished? I think this
"cleanup" continues to kickoff the caching which never completes... without
caching the indexing is ~2 hours. Any help would be greatly appreciated.
Thanks.

Cleaning message: "Chose lowest utilized file for cleaning. fileChosen: 0x0
..."

<dataConfig>
  <dataSource type"JdbcDataSource" ... />

  <document>
    <entity name="parent"
               query="select ID, tp.* from TABLE_PARENT tp">

      <entity name="child"
                 query="select ID, NAME, VALUE from TABLE_CHILD"
                
cacheImpl="org.apache.solr.handler.dataimport.BerkleyBackedCache"
                 cacheKey="ID"
                 cacheLookup="parent.ID"
                 persistCacheName="CHILD"
                 persistCacheBaseDir="/some/cache/dir"
                 persistCacheFieldNames="ID,NAME,VALUE"
                 persistCacheFieldTypes="STRING,STRING,STRING"
                 berkleyInternalCacheSize="1000000"
                 berkleyInternalShared="true" />

    </entity>
  </document>
</dataConfig>



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to