Hello Todd, "External merge" join helps to avoid boilerplate caching in such simple cases.
it should be something <entity name="parent" query="select ID, tp.* from TABLE_PARENT tp ORDER BY ID"> <entity name="child" query="select ID, NAME, VALUE from TABLE_CHILD ORDER BY ID" cacheKey="ID" cacheLookup="parent.ID" join="zipper" /> </entity> On Fri, Nov 13, 2015 at 10:54 PM, Todd Long <lon...@gmail.com> wrote: > We currently index using DIH along with the SortedMapBackedCache cache > implementation which has worked well until recently when we needed to index > a much larger table. We were running into memory issues using the > SortedMapBackedCache so we tried switching to the BerkleyBackedCache but > appear to have some configuration issues. I've included our basic setup > below. The issue we're running into is that it appears the Berkley database > is evicting database files (see message below) before they've completed. > When I watch the cache directory I only ever see two database files at a > time with each one being ~1GB in size (this appears to be hard coded). Is > there some additional configuration I'm missing to prevent the process from > "cleaning" up database files before the index has finished? I think this > "cleanup" continues to kickoff the caching which never completes... without > caching the indexing is ~2 hours. Any help would be greatly appreciated. > Thanks. > > Cleaning message: "Chose lowest utilized file for cleaning. fileChosen: 0x0 > ..." > > <dataConfig> > <dataSource type"JdbcDataSource" ... /> > > <document> > <entity name="parent" > query="select ID, tp.* from TABLE_PARENT tp"> > > <entity name="child" > query="select ID, NAME, VALUE from TABLE_CHILD" > > cacheImpl="org.apache.solr.handler.dataimport.BerkleyBackedCache" > cacheKey="ID" > cacheLookup="parent.ID" > persistCacheName="CHILD" > persistCacheBaseDir="/some/cache/dir" > persistCacheFieldNames="ID,NAME,VALUE" > persistCacheFieldTypes="STRING,STRING,STRING" > berkleyInternalCacheSize="1000000" > berkleyInternalShared="true" /> > > </entity> > </document> > </dataConfig> > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>