Hello Todd,

"External merge" join helps to avoid boilerplate caching in such simple
cases.

it should be something

  <entity name="parent"
               query="select ID, tp.* from TABLE_PARENT tp ORDER  BY ID">

      <entity name="child"
                 query="select ID, NAME, VALUE from TABLE_CHILD ORDER  BY
ID"

                 cacheKey="ID"
                 cacheLookup="parent.ID"
                 join="zipper" />

    </entity>


On Fri, Nov 13, 2015 at 10:54 PM, Todd Long <lon...@gmail.com> wrote:

> We currently index using DIH along with the SortedMapBackedCache cache
> implementation which has worked well until recently when we needed to index
> a much larger table. We were running into memory issues using the
> SortedMapBackedCache so we tried switching to the BerkleyBackedCache but
> appear to have some configuration issues. I've included our basic setup
> below. The issue we're running into is that it appears the Berkley database
> is evicting database files (see message below) before they've completed.
> When I watch the cache directory I only ever see two database files at a
> time with each one being ~1GB in size (this appears to be hard coded). Is
> there some additional configuration I'm missing to prevent the process from
> "cleaning" up database files before the index has finished? I think this
> "cleanup" continues to kickoff the caching which never completes... without
> caching the indexing is ~2 hours. Any help would be greatly appreciated.
> Thanks.
>
> Cleaning message: "Chose lowest utilized file for cleaning. fileChosen: 0x0
> ..."
>
> <dataConfig>
>   <dataSource type"JdbcDataSource" ... />
>
>   <document>
>     <entity name="parent"
>                query="select ID, tp.* from TABLE_PARENT tp">
>
>       <entity name="child"
>                  query="select ID, NAME, VALUE from TABLE_CHILD"
>
> cacheImpl="org.apache.solr.handler.dataimport.BerkleyBackedCache"
>                  cacheKey="ID"
>                  cacheLookup="parent.ID"
>                  persistCacheName="CHILD"
>                  persistCacheBaseDir="/some/cache/dir"
>                  persistCacheFieldNames="ID,NAME,VALUE"
>                  persistCacheFieldTypes="STRING,STRING,STRING"
>                  berkleyInternalCacheSize="1000000"
>                  berkleyInternalShared="true" />
>
>     </entity>
>   </document>
> </dataConfig>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>

Reply via email to