On 2/15/2018 2:00 AM, Srinivas Kashyap wrote:
> I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
> child entities in data-config.xml. And i'm using the same for full-import 
> only. And in the beginning of my implementation, i had written delta-import 
> query to index the modified changes. But my requirement grew and i have 17 
> child entities for a single parent entity now. When doing delta-import for 
> huge data, the number of requests being made to datasource(database)  became 
> more and CPU utilization was 100% when concurrent users started modifying the 
> data. For this instead of calling delta-import which imports based on last 
> index time, I did full-import('SortedMapBackedCache' ) based on last index 
> time.
>
> Though the parent entity query would return only records that are modified, 
> the child entity queries pull all the data from the database and the indexing 
> happens 'in-memory' which is causing the JVM memory go out of memory.

Can you provide your DIH config file (with passwords redacted) and the
precise URL you are using to initiate dataimport?  Also, I would like to
know what field you have defined as your uniqueKey.  I may have more
questions about the data in your system, depending on what I see.

That cache implementation should only cache entries from the database
that are actually requested.  If your query is correctly defined, it
should not pull all records from the DB table.

> Is there a way to specify in the child query entity to pull the record 
> related to parent entity in the full-import mode.

If I am understanding your question correctly, this is one of the fairly
basic things that DIH does.  Look at this config example in the
reference guide:

https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#configuring-the-dih-configuration-file

In the entity named feature in that example config, the query string
uses ${item.ID} to reference the ID column from the parent entity, which
is item.

I should warn you that a cached entity does not always improve
performance.  This is particularly true if the lookup into the cache is
the information that goes to your uniqueKey field.  When the lookup is
by uniqueKey, every single row requested from the database will be used
exactly once, so there's not really any point to caching it.

Thanks,
Shawn

Reply via email to