On 2/15/2018 2:00 AM, Srinivas Kashyap wrote: > I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the > child entities in data-config.xml. And i'm using the same for full-import > only. And in the beginning of my implementation, i had written delta-import > query to index the modified changes. But my requirement grew and i have 17 > child entities for a single parent entity now. When doing delta-import for > huge data, the number of requests being made to datasource(database) became > more and CPU utilization was 100% when concurrent users started modifying the > data. For this instead of calling delta-import which imports based on last > index time, I did full-import('SortedMapBackedCache' ) based on last index > time. > > Though the parent entity query would return only records that are modified, > the child entity queries pull all the data from the database and the indexing > happens 'in-memory' which is causing the JVM memory go out of memory.
Can you provide your DIH config file (with passwords redacted) and the precise URL you are using to initiate dataimport? Also, I would like to know what field you have defined as your uniqueKey. I may have more questions about the data in your system, depending on what I see. That cache implementation should only cache entries from the database that are actually requested. If your query is correctly defined, it should not pull all records from the DB table. > Is there a way to specify in the child query entity to pull the record > related to parent entity in the full-import mode. If I am understanding your question correctly, this is one of the fairly basic things that DIH does. Look at this config example in the reference guide: https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#configuring-the-dih-configuration-file In the entity named feature in that example config, the query string uses ${item.ID} to reference the ID column from the parent entity, which is item. I should warn you that a cached entity does not always improve performance. This is particularly true if the lookup into the cache is the information that goes to your uniqueKey field. When the lookup is by uniqueKey, every single row requested from the database will be used exactly once, so there's not really any point to caching it. Thanks, Shawn