See https://issues.apache.org/jira/browse/SOLR-2943 . You can set up 2 DIH handlers. The first would query the "CAT_TABLE" and save it to a disk-backed cache, using DIHCacheWriter. You then would replace your 3 child entities in the 2nd DIH handler to use DIHCacheProcessor to read back the cached data. This is a little complicated to do, but it would let you just cache the data once and because it is disk-backed, will scale to whatever size the CAT_TABLE is. (For some details, see this thread: http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tt4015514.html)
A simpler method is simply to specify "cacheImpl=SortedMapBackedCache" on the 3 child entities. (This is the same as using CachedSqlEntityProcessor.) It would generate 3 in-memory caches, each with the same data. If CAT_TABLE is small, this would be adequate. In between this would be to create a disk-backed cache Impl (or use the ones at SOLR-2613 or SOLR-2948) and specify it on "cacheImpl". It would still create 3 identical caches, but they would be disk-backed and could scale beyond what in-memory can handle. James Dyer Ingram Content Group (615) 213-4311 -----Original Message----- From: O. Olson [mailto:olson_...@yahoo.it] Sent: Thursday, May 16, 2013 11:01 AM To: solr-user@lucene.apache.org Subject: Speed up import of Hierarchical Data I am using the DataImportHandler to Query a SQL Server and populate Solr. Unfortunately, SQL does not have an understanding of hierarchical relationships, and hence I use Table Joins. The following is an outline of my table structure: PROD_TABLE -> SKU (Primary Key) -> Title (varchar) -> Descr (varchar) CAT_TABLE -> SKU (Foreign Key) -> CategoryLevel (int i.e. 1, 2, 3 …) -> CategoryName (varchar) I specify the SQL Query in the db-data-config.xml file – a snippet of which looks like: <dataConfig> <dataSource driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://localhost\...."/> <document> <entity name="Product" query="SELECT SKU, Title, Descr FROM PROD_TABLE"> <field column="SKU" name="SKU" /> <field column="Title" name="Title" /> <field column="Descr" name="Descr" /> <entity name="Cat1" query="SELECT CategoryName from CAT_TABLE where SKU='${Product.SKU}' AND CategoryLevel=1"> <field column="CategoryName" name="Category1" /> </entity> <entity name="Cat2" query="SELECT CategoryName from CAT_TABLE where SKU='${Product.SKU}' AND CategoryLevel=2"> <field column="CategoryName" name="Category2" /> </entity> <entity name="Cat3" query="SELECT CategoryName from CAT_TABLE where SKU='${Product.SKU}' AND CategoryLevel=3"> <field column="CategoryName" name="Category3" /> </entity> </entity> </document> </dataConfig> It seems like the DataImportHandler handler sends out three or four queries for each Product. This results in a very slow import. Is there any way to speed this up? I would not mind an intermediate step of first extracting SQL and then putting it into Solr. Thank you for all your help. O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Speed-up-import-of-Hierarchical-Data-tp4063924.html Sent from the Solr - User mailing list archive at Nabble.com.