We are working on importing a large number of records into Solr using DIH. We have one schema with ~2000 fields declared which map off to several database schemas so that typically each document will have ~500 fields in use. We have about 2 million "rows" which we are importing, and we are seeing < 20 minutes in test across 14 different "entity's" which really map off to one virtual document. Then we added our multiValue stuff and, well, it didn't work out nearly as well. :-)
We have several fields which are 1:M and so in our data-config.xml we might have something like this: <document name="allfund"> <entity name="FundId" dataSource="getFundManager" query="{call dbo.getFundManager_Id()}"> <field column="FundId" name="HS04C" /> <entity name="FundData" dataSource="getFundManager" query="{call dbo.getFundManager_Data(${FundId.FundId})}"> <field column="ManagerName" name="OF015" /> </entity> </entity> </document> That is a lot of database queries for a small result set which is really slowing things down for us. My question is more to ask advice, so it's a multi-parter :-) 1) Is there a way to declare in DIH an in-memory lookup where we can query for the entire Many side of the query in one database query, and match up on the PK? Then we can declare that field multiValued. 2) Assuming that isn't currently available, I thought "denormalizing" the 1:M into a delimited list and then using http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel imiterFilterFactory to tokenize. That would allow us to search on individual bits, and build something into the front-end to handle the display. That means we wouldn't use multiValued and we'd have to modify our db but we'd lose out on some of the abilities. 3) The third option was to open up DIH and try to add the first feature into it ourselves. Am I approaching this the right way? Are there other ways I haven't considered or don't know about? Thanks in advance, Tim