How did I miss that? Thanks, I will try that as it seems to be "in memory" lookup solution I needed.
Thanks Erick, Tim -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, April 14, 2011 10:58 AM To: solr-user@lucene.apache.org Subject: Re: Fast DIH with 1:M multValue entities I'm not sure this applies, but have you looked at http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor <http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor> Best Erick On Thu, Apr 14, 2011 at 9:12 AM, Tim Gilbert <tim.gilb...@morningstar.com>wrote: > We are working on importing a large number of records into Solr using > DIH. We have one schema with ~2000 fields declared which map off to > several database schemas so that typically each document will have ~500 > fields in use. We have about 2 million "rows" which we are importing, > and we are seeing < 20 minutes in test across 14 different "entity's" > which really map off to one virtual document. Then we added our > multiValue stuff and, well, it didn't work out nearly as well. :-) > > > > We have several fields which are 1:M and so in our data-config.xml we > might have something like this: > > > > <document name="allfund"> > > <entity name="FundId" dataSource="getFundManager" query="{call > dbo.getFundManager_Id()}"> > > <field column="FundId" name="HS04C" /> > > <entity name="FundData" dataSource="getFundManager" > > query="{call dbo.getFundManager_Data(${FundId.FundId})}"> > > > > <field column="ManagerName" name="OF015" /> > > </entity> > > </entity> > > </document> > > > > That is a lot of database queries for a small result set which is really > slowing things down for us. > > > > My question is more to ask advice, so it's a multi-parter :-) > > > > 1) Is there a way to declare in DIH an in-memory > lookup where we can query for the entire Many side of the query in one > database query, and match up on the PK? Then we can declare that field > multiValued. > > 2) Assuming that isn't currently available, I thought > "denormalizing" the 1:M into a delimited list and then using > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel > imiterFilterFactory to tokenize. That would allow us to search on > individual bits, and build something into the front-end to handle the > display. That means we wouldn't use multiValued and we'd have to modify > our db but we'd lose out on some of the abilities. > > 3) The third option was to open up DIH and try to add > the first feature into it ourselves. > > > > Am I approaching this the right way? Are there other ways I haven't > considered or don't know about? > > > > Thanks in advance, > > > > Tim > >