Fast DIH with 1:M multValue entities

Tim Gilbert Thu, 14 Apr 2011 06:12:36 -0700

We are working on importing a large number of records into Solr using
DIH.  We have one schema with ~2000 fields declared which map off to
several database schemas so that typically each document will have ~500
fields in use.  We have about 2 million "rows" which we are importing,
and we are seeing < 20 minutes in test across 14 different "entity's"
which really map off to one virtual document.  Then we added our
multiValue stuff and, well, it didn't work out nearly as well. :-)


 

We have several fields which are 1:M and so in our data-config.xml we
might have something like this:

 

<document name="allfund">

<entity name="FundId" dataSource="getFundManager" query="{call
dbo.getFundManager_Id()}">

<field column="FundId" name="HS04C" />

<entity name="FundData" dataSource="getFundManager" 

query="{call dbo.getFundManager_Data(${FundId.FundId})}">

 

<field column="ManagerName" name="OF015" />

</entity>

</entity>

</document>

 

That is a lot of database queries for a small result set which is really
slowing things down for us.

 

My question is more to ask advice, so it's a multi-parter :-)

 

1)                   Is there a way to declare in DIH an in-memory
lookup where we can query for the entire Many side of the query in one
database query, and match up on the PK?  Then we can declare that field
multiValued.

2)                   Assuming that isn't currently available, I thought
"denormalizing" the 1:M into a delimited list and then using
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel
imiterFilterFactory to tokenize.  That would allow us to search on
individual bits, and build something into the front-end to handle the
display.  That means we wouldn't use multiValued and we'd have to modify
our db but we'd lose out on some of the abilities.

3)                   The third option was to open up DIH and try to add
the first feature into it ourselves.

 

Am I approaching this the right way?  Are there other ways I haven't
considered or don't know about?

 

Thanks in advance,

 

Tim

Fast DIH with 1:M multValue entities

Reply via email to