Segmentation of data imports (not just full or single record imports)
---------------------------------------------------------------------
Key: SOLR-1613
URL: https://issues.apache.org/jira/browse/SOLR-1613
Project: Solr
Issue Type: New Feature
Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Matt Inger
It is desirable to able to segment imports by a particular field in the root
entity record so that you can update a particular segment of your database when
bulk updates occur on the backend database. For instance, if a bulk update
occurs for a particular customer, it would be more efficient to be able to
update a full segment of your index for that customer rather than issuing
updates for every single user in your index for that customer, or updating the
entire index. That would be a waste of processing power.
Instead, it would be more efficient to specify that a particular document field
in the root entity was a segmentation field, and define an additional query on
the root entity (i'm basing my example on a jdbc based datasource):
<entity name="user" pk="userid" segment="customerid" ...
query="..." segmentQuery="select ... where
customerid=${dataimporter.request.segment}" />
Then, when you request a segment update, you specify the segment as a parameter
to your request
/solr/db/dataimport?command=segment-import&segment=1000
I've worked out the code segments required to do this for the JdbcDataSource,
though I'm not sure what additional changes would be necessary for other
datasource types, and am attaching a patch which includes these changes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.