[
https://issues.apache.org/jira/browse/SOLR-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Noble Paul updated SOLR-1120:
-----------------------------
Attachment: SOLR-1120.patch
> Simplify EntityProcessor API
> ----------------------------
>
> Key: SOLR-1120
> URL: https://issues.apache.org/jira/browse/SOLR-1120
> Project: Solr
> Issue Type: Improvement
> Components: contrib - DataImportHandler
> Affects Versions: 1.3
> Reporter: Shalin Shekhar Mangar
> Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-1120.patch, SOLR-1120.patch, SOLR-1120.patch,
> SOLR-1120.patch, SOLR-1120.patch, SOLR-1120.patch
>
>
> Writing an EntityProcessor is deceptively complex. There are so many gotchas.
> I propose the following:
> # Extract out the Transformer application logic from EntityProcessor and add
> it to DocBuilder. Then EntityProcessor do not need to call applyTransformer
> or know about rowIterator and getFromRowCache() methods.
> # Change the meaning of EntityProcessor#destroy to be called on end of
> parent's row -- Right now init is called once per parent row but destroy
> actually means the end of import. In fact, there is no correct way for an
> entity processor to do clean up right now. Most do clean up when returning
> null (end of data) but with the introduction of $skipDoc, a transformer can
> return $skipDoc and the entity processor will never get a chance to clean up
> for the current init.
> # EntityProcessor will use the EventListener API to listen for import end.
> This should be used by EntityProcessor to do a final cleanup.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.