Simplify EntityProcessor API
----------------------------

                 Key: SOLR-1120
                 URL: https://issues.apache.org/jira/browse/SOLR-1120
             Project: Solr
          Issue Type: Improvement
          Components: contrib - DataImportHandler
    Affects Versions: 1.3
            Reporter: Shalin Shekhar Mangar
            Assignee: Shalin Shekhar Mangar
             Fix For: 1.4


Writing an EntityProcessor is deceptively complex. There are so many gotchas.

I propose the following:
# Extract out the Transformer application logic from EntityProcessor and add it 
to DocBuilder. Then EntityProcessor do not need to call applyTransformer or 
know about rowIterator and getFromRowCache() methods.
# Change the meaning of EntityProcessor#destroy to be called on end of parent's 
row -- Right now init is called once per parent row but destroy actually means 
the end of import. In fact, there is no correct way for an entity processor to 
do clean up right now. Most do clean up when returning null (end of data) but 
with the introduction of $skipDoc, a transformer can return $skipDoc and the 
entity processor will never get a chance to clean up for the current init.
# EntityProcessor will use the EventListener API to listen for import end. This 
should be used by EntityProcessor to do a final cleanup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to