On 4/24/2014 9:24 AM, Yuval Dotan wrote:
I want to use the DIH component in order to import data from old postgresql
DB.
I want to be able to recover from errors and crashes.
If an error occurs I should be able to restart and continue indexing from
where it stopped.
Is the DIH good enough for my requirements ?
If not is it possible to extend one of its classes in order to support the
recovery?

The entity in the Dataimport Handler (DIH) config has an "onError" attribute.

http://wiki.apache.org/solr/DataImportHandler#Schema_for_the_data_config
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors

But honestly, if you want a really robust Java program that indexes to Solr and does precisely what you want, you may be better off writing it yourself using SolrJ and JDBC. DIH is powerful and efficient, but when you write the program yourself, you can do anything you want with your data.

You also have the possibility of resuming an import after a Solr crash. Because DIH is embedded in Solr and doesn't save any kind of state data about an import in progress, that's pretty much impossible with DIH. With a SolrJ program, you'd have to handle that yourself, but it would be *possible*.

https://cwiki.apache.org/confluence/display/solr/Using+SolrJ

Thanks,
Shawn

Reply via email to