Hi All,

I have a Solr requirement to send all the documents imported from a
/dataimport query to go through another update chain as a separate
background process.

Currently I have configured my custom update chain in the /dataimport
handler itself. But since my custom update process need to connect to an
external enhancement engine (Apache Stanbol) to enhance the documents with
some NLP fields, it has a negative impact on /dataimport process.
The solution will be to have a separate update process running to enhance
the content of the documents imported from /dataimport.

Currently I have configured my custom Stanbol Processor as below in my
/dataimport handler.

<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
<str name="update.chain">stanbolInterceptor</str>
</lst>
   </requestHandler>

<updateRequestProcessorChain name="stanbolInterceptor">
<processor
class="com.solr.stanbol.processor.StanbolContentProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>


What I need now is to separate the 2 processes of dataimport and
stanbol-enhancement.
So this is like runing a separate re-indexing process periodically over the
documents imported from /dataimport for Stanbol fields.

The question is how to trigger my Stanbol update process to the documents
imported from /dataimport?
In Solr to trigger /update query we need to know the id and the fields of
the document to be updated. In my case I need to run all the documents
imported from the previous /dataimport process through a stanbol
update.chain.

Is there a way to keep track of the documents ids imported from
/dataimport?
Any advice or pointers will be really helpful.

Thanks,
Dileepa

Reply via email to