Hi all, Any ideas on how to run a reindex update process for all the imported documents from a /dataimport query? Appreciate your help.
Thanks, Dileepa On Thu, Jan 23, 2014 at 12:21 PM, Dileepa Jayakody < dileepajayak...@gmail.com> wrote: > Hi All, > > I did some research on this and found some alternatives useful to my > usecase. Please give your ideas. > > Can I update all documents indexed after a /dataimport query using the > last_indexed_time in dataimport.properties? > If so can anyone please give me some pointers? > What I currently have in mind is something like below; > > 1. Store the indexing timestamp of the document as a field > eg: <field name="timestamp" type="date" indexed="true" stored="true" > default="NOW" > multiValued="false"/> > > 2. Read the last_index_time from the dataimport.properties > > 3. Query all document id's indexed after the last_index_time and send them > through the Stanbol update processor. > > But I have a question here; > Does the last_index_time refer to when the dataimport is > started(onImportStart) or when the dataimport is finished (onImportEnd)? > If it's onImportEnd timestamp, them this solution won't work because the > timestamp indexed in the document field will be : onImportStart< > doc-index-timestamp < onImportEnd. > > > Another alternative I can think of is trigger an update chain via a > EventListener configured to run after a dataimport is processed > (onImportEnd). > In this case can the context in DIH give the list of document ids > processed in the /dataimport request? If so I can send those doc ids with > an /update query to run the Stanbol update process. > > Please give me your ideas and suggestions. > > Thanks, > Dileepa > > > > > On Wed, Jan 22, 2014 at 6:14 PM, Dileepa Jayakody < > dileepajayak...@gmail.com> wrote: > >> Hi All, >> >> I have a Solr requirement to send all the documents imported from a >> /dataimport query to go through another update chain as a separate >> background process. >> >> Currently I have configured my custom update chain in the /dataimport >> handler itself. But since my custom update process need to connect to an >> external enhancement engine (Apache Stanbol) to enhance the documents with >> some NLP fields, it has a negative impact on /dataimport process. >> The solution will be to have a separate update process running to enhance >> the content of the documents imported from /dataimport. >> >> Currently I have configured my custom Stanbol Processor as below in my >> /dataimport handler. >> >> <requestHandler name="/dataimport" class="solr.DataImportHandler"> >> <lst name="defaults"> >> <str name="config">data-config.xml</str> >> <str name="update.chain">stanbolInterceptor</str> >> </lst> >> </requestHandler> >> >> <updateRequestProcessorChain name="stanbolInterceptor"> >> <processor >> class="com.solr.stanbol.processor.StanbolContentProcessorFactory"/> >> <processor class="solr.RunUpdateProcessorFactory" /> >> </updateRequestProcessorChain> >> >> >> What I need now is to separate the 2 processes of dataimport and >> stanbol-enhancement. >> So this is like runing a separate re-indexing process periodically over >> the documents imported from /dataimport for Stanbol fields. >> >> The question is how to trigger my Stanbol update process to the documents >> imported from /dataimport? >> In Solr to trigger /update query we need to know the id and the fields of >> the document to be updated. In my case I need to run all the documents >> imported from the previous /dataimport process through a stanbol >> update.chain. >> >> Is there a way to keep track of the documents ids imported from >> /dataimport? >> Any advice or pointers will be really helpful. >> >> Thanks, >> Dileepa >> > >