Hi folks, Please suggest the solution for importing and indexing PDF files *incrementally*. My requirements is to pull the PDF files remotely from the network folder path. This network folder will be having new sets of PDF files after certain intervals (for say 20 secs). The folder will be forced to get empty, every time the new sets of PDF files are copied into it. I do not want to loose the earlier saved index of the old files, while doing the next incremental import.
Currently, i am using Solr 6.6 version for the research. The dataimport handler config is currently like this :- <!--Remote Access--><dataConfig> <dataSource type="BinFileDataSource"/> <document> <entity name="K2FileEntity" processor="FileListEntityProcessor" dataSource="null" recursive = "true" baseDir="\\CLDSINGH02\*RemoteFileDepot*" fileName=".*pdf" rootEntity="false"> <field column="file" name="id"/> <field column="fileSize" name="size" />--> <field column="fileLastModified" name="lastmodified" /> <entity name="pdf" processor="TikaEntityProcessor" onError="skip" url="${K2FileEntity.fileAbsolutePath}" format="text"> <field column="title" name="title" meta="true"/> <field column="dc:format" name="format" meta="true"/> <field column="text" name="text"/> </entity> </entity> </document></dataConfig> Kind regards, Karan Singh