The timestamp thing is not perfect. You can instead do a search against Solr and find the latest timestamp in the index. SOLR-1499 allows you to search against Solr in the DataImportHandler.
On Fri, Jan 21, 2011 at 2:27 AM, btucker <btuc...@mintel.com> wrote: > > Hello > > We've just started using solr to provide search functionality for our > application with the DataImportHandler performing a delta-import every 1 > fired by crontab, which works great, however it does occasionally miss > records that are added to the database while the delta-import is running. > > Our data-config.xml has the following queries in its root entity: > > query="SELECT id, date_published, date_created, publish_flag FROM Item WHERE > id > 0 > > AND record_type_id=0 > > ORDER BY id DESC" > preImportDeleteQuery="SELECT item_id AS Id FROM > gnpd_production.item_deletions" > deletedPkQuery="SELECT item_id AS id FROM gnpd_production.item_deletions > WHERE deletion_date >= > > SUBDATE('${dataimporter.last_index_time}', INTERVAL 5 MINUTE)" > deltaImportQuery="SELECT id, date_published, date_created, publish_flag FROM > Item WHERE id > 0 > > AND record_type_id=0 > > AND id=${dataimporter.delta.id} > > ORDER BY id DESC" > deltaQuery="SELECT id, date_published, date_created, publish_flag FROM Item > WHERE id > 0 > > AND record_type_id=0 > > AND sys_time_stamp >= > > SUBDATE('${dataimporter.last_index_time}', INTERVAL 1 MINUTE) ORDER BY id > DESC"> > > I think the problem i'm having comes from the way solr stores the > last_index_time in conf/dataimport.properties as stated on the wiki as > > ""When delta-import command is executed, it reads the start time stored in > conf/dataimport.properties. It uses that timestamp to run delta queries and > after completion, updates the timestamp in conf/dataimport.properties."" > > Which to me seems to indicate that any records with a time-stamp between > when the dataimport starts and ends will be missed as the last_index_time is > set to when it completes the import. > > This doesn't seem quite right to me. I would have expected the > last_index_time to refer to when the dataimport was last STARTED so that > there was no gaps in the timestamp covered. > > I changed the deltaQuery of our config to include the SUBDATE by INTERVAL 1 > MINUTE statement to alleviate this problem, but it does only cover times > when the delta-import takes less than a minute. > > Any ideas as to how this can be overcome? ,other than increasing the > INTERVAL to something larger. > > Regards > > Barry Tucker > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2300877.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com