Hi,
I'd like to crawl pages of chat logs that change whenever someone sends a message in our chat rooms, which happens every couple of seconds. The HTML log pages are updated instantly by the prosody jabber server and thus have always current timestamps. Nutch seems to reject them now because they are too new: > -shouldFetch rejected > 'http://conference.nr:5290/muc_log/', > fetchTime=1314950217363, curTime=1312358255779 I have two questions: 1. Which timestamp format is that? They don't seem to be unix timestamps, because > $ php -r 'echo date("Y-m-d H:i:s", 1312358255779);' > 43556-12-23 16:56:19 is the wrong year :) 2. What can I do to not get those URLs rejected? I already tried to set > db.fetch.schedule.adaptive.sync_delta to false and > db.fetch.schedule.adaptive.inc_rate > db.fetch.schedule.adaptive.dec_rate to 0, but that does not help. -- Viele Grüße Christian Weiske

