Hi Florian, What you are seeing is "dynamic crawling" behavior. The time between refetches of a document is based on the history of fetches of that document. The recrawl interval is the initial time between document fetches, but if a document does not change, the interval for the document increases according to a formula.
I would need to look at the code to be able to give you the precise formula, but if you need a limit on the amount of time between document fetch attempts, I suggest you create a ticket and I will look into adding that as a feature. Thanks, Karl On Sat, Jan 4, 2014 at 7:56 AM, Florian Schmedding < [email protected]> wrote: > Hello, > > the parameters reseed interval and recrawl interval of a continuous > crawling job are not quite clear to me. The documentation tells that the > reseed interval is the time after which the seeds are checked again, and > the recrawl interval is the time after which a document is checked for > changes. > > However, we observed that the recrawl interval for a document increases > after each check. On the other hand, the reseed interval seems to be set > up correctly in the database metadata about the seed documents. Yet the > web server does not receive requests at each time the interval elapses but > only after several intervals have elapsed. > > We are using a web connector. The web server does not tell the client to > cache the documents. Any help would be appreciated. > > Best regards, > Florian > > > >
