Hi Florian, I was unable to reproduce the behavior you described.
Could you view your job, and post a screen shot of that page? I want to see what your schedule record(s) look like. Thanks, Karl On Tue, Jan 14, 2014 at 6:09 AM, Karl Wright <[email protected]> wrote: > Hi Florian, > > I've never noted this behavior before. I'll see if I can reproduce it > here. > > Karl > > > > On Tue, Jan 14, 2014 at 5:36 AM, Florian Schmedding < > [email protected]> wrote: > >> Hi Karl, >> >> the scheduled job seems to work as expecetd. However, it runs two times: >> It starts at the beginning of the scheduled time, finishes, and >> immediately starts again. After finishing the second run it waits for the >> next scheduled time. Why does it run two times? The start method is "Start >> at beginning of schedule window". >> >> Yes, you're right about the checking guarantee. Currently, our interval is >> long enough for a complete crawler run. >> >> Best, >> Florian >> >> >> > Hi Florian, >> > >> > It is impossible to *guarantee* that a document will be checked, because >> > if >> > load on the crawler is high enough, it will fall behind. But I will >> look >> > into adding the feature you request. >> > >> > Karl >> > >> > >> > On Sun, Jan 5, 2014 at 9:08 AM, Florian Schmedding < >> > [email protected]> wrote: >> > >> >> Hi Karl, >> >> >> >> yes, in our case it is necessary to make sure that new documents are >> >> discovered and indexed within a certain interval. I have created a >> >> feature >> >> request on that. In the meantime we will try to use a scheduled job >> >> instead. >> >> >> >> Thanks for your help, >> >> Florian >> >> >> >> >> >> > Hi Florian, >> >> > >> >> > What you are seeing is "dynamic crawling" behavior. The time between >> >> > refetches of a document is based on the history of fetches of that >> >> > document. The recrawl interval is the initial time between document >> >> > fetches, but if a document does not change, the interval for the >> >> document >> >> > increases according to a formula. >> >> > >> >> > I would need to look at the code to be able to give you the precise >> >> > formula, but if you need a limit on the amount of time between >> >> document >> >> > fetch attempts, I suggest you create a ticket and I will look into >> >> adding >> >> > that as a feature. >> >> > >> >> > Thanks, >> >> > Karl >> >> > >> >> > >> >> > >> >> > On Sat, Jan 4, 2014 at 7:56 AM, Florian Schmedding < >> >> > [email protected]> wrote: >> >> > >> >> >> Hello, >> >> >> >> >> >> the parameters reseed interval and recrawl interval of a continuous >> >> >> crawling job are not quite clear to me. The documentation tells that >> >> the >> >> >> reseed interval is the time after which the seeds are checked again, >> >> and >> >> >> the recrawl interval is the time after which a document is checked >> >> for >> >> >> changes. >> >> >> >> >> >> However, we observed that the recrawl interval for a document >> >> increases >> >> >> after each check. On the other hand, the reseed interval seems to be >> >> set >> >> >> up correctly in the database metadata about the seed documents. Yet >> >> the >> >> >> web server does not receive requests at each time the interval >> >> elapses >> >> >> but >> >> >> only after several intervals have elapsed. >> >> >> >> >> >> We are using a web connector. The web server does not tell the >> client >> >> to >> >> >> cache the documents. Any help would be appreciated. >> >> >> >> >> >> Best regards, >> >> >> Florian >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> >
