Hi Florian, The only way this can happen is if the proper job termination state sequence does not take place. When MCF checks to see if a job should be started, if it determines that the answer is "no" it updates the job record immediately with a new "last checked" value. But if it starts the job, it waits for the job completion to take place before updating the job's "last checked" time. When a job aborts, at first glance it looks like it also does the right thing, but clearly that's not true, and there must be a bug somewhere in how this condition is handled.
I'll create a ticket to research this. In the interim, I suggest you figure out why your job is aborting in the first place. Thanks, Karl On Tue, Feb 4, 2014 at 11:49 AM, Karl Wright <[email protected]> wrote: > Hi Florian, > > I do not expect errors to appear in the tomcat log. > > But this is interesting: > > Good: > > >>>>>> > DEBUG 2014-02-03 16:00:02,153 (Job start thread) - Checking if job > 1385573203052 needs to be started; it was last checked at 1391439592120, > and now it is 1391439602151 > DEBUG 2014-02-03 16:00:02,153 (Job start thread) - Time match FOUND > within interval 1391439592120 to 1391439602151 > ... > > DEBUG 2014-02-03 16:13:47,105 (Job start thread) - Checking if job > 1385573203052 needs to be started; it was last checked at 1391440412615, > and now it is 1391440427102 > DEBUG 2014-02-03 16:13:47,105 (Job start thread) - No time match found > within interval 1391440412615 to 1391440427102 > <<<<<< > "last checked" time for job is updated. > > Bad: > > >>>>>> > DEBUG 2014-02-03 18:00:04,109 (Job start thread) - Checking if job > 1385573203052 needs to be started; it was last checked at 1391446794075, > and now it is 1391446804106 > DEBUG 2014-02-03 18:00:04,109 (Job start thread) - Time match FOUND > within interval 1391446794075 to 1391446804106 > ... > > DEBUG 2014-02-03 18:14:07,736 (Job start thread) - Checking if job > 1385573203052 needs to be started; it was last checked at 1391446794075, > and now it is 1391447647733 > DEBUG 2014-02-03 18:14:07,736 (Job start thread) - Time match FOUND > within interval 1391446794075 to 1391447647733 > <<<<<< > Note that the "last checked" time is NOT updated. > > I don't understand why, in one case, the "last checked" time is being > updated for the job, and is not in another case. I will look to see if > there is any way in the code that this can happen. > > Karl > > > > On Tue, Feb 4, 2014 at 10:45 AM, Florian Schmedding < > [email protected]> wrote: > >> Hi Karl, >> >> there are no errors in the Tomcat logs. Currently, the Manifold log >> contains only the job log messages (<property >> name="org.apache.manifoldcf.jobs" value="ALL"/>). I include two log >> snippets, one from a normal run, and one where the job got repeated two >> times. I noticed the thread sequence "Finisher - Job reset - Job >> notification" when the job finally terminates, and the thread sequence >> "Finisher - Job notification" when the job gets restarted again instead of >> terminating. >> >> >> DEBUG 2014-02-03 15:59:52,130 (Job start thread) - Checking if job >> 1385573203052 needs to be started; it was last checked at 1391439582108, >> and now it is 1391439592119 >> DEBUG 2014-02-03 15:59:52,131 (Job start thread) - No time match found >> within interval 1391439582108 to 1391439592119 >> DEBUG 2014-02-03 16:00:02,153 (Job start thread) - Checking if job >> 1385573203052 needs to be started; it was last checked at 1391439592120, >> and now it is 1391439602151 >> DEBUG 2014-02-03 16:00:02,153 (Job start thread) - Time match FOUND >> within interval 1391439592120 to 1391439602151 >> DEBUG 2014-02-03 16:00:02,153 (Job start thread) - Job '1385573203052' is >> within run window at 1391439602151 ms. (which starts at 1391439600000 ms.) >> DEBUG 2014-02-03 16:00:02,288 (Job start thread) - Signalled for job start >> for job 1385573203052 >> DEBUG 2014-02-03 16:00:11,319 (Startup thread) - Marked job 1385573203052 >> for startup >> DEBUG 2014-02-03 16:00:12,719 (Startup thread) - Job 1385573203052 is now >> started >> DEBUG 2014-02-03 16:13:30,234 (Finisher thread) - Marked job 1385573203052 >> for shutdown >> DEBUG 2014-02-03 16:13:32,995 (Job reset thread) - Job 1385573203052 now >> completed >> DEBUG 2014-02-03 16:13:37,541 (Job notification thread) - Found job >> 1385573203052 in need of notification >> DEBUG 2014-02-03 16:13:47,105 (Job start thread) - Checking if job >> 1385573203052 needs to be started; it was last checked at 1391440412615, >> and now it is 1391440427102 >> DEBUG 2014-02-03 16:13:47,105 (Job start thread) - No time match found >> within interval 1391440412615 to 1391440427102 >> >> >> DEBUG 2014-02-03 17:59:54,078 (Job start thread) - Checking if job >> 1385573203052 needs to be started; it was last checked at 1391446784053, >> and now it is 1391446794074 >> DEBUG 2014-02-03 17:59:54,078 (Job start thread) - No time match found >> within interval 1391446784053 to 1391446794074 >> DEBUG 2014-02-03 18:00:04,109 (Job start thread) - Checking if job >> 1385573203052 needs to be started; it was last checked at 1391446794075, >> and now it is 1391446804106 >> DEBUG 2014-02-03 18:00:04,109 (Job start thread) - Time match FOUND >> within interval 1391446794075 to 1391446804106 >> DEBUG 2014-02-03 18:00:04,110 (Job start thread) - Job '1385573203052' is >> within run window at 1391446804106 ms. (which starts at 1391446800000 ms.) >> DEBUG 2014-02-03 18:00:04,178 (Job start thread) - Signalled for job start >> for job 1385573203052 >> DEBUG 2014-02-03 18:00:11,710 (Startup thread) - Marked job 1385573203052 >> for startup >> DEBUG 2014-02-03 18:00:13,408 (Startup thread) - Job 1385573203052 is now >> started >> DEBUG 2014-02-03 18:14:04,286 (Finisher thread) - Marked job 1385573203052 >> for shutdown >> DEBUG 2014-02-03 18:14:06,777 (Job notification thread) - Found job >> 1385573203052 in need of notification >> DEBUG 2014-02-03 18:14:07,736 (Job start thread) - Checking if job >> 1385573203052 needs to be started; it was last checked at 1391446794075, >> and now it is 1391447647733 >> DEBUG 2014-02-03 18:14:07,736 (Job start thread) - Time match FOUND >> within interval 1391446794075 to 1391447647733 >> DEBUG 2014-02-03 18:14:07,736 (Job start thread) - Job '1385573203052' is >> within run window at 1391447647733 ms. (which starts at 1391446800000 ms.) >> DEBUG 2014-02-03 18:14:17,744 (Job start thread) - Checking if job >> 1385573203052 needs to be started; it was last checked at 1391446794075, >> and now it is 1391447657740 >> DEBUG 2014-02-03 18:14:17,744 (Job start thread) - Time match FOUND >> within interval 1391446794075 to 1391447657740 >> DEBUG 2014-02-03 18:14:17,744 (Job start thread) - Job '1385573203052' is >> within run window at 1391447657740 ms. (which starts at 1391446800000 ms.) >> DEBUG 2014-02-03 18:14:17,899 (Job start thread) - Signalled for job start >> for job 1385573203052 >> DEBUG 2014-02-03 18:14:26,787 (Startup thread) - Marked job 1385573203052 >> for startup >> DEBUG 2014-02-03 18:14:28,636 (Startup thread) - Job 1385573203052 is now >> started >> DEBUG 2014-02-03 18:27:45,387 (Finisher thread) - Marked job 1385573203052 >> for shutdown >> DEBUG 2014-02-03 18:27:52,737 (Job notification thread) - Found job >> 1385573203052 in need of notification >> DEBUG 2014-02-03 18:27:59,356 (Job start thread) - Checking if job >> 1385573203052 needs to be started; it was last checked at 1391446794075, >> and now it is 1391448479353 >> DEBUG 2014-02-03 18:27:59,358 (Job start thread) - Time match FOUND >> within interval 1391446794075 to 1391448479353 >> DEBUG 2014-02-03 18:27:59,358 (Job start thread) - Job '1385573203052' is >> within run window at 1391448479353 ms. (which starts at 1391446800000 ms.) >> DEBUG 2014-02-03 18:27:59,430 (Job start thread) - Signalled for job start >> for job 1385573203052 >> DEBUG 2014-02-03 18:28:09,309 (Startup thread) - Marked job 1385573203052 >> for startup >> DEBUG 2014-02-03 18:28:10,727 (Startup thread) - Job 1385573203052 is now >> started >> DEBUG 2014-02-03 18:41:18,202 (Finisher thread) - Marked job 1385573203052 >> for shutdown >> DEBUG 2014-02-03 18:41:23,636 (Job reset thread) - Job 1385573203052 now >> completed >> DEBUG 2014-02-03 18:41:25,368 (Job notification thread) - Found job >> 1385573203052 in need of notification >> DEBUG 2014-02-03 18:41:32,403 (Job start thread) - Checking if job >> 1385573203052 needs to be started; it was last checked at 1391449283114, >> and now it is 1391449292400 >> DEBUG 2014-02-03 18:41:32,403 (Job start thread) - No time match found >> within interval 1391449283114 to 1391449292400 >> >> >> Do you need another log output? >> >> Best, >> Florian >> >> > Also, what does the log have to say? If there is an error aborting the >> > job, there should be some record of it in the manifoldcf.log. >> > >> > Thanks, >> > Karl >> > >> > >> > On Tue, Feb 4, 2014 at 6:16 AM, Karl Wright <[email protected]> wrote: >> > >> >> Hi Florian, >> >> >> >> Please run the job manually, when outside the scheduling window or with >> >> the scheduling off. What is the reason for the job abort? >> >> >> >> Karl >> >> >> >> >> >> >> >> On Tue, Feb 4, 2014 at 3:30 AM, Florian Schmedding < >> >> [email protected]> wrote: >> >> >> >>> Hi Karl, >> >>> >> >>> yes, I've coincidentally seen "Aborted" in the end time column when I >> >>> refreshed the job status just after the number of active documents was >> >>> zero. At the next refresh the job was starting up. After looking in >> the >> >>> history I found out that it even started a third time. You can see the >> >>> history of a single day below (job continue, end, start, stop, unwait, >> >>> wait). The start method is "Start at beginning of schedule window". >> Job >> >>> invocation is "complete". Hop count mode is "Delete unreachable >> >>> documents". >> >>> >> >>> 02.03.2014 18:41 job end >> >>> 02.03.2014 18:28 job start >> >>> 02.03.2014 18:14 job start >> >>> 02.03.2014 18:00 job start >> >>> 02.03.2014 17:49 job end >> >>> 02.03.2014 17:27 job end >> >>> 02.03.2014 17:13 job start >> >>> 02.03.2014 17:00 job start >> >>> 02.03.2014 16:13 job end >> >>> 02.03.2014 16:00 job start >> >>> 02.03.2014 15:41 job end >> >>> 02.03.2014 15:27 job start >> >>> 02.03.2014 15:14 job start >> >>> 02.03.2014 15:00 job start >> >>> 02.03.2014 14:13 job end >> >>> 02.03.2014 14:00 job start >> >>> 02.03.2014 13:13 job end >> >>> 02.03.2014 13:00 job start >> >>> 02.03.2014 12:27 job end >> >>> 02.03.2014 12:14 job start >> >>> 02.03.2014 12:00 job start >> >>> 02.03.2014 11:13 job end >> >>> 02.03.2014 11:00 job start >> >>> 02.03.2014 10:13 job end >> >>> 02.03.2014 10:00 job start >> >>> 02.03.2014 09:29 job end >> >>> 02.03.2014 09:14 job start >> >>> 02.03.2014 09:00 job start >> >>> >> >>> Best, >> >>> Florian >> >>> >> >>> >> >>> > Hi Florian, >> >>> > >> >>> > Jobs don't just abort randomly. Are you sure that the job aborted? >> >>> Or >> >>> > did >> >>> > it just restart? >> >>> > >> >>> > As for "is this normal", it depends on how you have created your >> job. >> >>> If >> >>> > you selected the "Start within schedule window" selection, MCF will >> >>> > restart >> >>> > the job whenever it finishes and run it until the end of the >> >>> scheduling >> >>> > window. >> >>> > >> >>> > Karl >> >>> > >> >>> > >> >>> > >> >>> > On Mon, Feb 3, 2014 at 12:24 PM, Florian Schmedding < >> >>> > [email protected]> wrote: >> >>> > >> >>> >> Hi Karl, >> >>> >> >> >>> >> I've just observed that the job was started according to its >> >>> schedule >> >>> >> and >> >>> >> crawled all documents correctly (I've chosen to re-ingest all >> >>> documents >> >>> >> before the run). However, after finishing the last document (zero >> >>> active >> >>> >> documents) it was somehow aborted and restarted immediately. Is >> this >> >>> an >> >>> >> expected behavior? >> >>> >> >> >>> >> Best, >> >>> >> Florian >> >>> >> >> >>> >> >> >>> >> > Hi Florian, >> >>> >> > >> >>> >> > Based on this schedule, your crawls will be able to start >> whenever >> >>> the >> >>> >> > hour >> >>> >> > turns. So they can start every hour on the hour. If the last >> >>> crawl >> >>> >> > crossed an hour boundary, the next crawl will start immediately, >> I >> >>> >> > believe. >> >>> >> > >> >>> >> > Karl >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > On Wed, Jan 15, 2014 at 1:04 PM, Florian Schmedding < >> >>> >> > [email protected]> wrote: >> >>> >> > >> >>> >> >> Hi Karl, >> >>> >> >> >> >>> >> >> these are the values: >> >>> >> >> Priority: 5 Start method: Start at beginning of >> >>> >> schedule >> >>> >> >> window >> >>> >> >> Schedule type: Scan every document once Minimum recrawl >> >>> >> >> interval: >> >>> >> >> Not >> >>> >> >> applicable >> >>> >> >> Expiration interval: Not applicable Reseed interval: >> >>> Not >> >>> >> >> applicable >> >>> >> >> Scheduled time: Any day of week at 12 am 1 am 2 am 3 am >> 4 >> >>> am >> >>> >> 5 >> >>> >> >> am >> >>> >> >> 6 am 7 >> >>> >> >> am 8 am 9 am 10 am 11 am 12 pm 1 pm 2 pm 3 pm 4 pm 5 pm 6 pm 7 >> pm >> >>> 8 >> >>> >> pm 9 >> >>> >> >> pm 10 pm 11 pm >> >>> >> >> Maximum run time: No limit Job invocation: >> >>> >> Complete >> >>> >> >> >> >>> >> >> Maybe it is because I've changed the job from continuous >> crawling >> >>> to >> >>> >> >> this >> >>> >> >> schedule. I started it a few times manually, too. I couldn't >> >>> notice >> >>> >> >> anything strange in the job setup or in the respective entries >> in >> >>> the >> >>> >> >> database. >> >>> >> >> >> >>> >> >> Regards, >> >>> >> >> Florian >> >>> >> >> >> >>> >> >> > Hi Florian, >> >>> >> >> > >> >>> >> >> > I was unable to reproduce the behavior you described. >> >>> >> >> > >> >>> >> >> > Could you view your job, and post a screen shot of that page? >> >>> I >> >>> >> want >> >>> >> >> to >> >>> >> >> > see what your schedule record(s) look like. >> >>> >> >> > >> >>> >> >> > Thanks, >> >>> >> >> > Karl >> >>> >> >> > >> >>> >> >> > >> >>> >> >> > >> >>> >> >> > On Tue, Jan 14, 2014 at 6:09 AM, Karl Wright >> >>> <[email protected]> >> >>> >> >> wrote: >> >>> >> >> > >> >>> >> >> >> Hi Florian, >> >>> >> >> >> >> >>> >> >> >> I've never noted this behavior before. I'll see if I can >> >>> >> reproduce >> >>> >> >> it >> >>> >> >> >> here. >> >>> >> >> >> >> >>> >> >> >> Karl >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> On Tue, Jan 14, 2014 at 5:36 AM, Florian Schmedding < >> >>> >> >> >> [email protected]> wrote: >> >>> >> >> >> >> >>> >> >> >>> Hi Karl, >> >>> >> >> >>> >> >>> >> >> >>> the scheduled job seems to work as expecetd. However, it >> runs >> >>> two >> >>> >> >> >>> times: >> >>> >> >> >>> It starts at the beginning of the scheduled time, finishes, >> >>> and >> >>> >> >> >>> immediately starts again. After finishing the second run it >> >>> waits >> >>> >> >> for >> >>> >> >> >>> the >> >>> >> >> >>> next scheduled time. Why does it run two times? The start >> >>> method >> >>> >> is >> >>> >> >> >>> "Start >> >>> >> >> >>> at beginning of schedule window". >> >>> >> >> >>> >> >>> >> >> >>> Yes, you're right about the checking guarantee. Currently, >> >>> our >> >>> >> >> interval >> >>> >> >> >>> is >> >>> >> >> >>> long enough for a complete crawler run. >> >>> >> >> >>> >> >>> >> >> >>> Best, >> >>> >> >> >>> Florian >> >>> >> >> >>> >> >>> >> >> >>> >> >>> >> >> >>> > Hi Florian, >> >>> >> >> >>> > >> >>> >> >> >>> > It is impossible to *guarantee* that a document will be >> >>> >> checked, >> >>> >> >> >>> because >> >>> >> >> >>> > if >> >>> >> >> >>> > load on the crawler is high enough, it will fall behind. >> >>> But >> >>> I >> >>> >> >> will >> >>> >> >> >>> look >> >>> >> >> >>> > into adding the feature you request. >> >>> >> >> >>> > >> >>> >> >> >>> > Karl >> >>> >> >> >>> > >> >>> >> >> >>> > >> >>> >> >> >>> > On Sun, Jan 5, 2014 at 9:08 AM, Florian Schmedding < >> >>> >> >> >>> > [email protected]> wrote: >> >>> >> >> >>> > >> >>> >> >> >>> >> Hi Karl, >> >>> >> >> >>> >> >> >>> >> >> >>> >> yes, in our case it is necessary to make sure that new >> >>> >> documents >> >>> >> >> are >> >>> >> >> >>> >> discovered and indexed within a certain interval. I have >> >>> >> created >> >>> >> >> a >> >>> >> >> >>> >> feature >> >>> >> >> >>> >> request on that. In the meantime we will try to use a >> >>> >> scheduled >> >>> >> >> job >> >>> >> >> >>> >> instead. >> >>> >> >> >>> >> >> >>> >> >> >>> >> Thanks for your help, >> >>> >> >> >>> >> Florian >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> > Hi Florian, >> >>> >> >> >>> >> > >> >>> >> >> >>> >> > What you are seeing is "dynamic crawling" behavior. >> The >> >>> >> time >> >>> >> >> >>> between >> >>> >> >> >>> >> > refetches of a document is based on the history of >> >>> fetches >> >>> >> of >> >>> >> >> that >> >>> >> >> >>> >> > document. The recrawl interval is the initial time >> >>> between >> >>> >> >> >>> document >> >>> >> >> >>> >> > fetches, but if a document does not change, the >> interval >> >>> for >> >>> >> >> the >> >>> >> >> >>> >> document >> >>> >> >> >>> >> > increases according to a formula. >> >>> >> >> >>> >> > >> >>> >> >> >>> >> > I would need to look at the code to be able to give you >> >>> the >> >>> >> >> >>> precise >> >>> >> >> >>> >> > formula, but if you need a limit on the amount of time >> >>> >> between >> >>> >> >> >>> >> document >> >>> >> >> >>> >> > fetch attempts, I suggest you create a ticket and I >> will >> >>> >> look >> >>> >> >> into >> >>> >> >> >>> >> adding >> >>> >> >> >>> >> > that as a feature. >> >>> >> >> >>> >> > >> >>> >> >> >>> >> > Thanks, >> >>> >> >> >>> >> > Karl >> >>> >> >> >>> >> > >> >>> >> >> >>> >> > >> >>> >> >> >>> >> > >> >>> >> >> >>> >> > On Sat, Jan 4, 2014 at 7:56 AM, Florian Schmedding < >> >>> >> >> >>> >> > [email protected]> wrote: >> >>> >> >> >>> >> > >> >>> >> >> >>> >> >> Hello, >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> the parameters reseed interval and recrawl interval of >> >>> a >> >>> >> >> >>> continuous >> >>> >> >> >>> >> >> crawling job are not quite clear to me. The >> >>> documentation >> >>> >> >> tells >> >>> >> >> >>> that >> >>> >> >> >>> >> the >> >>> >> >> >>> >> >> reseed interval is the time after which the seeds are >> >>> >> checked >> >>> >> >> >>> again, >> >>> >> >> >>> >> and >> >>> >> >> >>> >> >> the recrawl interval is the time after which a >> document >> >>> is >> >>> >> >> >>> checked >> >>> >> >> >>> >> for >> >>> >> >> >>> >> >> changes. >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> However, we observed that the recrawl interval for a >> >>> >> document >> >>> >> >> >>> >> increases >> >>> >> >> >>> >> >> after each check. On the other hand, the reseed >> >>> interval >> >>> >> seems >> >>> >> >> to >> >>> >> >> >>> be >> >>> >> >> >>> >> set >> >>> >> >> >>> >> >> up correctly in the database metadata about the seed >> >>> >> >> documents. >> >>> >> >> >>> Yet >> >>> >> >> >>> >> the >> >>> >> >> >>> >> >> web server does not receive requests at each time the >> >>> >> interval >> >>> >> >> >>> >> elapses >> >>> >> >> >>> >> >> but >> >>> >> >> >>> >> >> only after several intervals have elapsed. >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> We are using a web connector. The web server does not >> >>> tell >> >>> >> the >> >>> >> >> >>> client >> >>> >> >> >>> >> to >> >>> >> >> >>> >> >> cache the documents. Any help would be appreciated. >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> Best regards, >> >>> >> >> >>> >> >> Florian >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> >> >> >>> >> >> >>> >> > >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> > >> >>> >> >> >>> >> >>> >> >> >>> >> >>> >> >> >>> >> >>> >> >> >> >> >>> >> >> > >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> > >> >>> >> >> >>> >> >> >>> >> >> >>> > >> >>> >> >>> >> >>> >> >> >> > >> >> >> >
