[web2py] Re: web2py book and softcron

mdipierro Wed, 14 Apr 2010 08:39:35 -0700

I would just use the current cron folder if not objections


On Apr 14, 3:16 am, AchipA <attila.cs...@gmail.com> wrote:
> Yeah, fairly easy. One question though about the placement of the
> scripts. Most *nixes do this by placing the split files in a separate
> cron.d directory... Should we commandeer the cron directory for this,
> or rename it to cron.d (while keeping backward compatibility), put a
> cron.d inside cron or... ?
>
> On Apr 13, 8:18 pm, mdipierro <mdipie...@cs.depaul.edu> wrote:
>
> > This an easier one. @Achipa, want to do this?
>
> > On Apr 13, 10:49 am, Thadeus Burgess <thade...@thadeusb.com> wrote:
>
> > > If we are working on cron can I inject a feature request?
>
> > > Support for multiple crontab files in applications/<myapp>/cron
>
> > > crontab, or plugin_hi.crontab can be read. Alpha order ?
>
> > > --
> > > Thadeus
>
> > > On Tue, Apr 13, 2010 at 10:47 AM, AchipA <attila.cs...@gmail.com> wrote:
> > > > Technically, we don't need to nor are we required to skip *exactly*
> > > > one minute. The real question is not 'has 60 seconds passed', but
> > > > 'have we checked cron in this minute'. I would thus check not for a 60
> > > > sec offset, but rather if the minute of cron start matches.
>
> > > > Also in line 51:
>
> > > > s.enter(60 - now % 60, 1, self.launch, ())
>
> > > > it would probably be better to use int(now) to avoid the sleep/time
> > > > functions rounding/timing errors potentially result in a 59.99...
>
> > > > On Apr 13, 3:18 pm, mdipierro <mdipie...@cs.depaul.edu> wrote:
> > > >> I do not like very much the idea of setting ctime/mtime but I do not
> > > >> like the current mechanism either. It is not just the pickle overhead.
> > > >> There seem to be precision issues that make it difficult to skip
> > > >> exactly one minute. For example right now newcron contains:
>
> > > >> if startup or self.now - start > 59.99
>
> > > >> 59.99 should be 60 but that causes cron to skip some calls. 59.99 is
> > > >> OK most of the cases but it may still cause some false positives.
>
> > > >> I do not know have a simple solution to this problem. It seems to me
> > > >> we need to store more info, not less, even start/stop times may not be
> > > >> sufficient information. We may also need to id each schedules call to
> > > >> cron and detect not only whether a cron task is running but which of
> > > >> past scheduled tasks are running.
>
> > > >> Bottom line: the part of cron/newcron that runs every 1 minute need to
> > > >> be redesigned completely in my view.
>
> > > >> Massimo
>
> > > >> On Apr 13, 6:23 am, AchipA <attila.cs...@gmail.com> wrote:
>
> > > >> > Hey, I don't have a problem with that, just saying the lock-and-read
> > > >> > mechanism can increase the overhead/latency significantly in certain
> > > >> > setups. If a two-file approach is a problem, would you consider
> > > >> > avoiding the cPickle read if I can find a way to do it via setting
> > > >> > ctime/mtime (we set a mtime older then ctime, and when we finish, we
> > > >> > set now() as mtime) ?
>
> > > >> > Alternatively, if it turns out that can't be done on all platforms 
> > > >> > and
> > > >> > we only need to check when it started and whether it is running, we
> > > >> > could simplify running checks by file size. When the cron starts, we
> > > >> > create the file with zero size, and when it finishes we just write a
> > > >> > space to it. That way we can check whether it's running or not just 
> > > >> > by
> > > >> > looking at the file size (a lot cheaper that reading/pickling).
>
> > > >> > The bottom line is that I would like to avoid having to do locking 
> > > >> > for
> > > >> > the read-checks (we write only once a minute, but read quite a bit
> > > >> > more as we scale upwards - like on a shared volume or on a busy site
> > > >> > with softcron).
>
> > > >> > On Apr 12, 11:27 pm, mdipierro <mdipie...@cs.depaul.edu> wrote:
>
> > > >> > > On Apr 12, 1:19 pm, AchipA <attila.cs...@gmail.com> wrote:
>
> > > >> > > > Why do we need the time range ? If the tasks are overlapping it's
> > > >> > > > their responsibility to handle that (I know this is arguable, but
> > > >> > > > that's how 'standard' cron works).
>
> > > >> > > This is also as it works in newcron. The problem is that if for any
> > > >> > > reason, the main process that loops and spans the tasks gets 
> > > >> > > stuck, it
> > > >> > > may give rise to a proliferation of processes that may crash the 
> > > >> > > os.
> > > >> > > The current mechanism is similar to the one you originally 
> > > >> > > implemented
> > > >> > > but you used a n additional file to determine if the cron was
> > > >> > > completed. I use the completion date.
>
> > > >> > > > Also, we can easily store two
> > > >> > > > timestamps (slightly hackish, but mtime and ctime can be set
> > > >> > > > separately), would have to check whether that is supported on all
> > > >> > > > platforms. Of course there are many other ways of reading data 
> > > >> > > > without
> > > >> > > > opening files, I'm just pondering about alternatives as the 
> > > >> > > > current
> > > >> > > > locking mechanism causes some problems on my shared-volume based 
> > > >> > > > multi-
> > > >> > > > server setup (that's why I used 'move' originally as it's atomic 
> > > >> > > > and
> > > >> > > > works well with netwok shares).
>
> > > >> > > > On Apr 12, 5:12 pm, mdipierro <mdipie...@cs.depaul.edu> wrote:
>
> > > >> > > > > Because they os timestamp only can only tell you when a task 
> > > >> > > > > has
> > > >> > > > > started (or stopped, depending on when it was created) it does 
> > > >> > > > > not
> > > >> > > > > contain enough information to give you a time range (time and 
> > > >> > > > > stop).
> > > >> > > > > Cron needs to know when the previous crondance started and 
> > > >> > > > > whether is
> > > >> > > > > was completed or not. The original implementation was doing 
> > > >> > > > > the check
> > > >> > > > > using locks and that resulted in a large number of try... 
> > > >> > > > > except...
> > > >> > > > > The current implementation removes most of the try.. except... 
> > > >> > > > > (people
> > > >> > > > > complained about that) and just stores start_time, stop_time
> > > >> > > > > explicitly in a picke.
>
> > > >> > > > > On Apr 12, 8:00 am, AchipA <attila.cs...@gmail.com> wrote:
>
> > > >> > > > > > To correct myself, it seems the cron in web2py no longer 
> > > >> > > > > > uses the
> > > >> > > > > > filesystem timestamps, but cPickles timestamps from/to the 
> > > >> > > > > > lock file.
> > > >> > > > > > I'm not sure why Massimo changed it, but this *is* a bigger 
> > > >> > > > > > overhead
> > > >> > > > > > than it was previously (as it needs to do file locking and
> > > >> > > > > > cPickle.load() on every single request - as opposed to a 
> > > >> > > > > > simple cached
> > > >> > > > > > non-locking filesystem call).
>
> > > >> > > > > > On Apr 1, 8:20 pm, AchipA <attila.cs...@gmail.com> wrote:
>
> > > >> > > > > > > Exactly, hardcron checks once a minute, softcron checks on 
> > > >> > > > > > > each page
> > > >> > > > > > > load. The 'check' is calling a function or two and 
> > > >> > > > > > > comparing a file's
> > > >> > > > > > > timestamp, so not *that* much more expensive.
>
> > > >> > > > > > > On Apr 1, 7:51 pm, Jonathan Lundell <jlund...@pobox.com> 
> > > >> > > > > > > wrote:
>
> > > >> > > > > > > > On Apr 1, 2010, at 10:37 AM, AchipA wrote:
>
> > > >> > > > > > > > > There is some overhead, but efficiency is a disputable 
> > > >> > > > > > > > > term - there is
> > > >> > > > > > > > > certainly more overhead than hardcron, but IMO not in 
> > > >> > > > > > > > > a way that would
> > > >> > > > > > > > > affect overall performance unless you're running it on 
> > > >> > > > > > > > > a site that has
> > > >> > > > > > > > > hundreds of thousands of hits per day...
>
> > > >> > > > > > > > Perhaps we could change (or eliminate) the wording. How 
> > > >> > > > > > > > about simply 'Using softcron'?
>
> > > >> > > > > > > > I'm curious: what is the extra overhead of soft vs 
> > > >> > > > > > > > hardcron? Just that it does a test on each page access? 
> > > >> > > > > > > > I'm guessing that's pretty cheap.
>
> > > >> > > > > > > > > On Apr 1, 5:40 pm, Jonathan Lundell 
> > > >> > > > > > > > > <jlund...@pobox.com> wrote:
> > > >> > > > > > > > >> Section 4.17 (cron) mentions hard vs 
> > > >> > > > > > > > >> softcrondefaults, but doesn't say how to override 
> > > >> > > > > > > > >> them.
>
> > > >> > > > > > > > >> Section 4.1 (cli) doesn't list --softcron
>
> > > >> > > > > > > > >> The startup message for softcronsays: 'Using softcron 
> > > >> > > > > > > > >> (but this is not very efficient)'
>
> > > >> > > > > > > > >> In what sense "not efficient"? I understand that the 
> > > >> > > > > > > > >> timing is less consistent, but is there really more 
> > > >> > > > > > > > >> overhead? softcron seems like a pretty reasonable 
> > > >> > > > > > > > >> choice if all you're doing it deleting expired 
> > > >> > > > > > > > >> sessions.
>
> > > > --
> > > > To unsubscribe, reply using "remove me" as the subject.

[web2py] Re: web2py book and softcron

Reply via email to