Re: which sort of application bugs hang wotaskd?

o...@ocs.cz Mon, 24 Oct 2016 09:49:23 -0700

Thanks a lot both!

(weird, I would have bet wotaskd is designed to survive and detect and 
potentially even relaunch frozen apps, but well, my memory self-evidently plays 
ugly tricks on me.)


Anyway...

> On 24. 10. 2016, at 6:23 PM, Chuck Hill <ch...@gevityinc.com> wrote:
> This should explain the how and why (IIRC) 
> http://www.podcastchart.com/podcasts/webobjects-podcasts/episodes/wotaskd-internals-bringing-sanity-to-deployment

... I would not like to sound ungrateful, but might perhaps there be somewhere 
something similar in a written form? Thing is, English is not my native 
language, and whilst I succeeded to read pretty well, my listening abilities 
are, alas, quite another story :(
 
> And offer some fixes and improvements.  And like most (all?) of my WOWODC 
> presentations, never fully finished. 
> As for the hang, the code in the logging does seem like a likely culprit.  
> Running “sudo jstack –F <process id>” should dump a trace of all threads.  
> Should… 

Thanks a lot, I'll ask the site admit to try, if the darned this happens again.

All the best,
OC

> From: <webobjects-dev-bounces+chill=gevityinc....@lists.apple.com> on behalf 
> of OC <o...@ocs.cz>
> Date: Monday, October 24, 2016 at 8:41 AM
> To: WebObjects-Dev <webobjects-dev@lists.apple.com>
> Subject: which sort of application bugs hang wotaskd?
>  
> Hello there,
>  
> there seems to be one pretty rare, ugly and hard-to find lock in my 
> application (I shall get back to it at the end, in hope it might ring a 
> bell), but what's most weird: it seems that when it happens, it's _wotaskd_ 
> what primarily goes down?!?
>  
> Alas, the information is sparse: it is the deployment site, to where the 
> programming team has no access (and so far we were not able to repeat the 
> problem at the test site whatever we try), but due to the site admin and 
> logs, it looks like
>  
> (a) first, one of the worker threads hangs somehow, so far inexplicably (EC 
> locking problem possible but improbable, explained below)
> (b) for some time, other threads run without a glitch, new reqeusts are 
> served, new R/R loop worker threads are spawned and logged (I log out all R/R 
> loops)
> (c) shortly (in minutes) though the adaptor begins to redirect requests to 
> the “Redirection URL”
> (d) now, the site admin is alerted; he runs JavaMonitor **which reports 
> “Failed to contact 127.0.0.1-1085”**!
> (e) he finds which process belongs to *the application instance* (*not* the 
> wotaskd!), and kills it from Terminal
> (f) which causes wotaskd to magically cure and JavaMonitor starts working and 
> stops showing the 1085 fail, allows to re-launch the instance, all is well 
> and swell.
>  
> Does this perhaps ring a bell? To me this behaviour does not make any sense :/
>  
> As for the hang itself, it's rather weird too. There is a loop which goes 
> through a list of EOs; each of them is logged out. Something like this:
>  
> ===
>         for (DBTimeChunk tch in session().currentMarket.orderedTimeChunks()) {
>             log.info(""+tch)
>             if (tch.someTimestamp>fixedTimestamp) continue // happens to be 
> true in our case
>             ... therefore some irrelevant code here (it would log if it 
> happened, does not) ...
>         }
> ===
>  
> The problem is that
>  
> - this goes through some of the TimeChunks, and _then_ it hangs -- not at the 
> start of R/R loop, where EC locking problems could be expected
> - in the same session, with the same EC, even in the same thread (for the 
> method which contains the loop happens to be used twice in the page template) 
> the loop already run through all the TimeChunks and tested their 
> someTimestamp and ended without a glitch (so, no fault is fired when it hangs)
>  
> So far it happened about thrice; each time on different TimeChunk.
>  
> About the only thing I guess _might_ cause the hang of the thread is the "log 
> tch". TimeChunk's toString() is comparatively complex, it might call, among 
> more mundane things, also
> - this.changesFromCommittedSnapshot()
> - this.attributeKeys()
> - this.primaryKey() (of ERXGenericRecord which it inherits)
>  
> Might one of them hang the thread, if another thread does the same/something 
> other at the wrong moment? (Presumed all of them were already called for the 
> same EO in the same thread all right shortly ago.)
>  
> If it happens again, it would help if the site admin could, before killing 
> the application, to force it somehow to log the stacktracks of all its 
> threads. Is there some trick for that?
>  
> And of course, for any other advice how to hunt for this bloody kind of bug 
> I'll be extremely grateful.
>  
> Thanks a lot,
> OC
>  
>  
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Webobjects-dev mailing list      (Webobjects-dev@lists.apple.com)
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/webobjects-dev/chill%40gevityinc.com
>  
> This email sent to ch...@gevityinc.com


 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      (Webobjects-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: which sort of application bugs hang wotaskd?

Reply via email to