Thanks a lot both! (weird, I would have bet wotaskd is designed to survive and detect and potentially even relaunch frozen apps, but well, my memory self-evidently plays ugly tricks on me.)
Anyway... > On 24. 10. 2016, at 6:23 PM, Chuck Hill <ch...@gevityinc.com> wrote: > This should explain the how and why (IIRC) > http://www.podcastchart.com/podcasts/webobjects-podcasts/episodes/wotaskd-internals-bringing-sanity-to-deployment ... I would not like to sound ungrateful, but might perhaps there be somewhere something similar in a written form? Thing is, English is not my native language, and whilst I succeeded to read pretty well, my listening abilities are, alas, quite another story :( > And offer some fixes and improvements. And like most (all?) of my WOWODC > presentations, never fully finished. > As for the hang, the code in the logging does seem like a likely culprit. > Running “sudo jstack –F <process id>” should dump a trace of all threads. > Should… Thanks a lot, I'll ask the site admit to try, if the darned this happens again. All the best, OC > From: <webobjects-dev-bounces+chill=gevityinc....@lists.apple.com> on behalf > of OC <o...@ocs.cz> > Date: Monday, October 24, 2016 at 8:41 AM > To: WebObjects-Dev <webobjects-dev@lists.apple.com> > Subject: which sort of application bugs hang wotaskd? > > Hello there, > > there seems to be one pretty rare, ugly and hard-to find lock in my > application (I shall get back to it at the end, in hope it might ring a > bell), but what's most weird: it seems that when it happens, it's _wotaskd_ > what primarily goes down?!? > > Alas, the information is sparse: it is the deployment site, to where the > programming team has no access (and so far we were not able to repeat the > problem at the test site whatever we try), but due to the site admin and > logs, it looks like > > (a) first, one of the worker threads hangs somehow, so far inexplicably (EC > locking problem possible but improbable, explained below) > (b) for some time, other threads run without a glitch, new reqeusts are > served, new R/R loop worker threads are spawned and logged (I log out all R/R > loops) > (c) shortly (in minutes) though the adaptor begins to redirect requests to > the “Redirection URL” > (d) now, the site admin is alerted; he runs JavaMonitor **which reports > “Failed to contact 127.0.0.1-1085”**! > (e) he finds which process belongs to *the application instance* (*not* the > wotaskd!), and kills it from Terminal > (f) which causes wotaskd to magically cure and JavaMonitor starts working and > stops showing the 1085 fail, allows to re-launch the instance, all is well > and swell. > > Does this perhaps ring a bell? To me this behaviour does not make any sense :/ > > As for the hang itself, it's rather weird too. There is a loop which goes > through a list of EOs; each of them is logged out. Something like this: > > === > for (DBTimeChunk tch in session().currentMarket.orderedTimeChunks()) { > log.info(""+tch) > if (tch.someTimestamp>fixedTimestamp) continue // happens to be > true in our case > ... therefore some irrelevant code here (it would log if it > happened, does not) ... > } > === > > The problem is that > > - this goes through some of the TimeChunks, and _then_ it hangs -- not at the > start of R/R loop, where EC locking problems could be expected > - in the same session, with the same EC, even in the same thread (for the > method which contains the loop happens to be used twice in the page template) > the loop already run through all the TimeChunks and tested their > someTimestamp and ended without a glitch (so, no fault is fired when it hangs) > > So far it happened about thrice; each time on different TimeChunk. > > About the only thing I guess _might_ cause the hang of the thread is the "log > tch". TimeChunk's toString() is comparatively complex, it might call, among > more mundane things, also > - this.changesFromCommittedSnapshot() > - this.attributeKeys() > - this.primaryKey() (of ERXGenericRecord which it inherits) > > Might one of them hang the thread, if another thread does the same/something > other at the wrong moment? (Presumed all of them were already called for the > same EO in the same thread all right shortly ago.) > > If it happens again, it would help if the site admin could, before killing > the application, to force it somehow to log the stacktracks of all its > threads. Is there some trick for that? > > And of course, for any other advice how to hunt for this bloody kind of bug > I'll be extremely grateful. > > Thanks a lot, > OC > > > _______________________________________________ > Do not post admin requests to the list. They will be ignored. > Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) > Help/Unsubscribe/Update your Subscription: > https://lists.apple.com/mailman/options/webobjects-dev/chill%40gevityinc.com > > This email sent to ch...@gevityinc.com _______________________________________________ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com