Re: Deadlocks, editing context locking and network tasks
Mark, I discover some useful classes in the er.extensions.concurrency package inside ERExtensions. Bases on these my current pattern for background tasks is this: ERXApplication._startRequest(); ec = ERXEC.newEditingContext(parentObjectStore); try { do the job without worrying about locks, auto locking will handle then } finally { ec = null; ERXApplication._endRequest(); } > 1. For a background thread, it is appropriate to create a new editing context > (ERXEC.newEditingContext(Bosc)) using a dedicated and new object store > coordinator (created using osc = new ERXObjectStoreCoordinator()). The new editing context is mandatory, the new OSC depends on your case, as usual there are pros and cons... Pros: - You will use a new connection to the database and if your connections settings and use case allows it (do not create long lock in the database server), you will not block others threads of your app. Cons: - You will not uses the snapshot cache of the main OSC so everything will be fetched, this can represent a large memory duplication and will require more time if most or your data os already cached. - Your changes will NOT be propagated to others EOEditingcontexts, they only propagate inside an OSC. Unless you need to perform long fetch (or update), a separate OSC may is probably not be the most efficient solution. It is really dependant on the type of database access performed by the task. > > 2. For a background thread, all such editing contexts should be lock()’ed and > then unlock()’ed - unlocked in finally {} clause in case of uncaught > exceptions. Automatic locking is only for ECs used within the R-R loop? You can, see the beginning of the message. > 3. But what should one do if, either during a background thread, R-R loop > (direct action or component action), one locks an editing context, does some > processing of objects within that context, makes a network call, and then > does some more processing within that context. Should one simply lock() and > then hope for the best, or unlock, do the network process and then re-lock at > the end. Are there any issues running unlock() if the EC isn’t actually > locked? What happens if that network call never returns? That should not be a problem if your EOEditing context is private but you will not receive the change of the EO from others EOEditingcontexts when you are locked. As other said, you should have some timeout in place and handle them properly. I do not know about too many unlock, I do not expect it to cause problems but I suggest to try, this is easy. > 4. Is locking an EC from a newly created OSC completely independent from all > other OSC ECs? If that lock isn’t released for some time, does it matter? As any lock, all resources used will never be released. This will include the snapshot cache of everything fetched in this EC. Samuel ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Deadlocks, editing context locking and network tasks
Dear René, Thank you. This is really helpful. I hadn’t spotted the screencast and will check it out. Mark > On 5 Sep 2016, at 09:34, René Bockwrote: > > Hi Mark > >> Am 03.09.2016 um 22:36 schrieb Mark Wardle : >> >> Dear all, >> >> I’m debugging a deadlock and realise that I probably need to re-design some >> of my code logic. >> >> Am I right in saying… >> >> 1. For a background thread, it is appropriate to create a new editing >> context (ERXEC.newEditingContext(osc)) using a dedicated and new object >> store coordinator (created using osc = new ERXObjectStoreCoordinator()). > > you should do that. > >> >> 2. For a background thread, all such editing contexts should be lock()’ed >> and then unlock()’ed - unlocked in finally {} clause in case of uncaught >> exceptions. Automatic locking is only for ECs used within the R-R loop? > > yes > >> >> 3. But what should one do if, either during a background thread, R-R loop >> (direct action or component action), one locks an editing context, does some >> processing of objects within that context, makes a network call, and then >> does some more processing within that context. Should one simply lock() and >> then hope for the best, or unlock, do the network process and then re-lock >> at the end. Are there any issues running unlock() if the EC isn’t actually >> locked? What happens if that network call never returns? > > You should handle network time-outs ;-) How long may the remote call may > take? Seconds, minutes or hours? If you have many background tasks waiting > network I/O, you may run out of OSCs or memory.. > > >> >> 4. Is locking an EC from a newly created OSC completely independent from all >> other OSC ECs? > > If you lock en EC, the other OSC (and theire ECs) are not affected > >> If that lock isn’t released for some time, does it matter? > > see above. > >> >> All advice appreciated, > > > By the way: there is a very helpful screencast on wocummunity: > > http://www.wocommunity.org/podcasts/wowodc/2011/BackgroundTasks.mov > > > Best regards > > René Bock > > -- > Phone: +49 69 650096 18 > > salient GmbH, Lindleystraße 12, 60314 Frankfurt > Main: +49 69 65 00 96 0 | http://www.salient-doremus.de > ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Deadlocks, editing context locking and network tasks
Hi Mark > Am 03.09.2016 um 22:36 schrieb Mark Wardle: > > Dear all, > > I’m debugging a deadlock and realise that I probably need to re-design some > of my code logic. > > Am I right in saying… > > 1. For a background thread, it is appropriate to create a new editing context > (ERXEC.newEditingContext(osc)) using a dedicated and new object store > coordinator (created using osc = new ERXObjectStoreCoordinator()). you should do that. > > 2. For a background thread, all such editing contexts should be lock()’ed and > then unlock()’ed - unlocked in finally {} clause in case of uncaught > exceptions. Automatic locking is only for ECs used within the R-R loop? yes > > 3. But what should one do if, either during a background thread, R-R loop > (direct action or component action), one locks an editing context, does some > processing of objects within that context, makes a network call, and then > does some more processing within that context. Should one simply lock() and > then hope for the best, or unlock, do the network process and then re-lock at > the end. Are there any issues running unlock() if the EC isn’t actually > locked? What happens if that network call never returns? You should handle network time-outs ;-) How long may the remote call may take? Seconds, minutes or hours? If you have many background tasks waiting network I/O, you may run out of OSCs or memory.. > > 4. Is locking an EC from a newly created OSC completely independent from all > other OSC ECs? If you lock en EC, the other OSC (and theire ECs) are not affected > If that lock isn’t released for some time, does it matter? see above. > > All advice appreciated, By the way: there is a very helpful screencast on wocummunity: http://www.wocommunity.org/podcasts/wowodc/2011/BackgroundTasks.mov Best regards René Bock -- Phone: +49 69 650096 18 salient GmbH, Lindleystraße 12, 60314 Frankfurt Main: +49 69 65 00 96 0 | http://www.salient-doremus.de ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Deadlocks, editing context locking and network tasks
Dear all, I’m debugging a deadlock and realise that I probably need to re-design some of my code logic. Am I right in saying… 1. For a background thread, it is appropriate to create a new editing context (ERXEC.newEditingContext(osc)) using a dedicated and new object store coordinator (created using osc = new ERXObjectStoreCoordinator()). 2. For a background thread, all such editing contexts should be lock()’ed and then unlock()’ed - unlocked in finally {} clause in case of uncaught exceptions. Automatic locking is only for ECs used within the R-R loop? 3. But what should one do if, either during a background thread, R-R loop (direct action or component action), one locks an editing context, does some processing of objects within that context, makes a network call, and then does some more processing within that context. Should one simply lock() and then hope for the best, or unlock, do the network process and then re-lock at the end. Are there any issues running unlock() if the EC isn’t actually locked? What happens if that network call never returns? 4. Is locking an EC from a newly created OSC completely independent from all other OSC ECs? If that lock isn’t released for some time, does it matter? All advice appreciated, Mark PS. Using Wonder, using safeLock ERXEC flag in application properties. ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WebObjects application instances hanging - Deadlocks occurring
Hi Raghu, as far as I can see, you have a log of lock/unlock operations with the EditingContext IDs and your original deadlock log with blocked thread names. But it is not easy to correlate those two logs, as the deadlock log is missing object IDs and the lock/unlock log has no thread names. I would expect there to be one more lock than unlock in the log, but it is tedious to find the right one. There is the property er.extensions.ERXEC.markOpenLocks that may help, if you can get it to work. When the deadlock occurs the direct action ERXDirectAction/showOpenEditingContextLockTraces should show you a more complete picture of currently open locks and where the offending editing context was created. Kind regards, Ralf Am 20.11.2014 um 15:23 schrieb Raghavender Bokka raghavender.bo...@prithvisolutions.com: Hi Team, The following are the exceptions generating in the log files when we enable the ERX logging, and we do not have any code in the Session.sleep method. And some of our WebObjects application instances are hanging when some user load (around 1000 users) are testing, when we look into the java process thread dump there are deadlocks occurring. --- --- Exception at er.extensions.eof.ERXEC.unlock(ERXEC.java:501) at com.webobjects.eocontrol.EOEditingContext._sendOrEnqueueNotification(EOEditingContext.java:4721) at com.webobjects.eocontrol.EOEditingContext._objectsChangedInStore(EOEditingContext.java:3562) at er.extensions.eof.ERXEC._objectsChangedInStore(ERXEC.java:1285) ... skipped 7 stack elements at com.webobjects.eocontrol.EOObjectStoreCoordinator._objectsChangedInSubStore(EOObjectStoreCoordinator.java:693) ... skipped 16 stack elements at com.webobjects.eocontrol.EOObjectStoreCoordinator.saveChangesInEditingContext(EOObjectStoreCoordinator.java:386) at com.webobjects.eocontrol.EOEditingContext.saveChanges(EOEditingContext.java:3192) at er.extensions.eof.ERXEC._saveChanges(ERXEC.java:981) at er.extensions.eof.ERXEC.saveChanges(ERXEC.java:903) at TestTakingMode$StudentTestSessionMode.testSubmitted(TestTakingMode.java:648) at ReviewTestResponsePage.submitTest(ReviewTestResponsePage.java:99) ... skipped 4 stack elements at KeyValueCodingProtectedAccessor.methodValue(KeyValueCodingProtectedAccessor.java:60) ... skipped 46 stack elements at Application.dispatchRequest(Application.java:670) Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.appserver.ERXSession - Will terminate, sessionId is FkDsWpsOxKy1TDaligNLDg Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.appserver.ERXBrowserFactory - _incrementReferenceCounterForKey() - count = 26, key = IE.7.0.4.0.Windows.{cpu = Unknown CPU; geckoRevision = No Gecko; } Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.eof.ERXEC - After popping: [er.extensions.eof.ERXEC@dd151f] Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.ERXEC.LockLogger - unlocked er.extensions.eof.ERXEC@13cd5b5 Exception at er.extensions.eof.ERXEC.unlock(ERXEC.java:501) at com.webobjects.appserver.WOSession._sleepInContext(WOSession.java:849) at com.webobjects.appserver.WOApplication.saveSessionForContext(WOApplication.java:1883) at er.extensions.appserver.ERXApplication.saveSessionForContext(ERXApplication.java:2075) ... skipped 6 stack elements at Application.dispatchRequest(Application.java:670) ... skipped 3 stack elements Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.ERXEC.LockLogger - locked er.extensions.eof.ERXEC@13cd5b5 Exception at er.extensions.eof.ERXEC.lock(ERXEC.java:483) at com.webobjects.eocontrol.EOEditingContext._dispose(EOEditingContext.java:1116) at com.webobjects.eocontrol.EOEditingContext.dispose(EOEditingContext.java:) at er.extensions.eof.ERXEC.dispose(ERXEC.java:610) at com.webobjects.appserver.WOSession._sleepInContext(WOSession.java:854) at com.webobjects.appserver.WOApplication.saveSessionForContext(WOApplication.java:1883) at er.extensions.appserver.ERXApplication.saveSessionForContext(ERXApplication.java:2075) ... skipped 6 stack elements at Application.dispatchRequest(Application.java:670) ... skipped 3 stack elements Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.eof.ERXEC - After pushing: [er.extensions.eof.ERXEC@dd151f, er.extensions.eof.ERXEC@13cd5b5] Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.eof.ERXEC - After popping: [er.extensions.eof.ERXEC@dd151f] Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.ERXEC.LockLogger - unlocked er.extensions.eof.ERXEC@13cd5b5 Exception at er.extensions.eof.ERXEC.unlock(ERXEC.java:501) at com.webobjects.eocontrol.EOEditingContext._dispose(EOEditingContext.java:1218) at com.webobjects.eocontrol.EOEditingContext.dispose(EOEditingContext.java:) at er.extensions.eof.ERXEC.dispose(ERXEC.java:610) at com.webobjects.appserver.WOSession
WebObjects application instances hanging - Deadlocks occurring
Hi Team, The following are the exceptions generating in the log files when we enable the ERX logging, and we do not have any code in the Session.sleep method. And some of our WebObjects application instances are hanging when some user load (around 1000 users) are testing, when we look into the java process thread dump there are deadlocks occurring. --- --- Exception at er.extensions.eof.ERXEC.unlock(ERXEC.java:501) at com.webobjects.eocontrol.EOEditingContext._sendOrEnqueueNotification(EOEditingContext.java:4721) at com.webobjects.eocontrol.EOEditingContext._objectsChangedInStore(EOEditingContext.java:3562) at er.extensions.eof.ERXEC._objectsChangedInStore(ERXEC.java:1285) ... skipped 7 stack elements at com.webobjects.eocontrol.EOObjectStoreCoordinator._objectsChangedInSubStore(EOObjectStoreCoordinator.java:693) ... skipped 16 stack elements at com.webobjects.eocontrol.EOObjectStoreCoordinator.saveChangesInEditingContext(EOObjectStoreCoordinator.java:386) at com.webobjects.eocontrol.EOEditingContext.saveChanges(EOEditingContext.java:3192) at er.extensions.eof.ERXEC._saveChanges(ERXEC.java:981) at er.extensions.eof.ERXEC.saveChanges(ERXEC.java:903) at TestTakingMode$StudentTestSessionMode.testSubmitted(TestTakingMode.java:648) at ReviewTestResponsePage.submitTest(ReviewTestResponsePage.java:99) ... skipped 4 stack elements at KeyValueCodingProtectedAccessor.methodValue(KeyValueCodingProtectedAccessor.java:60) ... skipped 46 stack elements at Application.dispatchRequest(Application.java:670) Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.appserver.ERXSession - Will terminate, sessionId is FkDsWpsOxKy1TDaligNLDg Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.appserver.ERXBrowserFactory - _incrementReferenceCounterForKey() - count = 26, key = IE.7.0.4.0.Windows.{cpu = Unknown CPU; geckoRevision = No Gecko; } Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.eof.ERXEC - After popping: [er.extensions.eof.ERXEC@dd151f] Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.ERXEC.LockLogger - unlocked er.extensions.eof.ERXEC@13cd5b5 Exception at er.extensions.eof.ERXEC.unlock(ERXEC.java:501) at com.webobjects.appserver.WOSession._sleepInContext(WOSession.java:849) at com.webobjects.appserver.WOApplication.saveSessionForContext(WOApplication.java:1883) at er.extensions.appserver.ERXApplication.saveSessionForContext(ERXApplication.java:2075) ... skipped 6 stack elements at Application.dispatchRequest(Application.java:670) ... skipped 3 stack elements Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.ERXEC.LockLogger - locked er.extensions.eof.ERXEC@13cd5b5 Exception at er.extensions.eof.ERXEC.lock(ERXEC.java:483) at com.webobjects.eocontrol.EOEditingContext._dispose(EOEditingContext.java:1116) at com.webobjects.eocontrol.EOEditingContext.dispose(EOEditingContext.java:) at er.extensions.eof.ERXEC.dispose(ERXEC.java:610) at com.webobjects.appserver.WOSession._sleepInContext(WOSession.java:854) at com.webobjects.appserver.WOApplication.saveSessionForContext(WOApplication.java:1883) at er.extensions.appserver.ERXApplication.saveSessionForContext(ERXApplication.java:2075) ... skipped 6 stack elements at Application.dispatchRequest(Application.java:670) ... skipped 3 stack elements Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.eof.ERXEC - After pushing: [er.extensions.eof.ERXEC@dd151f, er.extensions.eof.ERXEC@13cd5b5] Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.eof.ERXEC - After popping: [er.extensions.eof.ERXEC@dd151f] Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.ERXEC.LockLogger - unlocked er.extensions.eof.ERXEC@13cd5b5 Exception at er.extensions.eof.ERXEC.unlock(ERXEC.java:501) at com.webobjects.eocontrol.EOEditingContext._dispose(EOEditingContext.java:1218) at com.webobjects.eocontrol.EOEditingContext.dispose(EOEditingContext.java:) at er.extensions.eof.ERXEC.dispose(ERXEC.java:610) at com.webobjects.appserver.WOSession._sleepInContext(WOSession.java:854) at com.webobjects.appserver.WOApplication.saveSessionForContext(WOApplication.java:1883) at er.extensions.appserver.ERXApplication.saveSessionForContext(ERXApplication.java:2075) ... skipped 6 stack elements at Application.dispatchRequest(Application.java:670) ... skipped 3 stack elements Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.eof.ERXEC - After pushing: [er.extensions.eof.ERXEC@dd151f, er.extensions.eof.ERXEC@dd151f] Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.eof.ERXEC - After popping: [er.extensions.eof.ERXEC@dd151f] Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.appserver.ERXBrowserFactory - _decrementReferenceCounterForKey() - count = 25, key = IE.7.0.4.0.Windows.{cpu = Unknown CPU; geckoRevision = No Gecko; } Nov 17 22:22:01 Solar[6009] DEBUG er.extensions.appserver.ERXBrowserFactory
Re: WebObjects application instances hanging - Deadlocks occurring
._private.WODynamicGroup.invokeChildrenAction(WODynamicGroup.java:105) at com.webobjects.appserver._private.WODynamicGroup.invokeAction(WODynamicGroup.java:115) at com.webobjects.appserver.WOComponent.invokeAction(WOComponent.java:1079) at er.extensions.components.ERXComponent.invokeAction(ERXComponent.java:92) at com.webobjects.appserver.WOSession.invokeAction(WOSession.java:1357) at Session.invokeAction(Session.java:191) at com.webobjects.appserver.WOApplication.invokeAction(WOApplication.java:1745) at er.extensions.appserver.ajax.ERXAjaxApplication.invokeAction(ERXAjaxApplication.java:50) at er.extensions.appserver.ERXApplication.invokeAction(ERXApplication.java:1687) at com.webobjects.appserver._private.WOComponentRequestHandler._dispatchWithPreparedPage(WOComponentRequestHandler.java:206) at com.webobjects.appserver._private.WOComponentRequestHandler._dispatchWithPreparedSession(WOComponentRequestHandler.java:298) at com.webobjects.appserver._private.WOComponentRequestHandler._dispatchWithPreparedApplication(WOComponentRequestHandler.java:332) at com.webobjects.appserver._private.WOComponentRequestHandler._handleRequest(WOComponentRequestHandler.java:369) at com.webobjects.appserver._private.WOComponentRequestHandler.handleRequest(WOComponentRequestHandler.java:442) at com.webobjects.appserver.WOApplication.dispatchRequest(WOApplication.java:1687) at er.extensions.appserver.ERXApplication.dispatchRequestImmediately(ERXApplication.java:1802) at er.extensions.appserver.ERXApplication.dispatchRequest(ERXApplication.java:1767) at Application.dispatchRequest(Application.java:653) at com.webobjects.appserver._private.WOWorkerThread.runOnce(WOWorkerThread.java:144) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:226) at java.lang.Thread.run(Thread.java:619) Nov 17 22:20:48 Solar[6009] DEBUG er.extensions.ERXEC.LockLogger - locked er.extensions.eof.ERXEC@13cd5b5 Exception at er.extensions.eof.ERXEC.lock(ERXEC.java:483) at er.extensions.eof.ERXEC$DefaultFactory._newEditingContext(ERXEC.java:1465) at er.extensions.eof.ERXEC$DefaultFactory._newEditingContext(ERXEC.java:1434) at er.extensions.eof.ERXEC.newEditingContext(ERXEC.java:1540) at er.extensions.appserver.ERXSession.defaultEditingContext(ERXSession.java:353) at Session.setLoginUser(Session.java:106) at Main.login(Main.java:185) at Main.login(Main.java:120) ... skipped 4 stack elements at KeyValueCodingProtectedAccessor.methodValue(KeyValueCodingProtectedAccessor.java:60) ... skipped 46 stack elements at Application.dispatchRequest(Application.java:653) ... skipped 3 stack elements Nov 17 22:20:48 Solar[6009] DEBUG er.extensions.eof.ERXEC - After pushing: [er.extensions.eof.ERXEC@5971c3, er.extensions.eof.ERXEC@13cd5b5] Nov 17 22:20:48 Solar[6009] DEBUG er.extensions.eof.ERXEC - After popping: [er.extensions.eof.ERXEC@5971c3] Nov 17 22:20:48 Solar[6009] DEBUG er.extensions.ERXEC.LockLogger - unlocked er.extensions.eof.ERXEC@13cd5b5 Exception at er.extensions.eof.ERXEC.unlock(ERXEC.java:501) at er.extensions.eof.ERXEC$DefaultFactory._newEditingContext(ERXEC.java:1467) at er.extensions.eof.ERXEC$DefaultFactory._newEditingContext(ERXEC.java:1434) at er.extensions.eof.ERXEC.newEditingContext(ERXEC.java:1540) at er.extensions.appserver.ERXSession.defaultEditingContext(ERXSession.java:353) at Session.setLoginUser(Session.java:106) at Main.login(Main.java:185) at Main.login(Main.java:120) ... skipped 4 stack elements at KeyValueCodingProtectedAccessor.methodValue(KeyValueCodingProtectedAccessor.java:60) ... skipped 46 stack elements at Application.dispatchRequest(Application.java:653) ... skipped 3 stack elements --- --- Any help would be appreciated. Regards, Raghu. On 17-Nov-2014, at 11:41 PM, webobjects-dev-requ...@lists.apple.com wrote: Send Webobjects-dev mailing list submissions to webobjects-dev@lists.apple.com To subscribe or unsubscribe via the World Wide Web, visit https://lists.apple.com/mailman/listinfo/webobjects-dev or, via email, send a message with subject or body 'help' to webobjects-dev-requ...@lists.apple.com You can reach the person managing the list at webobjects-dev-ow...@lists.apple.com When replying, please edit your Subject line so it is more specific than Re: Contents of Webobjects-dev digest... Today's Topics: 1. Re: WebObjects application instances hanging - Deadlocks occurring (Ralf Schuchardt) 2. Re: WOWODC 2015 - April 25, 26 and 27 2015 (CHRISTOPH WICK | i4innovation GmbH, Bonn) 3. Re: WOCommunity maven repository down? (Henrique Prange) 4. Re: WOWODC 2015
Re: WebObjects application instances hanging - Deadlocks occurring
Hi, Am 17.11.2014 um 13:33 schrieb Raghavender Bokka raghavender.bo...@prithvisolutions.com: Hi Team, Some of our WebObjects application instances are hanging when some user load (around 1000 users) are testing, when we look into the java process thread dump there are deadlocks occurring. The following is the thread dump: [...] WorkerThread24 prio=3 tid=0x00e42800 nid=0x31 waiting on condition [0xd49fe000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0xdc3837c8 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) at com.webobjects.eocontrol.EOEditingContext.lock(EOEditingContext.java:4617) at er.extensions.eof.ERXEC.lock(ERXEC.java:480) at com.webobjects.appserver.WOSession._awakeInContext(WOSession.java:835) at com.webobjects.appserver.WOApplication.restoreSessionWithID(WOApplication.java:1917) at er.extensions.appserver.ERXApplication.restoreSessionWithID(ERXApplication.java:2093) at com.webobjects.appserver._private.WOComponentRequestHandler._dispatchWithPreparedApplication(WOComponentRequestHandler.java:324) at com.webobjects.appserver._private.WOComponentRequestHandler._handleRequest(WOComponentRequestHandler.java:369) at com.webobjects.appserver._private.WOComponentRequestHandler.handleRequest(WOComponentRequestHandler.java:442) - locked 0xdbc631d0 (a java.lang.Object) at com.webobjects.appserver.WOApplication.dispatchRequest(WOApplication.java:1687) at er.extensions.appserver.ERXApplication.dispatchRequestImmediately(ERXApplication.java:1802) at er.extensions.appserver.ERXApplication.dispatchRequest(ERXApplication.java:1767) at Application.dispatchRequest(Application.java:670) at com.webobjects.appserver._private.WOWorkerThread.runOnce(WOWorkerThread.java:144) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:226) at java.lang.Thread.run(Thread.java:619) This stack trace seems to indicate, that the defaultEditingContext was not unlocked in the previous request. Do you see an exception prior to the deadlock? If you have code in a Session.sleep() method, make sure to catch all exceptions there. Ralf ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
RE: WOWorkerThread deadlocks
Hi guys, I know I'm a little late on this, but I'm also seeing the same behavior. It's not a long running query I don't think because I'm logging long queries in postgres and nothing is running over 10 seconds. Can you explain why having a max of 256 worker threads is too high? Any other things I should look at? The customers are not happy! My last problem did turn out to be a bunch of deadlocks, which all now seem to be resolved. It had to with setting er.extensions.ERXObjectStoreCoordinatorPool.maxCoordinators=4 which should be seamless (you would think) but causes issues with fetch specs that have EOs crossing OSCs. I had to pull all EOs local, seems like something that should be handled inside wonder automatically (so I consider it a bug, whether it is or not could be argued I guess). Anyway, after those all got fixed, I'm now running into this. Much harder to figure out since I don't even know what the lock is held on. BTW Chuck and Quinton, I owe you guys a beer. Thanks for pointing me in the right direction on the last problem. Thanks for any help. -Mike -Original Message- From: webobjects-dev-bounces+mgargano=escholar@lists.apple.com [mailto:webobjects-dev-bounces+mgargano=escholar@lists.apple.com] On Behalf Of Chuck Hill Sent: Monday, September 10, 2012 1:24 PM To: Maik Musall Cc: webobjects-dev@lists.apple.com WebObjects Subject: Re: WOWorkerThread deadlocks Hi Maik, WorkerThread207 that many worker threads indicates two things to me: 1. Your app configuration is too high. I'd use a max of 6-10 and a listen queue size of around 4 (adjusted to suit your specific needs). A WO app is very, very unlikely to recover from a 200 worker thread backlog in any way that is useful to the users 2. You have a thread that is taking a long time to return a result. If you are dispatching requests concurrently, then this is most likely stuck in EOControl/EOAccess (e.g. waiting for a slow query result) or connecting to some external process. You could also have a deadlock. If you are not dispatching requests concurrently, then this delay could be in other code. The traces below do not show the problem. If you want to send a full dump, I am willing to look at it. It is possible that the problem had resolved by the time you took this dump. What you show below is normal for a lot of worker threads. WorkerThread206 is waiting for a new request, WorkerThread207 is idle waiting for something to do in the future. Chuck On 2012-09-10, at 8:03 AM, Maik Musall wrote: Hi, in an app with high concurrency, the app sometimes becomes unresponsive to everything but DirectActions at the time of day with the most concurrency. All users aren't seeing responses any more. In jstack I see hundreds of these: WorkerThread207 prio=5 tid=131e0a800 nid=0x151aa2000 waiting for monitor entry [151aa1000] java.lang.Thread.State: BLOCKED (on object monitor) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:406) - waiting to lock 20d3da450 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:210) at java.lang.Thread.run(Thread.java:680) all waiting on the same lock 20d3da450, and one thread holding that lock: WorkerThread206 prio=5 tid=131d79800 nid=0x15199f000 runnable [15199e000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) - locked 20d3da450 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:210) at java.lang.Thread.run(Thread.java:680) Anyone familiar with this problem? Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/chill%40global-village.net This email sent to ch...@global-village.net -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/mgargano%40escholar.com This email sent to mgarg
Re: WOWorkerThread deadlocks
On Jan 15, 2013, at 2:54 PM, Chuck Hill wrote: On 2013-01-15, at 10:50 AM, Michael Gargano wrote: Hi guys, I know I'm a little late on this, but I'm also seeing the same behavior. It's not a long running query I don't think because I'm logging long queries in postgres and nothing is running over 10 seconds. Can you explain why having a max of 256 worker threads is too high? http://osdir.com/ml/web.webobjects.admin/2005-02/msg6.html Keep in mind that you have 256 threads all trying to do something that usually sooner or later needs a single threaded EOF lock. That is just not going to make for happy users. Thanks. I'll take a look at this. Any other things I should look at? The customers are not happy! Cut down the number of worker threads and the listen queue size. It won't fix the problem but at least (a) you will see it sooner and (b) the app can recover. My last problem did turn out to be a bunch of deadlocks, which all now seem to be resolved. It had to with setting er.extensions.ERXObjectStoreCoordinatorPool.maxCoordinators=4 which should be seamless (you would think) but causes issues with fetch specs that have EOs crossing OSCs. Why on earth would an EO ever cross an OSC? They don't even cross ECs. a page creates a new EC gets an EO... that EO is passed around, is on another (or the same) page where another EC is created, when a fetchSpec is run against the new EC, but the other EO is used as part of the fetchSpec those ECs can be associated with two different OSCs, the new EC just created and the EC associated with the EO we already have a reference to. once i called localInstance on every EO being used like that all the deadlocks went away. I had to pull all EOs local, seems like something that should be handled inside wonder automatically (so I consider it a bug, whether it is or not could be argued I guess). Anyway, after those all got fixed, I'm now running into this. Much harder to figure out since I don't even know what the lock is held on. sudo jstack -F process id will show you if it is a deadlock. Otherwise it is likely bad exception handling that results in your code doing a lock() and never doing an unlock() no deadlocks are being detected and i don't see any either. i see the same thing Maik saw, all the worker threads are waiting on a lock held by one worker thread which is in a run state and awaiting a socket accept. I did searches across all the code and there is no manual locking anywhere, everything is through the autolocking of wonder. Chuck BTW Chuck and Quinton, I owe you guys a beer. Thanks for pointing me in the right direction on the last problem. Thanks for any help. -Mike -Original Message- From: webobjects-dev-bounces+mgargano=escholar@lists.apple.com [mailto:webobjects-dev-bounces+mgargano=escholar@lists.apple.com] On Behalf Of Chuck Hill Sent: Monday, September 10, 2012 1:24 PM To: Maik Musall Cc: webobjects-dev@lists.apple.com WebObjects Subject: Re: WOWorkerThread deadlocks Hi Maik, WorkerThread207 that many worker threads indicates two things to me: 1. Your app configuration is too high. I'd use a max of 6-10 and a listen queue size of around 4 (adjusted to suit your specific needs). A WO app is very, very unlikely to recover from a 200 worker thread backlog in any way that is useful to the users 2. You have a thread that is taking a long time to return a result. If you are dispatching requests concurrently, then this is most likely stuck in EOControl/EOAccess (e.g. waiting for a slow query result) or connecting to some external process. You could also have a deadlock. If you are not dispatching requests concurrently, then this delay could be in other code. The traces below do not show the problem. If you want to send a full dump, I am willing to look at it. It is possible that the problem had resolved by the time you took this dump. What you show below is normal for a lot of worker threads. WorkerThread206 is waiting for a new request, WorkerThread207 is idle waiting for something to do in the future. Chuck On 2012-09-10, at 8:03 AM, Maik Musall wrote: Hi, in an app with high concurrency, the app sometimes becomes unresponsive to everything but DirectActions at the time of day with the most concurrency. All users aren't seeing responses any more. In jstack I see hundreds of these: WorkerThread207 prio=5 tid=131e0a800 nid=0x151aa2000 waiting for monitor entry [151aa1000] java.lang.Thread.State: BLOCKED (on object monitor) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:406) - waiting to lock 20d3da450 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at com.webobjects.appserver._private.WOWorkerThread.run
Re: WOSessionStore deadlocks - SOLVED
Hi Chuck, a follow-up on this: Am 19.10.2012 um 20:05 schrieb Chuck Hill ch...@global-village.net: Hi Maik, This can also indicate some other things too: - session did not get checked in (app threw OutOfMemory, sleep() threw an exception) - previous request for this session is still running (deadlock, waiting, infinite loop) - 2+ requests for the same session in rapid sequence where the first terminates the session Looks like my answer that OutOfMemory would be OutOfTheQuestion was not true. I now discovered what lead to my application hanging every afternoon, after *once* it finally cared to log a proper message before hanging: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: PermGen space Doh, the PermGen. I totally forgot about that. I had the app at -Xmx24576m, but didn't adjust PermGen. Now with a PermGen limit of 512m (of which currently about 154m gets used max according to jvisualvm) everything is finally running smoothly. The app turns out to load about 12000 classes over a workday. I think I need to have a look at what those are sometime... Maik Chuck On 2012-10-19, at 4:00 AM, Maik Musall wrote: Hi, I recently discovered what may be responsible for frequent deadlocks of an application here. In the jstack -l output, I see almost all threads waiting on a single ReentrantLock, and this thread is what holds that lock: WorkerThread4 prio=5 tid=103bc9000 nid=0x132caf000 in Object.wait() [132cae000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 22711d098 (a com.webobjects.appserver.WOSessionStore$TimeoutEntry) at java.lang.Object.wait(Object.java:485) at com.webobjects.appserver.WOSessionStore.checkOutSessionWithID(WOSessionStore.java:191) - locked 22711d098 (a com.webobjects.appserver.WOSessionStore$TimeoutEntry) at com.webobjects.appserver.WOApplication.restoreSessionWithID(WOApplication.java:1913) at er.extensions.appserver.ERXApplication.restoreSessionWithID(ERXApplication.java:2440) at er.extensions.appserver.ERXComponentRequestHandler._dispatchWithPreparedApplication(ERXComponentRequestHandler.java:260) at er.extensions.appserver.ERXComponentRequestHandler._handleRequest(ERXComponentRequestHandler.java:302) at er.extensions.appserver.ERXComponentRequestHandler.handleRequest(ERXComponentRequestHandler.java:377) at com.webobjects.appserver.WOApplication.dispatchRequest(WOApplication.java:1687) at er.extensions.appserver.ERXApplication.dispatchRequestImmediately(ERXApplication.java:2139) at er.extensions.appserver.ERXApplication.dispatchRequest(ERXApplication.java:2104) at com.webobjects.appserver._private.WOWorkerThread.runOnce(WOWorkerThread.java:144) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:226) at java.lang.Thread.run(Thread.java:680) Locked ownable synchronizers: - 20ce7bbc0 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) Now, ERXApplication.restoreSessionWithID contains an interesting call to useSessionStoreDeadlockDetection(), but this detection only works in single threaded mode. I'm afraid I can't afford to switch off concurrent requests even for a testing period in production. I'm looking for someone with experience regarding this problem. The doc for that method mentions that it could help to find cases when a session is checked out twice in a single RR-loop, which will lead to a session store lockup. Since I cannot switch on this detection, what in your experience could lead to that happening? Thanks Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/chill%40global-village.net This email sent to ch...@global-village.net -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects Global Village Consulting ranks 13th in 2012 in BIV's Top 100 Fastest Growing Companies in B.C! Global Village Consulting ranks 76th in 24th annual PROFIT 200 ranking of Canada’s Fastest-Growing Companies by PROFIT Magazine! ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOSessionStore deadlocks - SOLVED
Hi Maik, Use -XX:MaxPermSize=your-desired-size-in-mbm Farrukh On Nov 8, 2012, at 1:59 PM, Maik Musall m...@selbstdenker.ag wrote: Hi Chuck, a follow-up on this: Am 19.10.2012 um 20:05 schrieb Chuck Hill ch...@global-village.net: Hi Maik, This can also indicate some other things too: - session did not get checked in (app threw OutOfMemory, sleep() threw an exception) - previous request for this session is still running (deadlock, waiting, infinite loop) - 2+ requests for the same session in rapid sequence where the first terminates the session Looks like my answer that OutOfMemory would be OutOfTheQuestion was not true. I now discovered what lead to my application hanging every afternoon, after *once* it finally cared to log a proper message before hanging: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: PermGen space Doh, the PermGen. I totally forgot about that. I had the app at -Xmx24576m, but didn't adjust PermGen. Now with a PermGen limit of 512m (of which currently about 154m gets used max according to jvisualvm) everything is finally running smoothly. The app turns out to load about 12000 classes over a workday. I think I need to have a look at what those are sometime... Maik Chuck On 2012-10-19, at 4:00 AM, Maik Musall wrote: Hi, I recently discovered what may be responsible for frequent deadlocks of an application here. In the jstack -l output, I see almost all threads waiting on a single ReentrantLock, and this thread is what holds that lock: WorkerThread4 prio=5 tid=103bc9000 nid=0x132caf000 in Object.wait() [132cae000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 22711d098 (a com.webobjects.appserver.WOSessionStore$TimeoutEntry) at java.lang.Object.wait(Object.java:485) at com.webobjects.appserver.WOSessionStore.checkOutSessionWithID(WOSessionStore.java:191) - locked 22711d098 (a com.webobjects.appserver.WOSessionStore$TimeoutEntry) at com.webobjects.appserver.WOApplication.restoreSessionWithID(WOApplication.java:1913) at er.extensions.appserver.ERXApplication.restoreSessionWithID(ERXApplication.java:2440) at er.extensions.appserver.ERXComponentRequestHandler._dispatchWithPreparedApplication(ERXComponentRequestHandler.java:260) at er.extensions.appserver.ERXComponentRequestHandler._handleRequest(ERXComponentRequestHandler.java:302) at er.extensions.appserver.ERXComponentRequestHandler.handleRequest(ERXComponentRequestHandler.java:377) at com.webobjects.appserver.WOApplication.dispatchRequest(WOApplication.java:1687) at er.extensions.appserver.ERXApplication.dispatchRequestImmediately(ERXApplication.java:2139) at er.extensions.appserver.ERXApplication.dispatchRequest(ERXApplication.java:2104) at com.webobjects.appserver._private.WOWorkerThread.runOnce(WOWorkerThread.java:144) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:226) at java.lang.Thread.run(Thread.java:680) Locked ownable synchronizers: - 20ce7bbc0 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) Now, ERXApplication.restoreSessionWithID contains an interesting call to useSessionStoreDeadlockDetection(), but this detection only works in single threaded mode. I'm afraid I can't afford to switch off concurrent requests even for a testing period in production. I'm looking for someone with experience regarding this problem. The doc for that method mentions that it could help to find cases when a session is checked out twice in a single RR-loop, which will lead to a session store lockup. Since I cannot switch on this detection, what in your experience could lead to that happening? Thanks Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/chill%40global-village.net This email sent to ch...@global-village.net -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects Global Village Consulting ranks 13th in 2012 in BIV's Top 100 Fastest Growing Companies in B.C! Global Village Consulting ranks 76th in 24th annual PROFIT 200 ranking of Canada’s Fastest-Growing Companies by PROFIT Magazine! ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https
Re: WOSessionStore deadlocks - SOLVED
Hi Farrukh, uh.. I think I described nothing else but my success doing that (although not mentioned the syntax)? Maik Am 08.11.2012 um 12:24 schrieb Farrukh Ijaz farrukh.i...@fuegodigitalmedia.com: Hi Maik, Use -XX:MaxPermSize=your-desired-size-in-mbm Farrukh On Nov 8, 2012, at 1:59 PM, Maik Musall m...@selbstdenker.ag wrote: Hi Chuck, a follow-up on this: Am 19.10.2012 um 20:05 schrieb Chuck Hill ch...@global-village.net: Hi Maik, This can also indicate some other things too: - session did not get checked in (app threw OutOfMemory, sleep() threw an exception) - previous request for this session is still running (deadlock, waiting, infinite loop) - 2+ requests for the same session in rapid sequence where the first terminates the session Looks like my answer that OutOfMemory would be OutOfTheQuestion was not true. I now discovered what lead to my application hanging every afternoon, after *once* it finally cared to log a proper message before hanging: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: PermGen space Doh, the PermGen. I totally forgot about that. I had the app at -Xmx24576m, but didn't adjust PermGen. Now with a PermGen limit of 512m (of which currently about 154m gets used max according to jvisualvm) everything is finally running smoothly. The app turns out to load about 12000 classes over a workday. I think I need to have a look at what those are sometime... Maik Chuck On 2012-10-19, at 4:00 AM, Maik Musall wrote: Hi, I recently discovered what may be responsible for frequent deadlocks of an application here. In the jstack -l output, I see almost all threads waiting on a single ReentrantLock, and this thread is what holds that lock: WorkerThread4 prio=5 tid=103bc9000 nid=0x132caf000 in Object.wait() [132cae000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 22711d098 (a com.webobjects.appserver.WOSessionStore$TimeoutEntry) at java.lang.Object.wait(Object.java:485) at com.webobjects.appserver.WOSessionStore.checkOutSessionWithID(WOSessionStore.java:191) - locked 22711d098 (a com.webobjects.appserver.WOSessionStore$TimeoutEntry) at com.webobjects.appserver.WOApplication.restoreSessionWithID(WOApplication.java:1913) at er.extensions.appserver.ERXApplication.restoreSessionWithID(ERXApplication.java:2440) at er.extensions.appserver.ERXComponentRequestHandler._dispatchWithPreparedApplication(ERXComponentRequestHandler.java:260) at er.extensions.appserver.ERXComponentRequestHandler._handleRequest(ERXComponentRequestHandler.java:302) at er.extensions.appserver.ERXComponentRequestHandler.handleRequest(ERXComponentRequestHandler.java:377) at com.webobjects.appserver.WOApplication.dispatchRequest(WOApplication.java:1687) at er.extensions.appserver.ERXApplication.dispatchRequestImmediately(ERXApplication.java:2139) at er.extensions.appserver.ERXApplication.dispatchRequest(ERXApplication.java:2104) at com.webobjects.appserver._private.WOWorkerThread.runOnce(WOWorkerThread.java:144) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:226) at java.lang.Thread.run(Thread.java:680) Locked ownable synchronizers: - 20ce7bbc0 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) Now, ERXApplication.restoreSessionWithID contains an interesting call to useSessionStoreDeadlockDetection(), but this detection only works in single threaded mode. I'm afraid I can't afford to switch off concurrent requests even for a testing period in production. I'm looking for someone with experience regarding this problem. The doc for that method mentions that it could help to find cases when a session is checked out twice in a single RR-loop, which will lead to a session store lockup. Since I cannot switch on this detection, what in your experience could lead to that happening? Thanks Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/chill%40global-village.net This email sent to ch...@global-village.net -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects Global Village Consulting ranks 13th in 2012 in BIV's Top 100 Fastest Growing Companies in B.C! Global Village Consulting ranks 76th in 24th annual PROFIT 200 ranking of Canada’s Fastest-Growing Companies by PROFIT Magazine
WOSessionStore deadlocks
Hi, I recently discovered what may be responsible for frequent deadlocks of an application here. In the jstack -l output, I see almost all threads waiting on a single ReentrantLock, and this thread is what holds that lock: WorkerThread4 prio=5 tid=103bc9000 nid=0x132caf000 in Object.wait() [132cae000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 22711d098 (a com.webobjects.appserver.WOSessionStore$TimeoutEntry) at java.lang.Object.wait(Object.java:485) at com.webobjects.appserver.WOSessionStore.checkOutSessionWithID(WOSessionStore.java:191) - locked 22711d098 (a com.webobjects.appserver.WOSessionStore$TimeoutEntry) at com.webobjects.appserver.WOApplication.restoreSessionWithID(WOApplication.java:1913) at er.extensions.appserver.ERXApplication.restoreSessionWithID(ERXApplication.java:2440) at er.extensions.appserver.ERXComponentRequestHandler._dispatchWithPreparedApplication(ERXComponentRequestHandler.java:260) at er.extensions.appserver.ERXComponentRequestHandler._handleRequest(ERXComponentRequestHandler.java:302) at er.extensions.appserver.ERXComponentRequestHandler.handleRequest(ERXComponentRequestHandler.java:377) at com.webobjects.appserver.WOApplication.dispatchRequest(WOApplication.java:1687) at er.extensions.appserver.ERXApplication.dispatchRequestImmediately(ERXApplication.java:2139) at er.extensions.appserver.ERXApplication.dispatchRequest(ERXApplication.java:2104) at com.webobjects.appserver._private.WOWorkerThread.runOnce(WOWorkerThread.java:144) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:226) at java.lang.Thread.run(Thread.java:680) Locked ownable synchronizers: - 20ce7bbc0 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) Now, ERXApplication.restoreSessionWithID contains an interesting call to useSessionStoreDeadlockDetection(), but this detection only works in single threaded mode. I'm afraid I can't afford to switch off concurrent requests even for a testing period in production. I'm looking for someone with experience regarding this problem. The doc for that method mentions that it could help to find cases when a session is checked out twice in a single RR-loop, which will lead to a session store lockup. Since I cannot switch on this detection, what in your experience could lead to that happening? Thanks Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOSessionStore deadlocks
Hi Maik, This can also indicate some other things too: - session did not get checked in (app threw OutOfMemory, sleep() threw an exception) - previous request for this session is still running (deadlock, waiting, infinite loop) - 2+ requests for the same session in rapid sequence where the first terminates the session Chuck On 2012-10-19, at 4:00 AM, Maik Musall wrote: Hi, I recently discovered what may be responsible for frequent deadlocks of an application here. In the jstack -l output, I see almost all threads waiting on a single ReentrantLock, and this thread is what holds that lock: WorkerThread4 prio=5 tid=103bc9000 nid=0x132caf000 in Object.wait() [132cae000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 22711d098 (a com.webobjects.appserver.WOSessionStore$TimeoutEntry) at java.lang.Object.wait(Object.java:485) at com.webobjects.appserver.WOSessionStore.checkOutSessionWithID(WOSessionStore.java:191) - locked 22711d098 (a com.webobjects.appserver.WOSessionStore$TimeoutEntry) at com.webobjects.appserver.WOApplication.restoreSessionWithID(WOApplication.java:1913) at er.extensions.appserver.ERXApplication.restoreSessionWithID(ERXApplication.java:2440) at er.extensions.appserver.ERXComponentRequestHandler._dispatchWithPreparedApplication(ERXComponentRequestHandler.java:260) at er.extensions.appserver.ERXComponentRequestHandler._handleRequest(ERXComponentRequestHandler.java:302) at er.extensions.appserver.ERXComponentRequestHandler.handleRequest(ERXComponentRequestHandler.java:377) at com.webobjects.appserver.WOApplication.dispatchRequest(WOApplication.java:1687) at er.extensions.appserver.ERXApplication.dispatchRequestImmediately(ERXApplication.java:2139) at er.extensions.appserver.ERXApplication.dispatchRequest(ERXApplication.java:2104) at com.webobjects.appserver._private.WOWorkerThread.runOnce(WOWorkerThread.java:144) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:226) at java.lang.Thread.run(Thread.java:680) Locked ownable synchronizers: - 20ce7bbc0 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) Now, ERXApplication.restoreSessionWithID contains an interesting call to useSessionStoreDeadlockDetection(), but this detection only works in single threaded mode. I'm afraid I can't afford to switch off concurrent requests even for a testing period in production. I'm looking for someone with experience regarding this problem. The doc for that method mentions that it could help to find cases when a session is checked out twice in a single RR-loop, which will lead to a session store lockup. Since I cannot switch on this detection, what in your experience could lead to that happening? Thanks Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/chill%40global-village.net This email sent to ch...@global-village.net -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects Global Village Consulting ranks 13th in 2012 in BIV's Top 100 Fastest Growing Companies in B.C! Global Village Consulting ranks 76th in 24th annual PROFIT 200 ranking of Canada’s Fastest-Growing Companies by PROFIT Magazine! ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Hi Chuck, many thanks for your answer. Am 13.09.2012 20:06, schrieb Chuck Hill: Hi Susanne, On 2012-09-13, at 8:57 AM, Susanne Schneider wrote: Hi all, please allow me to add one question regarding this interesting topic. Alexis Tual (my mail client has problem with correct quoting) has suggested for EOF background handling: snip ec.lock(); try { // huge loop to compute stats for (i = 0; i 100; i++) { // doing stuff with ec... // cycling the ec if (i % 100 == 0) { ec.unlock(); ec.dispose(); ec = newEditingContextForMyWork(); ec.lock(); } } } finally { ec.unlock(); } /snip Now my question: is it correct to dispose the ec after unlock or would it be better to do this beforehand, like: ec.dipsose(); ec.unlock(); It is correct to unlock it before disposing it. Good to know, we will do it this way. If I turn on the ec-lock logging in my application, there are many remarks from the Finalizers like: *** EOEditingContext: access with no lock: _eoForGID()! Is this a real problem or can it be ignored? I am not sure, can you send the full stack trace? There is nor real exception, just the logging message. We have turned on debugging with NSLog.debug.setAllowedDebugLevel(NSLog.DebugLevelInformational); NSLog.allowDebugLoggingForGroups(NSLog.DebugGroupMultithreading); EOObjectStore._resetAssertLock(); in the application constructor because we were experiencing sporadic deadlocks and hoped to get some information of any EC locking problem that way. Besides other information (about real unlocked ec usage) this results in messages like [120726 18:54:07] DEBUG Finalizer com.webobjects - *** EOEditingContext: access with no lock: _eoForGID()! at random intervals (whenever the garbage collection is done). There seem to be nothing related to this message. Explicitly disposing any local ec seems to help regarding this special message. But because I am not so familiar with the EOF internals, I was not sure if this is a real problem or just too chatty logging. Best regards. Susanne -- Susanne Schneider Coordinator secuTrial Development iAS interActive Systems GmbH Dieffenbachstraße 33 c, D-10967 Berlin fon+49(0)30 22 50 50 - 498 fax+49(0)30 22 50 50 - 451 mail susanne.schnei...@interactive-systems.de webhttp://www.interActive-Systems.de Geschäftsführer: Dr. Marko Reschke, Thomas Fritzsche Sitz der Gesellschaft: Berlin Amtsgericht Berlin Charlottenburg, HRB 106103B ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
I think you can safely ignore warnings from the finalizer. On 2012-09-14, at 9:28 AM, Susanne Schneider wrote: Hi Chuck, many thanks for your answer. Am 13.09.2012 20:06, schrieb Chuck Hill: Hi Susanne, On 2012-09-13, at 8:57 AM, Susanne Schneider wrote: Hi all, please allow me to add one question regarding this interesting topic. Alexis Tual (my mail client has problem with correct quoting) has suggested for EOF background handling: snip ec.lock(); try { // huge loop to compute stats for (i = 0; i 100; i++) { // doing stuff with ec... // cycling the ec if (i % 100 == 0) { ec.unlock(); ec.dispose(); ec = newEditingContextForMyWork(); ec.lock(); } } } finally { ec.unlock(); } /snip Now my question: is it correct to dispose the ec after unlock or would it be better to do this beforehand, like: ec.dipsose(); ec.unlock(); It is correct to unlock it before disposing it. Good to know, we will do it this way. If I turn on the ec-lock logging in my application, there are many remarks from the Finalizers like: *** EOEditingContext: access with no lock: _eoForGID()! Is this a real problem or can it be ignored? I am not sure, can you send the full stack trace? There is nor real exception, just the logging message. We have turned on debugging with NSLog.debug.setAllowedDebugLevel(NSLog.DebugLevelInformational); NSLog.allowDebugLoggingForGroups(NSLog.DebugGroupMultithreading); EOObjectStore._resetAssertLock(); in the application constructor because we were experiencing sporadic deadlocks and hoped to get some information of any EC locking problem that way. Besides other information (about real unlocked ec usage) this results in messages like [120726 18:54:07] DEBUG Finalizer com.webobjects - *** EOEditingContext: access with no lock: _eoForGID()! at random intervals (whenever the garbage collection is done). There seem to be nothing related to this message. Explicitly disposing any local ec seems to help regarding this special message. But because I am not so familiar with the EOF internals, I was not sure if this is a real problem or just too chatty logging. Best regards. Susanne -- Susanne Schneider Coordinator secuTrial Development iAS interActive Systems GmbH Dieffenbachstraße 33 c, D-10967 Berlin fon+49(0)30 22 50 50 - 498 fax+49(0)30 22 50 50 - 451 mail susanne.schnei...@interactive-systems.de webhttp://www.interActive-Systems.de Geschäftsführer: Dr. Marko Reschke, Thomas Fritzsche Sitz der Gesellschaft: Berlin Amtsgericht Berlin Charlottenburg, HRB 106103B -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Hi Alex, Hi Chuck, Am 13.09.2012 um 02:28 schrieb Chuck Hill ch...@global-village.net: Never out of memory. The app is allowed to grow up to 24 GByte, stays in the 1-4 GByte range in normal use and occasionally grows up to 12 GByte with the most advanced statistics that tend to suck in the whole database. That's also the reason though that I can't use separate EOF stacks for the statistics, because as soon as there were more than one of them, I'd have multiple 10 GByte-ish snapshot caches. The server has 48 GByte and I don't really want to hit that limit... and with separate stacks, it also would be difficult to keep the stats reflect current changes in the other stacks. I am not sure about the background threads (depends on how long OSC locks are held), but using ECs sharing the same EOF stack with regular requests is likely to cause problems like you are seeing. Do you mean that the application would be unresponsive while the lock was held in the background thread, or that simply doing it that way will lead to unrecoverable deadlocks? If you do massive fetches in the background, that will block other requests as the only OSC is locked. Correct. That said, I think (and correct me if I'm wrong) if you lock the ec but do not fetch anything with this ec, other ecs can still access the db. Also correct. The lock contention is only when fetching or saving. It can also happen if your code (or Wonder code that you are using) locks something in EOControl or EOAccess. I'm very familiar with that stuff, and my users know how it feels to wait for that lock :-) Anyway, the best practice is to use a dedicated OSC to do background work. Maik, you should use a dedicated OSC for your stats, and try, if possible to clean memory, for example : ec.lock(); try { // huge loop to compute stats for (i = 0; i 100; i++) { // doing stuff with ec... // cycling the ec if (i % 100 == 0) { ec.unlock(); ec.dispose(); ec = newEditingContextForMyWork(); ec.lock(); } } } finally { ec.unlock(); } If practical (I recall that it is not in Maik's case) that can be a good way of limiting memory usage. Right, not practical for me. I even rely on those statistics to fill the snapshot cache with data that other users will need in a minute anyway to speed up overall response times. Those statistics are not strictly background processes, they are user interaction that happens to be implemented in a worker thread while the user is displayed a long response page. What I've done to improve concurrent response times while those stats fetch their 30 EOs: I fetch them in batches of a few 1000 and release the lock in between. This is the method I can call on my manual-locking editing context between batches: public void shortLockRelease() { unlock(); try { Thread.sleep( 50 ); } catch( InterruptedException e ) { e.printStackTrace(); } finally { lock(); } } This effectively gives other threads the opportunity to sneak in a few transactions before the stats worker resumes grabbing the OSC's resources, and is enough to keep response times within a reasonable limit. Users feel it when stats are running, but they don't have to really wait any more. I've even tuned those 50 ms. Less than that and don't get the desired effect. More than that and you needlessly increase the stats execution time. Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
2012/9/13 Maik Musall m...@selbstdenker.ag Hi Alex, Hi Chuck, Am 13.09.2012 um 02:28 schrieb Chuck Hill ch...@global-village.net: Never out of memory. The app is allowed to grow up to 24 GByte, stays in the 1-4 GByte range in normal use and occasionally grows up to 12 GByte with the most advanced statistics that tend to suck in the whole database. That's also the reason though that I can't use separate EOF stacks for the statistics, because as soon as there were more than one of them, I'd have multiple 10 GByte-ish snapshot caches. The server has 48 GByte and I don't really want to hit that limit... and with separate stacks, it also would be difficult to keep the stats reflect current changes in the other stacks. I am not sure about the background threads (depends on how long OSC locks are held), but using ECs sharing the same EOF stack with regular requests is likely to cause problems like you are seeing. Do you mean that the application would be unresponsive while the lock was held in the background thread, or that simply doing it that way will lead to unrecoverable deadlocks? If you do massive fetches in the background, that will block other requests as the only OSC is locked. Correct. That said, I think (and correct me if I'm wrong) if you lock the ec but do not fetch anything with this ec, other ecs can still access the db. Also correct. The lock contention is only when fetching or saving. It can also happen if your code (or Wonder code that you are using) locks something in EOControl or EOAccess. I'm very familiar with that stuff, and my users know how it feels to wait for that lock :-) Anyway, the best practice is to use a dedicated OSC to do background work. Maik, you should use a dedicated OSC for your stats, and try, if possible to clean memory, for example : ec.lock(); try { // huge loop to compute stats for (i = 0; i 100; i++) { // doing stuff with ec... // cycling the ec if (i % 100 == 0) { ec.unlock(); ec.dispose(); ec = newEditingContextForMyWork(); ec.lock(); } } } finally { ec.unlock(); } If practical (I recall that it is not in Maik's case) that can be a good way of limiting memory usage. Right, not practical for me. I even rely on those statistics to fill the snapshot cache with data that other users will need in a minute anyway to speed up overall response times. Those statistics are not strictly background processes, they are user interaction that happens to be implemented in a worker thread while the user is displayed a long response page. What I've done to improve concurrent response times while those stats fetch their 30 EOs: I fetch them in batches of a few 1000 and release the lock in between. This is the method I can call on my manual-locking editing context between batches: public void shortLockRelease() { unlock(); try { Thread.sleep( 50 ); } catch( InterruptedException e ) { e.printStackTrace(); } finally { lock(); } } This effectively gives other threads the opportunity to sneak in a few transactions before the stats worker resumes grabbing the OSC's resources, and is enough to keep response times within a reasonable limit. Users feel it when stats are running, but they don't have to really wait any more. I've even tuned those 50 ms. Less than that and don't get the desired effect. More than that and you needlessly increase the stats execution time. Interesting setup, thanks for sharing, looks like one giant VM (and EOF) can handle this amount of objects ! If the DB is touched by this app only, you could fetch all the stats at startup... but I imagine this is more complicated :) Alex ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Hi all, please allow me to add one question regarding this interesting topic. Alexis Tual (my mail client has problem with correct quoting) has suggested for EOF background handling: snip ec.lock(); try { // huge loop to compute stats for (i = 0; i 100; i++) { // doing stuff with ec... // cycling the ec if (i % 100 == 0) { ec.unlock(); ec.dispose(); ec = newEditingContextForMyWork(); ec.lock(); } } } finally { ec.unlock(); } /snip Now my question: is it correct to dispose the ec after unlock or would it be better to do this beforehand, like: ec.dipsose(); ec.unlock(); If I turn on the ec-lock logging in my application, there are many remarks from the Finalizers like: *** EOEditingContext: access with no lock: _eoForGID()! Is this a real problem or can it be ignored? Best regards, Susanne -- Susanne Schneider Coordinator secuTrial Development iAS interActive Systems GmbH Dieffenbachstraße 33 c, 10967 Berlin fon+49 30 22 50 50 - 498 fax+49 30 22 50 50 - 451 mail susanne.schnei...@interactive-systems.de webhttp://www.interActive-Systems.de Geschäftsführer: Dr. Marko Reschke, Thomas Fritzsche Sitz der Gesellschaft: Berlin Amtsgericht Berlin Charlottenburg, HRB 106103B ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Hi Susanne, On 2012-09-13, at 8:57 AM, Susanne Schneider wrote: Hi all, please allow me to add one question regarding this interesting topic. Alexis Tual (my mail client has problem with correct quoting) has suggested for EOF background handling: snip ec.lock(); try { // huge loop to compute stats for (i = 0; i 100; i++) { // doing stuff with ec... // cycling the ec if (i % 100 == 0) { ec.unlock(); ec.dispose(); ec = newEditingContextForMyWork(); ec.lock(); } } } finally { ec.unlock(); } /snip Now my question: is it correct to dispose the ec after unlock or would it be better to do this beforehand, like: ec.dipsose(); ec.unlock(); It is correct to unlock it before disposing it. If I turn on the ec-lock logging in my application, there are many remarks from the Finalizers like: *** EOEditingContext: access with no lock: _eoForGID()! Is this a real problem or can it be ignored? I am not sure, can you send the full stack trace? Chuck -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
The state the app was in when I took that jstack was that no login was possible and user's requests would not return, ultimately running into no instance responses after the timeout elapsed. Grep the app logs for OutOfMemory, that is one possibility. They look ready to accept connections. It could also be that they got so back logged that wotaskd gave up on them and decided they were dead. Having the lower numbers above should help in this respect - the app will be able to recover more quickly. Never out of memory. The app is allowed to grow up to 24 GByte, stays in the 1-4 GByte range in normal use and occasionally grows up to 12 GByte with the most advanced statistics that tend to suck in the whole database. That's also the reason though that I can't use separate EOF stacks for the statistics, because as soon as there were more than one of them, I'd have multiple 10 GByte-ish snapshot caches. The server has 48 GByte and I don't really want to hit that limit... and with separate stacks, it also would be difficult to keep the stats reflect current changes in the other stacks. I am not sure about the background threads (depends on how long OSC locks are held), but using ECs sharing the same EOF stack with regular requests is likely to cause problems like you are seeing. Do you mean that the application would be unresponsive while the lock was held in the background thread, or that simply doing it that way will lead to unrecoverable deadlocks? ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Hi, 2012/9/13 John Huss johnth...@gmail.com The state the app was in when I took that jstack was that no login was possible and user's requests would not return, ultimately running into no instance responses after the timeout elapsed. Grep the app logs for OutOfMemory, that is one possibility. They look ready to accept connections. It could also be that they got so back logged that wotaskd gave up on them and decided they were dead. Having the lower numbers above should help in this respect - the app will be able to recover more quickly. Never out of memory. The app is allowed to grow up to 24 GByte, stays in the 1-4 GByte range in normal use and occasionally grows up to 12 GByte with the most advanced statistics that tend to suck in the whole database. That's also the reason though that I can't use separate EOF stacks for the statistics, because as soon as there were more than one of them, I'd have multiple 10 GByte-ish snapshot caches. The server has 48 GByte and I don't really want to hit that limit... and with separate stacks, it also would be difficult to keep the stats reflect current changes in the other stacks. I am not sure about the background threads (depends on how long OSC locks are held), but using ECs sharing the same EOF stack with regular requests is likely to cause problems like you are seeing. Do you mean that the application would be unresponsive while the lock was held in the background thread, or that simply doing it that way will lead to unrecoverable deadlocks? If you do massive fetches in the background, that will block other requests as the only OSC is locked. That said, I think (and correct me if I'm wrong) if you lock the ec but do not fetch anything with this ec, other ecs can still access the db. Anyway, the best practice is to use a dedicated OSC to do background work. Maik, you should use a dedicated OSC for your stats, and try, if possible to clean memory, for example : ec.lock(); try { // huge loop to compute stats for (i = 0; i 100; i++) { // doing stuff with ec... // cycling the ec if (i % 100 == 0) { ec.unlock(); ec.dispose(); ec = newEditingContextForMyWork(); ec.lock(); } } } finally { ec.unlock(); } Alex ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Hi John, On 2012-09-12, at 7:13 AM, John Huss wrote: The state the app was in when I took that jstack was that no login was possible and user's requests would not return, ultimately running into no instance responses after the timeout elapsed. Grep the app logs for OutOfMemory, that is one possibility. They look ready to accept connections. It could also be that they got so back logged that wotaskd gave up on them and decided they were dead. Having the lower numbers above should help in this respect - the app will be able to recover more quickly. Never out of memory. The app is allowed to grow up to 24 GByte, stays in the 1-4 GByte range in normal use and occasionally grows up to 12 GByte with the most advanced statistics that tend to suck in the whole database. That's also the reason though that I can't use separate EOF stacks for the statistics, because as soon as there were more than one of them, I'd have multiple 10 GByte-ish snapshot caches. The server has 48 GByte and I don't really want to hit that limit... and with separate stacks, it also would be difficult to keep the stats reflect current changes in the other stacks. I am not sure about the background threads (depends on how long OSC locks are held), but using ECs sharing the same EOF stack with regular requests is likely to cause problems like you are seeing. Do you mean that the application would be unresponsive while the lock was held in the background thread, or that simply doing it that way will lead to unrecoverable deadlocks? I meant that when the EC locks the OSC (e.g during fetches and saves) it would block all other requests also needing to lock the OSC. If the background thread's locks of the OSC are very short in duration (and also not happening constantly) it would have little effect on the other request. However that is not what background processing is often used for. Chuck -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
On 2012-09-12, at 2:58 PM, Alexis Tual wrote: Hi, 2012/9/13 John Huss johnth...@gmail.com The state the app was in when I took that jstack was that no login was possible and user's requests would not return, ultimately running into no instance responses after the timeout elapsed. Grep the app logs for OutOfMemory, that is one possibility. They look ready to accept connections. It could also be that they got so back logged that wotaskd gave up on them and decided they were dead. Having the lower numbers above should help in this respect - the app will be able to recover more quickly. Never out of memory. The app is allowed to grow up to 24 GByte, stays in the 1-4 GByte range in normal use and occasionally grows up to 12 GByte with the most advanced statistics that tend to suck in the whole database. That's also the reason though that I can't use separate EOF stacks for the statistics, because as soon as there were more than one of them, I'd have multiple 10 GByte-ish snapshot caches. The server has 48 GByte and I don't really want to hit that limit... and with separate stacks, it also would be difficult to keep the stats reflect current changes in the other stacks. I am not sure about the background threads (depends on how long OSC locks are held), but using ECs sharing the same EOF stack with regular requests is likely to cause problems like you are seeing. Do you mean that the application would be unresponsive while the lock was held in the background thread, or that simply doing it that way will lead to unrecoverable deadlocks? If you do massive fetches in the background, that will block other requests as the only OSC is locked. Correct. That said, I think (and correct me if I'm wrong) if you lock the ec but do not fetch anything with this ec, other ecs can still access the db. Also correct. The lock contention is only when fetching or saving. It can also happen if your code (or Wonder code that you are using) locks something in EOControl or EOAccess. Anyway, the best practice is to use a dedicated OSC to do background work. Maik, you should use a dedicated OSC for your stats, and try, if possible to clean memory, for example : ec.lock(); try { // huge loop to compute stats for (i = 0; i 100; i++) { // doing stuff with ec... // cycling the ec if (i % 100 == 0) { ec.unlock(); ec.dispose(); ec = newEditingContextForMyWork(); ec.lock(); } } } finally { ec.unlock(); } If practical (I recall that it is not in Maik's case) that can be a good way of limiting memory usage. Chuck -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Hi Chuck, Am 10.09.2012 um 22:30 schrieb Chuck Hill ch...@global-village.net: The state the app was in when I took that jstack was that no login was possible and user's requests would not return, ultimately running into no instance responses after the timeout elapsed. Grep the app logs for OutOfMemory, that is one possibility. They look ready to accept connections. It could also be that they got so back logged that wotaskd gave up on them and decided they were dead. Having the lower numbers above should help in this respect - the app will be able to recover more quickly. Never out of memory. The app is allowed to grow up to 24 GByte, stays in the 1-4 GByte range in normal use and occasionally grows up to 12 GByte with the most advanced statistics that tend to suck in the whole database. That's also the reason though that I can't use separate EOF stacks for the statistics, because as soon as there were more than one of them, I'd have multiple 10 GByte-ish snapshot caches. The server has 48 GByte and I don't really want to hit that limit... and with separate stacks, it also would be difficult to keep the stats reflect current changes in the other stacks. Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Hi Alexis, Am 10.09.2012 um 23:19 schrieb Alexis Tual alexis.t...@gmail.com: Note that I recently switched to Wonder for this project (using all the Wonder base classes), and since I did, this problem occurred more frequently. It's now almost once a day, and was about once a week before. I switched from MultiECLockManager to ERXEC with autolocking in the process. I've seen you have long response pages, have you turned off autolocking for these special cases ? Good point. I just checked: those are simple WOLongResponsePages that don't hold anything regarding EOF, just wait for the background worker thread to notify when it's done. The background workers all use manual locking, but some of them don't explicitly use my manual locking EC factory but use an autolocking EC and do manual locking on top. I'll correct that, thanks. Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Am 11.09.2012 um 09:10 schrieb Maik Musall m...@selbstdenker.ag: Hi Alexis, Am 10.09.2012 um 23:19 schrieb Alexis Tual alexis.t...@gmail.com: Note that I recently switched to Wonder for this project (using all the Wonder base classes), and since I did, this problem occurred more frequently. It's now almost once a day, and was about once a week before. I switched from MultiECLockManager to ERXEC with autolocking in the process. I've seen you have long response pages, have you turned off autolocking for these special cases ? Good point. I just checked: those are simple WOLongResponsePages that don't hold anything regarding EOF, just wait for the background worker thread to notify when it's done. The background workers all use manual locking, but some of them don't explicitly use my manual locking EC factory but use an autolocking EC and do manual locking on top. I'll correct that, thanks. Hmm, seems I have the choice between * use manual locking only in those background worker threads * diss manual locks and rely on autolocking for them. Worker threads are all implemented like this: public void run() { localEC.lock(); try { // heavy duty fetches, batchfetches, filtering and stuff that can take a minute } finally { localEC.unlock(); } } What would you recommend? My ERXEC-subclass-factory can give me either type. Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Use manual locking for your background threads, your snippet is right, be sure your localEC has autolock set to false. Check out Kerian presentation at WOWODC 2011 : http://www.wocommunity.org/podcasts/wowodc/2011/BackgroundTasks.mov and the examples : https://github.com/projectwonder/wonder/tree/master/Examples/Misc/BackgroundTasks Good luck, Alex 2012/9/11 Maik Musall m...@selbstdenker.ag Am 11.09.2012 um 09:10 schrieb Maik Musall m...@selbstdenker.ag: Hi Alexis, Am 10.09.2012 um 23:19 schrieb Alexis Tual alexis.t...@gmail.com: Note that I recently switched to Wonder for this project (using all the Wonder base classes), and since I did, this problem occurred more frequently. It's now almost once a day, and was about once a week before. I switched from MultiECLockManager to ERXEC with autolocking in the process. I've seen you have long response pages, have you turned off autolocking for these special cases ? Good point. I just checked: those are simple WOLongResponsePages that don't hold anything regarding EOF, just wait for the background worker thread to notify when it's done. The background workers all use manual locking, but some of them don't explicitly use my manual locking EC factory but use an autolocking EC and do manual locking on top. I'll correct that, thanks. Hmm, seems I have the choice between * use manual locking only in those background worker threads * diss manual locks and rely on autolocking for them. Worker threads are all implemented like this: public void run() { localEC.lock(); try { // heavy duty fetches, batchfetches, filtering and stuff that can take a minute } finally { localEC.unlock(); } } What would you recommend? My ERXEC-subclass-factory can give me either type. Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
On 2012-09-10, at 11:15 PM, Maik Musall wrote: Hi Chuck, Am 10.09.2012 um 22:30 schrieb Chuck Hill ch...@global-village.net: The state the app was in when I took that jstack was that no login was possible and user's requests would not return, ultimately running into no instance responses after the timeout elapsed. Grep the app logs for OutOfMemory, that is one possibility. They look ready to accept connections. It could also be that they got so back logged that wotaskd gave up on them and decided they were dead. Having the lower numbers above should help in this respect - the app will be able to recover more quickly. Never out of memory. The app is allowed to grow up to 24 GByte, stays in the 1-4 GByte range in normal use and occasionally grows up to 12 GByte with the most advanced statistics that tend to suck in the whole database. That's also the reason though that I can't use separate EOF stacks for the statistics, because as soon as there were more than one of them, I'd have multiple 10 GByte-ish snapshot caches. The server has 48 GByte and I don't really want to hit that limit... and with separate stacks, it also would be difficult to keep the stats reflect current changes in the other stacks. I am not sure about the background threads (depends on how long OSC locks are held), but using ECs sharing the same EOF stack with regular requests is likely to cause problems like you are seeing. Chuck -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
WOWorkerThread deadlocks
Hi, in an app with high concurrency, the app sometimes becomes unresponsive to everything but DirectActions at the time of day with the most concurrency. All users aren't seeing responses any more. In jstack I see hundreds of these: WorkerThread207 prio=5 tid=131e0a800 nid=0x151aa2000 waiting for monitor entry [151aa1000] java.lang.Thread.State: BLOCKED (on object monitor) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:406) - waiting to lock 20d3da450 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:210) at java.lang.Thread.run(Thread.java:680) all waiting on the same lock 20d3da450, and one thread holding that lock: WorkerThread206 prio=5 tid=131d79800 nid=0x15199f000 runnable [15199e000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) - locked 20d3da450 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:210) at java.lang.Thread.run(Thread.java:680) Anyone familiar with this problem? Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Hi, Isn't that normal? Only one thread can be accepting at any time, when it accepts, it releases the lock for the next one to enter the accept state. I think those are not the threads you are looking for… Regards, Miguel Arroz On 2012-09-10, at 8:03 AM, Maik Musall m...@selbstdenker.ag wrote: Hi, in an app with high concurrency, the app sometimes becomes unresponsive to everything but DirectActions at the time of day with the most concurrency. All users aren't seeing responses any more. In jstack I see hundreds of these: WorkerThread207 prio=5 tid=131e0a800 nid=0x151aa2000 waiting for monitor entry [151aa1000] java.lang.Thread.State: BLOCKED (on object monitor) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:406) - waiting to lock 20d3da450 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:210) at java.lang.Thread.run(Thread.java:680) all waiting on the same lock 20d3da450, and one thread holding that lock: WorkerThread206 prio=5 tid=131d79800 nid=0x15199f000 runnable [15199e000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) - locked 20d3da450 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:210) at java.lang.Thread.run(Thread.java:680) Anyone familiar with this problem? Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/arroz%40guiamac.com This email sent to ar...@guiamac.com ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Hi Maik, WorkerThread207 that many worker threads indicates two things to me: 1. Your app configuration is too high. I'd use a max of 6-10 and a listen queue size of around 4 (adjusted to suit your specific needs). A WO app is very, very unlikely to recover from a 200 worker thread backlog in any way that is useful to the users 2. You have a thread that is taking a long time to return a result. If you are dispatching requests concurrently, then this is most likely stuck in EOControl/EOAccess (e.g. waiting for a slow query result) or connecting to some external process. You could also have a deadlock. If you are not dispatching requests concurrently, then this delay could be in other code. The traces below do not show the problem. If you want to send a full dump, I am willing to look at it. It is possible that the problem had resolved by the time you took this dump. What you show below is normal for a lot of worker threads. WorkerThread206 is waiting for a new request, WorkerThread207 is idle waiting for something to do in the future. Chuck On 2012-09-10, at 8:03 AM, Maik Musall wrote: Hi, in an app with high concurrency, the app sometimes becomes unresponsive to everything but DirectActions at the time of day with the most concurrency. All users aren't seeing responses any more. In jstack I see hundreds of these: WorkerThread207 prio=5 tid=131e0a800 nid=0x151aa2000 waiting for monitor entry [151aa1000] java.lang.Thread.State: BLOCKED (on object monitor) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:406) - waiting to lock 20d3da450 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:210) at java.lang.Thread.run(Thread.java:680) all waiting on the same lock 20d3da450, and one thread holding that lock: WorkerThread206 prio=5 tid=131d79800 nid=0x15199f000 runnable [15199e000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) - locked 20d3da450 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:210) at java.lang.Thread.run(Thread.java:680) Anyone familiar with this problem? Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/chill%40global-village.net This email sent to ch...@global-village.net -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Hi Chuck, Am 10.09.2012 um 19:23 schrieb Chuck Hill ch...@global-village.net: Hi Maik, WorkerThread207 that many worker threads indicates two things to me: 1. Your app configuration is too high. I'd use a max of 6-10 and a listen queue size of around 4 (adjusted to suit your specific needs). A WO app is very, very unlikely to recover from a 200 worker thread backlog in any way that is useful to the users You may be right, they were at 16/512/8/128. I just set them to 4/8/8/6 and am eager to watch the behaviour tomorrow. There are up to 100 users concurrently (it's a backoffice app), although concurrently running requests are typically not more than 2-3, plus 1-2 DirectActions, plus possibly 1-2 long response pages running statistics stuff. 2. You have a thread that is taking a long time to return a result. If you are dispatching requests concurrently, then this is most likely stuck in EOControl/EOAccess (e.g. waiting for a slow query result) or connecting to some external process. You could also have a deadlock. If you are not dispatching requests concurrently, then this delay could be in other code. When that situation occurs, the app is not using CPU any more, neither is the database. It often doesn't respond to SIGTERM any more and needs SIGKILL to terminate so we can restart. The traces below do not show the problem. If you want to send a full dump, I am willing to look at it. It is possible that the problem had resolved by the time you took this dump. What you show below is normal for a lot of worker threads. WorkerThread206 is waiting for a new request, WorkerThread207 is idle waiting for something to do in the future. Thanks for the offer; here is the full jstack output: http://akaihi.selbstdenker.com/~maik/jstack_powerd_20120910.txt Maik On 2012-09-10, at 8:03 AM, Maik Musall wrote: Hi, in an app with high concurrency, the app sometimes becomes unresponsive to everything but DirectActions at the time of day with the most concurrency. All users aren't seeing responses any more. In jstack I see hundreds of these: WorkerThread207 prio=5 tid=131e0a800 nid=0x151aa2000 waiting for monitor entry [151aa1000] java.lang.Thread.State: BLOCKED (on object monitor) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:406) - waiting to lock 20d3da450 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:210) at java.lang.Thread.run(Thread.java:680) all waiting on the same lock 20d3da450, and one thread holding that lock: WorkerThread206 prio=5 tid=131d79800 nid=0x15199f000 runnable [15199e000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) - locked 20d3da450 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:210) at java.lang.Thread.run(Thread.java:680) Anyone familiar with this problem? Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/chill%40global-village.net This email sent to ch...@global-village.net -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Hi Maik, On 2012-09-10, at 11:04 AM, Maik Musall wrote: Hi Chuck, Am 10.09.2012 um 19:23 schrieb Chuck Hill ch...@global-village.net: Hi Maik, WorkerThread207 that many worker threads indicates two things to me: 1. Your app configuration is too high. I'd use a max of 6-10 and a listen queue size of around 4 (adjusted to suit your specific needs). A WO app is very, very unlikely to recover from a 200 worker thread backlog in any way that is useful to the users You may be right, they were at 16/512/8/128. I just set them to 4/8/8/6 and am eager to watch the behaviour tomorrow. You should at least know when there is a problem sooner. Then as quickly as you can, get a thread dump with jstack. There are up to 100 users concurrently (it's a backoffice app), although concurrently running requests are typically not more than 2-3, plus 1-2 DirectActions, plus possibly 1-2 long response pages running statistics stuff. OK, the 4/8/8/6 numbers you have seem reasonable for that load. 2. You have a thread that is taking a long time to return a result. If you are dispatching requests concurrently, then this is most likely stuck in EOControl/EOAccess (e.g. waiting for a slow query result) or connecting to some external process. You could also have a deadlock. If you are not dispatching requests concurrently, then this delay could be in other code. When that situation occurs, the app is not using CPU any more, neither is the database. It often doesn't respond to SIGTERM any more and needs SIGKILL to terminate so we can restart. That sounds like what a blocked non-daemon thread would cause. The traces below do not show the problem. If you want to send a full dump, I am willing to look at it. It is possible that the problem had resolved by the time you took this dump. What you show below is normal for a lot of worker threads. WorkerThread206 is waiting for a new request, WorkerThread207 is idle waiting for something to do in the future. Thanks for the offer; here is the full jstack output: http://akaihi.selbstdenker.com/~maik/jstack_powerd_20120910.txt Other than having a large number of idle worker threads, there is nothing in that trace that indicates the problem. In my experience, that means that they problem has resolved itself and the application recovered. You will need to run jstack closer to the start of the problem even to capture what is going wrong. Chuck On 2012-09-10, at 8:03 AM, Maik Musall wrote: Hi, in an app with high concurrency, the app sometimes becomes unresponsive to everything but DirectActions at the time of day with the most concurrency. All users aren't seeing responses any more. In jstack I see hundreds of these: WorkerThread207 prio=5 tid=131e0a800 nid=0x151aa2000 waiting for monitor entry [151aa1000] java.lang.Thread.State: BLOCKED (on object monitor) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:406) - waiting to lock 20d3da450 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:210) at java.lang.Thread.run(Thread.java:680) all waiting on the same lock 20d3da450, and one thread holding that lock: WorkerThread206 prio=5 tid=131d79800 nid=0x15199f000 runnable [15199e000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) - locked 20d3da450 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at com.webobjects.appserver._private.WOWorkerThread.run(WOWorkerThread.java:210) at java.lang.Thread.run(Thread.java:680) Anyone familiar with this problem? Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/chill%40global-village.net This email sent to ch...@global-village.net -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects ___ Do not post admin requests to the list. They will be
Re: WOWorkerThread deadlocks
Hi Chuck, Am 10.09.2012 um 21:35 schrieb Chuck Hill ch...@global-village.net: WorkerThread207 that many worker threads indicates two things to me: 1. Your app configuration is too high. I'd use a max of 6-10 and a listen queue size of around 4 (adjusted to suit your specific needs). A WO app is very, very unlikely to recover from a 200 worker thread backlog in any way that is useful to the users You may be right, they were at 16/512/8/128. I just set them to 4/8/8/6 and am eager to watch the behaviour tomorrow. You should at least know when there is a problem sooner. Then as quickly as you can, get a thread dump with jstack. There are up to 100 users concurrently (it's a backoffice app), although concurrently running requests are typically not more than 2-3, plus 1-2 DirectActions, plus possibly 1-2 long response pages running statistics stuff. OK, the 4/8/8/6 numbers you have seem reasonable for that load. 2. You have a thread that is taking a long time to return a result. If you are dispatching requests concurrently, then this is most likely stuck in EOControl/EOAccess (e.g. waiting for a slow query result) or connecting to some external process. You could also have a deadlock. If you are not dispatching requests concurrently, then this delay could be in other code. When that situation occurs, the app is not using CPU any more, neither is the database. It often doesn't respond to SIGTERM any more and needs SIGKILL to terminate so we can restart. That sounds like what a blocked non-daemon thread would cause. The traces below do not show the problem. If you want to send a full dump, I am willing to look at it. It is possible that the problem had resolved by the time you took this dump. What you show below is normal for a lot of worker threads. WorkerThread206 is waiting for a new request, WorkerThread207 is idle waiting for something to do in the future. Thanks for the offer; here is the full jstack output: http://akaihi.selbstdenker.com/~maik/jstack_powerd_20120910.txt Other than having a large number of idle worker threads, there is nothing in that trace that indicates the problem. In my experience, that means that they problem has resolved itself and the application recovered. You will need to run jstack closer to the start of the problem even to capture what is going wrong. The state the app was in when I took that jstack was that no login was possible and user's requests would not return, ultimately running into no instance responses after the timeout elapsed. If the problem persists, I think I'll set up a cronjob to record jstacks every couple of minutes or so. Note that I recently switched to Wonder for this project (using all the Wonder base classes), and since I did, this problem occurred more frequently. It's now almost once a day, and was about once a week before. I switched from MultiECLockManager to ERXEC with autolocking in the process. Maik ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Hi Maik, On 2012-09-10, at 1:07 PM, Maik Musall wrote: Hi Chuck, Am 10.09.2012 um 21:35 schrieb Chuck Hill ch...@global-village.net: WorkerThread207 that many worker threads indicates two things to me: 1. Your app configuration is too high. I'd use a max of 6-10 and a listen queue size of around 4 (adjusted to suit your specific needs). A WO app is very, very unlikely to recover from a 200 worker thread backlog in any way that is useful to the users You may be right, they were at 16/512/8/128. I just set them to 4/8/8/6 and am eager to watch the behaviour tomorrow. You should at least know when there is a problem sooner. Then as quickly as you can, get a thread dump with jstack. There are up to 100 users concurrently (it's a backoffice app), although concurrently running requests are typically not more than 2-3, plus 1-2 DirectActions, plus possibly 1-2 long response pages running statistics stuff. OK, the 4/8/8/6 numbers you have seem reasonable for that load. 2. You have a thread that is taking a long time to return a result. If you are dispatching requests concurrently, then this is most likely stuck in EOControl/EOAccess (e.g. waiting for a slow query result) or connecting to some external process. You could also have a deadlock. If you are not dispatching requests concurrently, then this delay could be in other code. When that situation occurs, the app is not using CPU any more, neither is the database. It often doesn't respond to SIGTERM any more and needs SIGKILL to terminate so we can restart. That sounds like what a blocked non-daemon thread would cause. The traces below do not show the problem. If you want to send a full dump, I am willing to look at it. It is possible that the problem had resolved by the time you took this dump. What you show below is normal for a lot of worker threads. WorkerThread206 is waiting for a new request, WorkerThread207 is idle waiting for something to do in the future. Thanks for the offer; here is the full jstack output: http://akaihi.selbstdenker.com/~maik/jstack_powerd_20120910.txt Other than having a large number of idle worker threads, there is nothing in that trace that indicates the problem. In my experience, that means that they problem has resolved itself and the application recovered. You will need to run jstack closer to the start of the problem even to capture what is going wrong. The state the app was in when I took that jstack was that no login was possible and user's requests would not return, ultimately running into no instance responses after the timeout elapsed. Grep the app logs for OutOfMemory, that is one possibility. They look ready to accept connections. It could also be that they got so back logged that wotaskd gave up on them and decided they were dead. Having the lower numbers above should help in this respect - the app will be able to recover more quickly. If the problem persists, I think I'll set up a cronjob to record jstacks every couple of minutes or so. That might be one way, unless can you babysit it and start grabbing them when the number of active worker threads goes up. Note that I recently switched to Wonder for this project (using all the Wonder base classes), and since I did, this problem occurred more frequently. It's now almost once a day, and was about once a week before. I switched from MultiECLockManager to ERXEC with autolocking in the process. I don't have any suggestions on how that change might cause this to happen more often. Chuck -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/gvc/practical_webobjects ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
Hi, 2012/9/11 Maik Musall m...@selbstdenker.ag Note that I recently switched to Wonder for this project (using all the Wonder base classes), and since I did, this problem occurred more frequently. It's now almost once a day, and was about once a week before. I switched from MultiECLockManager to ERXEC with autolocking in the process. I've seen you have long response pages, have you turned off autolocking for these special cases ? To help diagnose, you could make a little script to poll your app every 10 sec and if the response contains No instance available, you jstack the process... Alex ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: WOWorkerThread deadlocks
That is a good point! And also make sure that the long requests and background threads are not using the main EOF stack. Chuck On Sep 10, 2012, at 2:19 PM, Alexis Tual alexis.t...@gmail.com wrote: Hi, 2012/9/11 Maik Musall m...@selbstdenker.ag Note that I recently switched to Wonder for this project (using all the Wonder base classes), and since I did, this problem occurred more frequently. It's now almost once a day, and was about once a week before. I switched from MultiECLockManager to ERXEC with autolocking in the process. I've seen you have long response pages, have you turned off autolocking for these special cases ? To help diagnose, you could make a little script to poll your app every 10 sec and if the response contains No instance available, you jstack the process... Alex ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Deadlocks in one of our apps
... - It's a (non public) online store. When people log in, we create a order in memory and customers add order items to the order. We don't store anything in the DB until the payment is made with PayFlow. When we get the response from PayFlow, we store a copy of the order (and the items) to our Oracle db. After that, we contact our SQL Server db (actually, a accounting system, and we send the data to a stored procedure), and we get the invoice number produced by the accounting system and store it in the order EO in Oracle. So in summary : - People login, we create a order EO, the EO is created in the session's editing context - People add items to the order - They start the order payment steps - Long response page kicks in - We contact PayFlow to make the payment - If the payment is succesful, we store the order in Oracle - We create a new EO, in a different EC, for SQL Server - We update the order EO to store the invoice number in Oracle - We generate (FOXML, generated in a separated JVM) the invoice in PDF - Long response page is done, pageForResult is called Everything is done in session.defaultEditingContext EXCEPT the SQL Server EOs, You are not using the session.defaultEditingContext in the long response page, are you? I am pretty sure that is an excellent source of deadlocks. Hum, yes we do use in the long response page... But since localInstanceOfObject won't let me have a copy in a new EC, what are the options except not using the session EC? Not using the session EC would be a good choice. Make a different EC. Pass it into the long response page. Be careful handing off locking. You could also save the order in an unpaid state, then fetch it in the long response page and update it if paid, or delete it if not. Ooh, yeah, you could do that too. So we end up doing : - before the long response page is called, we save the EO in a unpaid state - in the method called in performAction, we create a new editing context, we lock it and we insert a copy of the order EO into the new EC by using localInstanceOfObject - the bulk of the job is done in a try {} finally { ec.unlock() } - when the method that is called inside performAction have done his job, the order EO is sent back - we override the order EO that was stored in the session default editing context with the one that was inserted in the temporary EC : ((Session )session ()).setCommande ((Commande )EOUtilities.localInstanceOfObject(session().defaultEditingContext(), (Commande)copieCommande)); So far, so good. We will see later today when the load get higher if we get any deadlocks. Many thanks to all who helped out :-) Pascal Robert prob...@macti.ca AIM: MacTICanada Twitter : MacTICanada LinkedIn : http://www.linkedin.com/in/macti WO Community profile : http://wocommunity.org/page/member?name=probert ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Deadlocks in one of our apps
On Jun 4, 2010, at 4:40 AM, Pascal Robert wrote: ... - It's a (non public) online store. When people log in, we create a order in memory and customers add order items to the order. We don't store anything in the DB until the payment is made with PayFlow. When we get the response from PayFlow, we store a copy of the order (and the items) to our Oracle db. After that, we contact our SQL Server db (actually, a accounting system, and we send the data to a stored procedure), and we get the invoice number produced by the accounting system and store it in the order EO in Oracle. So in summary : - People login, we create a order EO, the EO is created in the session's editing context - People add items to the order - They start the order payment steps - Long response page kicks in - We contact PayFlow to make the payment - If the payment is succesful, we store the order in Oracle - We create a new EO, in a different EC, for SQL Server - We update the order EO to store the invoice number in Oracle - We generate (FOXML, generated in a separated JVM) the invoice in PDF - Long response page is done, pageForResult is called Everything is done in session.defaultEditingContext EXCEPT the SQL Server EOs, You are not using the session.defaultEditingContext in the long response page, are you? I am pretty sure that is an excellent source of deadlocks. Hum, yes we do use in the long response page... But since localInstanceOfObject won't let me have a copy in a new EC, what are the options except not using the session EC? Not using the session EC would be a good choice. Make a different EC. Pass it into the long response page. Be careful handing off locking. You could also save the order in an unpaid state, then fetch it in the long response page and update it if paid, or delete it if not. Ooh, yeah, you could do that too. So we end up doing : - before the long response page is called, we save the EO in a unpaid state - in the method called in performAction, we create a new editing context, we lock it and we insert a copy of the order EO into the new EC by using localInstanceOfObject we fault (not insert) a copy of the order EO into the new EC... Just everyone is clear on what is happening. - the bulk of the job is done in a try {} finally { ec.unlock() } - when the method that is called inside performAction have done his job, the order EO is sent back - we override the order EO that was stored in the session default editing context with the one that was inserted in the temporary EC : ((Session )session ()).setCommande ((Commande )EOUtilities.localInstanceOfObject(session().defaultEditingContext(), (Commande)copieCommande)); I'd put that move between ECs in Session: public void setCommande(Commande c) { command = (Commande)EOUtilities.localInstanceOfObject(defaultEditingContext(),c); } and change the line above to ((Session)session()).setCommande(copieCommande); That way the session is protected and does not rely on the client code being correct. So far, so good. We will see later today when the load get higher if we get any deadlocks. Many thanks to all who helped out :-) Let us know how it works out! Chuck Pascal Robert prob...@macti.ca AIM: MacTICanada Twitter : MacTICanada LinkedIn : http://www.linkedin.com/in/macti WO Community profile : http://wocommunity.org/page/member?name=probert -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/products/practical_webobjects ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Deadlocks in one of our apps
... And going back to the physical server didn't solve anything, I got the same deadlock this morning. Ok, so I will move back the DB to the physical server to see if the problem goes away. On Jun 1, 2010, at 6:34 AM, Pascal Robert wrote: Hum... And after I started using ERXWOLongResponsePage, I still got a deadlock, but this time, it says that it's a EODatabaseContext lock : Thread t...@92163: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame) - com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56, line=1973 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext (com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) ... We don't manual (eg , in code) locking at the EODatabaseContext level. It is possible that an odd exception in EOAccess or below is resulting in this not getting unlocked. Joe's reply below might be what is happening to you. Chuck Another thing to note if this is a long request to a database housed in an ESX vm. We had similar problems with long requests timing out between two systems, with one hosted by esx 4.x. Such long requests were caught by some low level interface muxing issue and my whole EOF stack was frozen when the underlying db connection was lost mid-transaction. I resolved it by moving this application off of a vm. On May 31, 2010, at 5:33 PM, Pascal Robert prob...@macti.ca wrote: Ok, will try with ERXWOLongResponsePage since it look like it's locking and unlocking all ECs in the thread. There's a bunch of stuff wrong here. First, the only actually locked thread is: - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext (com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel ._selectWithFetchSpecificationEditingContext (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=158, line=788 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .selectObjectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=64, line=215 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseContext ._objectsWithFetchSpecificationEditingContext (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=219, line=3205 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=34, line=3346 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=97, line=539 (Interpreted frame) - com .webobjects .eocontrol .EOEditingContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=79, line=4114 (Interpreted frame) - er .extensions .eof .ERXEC .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=72, line=1211 (Interpreted frame) - com .webobjects .eocontrol .EOEditingContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification) @bci=3, line=4500 (Interpreted frame) - com .acaiq .fondation .acaiqCore ._Licence .fetchLicences(com.webobjects.eocontrol.EOEditingContext, com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray) @bci=19, line=1062 (Interpreted frame) - com .acaiq .fondation .acaiqCore ._Membre.licences(com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray, boolean) @bci=77, line=8920 (Interpreted frame) - com .acaiq .fondation .acaiqCore ._Membre.licences(com.webobjects.eocontrol.EOQualifier, boolean) @bci=4, line=8893 (Interpreted frame) - com .acaiq .fondation .acaiqCore .Membre .licencesParEtats(com.acaiq.fondation.acaiqCore.EtatMembre[]) @bci=100, line=980 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.licencesValides() @bci=11, line=996 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.estCourtier() @bci=5, line=1035 (Interpreted frame) -
Re: Deadlocks in one of our apps
That makes your code look guilty then. :-) Check your long response page implementation again. Are there any exceptions in the log that might be related? I'd also reduce the Maximum Adaptor threads (JavaMonitor - Application configuration - Application settings). 6 or 8 is probably more than enough for this app. That will at least reduce the size of the thread dumps. I'd also trim down the listen queue size to 2 or 4, might as well catch this as soon as possible. Chuck On Jun 2, 2010, at 5:19 AM, Pascal Robert wrote: ... And going back to the physical server didn't solve anything, I got the same deadlock this morning. Ok, so I will move back the DB to the physical server to see if the problem goes away. On Jun 1, 2010, at 6:34 AM, Pascal Robert wrote: Hum... And after I started using ERXWOLongResponsePage, I still got a deadlock, but this time, it says that it's a EODatabaseContext lock : Thread t...@92163: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame) - com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56, line=1973 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext (com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) ... We don't manual (eg , in code) locking at the EODatabaseContext level. It is possible that an odd exception in EOAccess or below is resulting in this not getting unlocked. Joe's reply below might be what is happening to you. Chuck Another thing to note if this is a long request to a database housed in an ESX vm. We had similar problems with long requests timing out between two systems, with one hosted by esx 4.x. Such long requests were caught by some low level interface muxing issue and my whole EOF stack was frozen when the underlying db connection was lost mid-transaction. I resolved it by moving this application off of a vm. On May 31, 2010, at 5:33 PM, Pascal Robert prob...@macti.ca wrote: Ok, will try with ERXWOLongResponsePage since it look like it's locking and unlocking all ECs in the thread. There's a bunch of stuff wrong here. First, the only actually locked thread is: - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext (com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel ._selectWithFetchSpecificationEditingContext (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=158, line=788 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .selectObjectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=64, line=215 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseContext ._objectsWithFetchSpecificationEditingContext (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=219, line=3205 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=34, line=3346 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=97, line=539 (Interpreted frame) - com .webobjects .eocontrol .EOEditingContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=79, line=4114 (Interpreted frame) - er .extensions .eof .ERXEC .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=72, line=1211 (Interpreted frame) - com .webobjects .eocontrol .EOEditingContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification) @bci=3, line=4500 (Interpreted frame) - com .acaiq .fondation .acaiqCore ._Licence .fetchLicences(com.webobjects.eocontrol.EOEditingContext, com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray) @bci=19, line=1062 (Interpreted frame) - com .acaiq .fondation .acaiqCore ._Membre.licences(com.webobjects.eocontrol.EOQualifier,
Re: Deadlocks in one of our apps
doesn't addCooperatingObjectStore have a race condition in =5.4? i don't recall if wonder fixed that or not ... On Jun 2, 2010, at 8:19 AM, Pascal Robert wrote: ... And going back to the physical server didn't solve anything, I got the same deadlock this morning. Ok, so I will move back the DB to the physical server to see if the problem goes away. On Jun 1, 2010, at 6:34 AM, Pascal Robert wrote: Hum... And after I started using ERXWOLongResponsePage, I still got a deadlock, but this time, it says that it's a EODatabaseContext lock : Thread t...@92163: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame) - com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56, line=1973 (Interpreted frame) - com.webobjects.eocontrol.EOObjectStoreCoordinator.addCooperatingObjectStore(com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel.setCurrentEditingContext(com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) ... We don't manual (eg , in code) locking at the EODatabaseContext level. It is possible that an odd exception in EOAccess or below is resulting in this not getting unlocked. Joe's reply below might be what is happening to you. Chuck Another thing to note if this is a long request to a database housed in an ESX vm. We had similar problems with long requests timing out between two systems, with one hosted by esx 4.x. Such long requests were caught by some low level interface muxing issue and my whole EOF stack was frozen when the underlying db connection was lost mid-transaction. I resolved it by moving this application off of a vm. On May 31, 2010, at 5:33 PM, Pascal Robert prob...@macti.ca wrote: Ok, will try with ERXWOLongResponsePage since it look like it's locking and unlocking all ECs in the thread. There's a bunch of stuff wrong here. First, the only actually locked thread is: - com.webobjects.eocontrol.EOObjectStoreCoordinator.addCooperatingObjectStore(com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel.setCurrentEditingContext(com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel._selectWithFetchSpecificationEditingContext(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=158, line=788 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel.selectObjectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=64, line=215 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext._objectsWithFetchSpecificationEditingContext(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=219, line=3205 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=34, line=3346 (Interpreted frame) - com.webobjects.eocontrol.EOObjectStoreCoordinator.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=97, line=539 (Interpreted frame) - com.webobjects.eocontrol.EOEditingContext.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=79, line=4114 (Interpreted frame) - er.extensions.eof.ERXEC.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=72, line=1211 (Interpreted frame) - com.webobjects.eocontrol.EOEditingContext.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification) @bci=3, line=4500 (Interpreted frame) - com.acaiq.fondation.acaiqCore._Licence.fetchLicences(com.webobjects.eocontrol.EOEditingContext, com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray) @bci=19, line=1062 (Interpreted frame) - com.acaiq.fondation.acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray, boolean) @bci=77, line=8920 (Interpreted frame) - com.acaiq.fondation.acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier, boolean) @bci=4, line=8893 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.licencesParEtats(com.acaiq.fondation.acaiqCore.EtatMembre[]) @bci=100, line=980 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.licencesValides() @bci=11, line=996 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.estCourtier()
Re: Deadlocks in one of our apps
Le 10-06-02 à 10:35, Mike Schrag a écrit : doesn't addCooperatingObjectStore have a race condition in =5.4? i don't recall if wonder fixed that or not ... I guess we could try with WO 5.4.3, but about Wonder, the app is extending from ERXApplication/ERXSession. Wonder download from two months ago. On Jun 2, 2010, at 8:19 AM, Pascal Robert wrote: ... And going back to the physical server didn't solve anything, I got the same deadlock this morning. Ok, so I will move back the DB to the physical server to see if the problem goes away. On Jun 1, 2010, at 6:34 AM, Pascal Robert wrote: Hum... And after I started using ERXWOLongResponsePage, I still got a deadlock, but this time, it says that it's a EODatabaseContext lock : Thread t...@92163: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame) - com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56, line=1973 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext (com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) ... We don't manual (eg , in code) locking at the EODatabaseContext level. It is possible that an odd exception in EOAccess or below is resulting in this not getting unlocked. Joe's reply below might be what is happening to you. Chuck Another thing to note if this is a long request to a database housed in an ESX vm. We had similar problems with long requests timing out between two systems, with one hosted by esx 4.x. Such long requests were caught by some low level interface muxing issue and my whole EOF stack was frozen when the underlying db connection was lost mid-transaction. I resolved it by moving this application off of a vm. On May 31, 2010, at 5:33 PM, Pascal Robert prob...@macti.ca wrote: Ok, will try with ERXWOLongResponsePage since it look like it's locking and unlocking all ECs in the thread. There's a bunch of stuff wrong here. First, the only actually locked thread is: - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext (com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel ._selectWithFetchSpecificationEditingContext (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=158, line=788 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .selectObjectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=64, line=215 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseContext ._objectsWithFetchSpecificationEditingContext (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=219, line=3205 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=34, line=3346 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=97, line=539 (Interpreted frame) - com .webobjects .eocontrol .EOEditingContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=79, line=4114 (Interpreted frame) - er .extensions .eof .ERXEC .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=72, line=1211 (Interpreted frame) - com .webobjects .eocontrol .EOEditingContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification) @bci=3, line=4500 (Interpreted frame) - com .acaiq .fondation .acaiqCore ._Licence .fetchLicences(com.webobjects.eocontrol.EOEditingContext, com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray) @bci=19, line=1062 (Interpreted frame) - com .acaiq .fondation .acaiqCore ._Membre.licences(com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray, boolean) @bci=77, line=8920 (Interpreted frame) - com .acaiq .fondation .acaiqCore ._Membre.licences(com.webobjects.eocontrol.EOQualifier, boolean) @bci=4, line=8893
Re: Deadlocks in one of our apps
Le 10-06-02 à 10:30, Chuck Hill a écrit : That makes your code look guilty then. :-) Funny thing is that he not really my code (eg, I didn't write it) but this is code dated from WO 5.2. It's just that this app never had that much traffic. And I did try stress loading this app with JMeter, but since the URL is changed when the long response page is called (session ID is put back in the URL) and I don't know how to fix this, that part was not stress loaded. Check your long response page implementation again. Are there any exceptions in the log that might be related? Just to explain a bit more : - It's a (non public) online store. When people log in, we create a order in memory and customers add order items to the order. We don't store anything in the DB until the payment is made with PayFlow. When we get the response from PayFlow, we store a copy of the order (and the items) to our Oracle db. After that, we contact our SQL Server db (actually, a accounting system, and we send the data to a stored procedure), and we get the invoice number produced by the accounting system and store it in the order EO in Oracle. So in summary : - People login, we create a order EO, the EO is created in the session's editing context - People add items to the order - They start the order payment steps - Long response page kicks in - We contact PayFlow to make the payment - If the payment is succesful, we store the order in Oracle - We create a new EO, in a different EC, for SQL Server - We update the order EO to store the invoice number in Oracle - We generate (FOXML, generated in a separated JVM) the invoice in PDF - Long response page is done, pageForResult is called Everything is done in session.defaultEditingContext EXCEPT the SQL Server EOs, where we create a new EOObjectStore, create a new EOEditingContext inside the new object store, and EOObjectStore osc = new EOObjectStoreCoordinator(); EOEditingContext ec = new EOEditingContext(osc); ec.lock(); try { CommandesEcom commandeEcom = CommandesEcom.creerCommandesEcom(ec); ... ec.saveChanges(); finally { ec.unlock(); ec.dispose(); osc.dispose(); ec = null; osc = null; } A co-worker suggested that we create a new editing context in the long response page, and call EOUtilities.localInstanceOfObject to have a copy of the order EO in the new EC, but the resulting EO is null, even if the source is not. I'd also reduce the Maximum Adaptor threads (JavaMonitor - Application configuration - Application settings). 6 or 8 is probably more than enough for this app. That will at least reduce the size of the thread dumps. I'd also trim down the listen queue size to 2 or 4, might as well catch this as soon as possible. Chuck On Jun 2, 2010, at 5:19 AM, Pascal Robert wrote: ... And going back to the physical server didn't solve anything, I got the same deadlock this morning. Ok, so I will move back the DB to the physical server to see if the problem goes away. On Jun 1, 2010, at 6:34 AM, Pascal Robert wrote: Hum... And after I started using ERXWOLongResponsePage, I still got a deadlock, but this time, it says that it's a EODatabaseContext lock : Thread t...@92163: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame) - com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56, line=1973 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext (com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) ... We don't manual (eg , in code) locking at the EODatabaseContext level. It is possible that an odd exception in EOAccess or below is resulting in this not getting unlocked. Joe's reply below might be what is happening to you. Chuck Another thing to note if this is a long request to a database housed in an ESX vm. We had similar problems with long requests timing out between two systems, with one hosted by esx 4.x. Such long requests were caught by some low level interface muxing issue and my whole EOF stack was frozen when the underlying db connection was lost mid-transaction. I resolved it by moving this application off of a vm. On May 31, 2010, at 5:33 PM, Pascal Robert prob...@macti.ca wrote: Ok, will try with
Re: Deadlocks in one of our apps
FYI, I did a brief test of using WO 5.4.3 on a mature app a week or two ago, and running a barrage of Selenium tests (where each test generally created a new Session with a specific user on a specific page with a master EO) would deadlock some of the Sessions' defaultERXEC's every time. Switching back to WO 5.3.3 made the problem go away ... and yes, I built Wonder with the 54 patch too so that quickly killed my confidence in WO 5.4.3. Even though I am 99% sure that this is probably compatability between my code, Wonder and WO 5.4.3, I could not find the problem after 2 hours, so I had to park it, revert to WO 5.3.3 and get priority work done. -Kieran On Jun 2, 2010, at 10:52 AM, Pascal Robert wrote: Le 10-06-02 à 10:35, Mike Schrag a écrit : doesn't addCooperatingObjectStore have a race condition in =5.4? i don't recall if wonder fixed that or not ... I guess we could try with WO 5.4.3, but about Wonder, the app is extending from ERXApplication/ERXSession. Wonder download from two months ago. On Jun 2, 2010, at 8:19 AM, Pascal Robert wrote: ... And going back to the physical server didn't solve anything, I got the same deadlock this morning. Ok, so I will move back the DB to the physical server to see if the problem goes away. On Jun 1, 2010, at 6:34 AM, Pascal Robert wrote: Hum... And after I started using ERXWOLongResponsePage, I still got a deadlock, but this time, it says that it's a EODatabaseContext lock : Thread t...@92163: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame) - com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56, line=1973 (Interpreted frame) - com.webobjects.eocontrol.EOObjectStoreCoordinator.addCooperatingObjectStore(com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel.setCurrentEditingContext(com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) ... We don't manual (eg , in code) locking at the EODatabaseContext level. It is possible that an odd exception in EOAccess or below is resulting in this not getting unlocked. Joe's reply below might be what is happening to you. Chuck Another thing to note if this is a long request to a database housed in an ESX vm. We had similar problems with long requests timing out between two systems, with one hosted by esx 4.x. Such long requests were caught by some low level interface muxing issue and my whole EOF stack was frozen when the underlying db connection was lost mid-transaction. I resolved it by moving this application off of a vm. On May 31, 2010, at 5:33 PM, Pascal Robert prob...@macti.ca wrote: Ok, will try with ERXWOLongResponsePage since it look like it's locking and unlocking all ECs in the thread. There's a bunch of stuff wrong here. First, the only actually locked thread is: - com.webobjects.eocontrol.EOObjectStoreCoordinator.addCooperatingObjectStore(com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel.setCurrentEditingContext(com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel._selectWithFetchSpecificationEditingContext(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=158, line=788 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel.selectObjectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=64, line=215 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext._objectsWithFetchSpecificationEditingContext(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=219, line=3205 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=34, line=3346 (Interpreted frame) - com.webobjects.eocontrol.EOObjectStoreCoordinator.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=97, line=539 (Interpreted frame) - com.webobjects.eocontrol.EOEditingContext.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=79, line=4114 (Interpreted frame) - er.extensions.eof.ERXEC.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=72, line=1211 (Interpreted frame) -
Re: Deadlocks in one of our apps
On Jun 2, 2010, at 8:16 AM, Pascal Robert wrote: Le 10-06-02 à 10:30, Chuck Hill a écrit : That makes your code look guilty then. :-) Funny thing is that he not really my code (eg, I didn't write it) but this is code dated from WO 5.2. It's just that this app never had that much traffic. And I did try stress loading this app with JMeter, but since the URL is changed when the long response page is called (session ID is put back in the URL) and I don't know how to fix this, that part was not stress loaded. Check your long response page implementation again. Are there any exceptions in the log that might be related? Just to explain a bit more : - It's a (non public) online store. When people log in, we create a order in memory and customers add order items to the order. We don't store anything in the DB until the payment is made with PayFlow. When we get the response from PayFlow, we store a copy of the order (and the items) to our Oracle db. After that, we contact our SQL Server db (actually, a accounting system, and we send the data to a stored procedure), and we get the invoice number produced by the accounting system and store it in the order EO in Oracle. So in summary : - People login, we create a order EO, the EO is created in the session's editing context - People add items to the order - They start the order payment steps - Long response page kicks in - We contact PayFlow to make the payment - If the payment is succesful, we store the order in Oracle - We create a new EO, in a different EC, for SQL Server - We update the order EO to store the invoice number in Oracle - We generate (FOXML, generated in a separated JVM) the invoice in PDF - Long response page is done, pageForResult is called Everything is done in session.defaultEditingContext EXCEPT the SQL Server EOs, You are not using the session.defaultEditingContext in the long response page, are you? I am pretty sure that is an excellent source of deadlocks. Chuck where we create a new EOObjectStore, create a new EOEditingContext inside the new object store, and EOObjectStore osc = new EOObjectStoreCoordinator(); EOEditingContext ec = new EOEditingContext(osc); ec.lock(); try { CommandesEcom commandeEcom = CommandesEcom.creerCommandesEcom(ec); ... ec.saveChanges(); finally { ec.unlock(); ec.dispose(); osc.dispose(); ec = null; osc = null; } A co-worker suggested that we create a new editing context in the long response page, and call EOUtilities.localInstanceOfObject to have a copy of the order EO in the new EC, but the resulting EO is null, even if the source is not. I'd also reduce the Maximum Adaptor threads (JavaMonitor - Application configuration - Application settings). 6 or 8 is probably more than enough for this app. That will at least reduce the size of the thread dumps. I'd also trim down the listen queue size to 2 or 4, might as well catch this as soon as possible. Chuck On Jun 2, 2010, at 5:19 AM, Pascal Robert wrote: ... And going back to the physical server didn't solve anything, I got the same deadlock this morning. Ok, so I will move back the DB to the physical server to see if the problem goes away. On Jun 1, 2010, at 6:34 AM, Pascal Robert wrote: Hum... And after I started using ERXWOLongResponsePage, I still got a deadlock, but this time, it says that it's a EODatabaseContext lock : Thread t...@92163: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame) - com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56, line=1973 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext (com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) ... We don't manual (eg , in code) locking at the EODatabaseContext level. It is possible that an odd exception in EOAccess or below is resulting in this not getting unlocked. Joe's reply below might be what is happening to you. Chuck Another thing to note if this is a long request to a database housed in an ESX vm. We had similar problems with long requests timing out between two systems, with one hosted by esx 4.x. Such long requests were caught by some low level interface muxing issue and my whole EOF stack was frozen when
Re: Deadlocks in one of our apps
Le 10-06-02 à 11:39, Chuck Hill a écrit : On Jun 2, 2010, at 8:16 AM, Pascal Robert wrote: Le 10-06-02 à 10:30, Chuck Hill a écrit : That makes your code look guilty then. :-) Funny thing is that he not really my code (eg, I didn't write it) but this is code dated from WO 5.2. It's just that this app never had that much traffic. And I did try stress loading this app with JMeter, but since the URL is changed when the long response page is called (session ID is put back in the URL) and I don't know how to fix this, that part was not stress loaded. Check your long response page implementation again. Are there any exceptions in the log that might be related? Just to explain a bit more : - It's a (non public) online store. When people log in, we create a order in memory and customers add order items to the order. We don't store anything in the DB until the payment is made with PayFlow. When we get the response from PayFlow, we store a copy of the order (and the items) to our Oracle db. After that, we contact our SQL Server db (actually, a accounting system, and we send the data to a stored procedure), and we get the invoice number produced by the accounting system and store it in the order EO in Oracle. So in summary : - People login, we create a order EO, the EO is created in the session's editing context - People add items to the order - They start the order payment steps - Long response page kicks in - We contact PayFlow to make the payment - If the payment is succesful, we store the order in Oracle - We create a new EO, in a different EC, for SQL Server - We update the order EO to store the invoice number in Oracle - We generate (FOXML, generated in a separated JVM) the invoice in PDF - Long response page is done, pageForResult is called Everything is done in session.defaultEditingContext EXCEPT the SQL Server EOs, You are not using the session.defaultEditingContext in the long response page, are you? I am pretty sure that is an excellent source of deadlocks. Hum, yes we do use in the long response page... But since localInstanceOfObject won't let me have a copy in a new EC, what are the options except not using the session EC? Chuck where we create a new EOObjectStore, create a new EOEditingContext inside the new object store, and EOObjectStore osc = new EOObjectStoreCoordinator(); EOEditingContext ec = new EOEditingContext(osc); ec.lock(); try { CommandesEcom commandeEcom = CommandesEcom.creerCommandesEcom(ec); ... ec.saveChanges(); finally { ec.unlock(); ec.dispose(); osc.dispose(); ec = null; osc = null; } A co-worker suggested that we create a new editing context in the long response page, and call EOUtilities.localInstanceOfObject to have a copy of the order EO in the new EC, but the resulting EO is null, even if the source is not. I'd also reduce the Maximum Adaptor threads (JavaMonitor - Application configuration - Application settings). 6 or 8 is probably more than enough for this app. That will at least reduce the size of the thread dumps. I'd also trim down the listen queue size to 2 or 4, might as well catch this as soon as possible. Chuck On Jun 2, 2010, at 5:19 AM, Pascal Robert wrote: ... And going back to the physical server didn't solve anything, I got the same deadlock this morning. Ok, so I will move back the DB to the physical server to see if the problem goes away. On Jun 1, 2010, at 6:34 AM, Pascal Robert wrote: Hum... And after I started using ERXWOLongResponsePage, I still got a deadlock, but this time, it says that it's a EODatabaseContext lock : Thread t...@92163: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame) - com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56, line=1973 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext (com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) ... We don't manual (eg , in code) locking at the EODatabaseContext level. It is possible that an odd exception in EOAccess or below is resulting in this not getting unlocked. Joe's reply below might be what is happening to you. Chuck Another thing to note if this is a long request to a database housed in an ESX vm
Re: Deadlocks in one of our apps
On Jun 2, 2010, at 8:51 AM, Pascal Robert wrote: Le 10-06-02 à 11:39, Chuck Hill a écrit : On Jun 2, 2010, at 8:16 AM, Pascal Robert wrote: Le 10-06-02 à 10:30, Chuck Hill a écrit : That makes your code look guilty then. :-) Funny thing is that he not really my code (eg, I didn't write it) but this is code dated from WO 5.2. It's just that this app never had that much traffic. And I did try stress loading this app with JMeter, but since the URL is changed when the long response page is called (session ID is put back in the URL) and I don't know how to fix this, that part was not stress loaded. Check your long response page implementation again. Are there any exceptions in the log that might be related? Just to explain a bit more : - It's a (non public) online store. When people log in, we create a order in memory and customers add order items to the order. We don't store anything in the DB until the payment is made with PayFlow. When we get the response from PayFlow, we store a copy of the order (and the items) to our Oracle db. After that, we contact our SQL Server db (actually, a accounting system, and we send the data to a stored procedure), and we get the invoice number produced by the accounting system and store it in the order EO in Oracle. So in summary : - People login, we create a order EO, the EO is created in the session's editing context - People add items to the order - They start the order payment steps - Long response page kicks in - We contact PayFlow to make the payment - If the payment is succesful, we store the order in Oracle - We create a new EO, in a different EC, for SQL Server - We update the order EO to store the invoice number in Oracle - We generate (FOXML, generated in a separated JVM) the invoice in PDF - Long response page is done, pageForResult is called Everything is done in session.defaultEditingContext EXCEPT the SQL Server EOs, You are not using the session.defaultEditingContext in the long response page, are you? I am pretty sure that is an excellent source of deadlocks. Hum, yes we do use in the long response page... But since localInstanceOfObject won't let me have a copy in a new EC, what are the options except not using the session EC? Not using the session EC would be a good choice. Make a different EC. Pass it into the long response page. Be careful handing off locking. You could also save the order in an unpaid state, then fetch it in the long response page and update it if paid, or delete it if not. Chuck Chuck where we create a new EOObjectStore, create a new EOEditingContext inside the new object store, and EOObjectStore osc = new EOObjectStoreCoordinator(); EOEditingContext ec = new EOEditingContext(osc); ec.lock(); try { CommandesEcom commandeEcom = CommandesEcom.creerCommandesEcom(ec); ... ec.saveChanges(); finally { ec.unlock(); ec.dispose(); osc.dispose(); ec = null; osc = null; } A co-worker suggested that we create a new editing context in the long response page, and call EOUtilities.localInstanceOfObject to have a copy of the order EO in the new EC, but the resulting EO is null, even if the source is not. I'd also reduce the Maximum Adaptor threads (JavaMonitor - Application configuration - Application settings). 6 or 8 is probably more than enough for this app. That will at least reduce the size of the thread dumps. I'd also trim down the listen queue size to 2 or 4, might as well catch this as soon as possible. Chuck On Jun 2, 2010, at 5:19 AM, Pascal Robert wrote: ... And going back to the physical server didn't solve anything, I got the same deadlock this morning. Ok, so I will move back the DB to the physical server to see if the problem goes away. On Jun 1, 2010, at 6:34 AM, Pascal Robert wrote: Hum... And after I started using ERXWOLongResponsePage, I still got a deadlock, but this time, it says that it's a EODatabaseContext lock : Thread t...@92163: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame) - com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56, line=1973 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext (com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame
Re: Deadlocks in one of our apps
On 2010-06-02, at 11:51 AM, Pascal Robert wrote: Le 10-06-02 à 11:39, Chuck Hill a écrit : On Jun 2, 2010, at 8:16 AM, Pascal Robert wrote: Le 10-06-02 à 10:30, Chuck Hill a écrit : That makes your code look guilty then. :-) Funny thing is that he not really my code (eg, I didn't write it) but this is code dated from WO 5.2. It's just that this app never had that much traffic. And I did try stress loading this app with JMeter, but since the URL is changed when the long response page is called (session ID is put back in the URL) and I don't know how to fix this, that part was not stress loaded. Check your long response page implementation again. Are there any exceptions in the log that might be related? Just to explain a bit more : - It's a (non public) online store. When people log in, we create a order in memory and customers add order items to the order. We don't store anything in the DB until the payment is made with PayFlow. When we get the response from PayFlow, we store a copy of the order (and the items) to our Oracle db. After that, we contact our SQL Server db (actually, a accounting system, and we send the data to a stored procedure), and we get the invoice number produced by the accounting system and store it in the order EO in Oracle. So in summary : - People login, we create a order EO, the EO is created in the session's editing context - People add items to the order - They start the order payment steps - Long response page kicks in - We contact PayFlow to make the payment - If the payment is succesful, we store the order in Oracle - We create a new EO, in a different EC, for SQL Server - We update the order EO to store the invoice number in Oracle - We generate (FOXML, generated in a separated JVM) the invoice in PDF - Long response page is done, pageForResult is called Everything is done in session.defaultEditingContext EXCEPT the SQL Server EOs, You are not using the session.defaultEditingContext in the long response page, are you? I am pretty sure that is an excellent source of deadlocks. Hum, yes we do use in the long response page... But since localInstanceOfObject won't let me have a copy in a new EC, what are the options except not using the session EC? Not sure about the vagaries of using the long response page, but you could create a new EC when you create the order - which is what I would do regardless of what other steps you needed to take. Alternately, you could clone the object graph into a new EC for the long response page. Chuck where we create a new EOObjectStore, create a new EOEditingContext inside the new object store, and EOObjectStore osc = new EOObjectStoreCoordinator(); EOEditingContext ec = new EOEditingContext(osc); ec.lock(); try { CommandesEcom commandeEcom = CommandesEcom.creerCommandesEcom(ec); ... ec.saveChanges(); finally { ec.unlock(); ec.dispose(); osc.dispose(); ec = null; osc = null; } A co-worker suggested that we create a new editing context in the long response page, and call EOUtilities.localInstanceOfObject to have a copy of the order EO in the new EC, but the resulting EO is null, even if the source is not. I'd also reduce the Maximum Adaptor threads (JavaMonitor - Application configuration - Application settings). 6 or 8 is probably more than enough for this app. That will at least reduce the size of the thread dumps. I'd also trim down the listen queue size to 2 or 4, might as well catch this as soon as possible. Chuck On Jun 2, 2010, at 5:19 AM, Pascal Robert wrote: ... And going back to the physical server didn't solve anything, I got the same deadlock this morning. Ok, so I will move back the DB to the physical server to see if the problem goes away. On Jun 1, 2010, at 6:34 AM, Pascal Robert wrote: Hum... And after I started using ERXWOLongResponsePage, I still got a deadlock, but this time, it says that it's a EODatabaseContext lock : Thread t...@92163: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame) - com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56, line=1973 (Interpreted frame) - com.webobjects.eocontrol.EOObjectStoreCoordinator.addCooperatingObjectStore(com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel.setCurrentEditingContext(com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) ... We don't manual
Re: Deadlocks in one of our apps
On 2010-06-02, at 11:54 AM, Chuck Hill wrote: On Jun 2, 2010, at 8:51 AM, Pascal Robert wrote: Le 10-06-02 à 11:39, Chuck Hill a écrit : On Jun 2, 2010, at 8:16 AM, Pascal Robert wrote: Le 10-06-02 à 10:30, Chuck Hill a écrit : That makes your code look guilty then. :-) Funny thing is that he not really my code (eg, I didn't write it) but this is code dated from WO 5.2. It's just that this app never had that much traffic. And I did try stress loading this app with JMeter, but since the URL is changed when the long response page is called (session ID is put back in the URL) and I don't know how to fix this, that part was not stress loaded. Check your long response page implementation again. Are there any exceptions in the log that might be related? Just to explain a bit more : - It's a (non public) online store. When people log in, we create a order in memory and customers add order items to the order. We don't store anything in the DB until the payment is made with PayFlow. When we get the response from PayFlow, we store a copy of the order (and the items) to our Oracle db. After that, we contact our SQL Server db (actually, a accounting system, and we send the data to a stored procedure), and we get the invoice number produced by the accounting system and store it in the order EO in Oracle. So in summary : - People login, we create a order EO, the EO is created in the session's editing context - People add items to the order - They start the order payment steps - Long response page kicks in - We contact PayFlow to make the payment - If the payment is succesful, we store the order in Oracle - We create a new EO, in a different EC, for SQL Server - We update the order EO to store the invoice number in Oracle - We generate (FOXML, generated in a separated JVM) the invoice in PDF - Long response page is done, pageForResult is called Everything is done in session.defaultEditingContext EXCEPT the SQL Server EOs, You are not using the session.defaultEditingContext in the long response page, are you? I am pretty sure that is an excellent source of deadlocks. Hum, yes we do use in the long response page... But since localInstanceOfObject won't let me have a copy in a new EC, what are the options except not using the session EC? Not using the session EC would be a good choice. Make a different EC. Pass it into the long response page. Be careful handing off locking. You could also save the order in an unpaid state, then fetch it in the long response page and update it if paid, or delete it if not. Ooh, yeah, you could do that too. Chuck Chuck where we create a new EOObjectStore, create a new EOEditingContext inside the new object store, and EOObjectStore osc = new EOObjectStoreCoordinator(); EOEditingContext ec = new EOEditingContext(osc); ec.lock(); try { CommandesEcom commandeEcom = CommandesEcom.creerCommandesEcom(ec); ... ec.saveChanges(); finally { ec.unlock(); ec.dispose(); osc.dispose(); ec = null; osc = null; } A co-worker suggested that we create a new editing context in the long response page, and call EOUtilities.localInstanceOfObject to have a copy of the order EO in the new EC, but the resulting EO is null, even if the source is not. I'd also reduce the Maximum Adaptor threads (JavaMonitor - Application configuration - Application settings). 6 or 8 is probably more than enough for this app. That will at least reduce the size of the thread dumps. I'd also trim down the listen queue size to 2 or 4, might as well catch this as soon as possible. Chuck On Jun 2, 2010, at 5:19 AM, Pascal Robert wrote: ... And going back to the physical server didn't solve anything, I got the same deadlock this morning. Ok, so I will move back the DB to the physical server to see if the problem goes away. On Jun 1, 2010, at 6:34 AM, Pascal Robert wrote: Hum... And after I started using ERXWOLongResponsePage, I still got a deadlock, but this time, it says that it's a EODatabaseContext lock : Thread t...@92163: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame) - com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56, line=1973 (Interpreted frame) - com.webobjects.eocontrol.EOObjectStoreCoordinator.addCooperatingObjectStore(com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel.setCurrentEditingContext
Re: Deadlocks in one of our apps
Hum... And after I started using ERXWOLongResponsePage, I still got a deadlock, but this time, it says that it's a EODatabaseContext lock : Thread t...@92163: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame) - com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56, line=1973 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext(com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) ... We don't manual (eg , in code) locking at the EODatabaseContext level. Another thing to note if this is a long request to a database housed in an ESX vm. We had similar problems with long requests timing out between two systems, with one hosted by esx 4.x. Such long requests were caught by some low level interface muxing issue and my whole EOF stack was frozen when the underlying db connection was lost mid- transaction. I resolved it by moving this application off of a vm. On May 31, 2010, at 5:33 PM, Pascal Robert prob...@macti.ca wrote: Ok, will try with ERXWOLongResponsePage since it look like it's locking and unlocking all ECs in the thread. There's a bunch of stuff wrong here. First, the only actually locked thread is: - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext (com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel ._selectWithFetchSpecificationEditingContext (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=158, line=788 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .selectObjectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=64, line=215 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseContext ._objectsWithFetchSpecificationEditingContext (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=219, line=3205 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=34, line=3346 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=97, line=539 (Interpreted frame) - com .webobjects .eocontrol .EOEditingContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=79, line=4114 (Interpreted frame) - er .extensions .eof .ERXEC .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=72, line=1211 (Interpreted frame) - com .webobjects .eocontrol .EOEditingContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification) @bci=3, line=4500 (Interpreted frame) - com .acaiq .fondation .acaiqCore ._Licence.fetchLicences(com.webobjects.eocontrol.EOEditingContext, com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray) @bci=19, line=1062 (Interpreted frame) - com .acaiq .fondation .acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray, boolean) @bci=77, line=8920 (Interpreted frame) - com .acaiq .fondation .acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier, boolean) @bci=4, line=8893 (Interpreted frame) - com .acaiq .fondation .acaiqCore .Membre .licencesParEtats(com.acaiq.fondation.acaiqCore.EtatMembre[]) @bci=100, line=980 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.licencesValides() @bci=11, line=996 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.estCourtier() @bci=5, line=1035 (Interpreted frame) - sun.reflect.GeneratedMethodAccessor87.invoke(java.lang.Object, java.lang.Object[]) @bci=40 (Interpreted frame) Which reminds me of an unlocked EC/OSC. Second: java.lang.IllegalArgumentException: Attribute noCommandeOracle can't receive a null parameter : at com .acaiq .fondation .depot .lbaArticle ._CommandesEcom.setNoCommandeOracle(_CommandesEcom.java:419) This is a *template* that throws on null?? You sure
Re: Deadlocks in one of our apps
Um. Just how did you switch to ERXWOLongResponsePage? If you overrode run() than nothing's gonna happen. Cheers, Anjo Am 01.06.2010 um 15:34 schrieb Pascal Robert: ERXWOLongResponsePage ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Deadlocks in one of our apps
We have a component, OSLongResponseComponent, that was extending from WOLongResponsePage, and now it's extending from ERXWOLongResponsePage. The only thing we are overriding is valueForKeyPath and appendToResponse, run() is not overriden. Um. Just how did you switch to ERXWOLongResponsePage? If you overrode run() than nothing's gonna happen. Cheers, Anjo Am 01.06.2010 um 15:34 schrieb Pascal Robert: ERXWOLongResponsePage Pascal Robert prob...@macti.ca AIM: MacTICanada Twitter : MacTICanada LinkedIn : http://www.linkedin.com/in/macti WO Community profile : http://wocommunity.org/page/member?name=probert ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Deadlocks in one of our apps
On Jun 1, 2010, at 6:34 AM, Pascal Robert wrote: Hum... And after I started using ERXWOLongResponsePage, I still got a deadlock, but this time, it says that it's a EODatabaseContext lock : Thread t...@92163: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame) - com.webobjects.foundation.NSRecursiveLock.lock() @bci=54, line=72 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.lock() @bci=56, line=1973 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext(com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) ... We don't manual (eg , in code) locking at the EODatabaseContext level. It is possible that an odd exception in EOAccess or below is resulting in this not getting unlocked. Joe's reply below might be what is happening to you. Chuck Another thing to note if this is a long request to a database housed in an ESX vm. We had similar problems with long requests timing out between two systems, with one hosted by esx 4.x. Such long requests were caught by some low level interface muxing issue and my whole EOF stack was frozen when the underlying db connection was lost mid- transaction. I resolved it by moving this application off of a vm. On May 31, 2010, at 5:33 PM, Pascal Robert prob...@macti.ca wrote: Ok, will try with ERXWOLongResponsePage since it look like it's locking and unlocking all ECs in the thread. There's a bunch of stuff wrong here. First, the only actually locked thread is: - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext (com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel ._selectWithFetchSpecificationEditingContext (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=158, line=788 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .selectObjectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=64, line=215 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseContext ._objectsWithFetchSpecificationEditingContext (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=219, line=3205 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=34, line=3346 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=97, line=539 (Interpreted frame) - com .webobjects .eocontrol .EOEditingContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=79, line=4114 (Interpreted frame) - er .extensions .eof .ERXEC .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=72, line=1211 (Interpreted frame) - com .webobjects .eocontrol .EOEditingContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification) @bci=3, line=4500 (Interpreted frame) - com .acaiq .fondation .acaiqCore ._Licence .fetchLicences(com.webobjects.eocontrol.EOEditingContext, com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray) @bci=19, line=1062 (Interpreted frame) - com .acaiq .fondation .acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray, boolean) @bci=77, line=8920 (Interpreted frame) - com .acaiq .fondation .acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier, boolean) @bci=4, line=8893 (Interpreted frame) - com .acaiq .fondation .acaiqCore .Membre .licencesParEtats(com.acaiq.fondation.acaiqCore.EtatMembre[]) @bci=100, line=980 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.licencesValides() @bci=11, line=996 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.estCourtier() @bci=5, line=1035 (Interpreted frame) - sun.reflect.GeneratedMethodAccessor87.invoke(java.lang.Object, java.lang.Object[]) @bci=40 (Interpreted frame) Which reminds me of an unlocked EC/OSC. Second: java.lang.IllegalArgumentException: Attribute
Re: Deadlocks in one of our apps
There's a bunch of stuff wrong here. First, the only actually locked thread is: - com.webobjects.eocontrol.EOObjectStoreCoordinator.addCooperatingObjectStore(com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel.setCurrentEditingContext(com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel._selectWithFetchSpecificationEditingContext(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=158, line=788 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel.selectObjectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=64, line=215 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext._objectsWithFetchSpecificationEditingContext(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=219, line=3205 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=34, line=3346 (Interpreted frame) - com.webobjects.eocontrol.EOObjectStoreCoordinator.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=97, line=539 (Interpreted frame) - com.webobjects.eocontrol.EOEditingContext.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=79, line=4114 (Interpreted frame) - er.extensions.eof.ERXEC.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=72, line=1211 (Interpreted frame) - com.webobjects.eocontrol.EOEditingContext.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification) @bci=3, line=4500 (Interpreted frame) - com.acaiq.fondation.acaiqCore._Licence.fetchLicences(com.webobjects.eocontrol.EOEditingContext, com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray) @bci=19, line=1062 (Interpreted frame) - com.acaiq.fondation.acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray, boolean) @bci=77, line=8920 (Interpreted frame) - com.acaiq.fondation.acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier, boolean) @bci=4, line=8893 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.licencesParEtats(com.acaiq.fondation.acaiqCore.EtatMembre[]) @bci=100, line=980 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.licencesValides() @bci=11, line=996 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.estCourtier() @bci=5, line=1035 (Interpreted frame) - sun.reflect.GeneratedMethodAccessor87.invoke(java.lang.Object, java.lang.Object[]) @bci=40 (Interpreted frame) Which reminds me of an unlocked EC/OSC. Second: java.lang.IllegalArgumentException: Attribute noCommandeOracle can't receive a null parameter : at com.acaiq.fondation.depot.lbaArticle._CommandesEcom.setNoCommandeOracle(_CommandesEcom.java:419) This is a *template* that throws on null?? You sure that's such a bright idea? Isn't this what validation is for? And third: at com.acaiq.depot.component.TransactionAchat.performAction(TransactionAchat.java:63) at com.webobjects.woextensions.WOLongResponsePage.run(WOLongResponsePage.java:119) As you're throwing from inside a normal com.webobjects.woextensions.WOLongResponsePage, I seriously hope you're doing your part of try{} finally{} and EC unlocking. Cheers, Anjo Am 31.05.2010 um 20:02 schrieb Pascal Robert: One of our apps have deadlocked 5 times over 3 days, strangely enough it started when we moved our Oracle Database 10gR2 DB to our VMWare ESX 4.0 cluster. e didn't re-install Oracle, I simply did a P2V (Physical to VM) conversion, so it's the exact same version of Oracle DB as before. What's happenning is that we store some information on our Oracle database, save it, and we built a copy of some of the data to a new EO (different entity) in a SQL Server 2005 db so the accounting system take care of billing. The exception that cause the deadlock (or at least the last thing written to the log before the deadlock) : java.lang.IllegalArgumentException: Attribute noCommandeOracle can't receive a null parameter : at com.acaiq.fondation.depot.lbaArticle._CommandesEcom.setNoCommandeOracle(_CommandesEcom.java:419) at com.acaiq.fondation.depot.Caissier.copiePourLBA(Caissier.java:267) at com.acaiq.fondation.depot.Caissier.paye(Caissier.java:137) at com.acaiq.depot.component.TransactionAchat.performAction(TransactionAchat.java:63) at com.webobjects.woextensions.WOLongResponsePage.run(WOLongResponsePage.java:119)
Re: Deadlocks in one of our apps
Ok, will try with ERXWOLongResponsePage since it look like it's locking and unlocking all ECs in the thread. There's a bunch of stuff wrong here. First, the only actually locked thread is: - com .webobjects .eocontrol .EOObjectStoreCoordinator .addCooperatingObjectStore (com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .setCurrentEditingContext(com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel ._selectWithFetchSpecificationEditingContext (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=158, line=788 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseChannel .selectObjectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=64, line=215 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseContext ._objectsWithFetchSpecificationEditingContext (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=219, line=3205 (Interpreted frame) - com .webobjects .eoaccess .EODatabaseContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=34, line=3346 (Interpreted frame) - com .webobjects .eocontrol .EOObjectStoreCoordinator .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=97, line=539 (Interpreted frame) - com .webobjects .eocontrol .EOEditingContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=79, line=4114 (Interpreted frame) - er .extensions .eof .ERXEC .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=72, line=1211 (Interpreted frame) - com .webobjects .eocontrol .EOEditingContext .objectsWithFetchSpecification (com.webobjects.eocontrol.EOFetchSpecification) @bci=3, line=4500 (Interpreted frame) - com .acaiq .fondation .acaiqCore ._Licence.fetchLicences(com.webobjects.eocontrol.EOEditingContext, com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray) @bci=19, line=1062 (Interpreted frame) - com .acaiq .fondation .acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray, boolean) @bci=77, line=8920 (Interpreted frame) - com .acaiq .fondation .acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier, boolean) @bci=4, line=8893 (Interpreted frame) - com .acaiq .fondation .acaiqCore .Membre.licencesParEtats(com.acaiq.fondation.acaiqCore.EtatMembre[]) @bci=100, line=980 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.licencesValides() @bci=11, line=996 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.estCourtier() @bci=5, line=1035 (Interpreted frame) - sun.reflect.GeneratedMethodAccessor87.invoke(java.lang.Object, java.lang.Object[]) @bci=40 (Interpreted frame) Which reminds me of an unlocked EC/OSC. Second: java.lang.IllegalArgumentException: Attribute noCommandeOracle can't receive a null parameter : at com .acaiq .fondation .depot .lbaArticle._CommandesEcom.setNoCommandeOracle(_CommandesEcom.java: 419) This is a *template* that throws on null?? You sure that's such a bright idea? Isn't this what validation is for? And third: at com .acaiq .depot .component.TransactionAchat.performAction(TransactionAchat.java:63) at com .webobjects .woextensions.WOLongResponsePage.run(WOLongResponsePage.java:119) As you're throwing from inside a normal com.webobjects.woextensions.WOLongResponsePage, I seriously hope you're doing your part of try{} finally{} and EC unlocking. Cheers, Anjo Am 31.05.2010 um 20:02 schrieb Pascal Robert: One of our apps have deadlocked 5 times over 3 days, strangely enough it started when we moved our Oracle Database 10gR2 DB to our VMWare ESX 4.0 cluster. e didn't re-install Oracle, I simply did a P2V (Physical to VM) conversion, so it's the exact same version of Oracle DB as before. What's happenning is that we store some information on our Oracle database, save it, and we built a copy of some of the data to a new EO (different entity) in a SQL Server 2005 db so the accounting system take care of billing. The exception that cause the deadlock (or at least the last thing written to the log before the deadlock) : java.lang.IllegalArgumentException: Attribute noCommandeOracle can't receive a null parameter : at com .acaiq .fondation .depot .lbaArticle._CommandesEcom.setNoCommandeOracle(_CommandesEcom.java: 419) at
Re: Deadlocks in one of our apps
Another thing to note if this is a long request to a database housed in an ESX vm. We had similar problems with long requests timing out between two systems, with one hosted by esx 4.x. Such long requests were caught by some low level interface muxing issue and my whole EOF stack was frozen when the underlying db connection was lost mid-transaction. I resolved it by moving this application off of a vm. On May 31, 2010, at 5:33 PM, Pascal Robert prob...@macti.ca wrote: Ok, will try with ERXWOLongResponsePage since it look like it's locking and unlocking all ECs in the thread. There's a bunch of stuff wrong here. First, the only actually locked thread is: - com.webobjects.eocontrol.EOObjectStoreCoordinator.addCooperatingObjectStore(com.webobjects.eocontrol.EOCooperatingObjectStore) @bci=5, line=130 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel.setCurrentEditingContext(com.webobjects.eocontrol.EOEditingContext) @bci=34, line=166 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel._selectWithFetchSpecificationEditingContext(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=158, line=788 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseChannel.selectObjectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=64, line=215 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext._objectsWithFetchSpecificationEditingContext(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=219, line=3205 (Interpreted frame) - com.webobjects.eoaccess.EODatabaseContext.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=34, line=3346 (Interpreted frame) - com.webobjects.eocontrol.EOObjectStoreCoordinator.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=97, line=539 (Interpreted frame) - com.webobjects.eocontrol.EOEditingContext.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=79, line=4114 (Interpreted frame) - er.extensions.eof.ERXEC.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification, com.webobjects.eocontrol.EOEditingContext) @bci=72, line=1211 (Interpreted frame) - com.webobjects.eocontrol.EOEditingContext.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification) @bci=3, line=4500 (Interpreted frame) - com.acaiq.fondation.acaiqCore._Licence.fetchLicences(com.webobjects.eocontrol.EOEditingContext, com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray) @bci=19, line=1062 (Interpreted frame) - com.acaiq.fondation.acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray, boolean) @bci=77, line=8920 (Interpreted frame) - com.acaiq.fondation.acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier, boolean) @bci=4, line=8893 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.licencesParEtats(com.acaiq.fondation.acaiqCore.EtatMembre[]) @bci=100, line=980 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.licencesValides() @bci=11, line=996 (Interpreted frame) - com.acaiq.fondation.acaiqCore.Membre.estCourtier() @bci=5, line=1035 (Interpreted frame) - sun.reflect.GeneratedMethodAccessor87.invoke(java.lang.Object, java.lang.Object[]) @bci=40 (Interpreted frame) Which reminds me of an unlocked EC/OSC. Second: java.lang.IllegalArgumentException: Attribute noCommandeOracle can't receive a null parameter : at com.acaiq.fondation.depot.lbaArticle._CommandesEcom.setNoCommandeOracle(_CommandesEcom.java:419) This is a *template* that throws on null?? You sure that's such a bright idea? Isn't this what validation is for? And third: at com.acaiq.depot.component.TransactionAchat.performAction(TransactionAchat.java:63) at com.webobjects.woextensions.WOLongResponsePage.run(WOLongResponsePage.java:119) As you're throwing from inside a normal com.webobjects.woextensions.WOLongResponsePage, I seriously hope you're doing your part of try{} finally{} and EC unlocking. Cheers, Anjo Am 31.05.2010 um 20:02 schrieb Pascal Robert: One of our apps have deadlocked 5 times over 3 days, strangely enough it started when we moved our Oracle Database 10gR2 DB to our VMWare ESX 4.0 cluster. e didn't re-install Oracle, I simply did a P2V (Physical to VM) conversion, so it's the exact same version of Oracle DB as before. What's happenning is that we store some information on our Oracle database, save it, and we built a copy of some of the data to a new EO (different entity) in a SQL Server 2005 db so the accounting system take care of billing.
Re: Deadlocks
Le 07-09-05 à 18:14, Guido Neitzer a écrit : On 05.09.2007, at 16:01, Simon McLean wrote: We're experiencing some pretty bad deadlock issues at the moment and I'm pretty convinced it's down to EC lock abuse. Get a stacktrace of your running application to verify that: http://tinyurl.com/3bpkkv BTW, no need to use tinyurl.com for links to the wiki, when you go on the Info tab for the page in the wiki, Confluence will display a shorter link, in that case : http://wiki.objectstyle.org/confluence/x/sAED ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: Deadlocks
Hi Guido - Many thanks for that URL - and thanks to everyone else that posted ideas. We had to let the app fall over quite a few times and grab a half dozen stack traces before we figured it out, but the app is humming along nicely once again now. We ended up finding 2 core issues: 1) the occasional use of new EOEditingContext() instead of ERXEC.newEditingContext() 2) use of the session's editing context inside a thread Both were well buried sins that once the app scaled up became rather ugly :-( Thanks again, Simon On 5 Sep 2007, at 23:14, Guido Neitzer wrote: On 05.09.2007, at 16:01, Simon McLean wrote: We're experiencing some pretty bad deadlock issues at the moment and I'm pretty convinced it's down to EC lock abuse. Get a stacktrace of your running application to verify that: http://tinyurl.com/3bpkkv ... we should never have to manually lock or unlock an EC ? That is true, yes - but you still might run into problems if you do bad things. Or put another way, when using these rules is there any situation that we would have to call ec.lock() or ec.unlock() in our code ? I normally lock and unlock manually on long response pages / tasks, as the unlocking of editing contexts relies on the request response loop. If you see problems in the stacktrace, when the session gets checked out from the session store, make sure you NEVER EVER touch something from the session's default editing context inside your performAction method on a long response page. This will autolock your session's default editing context, it will not get unlocked, because you are outside of the rr loop and the next checkout for that session will fail. The other thing I saw with deadlocks: if you run out of space on your server, log4j might deadlock. cug ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: Deadlocks
Pascal Robert [EMAIL PROTECTED] wrote: BTW, no need to use tinyurl.com for links to the wiki, when you go on the Info tab for the page in the wiki, Confluence will display a shorter link, in that case : http://wiki.objectstyle.org/confluence/x/sAED Ah, thanks for the hint. cug ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: Deadlocks
Am 06.09.2007 um 00:39 schrieb Mike Schrag: If you see problems in the stacktrace, when the session gets checked out from the session store, make sure you NEVER EVER touch something from the session's default editing context inside your performAction method on a long response page. This will autolock your session's default editing context, it will not get unlocked, because you are outside of the rr loop and the next checkout for that session will fail. This particular deadlock should be fixed as of a couple weeks ago after we talked, btw ... I think I rolled autolocking into long response, also. How is that supposed to work? The actual processing is done in the extra thread, and if it has locked the supplied EC, any page coming it with this session will not run - so if you run with concurrent request handling off, the app request handling lock is never returned until the task finished and your app is dead in the meantime. Otherwise only the long response page is frozen... Apart from that, when you use D2W it is ridiculously easy to deadlock your app when you don't use Wonder because of the pattern of locking/ unlocking ECs in awake() and sleep() which doesn't really work on reloads. Cheers, Anjo ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: Deadlocks
Am 06.09.2007 um 00:39 schrieb Mike Schrag: If you see problems in the stacktrace, when the session gets checked out from the session store, make sure you NEVER EVER touch something from the session's default editing context inside your performAction method on a long response page. This will autolock your session's default editing context, it will not get unlocked, because you are outside of the rr loop and the next checkout for that session will fail. This particular deadlock should be fixed as of a couple weeks ago after we talked, btw ... I think I rolled autolocking into long response, also. How is that supposed to work? The actual processing is done in the extra thread, and if it has locked the supplied EC, any page coming it with this session will not run - so if you run with concurrent request handling off, the app request handling lock is never returned until the task finished and your app is dead in the meantime. Otherwise only the long response page is frozen... There was a bug with coalesced autolocks where it would coalesce outside of the RR-loop, which meant that it would leave a lock open on purpose. This would explode like you are describing in the long response thread because it would keep the lock on. There is a fix for this whereby it reverts back to just plain-jane autolocking on each call vs locking for the entire thread. This is still wrong because you usually want a longer lock than just on each call. The proper way to do it is to local instance the object into another EC and lock THAT for the long response. This still requires manually locking/unlocking to get a lock span across multiple calls, but even if you don't, Wonder will at least autolock each individual call for you and prevent MOST terrible things. Then there's the variation of long response (I'd have to look up exactly what this is called -- i think there is the normal long response and then another one that does this style) that will act like the long response thread is in a RR loop and support coalesced locking also. I can't remember which of this stuff rolled into the main long response class offhand, but this also supports cleaning up left-open-locks just like in a RR loop. There's also a Thread/ Runnable variant that does this same thing, so you can use the ERX version of the runnable and it will always clean up for you ... Just some safety nets. ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: Deadlocks
On 06.09.2007, at 09:04, Mike Schrag wrote: Then there's the variation of long response (I'd have to look up exactly what this is called -- i think there is the normal long response and then another one that does this style) that will act like the long response thread is in a RR loop and support coalesced locking also. I can't remember which of this stuff rolled into the main long response class offhand, but this also supports cleaning up left-open-locks just like in a RR loop. There's also a Thread/ Runnable variant that does this same thing, so you can use the ERX version of the runnable and it will always clean up for you ... Just some safety nets. ERX**RUnnable, which class is that. It's time for a Wiki Page on Multithreaded EOF. I'm working through this now, when i get it sussed i'll try to write it up. ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: Deadlocks
On Sep 5, 2007, at 6:47 PM, Matthew W. Taylor wrote: If virtue can't be mine alone at least my faults can be my own. - Piet Hein :-) Deadlocks in WebObjects have always been my own fault. I'm thankful to Andrew Lindesay's HOWTO. It's a super easy process -- and saves my bacon from my own carelessness. From: Steven Mark McCraw [EMAIL PROTECTED] Date: Wed, 5 Sep 2007 20:22:42 -0400 To: Chuck Hill [EMAIL PROTECTED] Cc: WebObjects (Group) webobjects-dev@lists.apple.com Subject: Re: Deadlocks Is it easy? Or is that just the nature of the concurrency beast? I consider it easy because I've had to deal with it so many times. I think WebObjects seems worse than a normal multithreaded app because things you do that are totally unrelated to concurrency from the programmer's point of view can cause you deadlock. Deadlocks sure are frustrating. In my opinion WO locks are only more noticeable than other web app environments, because, in classical WO programming, you've only got one channel to the DB per application. Lock that up -- and the rest of the app is toast. That is usually more of a scarce resource contention problem than true deadlocking - unless you take out a lock on say a DB context and don't unlock it. But that is a good point and something that people stumble over. I don't know if there is a practical fix for this. Making EOF truly multi-threaded would be a duanting task. Other programming environments might suffer less by having more avenues to the data. You might be equally guilty of poor practice in those environments but possibly not even notice it. Instead you reboot your app when it slows to a crawl, or runs out of descriptors, blaming it on Java. Grin. It's nice that so much of the concurrency-handling misery you would ordinarily have to think about with multithreaded applications is hidden from you, but when it goes wrong, it is the height of confusion. Well when we see that copy of open-source WO, Real Soon Now (tm). we can all say what confusion? I trust you have not been holding your breath. ;-) Chuck -- Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/products/practical_webobjects ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Deadlocks
Hi - We're experiencing some pretty bad deadlock issues at the moment and I'm pretty convinced it's down to EC lock abuse. Can anyone confirm that if we follow these rules: ** If you do want all that Wonder magic and love: 1) extend ERXApplication 2) extend ERXSession 3) use ERXEC.newEditingContext() instead of new EOEditingContext() 4) Add to Properties: er.extensions.ERXApplication.useEditingContextUnlocker=true er.extensions.ERXEC.defaultAutomaticLockUnlock=true er.extensions.ERXEC.useSharedEditingContext=false er.extensions.ERXEC.defaultCoalesceAutoLocks=true ... we should never have to manually lock or unlock an EC ? Or put another way, when using these rules is there any situation that we would have to call ec.lock() or ec.unlock() in our code ? Thanks, Simon ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: Deadlocks
On 05.09.2007, at 16:01, Simon McLean wrote: We're experiencing some pretty bad deadlock issues at the moment and I'm pretty convinced it's down to EC lock abuse. Get a stacktrace of your running application to verify that: http://tinyurl.com/3bpkkv ... we should never have to manually lock or unlock an EC ? That is true, yes - but you still might run into problems if you do bad things. Or put another way, when using these rules is there any situation that we would have to call ec.lock() or ec.unlock() in our code ? I normally lock and unlock manually on long response pages / tasks, as the unlocking of editing contexts relies on the request response loop. If you see problems in the stacktrace, when the session gets checked out from the session store, make sure you NEVER EVER touch something from the session's default editing context inside your performAction method on a long response page. This will autolock your session's default editing context, it will not get unlocked, because you are outside of the rr loop and the next checkout for that session will fail. The other thing I saw with deadlocks: if you run out of space on your server, log4j might deadlock. cug ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: Deadlocks
If you see problems in the stacktrace, when the session gets checked out from the session store, make sure you NEVER EVER touch something from the session's default editing context inside your performAction method on a long response page. This will autolock your session's default editing context, it will not get unlocked, because you are outside of the rr loop and the next checkout for that session will fail. This particular deadlock should be fixed as of a couple weeks ago after we talked, btw ... I think I rolled autolocking into long response, also. But generally speaking, if you're in another thread, I would lock manually to be safe. Also, if you ever access an EODatabaseContext directly, you MUST lock that yourself. It will not autolock, and that will cause terrible problems. ms ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: Deadlocks
On Sep 5, 2007, at 3:01 PM, Simon McLean wrote: Hi - We're experiencing some pretty bad deadlock issues at the moment and I'm pretty convinced it's down to EC lock abuse. What makes you think that? As Guido indicated, if you don't have stack traces you are just guessing. Guessing is not an effective form of debugging. :-) Can anyone confirm that if we follow these rules: ** If you do want all that Wonder magic and love: 1) extend ERXApplication 2) extend ERXSession 3) use ERXEC.newEditingContext() instead of new EOEditingContext() 4) Add to Properties: er.extensions.ERXApplication.useEditingContextUnlocker=true er.extensions.ERXEC.defaultAutomaticLockUnlock=true er.extensions.ERXEC.useSharedEditingContext=false er.extensions.ERXEC.defaultCoalesceAutoLocks=true ... we should never have to manually lock or unlock an EC ? Or put another way, when using these rules is there any situation that we would have to call ec.lock() or ec.unlock() in our code ? You are probably safe for general EC usage there, but you can still do other bad things and end up deadlocked. Chuck -- Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/products/practical_webobjects ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: Deadlocks
You are probably safe for general EC usage there, but you can still do other bad things and end up deadlocked. There are many great things in the win column for WebObjects, but I believe one of the definite negatives of the technology is how ridiculously easy it is to deadlock a webobjects application. You have to take the bad with the good. This is miserably scary and nasty until you learn to dump the thread stack traces (see the URL Guido posted earlier: http://tinyurl.com/3bpkkv. Learning the tricks shown here cost me a week of sleep once, but now it's beautifully documented, so profit from the work people have done to write up these instructions). Once you have the stack traces in hand, it becomes pretty obvious where the problem is and you can fix it. Look for the thread which isn't stuck in a wait queue or sleeping while waiting for requests. Mark ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: Deadlocks
On Sep 5, 2007, at 4:12 PM, Steven Mark McCraw wrote: You are probably safe for general EC usage there, but you can still do other bad things and end up deadlocked. There are many great things in the win column for WebObjects, but I believe one of the definite negatives of the technology is how ridiculously easy it is to deadlock a webobjects application. Is it easy? Or is that just the nature of the concurrency beast? The only ways to cause deadlock that I can think of are (a) improper exception handling related to releasing locks and (b) unbalanced lock usage. I'd expect those to cause problems in any multi-threaded, concurrent environment. There _were_ some issues in this area in prior versions. AFAIK, these are fixed. The one thing I can think of that WO could have added is some try...catch or try...finally blocks in WOSession. These could, if present, handle when the developer does not properly handle the exceptions that happen in their code. Can you think of anything else that could be done? Chuck You have to take the bad with the good. This is miserably scary and nasty until you learn to dump the thread stack traces (see the URL Guido posted earlier: http://tinyurl.com/3bpkkv. Learning the tricks shown here cost me a week of sleep once, but now it's beautifully documented, so profit from the work people have done to write up these instructions). Once you have the stack traces in hand, it becomes pretty obvious where the problem is and you can fix it. Look for the thread which isn't stuck in a wait queue or sleeping while waiting for requests. Mark -- Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/products/practical_webobjects ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
Re: Deadlocks
If virtue can't be mine alone at least my faults can be my own. - Piet Hein Deadlocks in WebObjects have always been my own fault. I'm thankful to Andrew Lindesay's HOWTO. It's a super easy process -- and saves my bacon from my own carelessness. From: Steven Mark McCraw [EMAIL PROTECTED] Date: Wed, 5 Sep 2007 20:22:42 -0400 To: Chuck Hill [EMAIL PROTECTED] Cc: WebObjects (Group) webobjects-dev@lists.apple.com Subject: Re: Deadlocks Is it easy? Or is that just the nature of the concurrency beast? I consider it easy because I've had to deal with it so many times. I think WebObjects seems worse than a normal multithreaded app because things you do that are totally unrelated to concurrency from the programmer's point of view can cause you deadlock. Deadlocks sure are frustrating. In my opinion WO locks are only more noticeable than other web app environments, because, in classical WO programming, you've only got one channel to the DB per application. Lock that up -- and the rest of the app is toast. Other programming environments might suffer less by having more avenues to the data. You might be equally guilty of poor practice in those environments but possibly not even notice it. Instead you reboot your app when it slows to a crawl, or runs out of descriptors, blaming it on Java. It's nice that so much of the concurrency-handling misery you would ordinarily have to think about with multithreaded applications is hidden from you, but when it goes wrong, it is the height of confusion. Well when we see that copy of open-source WO, Real Soon Now (tm). we can all say what confusion? -=- matt Matthew Taylor Northwestern University smime.p7s Description: S/MIME cryptographic signature ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
OPENBASE and Deadlocks
Hello; I'm not sure if I mentioned this before, but one of my projects was having deadlock problems with high-volume writes out of a WOA into OPENBASE. I developed a subclassed adaptor for this -- so if anybody is interested in this, I put something in the wiki about it... http://en.wikibooks.org/wiki/Programming:WebObjects/ Database_Compatibility_and_Comparisons/OpenBase#Deadlocks cheers. ___ Andrew Lindesay www.lindesay.co.nz ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to archive@mail-archive.com
Re: OPENBASE and Deadlocks
Hi Andrew, I think I understand now. So I think you are saying that OpenBase is detecting a deadlock and aborting a transaction, right? The solution you have posted on this page will work fine. You could also resave the transaction. I've made a small edit to the wiki to make it more clear that we are talking about the problem of aborting transactions to avoid deadlocks rather than a deadlocked server. Please let me know if I have misunderstood. Thanks. Best regards, Scott Keith OpenBase On Jan 17, 2007, at 11:31 PM, Andrew Lindesay wrote: Hello; I'm not sure if I mentioned this before, but one of my projects was having deadlock problems with high-volume writes out of a WOA into OPENBASE. I developed a subclassed adaptor for this -- so if anybody is interested in this, I put something in the wiki about it... http://en.wikibooks.org/wiki/Programming:WebObjects/ Database_Compatibility_and_Comparisons/OpenBase#Deadlocks cheers. ___ Andrew Lindesay www.lindesay.co.nz ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to archive@mail-archive.com
RE: OPENBASE and Deadlocks - follow-up
Hello again; I'm not sure if I mentioned this before, but one of my projects was having deadlock problems with high-volume writes out of a WOA into OPENBASE. I developed a subclassed adaptor for this -- so if anybody is interested in this, I put something in the wiki about it... I just want to follow up this post by adding that the behavior exhibited by the OPENBASE product is 100% correct -- this is not an issue with the database server. This is a database client-end solution that integrates with WebObjects applications for specifically solving this issue in the development of WebObjects applications. cheers. ___ Andrew Lindesay www.lindesay.co.nz ___ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to archive@mail-archive.com