Hi,
It looks like I’ve identified a bug in Felix which causes deadlock at
startup – my Karaf fails to start :(.
As far as I can see conditions and code flow is following
Conditions ...
- There is a WAR bundle having package imports from 2 other bundles (I
expect it to be specifically java.annotation)
- WAR is successfully started, but afterwards – few milliseconds later –
another bundle which exports same package (java.annotations) is being
resolved and this causes WAR bundle to be re-resolved
(STOP->UNRESOLVED->RESOLVED-STARTED)
This causes …
- … FelixStartLevel thread to hold the lock over WAR bundle (during firing
the WAR’s bundle-started event)
- … FelixPackageAdmin thread to hold the global lock and requesting the
bundle lock over WAR bundle to stop it to refresh its packages – ends up in
endless waiting loop because bundle is already locked by FelixStartLevel
thread
- … FelixStartLevel trying to re-acquire bundle lock over the WAR bundle
to
register a service from bundle-started event handled in PAX WEB for that WAR
bundle - this too ends up in endless loop because global lock is gained by
another thread.
Shortly, its improper set of bundles (multiple bundles exporting the same
package) being uploaded to Karaf, but … there is definitely the bug in
Felix. I expect FelixStartLevel thread to be able to successfully re-acquire
the lock because it already holds it and I think ’||’ should be replaced
with ‘&&’ in code block below, which effectively means – immediately
re-enter the lock (i.e. skip while loop) if
- bundle.isLockable() == true – because bundle is already locked by
current
thread
synchronized boolean isLockable() {
return (m_lockCount == 0) || (m_lockThread ==
Thread.currentThread());
}
or
- current thread is already holding the global lock (and consequently all
bundles)
Correct code is
// Wait if the desired bundle is already locked by someone else
// or if any thread has the global lock, unless the current
thread
// holds the global lock or the bundle lock already.
while (
!bundle.isLockable() &&
( (m_globalLockThread != null)
&& (m_globalLockThread != Thread.currentThread())) )
{
or
while (
!( bundle.isLockable() ||
( (m_globalLockThread != null)
&& (m_globalLockThread != Thread.currentThread()))) )
{
Debug information for that problem
FelixPackageAdmin thread – at acquireBundleLock(bundle #126)
Tries to stop bundle my-webapp-0.1.0-SNAPSHOT.war
Because it is going to refresh its packages
FelixStartLevel thread – at acquireBundleLock(bundle #126)
Tries to addServiceListener on s = (java.lang.String)
"(objectClass=org.osgi.service.http.HttpService)"
Because war bundle is being started
m_globalLockThread == FelixPackageAdmin
bundle.m_lockThread == FelixStartLevel
bundle.m_lockCount == 1
desiredStates == 40 == Bundle.STARTING | Bundle.ACTIVE
bundle == 32 == ACTIVE
void acquireBundleLock(BundleImpl bundle, int desiredStates)
throws IllegalStateException
{
synchronized (m_bundleLock)
{
// Wait if the desired bundle is already locked by someone else
// or if any thread has the global lock, unless the current
thread
// holds the global lock or the bundle lock already.
while (!bundle.isLockable() ||
((m_globalLockThread != null)
&& (m_globalLockThread != Thread.currentThread())))
{
// Check to make sure the bundle is in a desired state.
// If so, keep waiting. If not, throw an exception.
if ((desiredStates & bundle.getState()) == 0)
{
throw new IllegalStateException("Bundle in unexpected
state.");
}
// If the calling thread already owns the global lock, then
make
// sure no other thread is trying to promote a bundle lock
to a
// global lock. If so, interrupt the other thread to avoid
deadlock.
else if (m_globalLockThread == Thread.currentThread()
&& (bundle.getLockingThread() != null)
&&
m_globalLockWaitersList.contains(bundle.getLockingThread()))
{
bundle.getLockingThread().interrupt();
}
try
{
m_bundleLock.wait();
}
catch (InterruptedException ex)
{
throw new IllegalStateException("Unable to acquire
bundle lock, thread interrupted.");
}
}
"FelixDispatchQueue"
m_globalLockWaitersList = (java.util.ArrayList) [Thread[Blueprint Extender:
1,5,main], Thread[FelixDispatchQueue,5,main]]
private boolean acquireGlobalLock()
{
synchronized (m_bundleLock)
{
// Wait as long as some other thread holds the global lock
// and the current thread is not interrupted.
boolean interrupted = false;
while (!interrupted
&& (m_globalLockThread != null)
&& (m_globalLockThread != Thread.currentThread()))
{
// Add calling thread to global lock waiters list.
m_globalLockWaitersList.add(Thread.currentThread());
// We need to wake up all waiting threads so we can
// recheck for potential deadlock in acquireBundleLock()
// if this thread was holding a bundle lock and is now
// trying to promote it to a global lock.
m_bundleLock.notifyAll();
// Now wait for the global lock.
try
{
m_bundleLock.wait();
}
catch (InterruptedException ex)
{
interrupted = true;
}
// At this point we are either interrupted or will get the
// global lock, so remove the thread from the waiters list.
m_globalLockWaitersList.remove(Thread.currentThread());
}
"FelixPackageAdmin"
java.lang.Object.wait(Object.java)
java.lang.Object.wait(Object.java:503)
org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4922)
org.apache.felix.framework.Felix.stopBundle(Felix.java:2197)
org.apache.felix.framework.Felix$RefreshHelper.stop(Felix.java:4668)
org.apache.felix.framework.Felix.refreshPackages(Felix.java:3699)
org.apache.felix.framework.PackageAdminImpl.run(PackageAdminImpl.java:365)
java.lang.Thread.run(Thread.java:722)
"FelixStartLevel"
java.lang.Object.wait(Object.java)
java.lang.Object.wait(Object.java:503)
org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4922)
org.apache.felix.framework.Felix.addServiceListener(Felix.java:2814)
org.apache.felix.framework.BundleContextImpl.addServiceListener(BundleContextImpl.java:246)
org.osgi.util.tracker.ServiceTracker.open(ServiceTracker.java:308)
org.osgi.util.tracker.ServiceTracker.open(ServiceTracker.java:273)
org.ops4j.pax.swissbox.tracker.ServiceCollection.onStart(ServiceCollection.java:139)
org.ops4j.pax.swissbox.lifecycle.AbstractLifecycle$Stopped.start(AbstractLifecycle.java:121)
org.ops4j.pax.swissbox.lifecycle.AbstractLifecycle.start(AbstractLifecycle.java:49)
org.ops4j.pax.swissbox.tracker.ReplaceableService.onStart(ReplaceableService.java:146)
org.ops4j.pax.swissbox.lifecycle.AbstractLifecycle$Stopped.start(AbstractLifecycle.java:121)
org.ops4j.pax.swissbox.lifecycle.AbstractLifecycle.start(AbstractLifecycle.java:49)
org.ops4j.pax.web.extender.war.internal.WebAppPublisher.publish(WebAppPublisher.java:81)
org.ops4j.pax.web.extender.war.internal.WebXmlObserver.doPublish(WebXmlObserver.java:304)
org.ops4j.pax.web.extender.war.internal.WebXmlObserver.addingEntries(WebXmlObserver.java:153)
org.ops4j.pax.swissbox.extender.BundleWatcher.register(BundleWatcher.java:186)
org.ops4j.pax.swissbox.extender.BundleWatcher.access$000(BundleWatcher.java:45)
org.ops4j.pax.swissbox.extender.BundleWatcher$1.bundleChanged(BundleWatcher.java:127)
org.apache.felix.framework.util.EventDispatcher.invokeBundleListenerCallback(EventDispatcher.java:807)
org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:729)
org.apache.felix.framework.util.EventDispatcher.fireBundleEvent(EventDispatcher.java:610)
org.apache.felix.framework.Felix.fireBundleEvent(Felix.java:3879)
org.apache.felix.framework.Felix.startBundle(Felix.java:1850)
org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1192)
org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:266)
java.lang.Thread.run(Thread.java:722)
"FelixDispatchQueue"
org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:4991)
org.apache.felix.framework.Felix.resolveBundles(Felix.java:3492)
org.apache.felix.framework.Felix.findBundleEntries(Felix.java:1563)
org.apache.felix.framework.BundleImpl.findEntries(BundleImpl.java:293)
org.apache.karaf.deployer.features.FeatureDeploymentListener.bundleChanged(FeatureDeploymentListener.java:126)
org.apache.felix.framework.util.EventDispatcher.invokeBundleListenerCallback(EventDispatcher.java:807)
org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:729)
org.apache.felix.framework.util.EventDispatcher.run(EventDispatcher.java:949)
org.apache.felix.framework.util.EventDispatcher.access$000(EventDispatcher.java:54)
org.apache.felix.framework.util.EventDispatcher$1.run(EventDispatcher.java:106)
java.lang.Thread.run(Thread.java:722)
Hopefully someone can add fix for this into Felix.
--
View this message in context:
http://karaf.922171.n3.nabble.com/Deadlock-in-Karaf-2-2-9-Felix-3-2-2-tp4025936.html
Sent from the Karaf - User mailing list archive at Nabble.com.