Hi, I created https://issues.apache.org/jira/browse/KARAF-6866
Will try to come up with a PR for a fix. Stay tuned. /henning On Mon, Oct 5, 2020 at 5:00 PM Jean-Baptiste Onofre <[email protected]> wrote: > Hi, > > It would "force" configadmin to be startup in etc/startup.properties > before features service (which is the case by default but not mandatory). > > It’s certainly acceptable but I would like to evaluate the impact. > > Can you please create a Jira related to that (or I can do it for you) ? > > Thanks, > Regards > JB > > Le 5 oct. 2020 à 16:53, Henning Treu <[email protected]> a écrit : > > Hi JB, > > unfortunately there is no practical way to reproduce this. It happens > "occasionally" on our docker images with karaf packaged and in 10% of cases > when our customer installs the karaf distro on RHEL7. > > In the feature.core Activator the config is loaded specifically from file, > why not use the config already stored in ConfigAdmin (I guess provided by > felix FileInstall)? Since the timestamp on the file updates it looks like > it gets re-written and races with the Activator to open it for reading. > Does this make sense? > > Best > /henning > > On Fri, Oct 2, 2020 at 2:22 PM Jean-Baptiste Onofre <[email protected]> > wrote: > >> Hi, >> >> Does it really occur with Karaf 4.2.8 ? >> >> I never saw such issue. Any step to reproduce this (maybe changing time) ? >> >> Regards >> JB >> >> Le 2 oct. 2020 à 13:31, Henning Treu <[email protected]> a écrit : >> >> Hi list, >> >> we bundle our OSGi application with karaf, having our own feature >> repository put into etc/org.apache.karaf.features.cfg and also defined our >> boot feature with the featuresBoot property. >> >> When starting up on RHEL7 and re-starting on Docker the startup gets >> interrupted occasionally when >> org.apache.karaf.features.internal.osgi.Activator >> tries to read etc/org.apache.karaf.features.cfg. We can see in a heap dump >> from this state, that the features.core bundle is loaded, but the >> properties from the config file are not present. >> This leads to no further features/bundles being installed. >> >> When debugging this I can see the timestamps >> on etc/org.apache.karaf.features.cfg beeing later then the timestamps on >> the karaf pid file: >> >> /opt/instana/agent# stat agent.pid >> File: agent.pid >> Size: 5 Blocks: 8 IO Block: 4096 regular file >> Device: 41h/65d Inode: 19025079 Links: 1 >> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) >> Access: 2020-10-02 06:52:09.823302941 +0000 >> Modify: 2020-10-02 06:57:41.029279007 +0000 >> Change: 2020-10-02 06:57:41.029279007 +0000 >> Birth: - >> >> /opt/instana/agent# stat etc/org.apache.karaf.features.cfg >> File: etc/org.apache.karaf.features.cfg >> Size: 3393 Blocks: 8 IO Block: 4096 regular file >> Device: 41h/65d Inode: 39076582 Links: 1 >> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) >> Access: 2020-10-02 06:57:42.645969271 +0000 >> Modify: 2020-10-02 06:57:42.645969271 +0000 >> Change: 2020-10-02 06:57:42.645969271 +0000 >> Birth: - >> >> >> >> Following is a thread dump from the stuck startup: >> >> /opt/instana/agent# jvm/bin/jcmd 26665 Thread.print >> 26665: >> 2020-10-02 07:00:53 >> Full thread dump OpenJDK 64-Bit Server VM (25.232-b18 mixed mode):"Attach >> Listener" #18 daemon prio=9 os_prio=0 tid=0x0000000002a64000 nid=0x6cd4 >> waiting on condition [0x0000000000000000] >> java.lang.Thread.State: RUNNABLE"fileinstall-/opt/instana/agent/deploy" >> #17 daemon prio=5 os_prio=0 tid=0x0000000002de2000 nid=0x6842 in >> Object.wait() [0x00007faa5ef42000] >> java.lang.Thread.State: TIMED_WAITING (on object monitor) >> at java.lang.Object.wait(Native Method) >> - waiting on <0x00000000f8f7d1d0> (a >> org.apache.felix.fileinstall.internal.DirectoryWatcher) >> at >> org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:320) >> - locked <0x00000000f8f7d1d0> (a >> org.apache.felix.fileinstall.internal.DirectoryWatcher)"fileinstall-/opt/instana/agent/etc" >> #15 daemon prio=5 os_prio=0 tid=0x0000000002e2a800 nid=0x6841 in >> Object.wait() [0x00007faa5efa3000] >> java.lang.Thread.State: TIMED_WAITING (on object monitor) >> at java.lang.Object.wait(Native Method) >> - waiting on <0x00000000f8c87360> (a >> org.apache.felix.fileinstall.internal.DirectoryWatcher) >> at >> org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:320) >> - locked <0x00000000f8c87360> (a >> org.apache.felix.fileinstall.internal.DirectoryWatcher)"Thread-2" #16 daemon >> prio=5 os_prio=0 tid=0x0000000002ccc800 nid=0x683c runnable >> [0x00007faa5f880000] >> java.lang.Thread.State: RUNNABLE >> at sun.nio.fs.LinuxWatchService.poll(Native Method) >> at sun.nio.fs.LinuxWatchService.access$600(LinuxWatchService.java:47) >> at >> sun.nio.fs.LinuxWatchService$Poller.run(LinuxWatchService.java:314) >> at java.lang.Thread.run(Thread.java:748)"CM Event Dispatcher" #14 >> daemon prio=5 os_prio=0 tid=0x0000000002caa000 nid=0x683b in Object.wait() >> [0x00007faa5f8e1000] >> java.lang.Thread.State: WAITING (on object monitor) >> at java.lang.Object.wait(Native Method) >> - waiting on <0x00000000f8c87860> (a java.util.LinkedList) >> at java.lang.Object.wait(Object.java:502) >> at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:79) >> - locked <0x00000000f8c87860> (a java.util.LinkedList) >> at java.lang.Thread.run(Thread.java:748)"CM Configuration Updater" >> #13 daemon prio=5 os_prio=0 tid=0x0000000002ca1800 nid=0x683a in >> Object.wait() [0x00007faa5f942000] >> java.lang.Thread.State: WAITING (on object monitor) >> at java.lang.Object.wait(Native Method) >> - waiting on <0x00000000f8c87c78> (a java.util.LinkedList) >> at java.lang.Object.wait(Object.java:502) >> at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:79) >> - locked <0x00000000f8c87c78> (a java.util.LinkedList) >> at java.lang.Thread.run(Thread.java:748)"Coordination Timer" #12 >> daemon prio=5 os_prio=0 tid=0x0000000002c8d000 nid=0x6839 in Object.wait() >> [0x00007faa5fa4b000] >> java.lang.Thread.State: WAITING (on object monitor) >> at java.lang.Object.wait(Native Method) >> - waiting on <0x00000000f8c87e18> (a java.util.TaskQueue) >> at java.lang.Object.wait(Object.java:502) >> at java.util.TimerThread.mainLoop(Timer.java:526) >> - locked <0x00000000f8c87e18> (a java.util.TaskQueue) >> at java.util.TimerThread.run(Timer.java:505)"Karaf Lock Monitor >> Thread" #11 prio=5 os_prio=0 tid=0x0000000002863000 nid=0x6838 waiting on >> condition [0x00007faa5f804000] >> java.lang.Thread.State: TIMED_WAITING (sleeping) >> at java.lang.Thread.sleep(Native Method) >> at org.apache.karaf.main.Main.doMonitor(Main.java:410) >> at org.apache.karaf.main.Main.access$100(Main.java:74) >> at org.apache.karaf.main.Main$3.run(Main.java:377)"FelixStartLevel" >> #10 daemon prio=5 os_prio=0 tid=0x000000000250a000 nid=0x6837 in >> Object.wait() [0x00007faa5fb20000] >> java.lang.Thread.State: WAITING (on object monitor) >> at java.lang.Object.wait(Native Method) >> - waiting on <0x00000000f84d8b00> (a java.util.ArrayList) >> at java.lang.Object.wait(Object.java:502) >> at >> org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:283) >> - locked <0x00000000f84d8b00> (a java.util.ArrayList) >> at java.lang.Thread.run(Thread.java:748)"FelixFrameworkWiring" #9 >> daemon prio=5 os_prio=0 tid=0x0000000002508000 nid=0x6836 in Object.wait() >> [0x00007faa5fc62000] >> java.lang.Thread.State: WAITING (on object monitor) >> at java.lang.Object.wait(Native Method) >> - waiting on <0x00000000f84d8cb8> (a java.util.ArrayList) >> at java.lang.Object.wait(Object.java:502) >> at >> org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172) >> - locked <0x00000000f84d8cb8> (a java.util.ArrayList) >> at java.lang.Thread.run(Thread.java:748)"FelixDispatchQueue" #8 >> prio=5 os_prio=0 tid=0x0000000002577000 nid=0x6835 in Object.wait() >> [0x00007faa5fb81000] >> java.lang.Thread.State: WAITING (on object monitor) >> at java.lang.Object.wait(Native Method) >> - waiting on <0x00000000f84d8e70> (a java.util.ArrayList) >> at java.lang.Object.wait(Object.java:502) >> at >> org.apache.felix.framework.EventDispatcher.run(EventDispatcher.java:1122) >> - locked <0x00000000f84d8e70> (a java.util.ArrayList) >> at >> org.apache.felix.framework.EventDispatcher.access$000(EventDispatcher.java:54) >> at >> org.apache.felix.framework.EventDispatcher$1.run(EventDispatcher.java:102) >> at java.lang.Thread.run(Thread.java:748)"Service Thread" #6 daemon >> prio=9 os_prio=0 tid=0x0000000001f7d000 nid=0x6830 runnable >> [0x0000000000000000] >> java.lang.Thread.State: RUNNABLE"C2 CompilerThread0" #5 daemon prio=9 >> os_prio=0 tid=0x0000000001f73000 nid=0x682f waiting on condition >> [0x0000000000000000] >> java.lang.Thread.State: RUNNABLE"Signal Dispatcher" #4 daemon prio=9 >> os_prio=0 tid=0x0000000001f63000 nid=0x682e runnable [0x0000000000000000] >> java.lang.Thread.State: RUNNABLE"Finalizer" #3 daemon prio=8 os_prio=0 >> tid=0x0000000001f3e800 nid=0x682d in Object.wait() [0x00007faa63b2e000] >> java.lang.Thread.State: WAITING (on object monitor) >> at java.lang.Object.wait(Native Method) >> - waiting on <0x00000000f83a18a8> (a >> java.lang.ref.ReferenceQueue$Lock) >> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144) >> - locked <0x00000000f83a18a8> (a java.lang.ref.ReferenceQueue$Lock) >> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165) >> at >> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)"Reference >> Handler" #2 daemon prio=10 os_prio=0 tid=0x0000000001f34000 nid=0x682c in >> Object.wait() [0x00007faa63b8f000] >> java.lang.Thread.State: WAITING (on object monitor) >> at java.lang.Object.wait(Native Method) >> - waiting on <0x00000000f83a1a60> (a java.lang.ref.Reference$Lock) >> at java.lang.Object.wait(Object.java:502) >> at java.lang.ref.Reference.tryHandlePending(Reference.java:191) >> - locked <0x00000000f83a1a60> (a java.lang.ref.Reference$Lock) >> at >> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)"main" #1 >> prio=5 os_prio=0 tid=0x0000000001ec4000 nid=0x682a in Object.wait() >> [0x00007faa65f49000] >> java.lang.Thread.State: WAITING (on object monitor) >> at java.lang.Object.wait(Native Method) >> - waiting on <0x00000000f855e590> (a >> org.apache.felix.framework.util.ThreadGate) >> at >> org.apache.felix.framework.util.ThreadGate.await(ThreadGate.java:79) >> - locked <0x00000000f855e590> (a >> org.apache.felix.framework.util.ThreadGate) >> at org.apache.felix.framework.Felix.waitForStop(Felix.java:1075) >> at org.apache.karaf.main.Main.awaitShutdown(Main.java:668) >> at org.apache.karaf.main.Main.main(Main.java:188)"VM Thread" >> os_prio=0 tid=0x0000000001f2a000 nid=0x682b runnable"VM Periodic Task >> Thread" os_prio=0 tid=0x0000000001f80800 nid=0x6831 waiting on conditionJNI >> global references: 304 >> >> >> This is our startup.properties: >> >> # Bundles to be started on startup, with startlevel >> mvn\:org.apache.karaf.features/org.apache.karaf.features.extension/4.2.9 >> = 1 >> mvn\:org.ops4j.pax.url/pax-url-aether/2.6.2 = 5 >> mvn\:org.ops4j.pax.logging/pax-logging-api/1.11.6 = 8 >> mvn\:org.ops4j.pax.logging/pax-logging-log4j2/1.11.6 = 8 >> mvn\:org.fusesource.jansi/jansi/1.18 = 8 >> mvn\:org.apache.felix/org.apache.felix.coordinator/1.0.2 = 9 >> mvn\:org.apache.felix/org.apache.felix.configadmin/1.9.16 = 10 >> mvn\:org.apache.felix/org.apache.felix.fileinstall/3.6.6 = 11 >> mvn\:org.apache.karaf.features/org.apache.karaf.features.core/4.2.9 = 15 >> >> I suspect a race condition between felix fileinstall, felix config admin >> and the feature.core bundle to cause this when the file is re-written by >> fileinstall(?) and feature.core opening it for reading. Unfortunately I >> have no detailed experience in the combined workings of those components. >> >> Happy to hear any suggestion. >> >> ps: seen with karaf 4.2.8 on RHEL7 and 4.2.9 on docker. >> >> Cheers >> /henning >> -- >> *Henning Treu *| Senior Software Engineer >> Mobile: +49 160 1797564 >> >> *INSTANA Inc* | 222 South Riverside Plaza, 15th Floor | Chicago, IL 60606 | >> United States >> >> *INSTANA GMBH *| Dunkelnberger Str. 39 | 42697 Solingen | Deutschland >> Sitz der Gesellschaft: Solingen | HRB 26637 | Amtsgericht Wuppertal >> Geschäftsführer: Mirko Novakovic | Pavlo Baron >> >> >> > > -- > *Henning Treu *| Senior Software Engineer > Mobile: +49 160 1797564 > > *INSTANA Inc* | 222 South Riverside Plaza, 15th Floor | Chicago, IL 60606 | > United States > > *INSTANA GMBH *| Dunkelnberger Str. 39 | 42697 Solingen | Deutschland > Sitz der Gesellschaft: Solingen | HRB 26637 | Amtsgericht Wuppertal > Geschäftsführer: Mirko Novakovic | Pavlo Baron > > > -- *Henning Treu *| Senior Software Engineer Mobile: +49 160 1797564 *INSTANA Inc* | 222 South Riverside Plaza, 15th Floor | Chicago, IL 60606 | United States *INSTANA GMBH *| Dunkelnberger Str. 39 | 42697 Solingen | Deutschland Sitz der Gesellschaft: Solingen | HRB 26637 | Amtsgericht Wuppertal Geschäftsführer: Mirko Novakovic | Pavlo Baron
