Hi JB,

unfortunately there is no practical way to reproduce this. It happens
"occasionally" on our docker images with karaf packaged and in 10% of cases
when our customer installs the karaf distro on RHEL7.

In the feature.core Activator the config is loaded specifically from file,
why not use the config already stored in ConfigAdmin (I guess provided by
felix FileInstall)? Since the timestamp on the file updates it looks like
it gets re-written and races with the Activator to open it for reading.
Does this make sense?

Best
/henning

On Fri, Oct 2, 2020 at 2:22 PM Jean-Baptiste Onofre <[email protected]> wrote:

> Hi,
>
> Does it really occur with Karaf 4.2.8 ?
>
> I never saw such issue. Any step to reproduce this (maybe changing time) ?
>
> Regards
> JB
>
> Le 2 oct. 2020 à 13:31, Henning Treu <[email protected]> a écrit :
>
> Hi list,
>
> we bundle our OSGi application with karaf, having our own feature
> repository put into etc/org.apache.karaf.features.cfg and also defined our
> boot feature with the featuresBoot property.
>
> When starting up on RHEL7 and re-starting on Docker the startup gets
> interrupted occasionally when 
> org.apache.karaf.features.internal.osgi.Activator
> tries to read etc/org.apache.karaf.features.cfg. We can see in a heap dump
> from this state, that the features.core bundle is loaded, but the
> properties from the config file are not present.
> This leads to no further features/bundles being installed.
>
> When debugging this I can see the timestamps
> on etc/org.apache.karaf.features.cfg beeing later then the timestamps on
> the karaf pid file:
>
> /opt/instana/agent# stat agent.pid
>   File: agent.pid
>   Size: 5               Blocks: 8          IO Block: 4096   regular file
> Device: 41h/65d Inode: 19025079    Links: 1
> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2020-10-02 06:52:09.823302941 +0000
> Modify: 2020-10-02 06:57:41.029279007 +0000
> Change: 2020-10-02 06:57:41.029279007 +0000
>  Birth: -
>
> /opt/instana/agent# stat etc/org.apache.karaf.features.cfg
>   File: etc/org.apache.karaf.features.cfg
>   Size: 3393            Blocks: 8          IO Block: 4096   regular file
> Device: 41h/65d Inode: 39076582    Links: 1
> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2020-10-02 06:57:42.645969271 +0000
> Modify: 2020-10-02 06:57:42.645969271 +0000
> Change: 2020-10-02 06:57:42.645969271 +0000
>  Birth: -
>
>
>
> Following is a thread dump from the stuck startup:
>
> /opt/instana/agent# jvm/bin/jcmd 26665 Thread.print
> 26665:
> 2020-10-02 07:00:53
> Full thread dump OpenJDK 64-Bit Server VM (25.232-b18 mixed mode):"Attach 
> Listener" #18 daemon prio=9 os_prio=0 tid=0x0000000002a64000 nid=0x6cd4 
> waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE"fileinstall-/opt/instana/agent/deploy" 
> #17 daemon prio=5 os_prio=0 tid=0x0000000002de2000 nid=0x6842 in 
> Object.wait() [0x00007faa5ef42000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000000f8f7d1d0> (a 
> org.apache.felix.fileinstall.internal.DirectoryWatcher)
>         at 
> org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:320)
>         - locked <0x00000000f8f7d1d0> (a 
> org.apache.felix.fileinstall.internal.DirectoryWatcher)"fileinstall-/opt/instana/agent/etc"
>  #15 daemon prio=5 os_prio=0 tid=0x0000000002e2a800 nid=0x6841 in 
> Object.wait() [0x00007faa5efa3000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000000f8c87360> (a 
> org.apache.felix.fileinstall.internal.DirectoryWatcher)
>         at 
> org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:320)
>         - locked <0x00000000f8c87360> (a 
> org.apache.felix.fileinstall.internal.DirectoryWatcher)"Thread-2" #16 daemon 
> prio=5 os_prio=0 tid=0x0000000002ccc800 nid=0x683c runnable 
> [0x00007faa5f880000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.fs.LinuxWatchService.poll(Native Method)
>         at sun.nio.fs.LinuxWatchService.access$600(LinuxWatchService.java:47)
>         at sun.nio.fs.LinuxWatchService$Poller.run(LinuxWatchService.java:314)
>         at java.lang.Thread.run(Thread.java:748)"CM Event Dispatcher" #14 
> daemon prio=5 os_prio=0 tid=0x0000000002caa000 nid=0x683b in Object.wait() 
> [0x00007faa5f8e1000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000000f8c87860> (a java.util.LinkedList)
>         at java.lang.Object.wait(Object.java:502)
>         at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:79)
>         - locked <0x00000000f8c87860> (a java.util.LinkedList)
>         at java.lang.Thread.run(Thread.java:748)"CM Configuration Updater" 
> #13 daemon prio=5 os_prio=0 tid=0x0000000002ca1800 nid=0x683a in 
> Object.wait() [0x00007faa5f942000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000000f8c87c78> (a java.util.LinkedList)
>         at java.lang.Object.wait(Object.java:502)
>         at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:79)
>         - locked <0x00000000f8c87c78> (a java.util.LinkedList)
>         at java.lang.Thread.run(Thread.java:748)"Coordination Timer" #12 
> daemon prio=5 os_prio=0 tid=0x0000000002c8d000 nid=0x6839 in Object.wait() 
> [0x00007faa5fa4b000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000000f8c87e18> (a java.util.TaskQueue)
>         at java.lang.Object.wait(Object.java:502)
>         at java.util.TimerThread.mainLoop(Timer.java:526)
>         - locked <0x00000000f8c87e18> (a java.util.TaskQueue)
>         at java.util.TimerThread.run(Timer.java:505)"Karaf Lock Monitor 
> Thread" #11 prio=5 os_prio=0 tid=0x0000000002863000 nid=0x6838 waiting on 
> condition [0x00007faa5f804000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.karaf.main.Main.doMonitor(Main.java:410)
>         at org.apache.karaf.main.Main.access$100(Main.java:74)
>         at org.apache.karaf.main.Main$3.run(Main.java:377)"FelixStartLevel" 
> #10 daemon prio=5 os_prio=0 tid=0x000000000250a000 nid=0x6837 in 
> Object.wait() [0x00007faa5fb20000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000000f84d8b00> (a java.util.ArrayList)
>         at java.lang.Object.wait(Object.java:502)
>         at 
> org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:283)
>         - locked <0x00000000f84d8b00> (a java.util.ArrayList)
>         at java.lang.Thread.run(Thread.java:748)"FelixFrameworkWiring" #9 
> daemon prio=5 os_prio=0 tid=0x0000000002508000 nid=0x6836 in Object.wait() 
> [0x00007faa5fc62000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000000f84d8cb8> (a java.util.ArrayList)
>         at java.lang.Object.wait(Object.java:502)
>         at 
> org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172)
>         - locked <0x00000000f84d8cb8> (a java.util.ArrayList)
>         at java.lang.Thread.run(Thread.java:748)"FelixDispatchQueue" #8 
> prio=5 os_prio=0 tid=0x0000000002577000 nid=0x6835 in Object.wait() 
> [0x00007faa5fb81000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000000f84d8e70> (a java.util.ArrayList)
>         at java.lang.Object.wait(Object.java:502)
>         at 
> org.apache.felix.framework.EventDispatcher.run(EventDispatcher.java:1122)
>         - locked <0x00000000f84d8e70> (a java.util.ArrayList)
>         at 
> org.apache.felix.framework.EventDispatcher.access$000(EventDispatcher.java:54)
>         at 
> org.apache.felix.framework.EventDispatcher$1.run(EventDispatcher.java:102)
>         at java.lang.Thread.run(Thread.java:748)"Service Thread" #6 daemon 
> prio=9 os_prio=0 tid=0x0000000001f7d000 nid=0x6830 runnable 
> [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE"C2 CompilerThread0" #5 daemon prio=9 
> os_prio=0 tid=0x0000000001f73000 nid=0x682f waiting on condition 
> [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE"Signal Dispatcher" #4 daemon prio=9 
> os_prio=0 tid=0x0000000001f63000 nid=0x682e runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE"Finalizer" #3 daemon prio=8 os_prio=0 
> tid=0x0000000001f3e800 nid=0x682d in Object.wait() [0x00007faa63b2e000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000000f83a18a8> (a 
> java.lang.ref.ReferenceQueue$Lock)
>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
>         - locked <0x00000000f83a18a8> (a java.lang.ref.ReferenceQueue$Lock)
>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
>         at 
> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)"Reference 
> Handler" #2 daemon prio=10 os_prio=0 tid=0x0000000001f34000 nid=0x682c in 
> Object.wait() [0x00007faa63b8f000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000000f83a1a60> (a java.lang.ref.Reference$Lock)
>         at java.lang.Object.wait(Object.java:502)
>         at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
>         - locked <0x00000000f83a1a60> (a java.lang.ref.Reference$Lock)
>         at 
> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)"main" #1 
> prio=5 os_prio=0 tid=0x0000000001ec4000 nid=0x682a in Object.wait() 
> [0x00007faa65f49000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000000f855e590> (a 
> org.apache.felix.framework.util.ThreadGate)
>         at 
> org.apache.felix.framework.util.ThreadGate.await(ThreadGate.java:79)
>         - locked <0x00000000f855e590> (a 
> org.apache.felix.framework.util.ThreadGate)
>         at org.apache.felix.framework.Felix.waitForStop(Felix.java:1075)
>         at org.apache.karaf.main.Main.awaitShutdown(Main.java:668)
>         at org.apache.karaf.main.Main.main(Main.java:188)"VM Thread" 
> os_prio=0 tid=0x0000000001f2a000 nid=0x682b runnable"VM Periodic Task Thread" 
> os_prio=0 tid=0x0000000001f80800 nid=0x6831 waiting on conditionJNI global 
> references: 304
>
>
> This is our startup.properties:
>
> # Bundles to be started on startup, with startlevel
> mvn\:org.apache.karaf.features/org.apache.karaf.features.extension/4.2.9 =
> 1
> mvn\:org.ops4j.pax.url/pax-url-aether/2.6.2 = 5
> mvn\:org.ops4j.pax.logging/pax-logging-api/1.11.6 = 8
> mvn\:org.ops4j.pax.logging/pax-logging-log4j2/1.11.6 = 8
> mvn\:org.fusesource.jansi/jansi/1.18 = 8
> mvn\:org.apache.felix/org.apache.felix.coordinator/1.0.2 = 9
> mvn\:org.apache.felix/org.apache.felix.configadmin/1.9.16 = 10
> mvn\:org.apache.felix/org.apache.felix.fileinstall/3.6.6 = 11
> mvn\:org.apache.karaf.features/org.apache.karaf.features.core/4.2.9 = 15
>
> I suspect a race condition between felix fileinstall, felix config admin
> and the feature.core bundle to cause this when the file is re-written by
> fileinstall(?) and feature.core opening it for reading. Unfortunately I
> have no detailed experience in the combined workings of those components.
>
> Happy to hear any suggestion.
>
> ps: seen with karaf 4.2.8 on RHEL7 and 4.2.9 on docker.
>
> Cheers
> /henning
> --
> *Henning Treu *| Senior Software Engineer
> Mobile: +49 160 1797564
>
> *INSTANA Inc* | 222 South Riverside Plaza, 15th Floor | Chicago, IL 60606 |
> United States
>
> *INSTANA GMBH *| Dunkelnberger Str. 39 | 42697 Solingen | Deutschland
> Sitz der Gesellschaft: Solingen | HRB 26637 | Amtsgericht Wuppertal
> Geschäftsführer: Mirko Novakovic | Pavlo Baron
>
>
>

-- 
*Henning Treu *| Senior Software Engineer
Mobile: +49 160 1797564

*INSTANA Inc* | 222 South Riverside Plaza, 15th Floor | Chicago, IL 60606 |
United States

*INSTANA GMBH *| Dunkelnberger Str. 39 | 42697 Solingen | Deutschland
Sitz der Gesellschaft: Solingen | HRB 26637 | Amtsgericht Wuppertal
Geschäftsführer: Mirko Novakovic | Pavlo Baron

Reply via email to