On 4/28/19 5:26 PM, Philippe Gerum via Xenomai wrote:
> On 4/27/19 12:20 AM, Steve Freyder wrote:
>> On 4/26/2019 4:18 PM, Lowell Gilbert via Xenomai wrote:
>>> Hi.
>>>
>>> I have an application working successfully with Xenomai 3.0.8 on a 4.14
>>> kernel. I use Yocto to build the system; when I tried to move to a newer
>>> version of Yocto, my application hung on trying to become a daemon. This
>>> is happening with the daemon() call (which is what I've used up to now)
>>> and with fork().
>>>
>>> I built a test application so that I could confirm that this problem
>>> only occurs when I link (and wrap) with Xenomai. However, Xenomai
>>> doesn't seem to do anything significant with fork, so I'm puzzled about
>>> why this might be happening. I am not using libdaemon.
>>>
>>> Here are the changes that I thought might be significant:
>>> | newer (nonworking setup)  | older (working) |
>>> | gcc-cross-arm-8.2.0       |           7.3.0 |
>>> | glibc-2.28                |            2.26 |
>>> | glib-2.0-1_2.58.0         |     1_2.52.3-r0 |
>>> | binutils-cross-arm-2.31.1 |          2.29.1 |
>>> | coreutils-8.30            |            8.27 |
>>>
>>> Does anything jump out as a candidate for causing problems with a fork()
>>> call? Is there anything else I should be considering?
>>>
>>> Thanks.
>>>
>>> Be well.
>>>
>> I can tell you that I have a hang issue due to fork() in a Xenomai
>> program if, after the fork(), I don't do an exec().  I believe
>> the hang is related to registry access, and the fact that the
>> Unix domain socket connecting to sysregd that is inherited by
>> the forked process (which has FD_CLOEXEC set) hasn't yet gotten
>> closed (no exec() yet so no action on FD_CLOEXEC flags yet).
>>
>> If you are running into the same problem, and you don't require
>> registry access, you should see the problem go away if you throw
>> the --no-registry switch on the command line that invokes your
>> program.  That's not a real fix, but it's perhaps a way to know
>> if you're seeing a related problem.
>>
>> In my case, the way I see the "hang" is via an attempt to list
>> the contents of /run/xenomai using find:
>>
>> root:~ # find /run/xenomai
>>
>> If I run a program XX that uses the registry, that does a fork() call
>> and then does not exec(), and while that program is running, I
>> execute the above find command, it will hang part way through the
>> listing.  If I kill program XX, the listing continues (un-hangs).
>>
>> If I run a program that uses the registry, that does a fork() and
>> then an exec(), no such hang occurs during the find command.
>>
>> Phillipe made the change to fix this originally by adding SOCK_CLOEXEC
>> to the socket() call in sysreg.c, and it did fix it but I realized
>> much later it fixes it only if you actually call exec(), which in my
>> code I always do, but more recently one of our developers had some
>> code that didn't exec(), which was the first time I saw this hang.
>>
>> Phillipe, I had it on my list to ask you about this but it hasn't
>> bitten me lately and I forgot until I saw this msg about fork().
>>
>> I think deamonizing in its canonical form of: fork(), let the forked
>> process take over, and then exit() in the parent, is problematic when
>> you have a wrapped main() where the wrappers already initialized the
>> sysreg mechanism but the process that was done for is now gone, and
>> the fork()'ed process has no idea it has a sysreg socket in hand.
>>
>> Perhaps the better answer when daemonizing is to use --no-init and then
>> have the forked() process do manual xenomai_init() call?
>>
> 
> I don't know yet, I'll follow up on this.
> 

Could you try the patch below? Ideally, we should have this in 3.0.9 if this 
improves the situation.

Thanks,

diff --git a/lib/cobalt/init.c b/lib/cobalt/init.c
index abd990692..02a99c569 100644
--- a/lib/cobalt/init.c
+++ b/lib/cobalt/init.c
@@ -184,20 +184,26 @@ static void low_init(void)
        cobalt_ticks_init(f->clock_freq);
 }
 
+static int cobalt_init_2(void);
+
 static void cobalt_fork_handler(void)
 {
        cobalt_unmap_umm();
        cobalt_clear_tsd();
        cobalt_print_init_atfork();
-       if (cobalt_init())
+       if (cobalt_init_2())
                exit(EXIT_FAILURE);
 }
 
-static void __cobalt_init(void)
+static inline void commit_stack_memory(void)
 {
-       struct sigaction sa;
+       char stk[PTHREAD_STACK_MIN / 2];
+       cobalt_commit_memory(stk);
+}
 
-       low_init();
+static void cobalt_init_1(void)
+{
+       struct sigaction sa;
 
        sa.sa_sigaction = cobalt_sigdebug_handler;
        sigemptyset(&sa.sa_mask);
@@ -228,20 +234,9 @@ static void __cobalt_init(void)
                            " sizeof(cobalt_sem_shadow): %Zd!",
                            sizeof(sem_t),
                            sizeof(struct cobalt_sem_shadow));
-
-       cobalt_mutex_init();
-       cobalt_sched_init();
-       cobalt_thread_init();
-       cobalt_print_init();
 }
 
-static inline void commit_stack_memory(void)
-{
-       char stk[PTHREAD_STACK_MIN / 2];
-       cobalt_commit_memory(stk);
-}
-
-int cobalt_init(void)
+static int cobalt_init_2(void)
 {
        pthread_t ptid = pthread_self();
        struct sched_param parm;
@@ -249,7 +244,12 @@ int cobalt_init(void)
 
        commit_stack_memory();  /* We only need this for the main thread */
        cobalt_default_condattr_init();
-       __cobalt_init();
+
+       low_init();
+       cobalt_mutex_init();
+       cobalt_sched_init();
+       cobalt_thread_init();
+       cobalt_print_init();
 
        if (__cobalt_control_bind)
                return 0;
@@ -288,12 +288,19 @@ int cobalt_init(void)
        return 0;
 }
 
+int cobalt_init(void)
+{
+       cobalt_init_1();
+
+       return cobalt_init_2();
+}
+
 static int get_int_arg(const char *name, const char *arg,
                       int *valp, int min)
 {
        int value, ret;
        char *p;
-       
+
        errno = 0;
        value = (int)strtol(arg, &p, 10);
        if (errno || *p || value < min) {


-- 
Philippe.

Reply via email to