Re: "rtdm/nrtsig: move inband work description off the stack"

Philippe Gerum via Xenomai Fri, 04 Jun 2021 05:54:21 -0700


Chen, Hongzhan <[email protected]> writes:


>>-----Original Message-----
>>From: Philippe Gerum <[email protected]> 
>>Sent: Friday, June 4, 2021 4:15 PM
>>To: Jan Kiszka <[email protected]>
>>Cc: Chen, Hongzhan <[email protected]>; [email protected]
>>Subject: Re: "rtdm/nrtsig: move inband work description off the stack"
>>
>>
>>Jan Kiszka <[email protected]> writes:
>>
>>> On 04.06.21 09:51, Philippe Gerum wrote:
>>>> 
>>>> Chen, Hongzhan <[email protected]> writes:
>>>> 
>>>>>> Jan Kiszka <[email protected]> writes:
>>>>>>
>>>>>>> On 26.05.21 15:52, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>> Jan Kiszka <[email protected]> writes:
>>>>>>>>
>>>>>>>>> On 25.05.21 15:20, Philippe Gerum wrote:
>>>>>>>>>>
>>>>>>>>>> Jan Kiszka <[email protected]> writes:
>>>>>>>>>>
>>>>>>>>>>> On 25.05.21 14:37, Philippe Gerum wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Jan Kiszka <[email protected]> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jan Kiszka <[email protected]> writes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Jan Kiszka <[email protected]> writes:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Philippe,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight 
>>>>>>>>>>>>>>>>> service, compared
>>>>>>>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing 
>>>>>>>>>>>>>>>>> you would
>>>>>>>>>>>>>>>>> expect from a "send a signal" service, specifically from
>>>>>>>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending 
>>>>>>>>>>>>>>>>> extra payload.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd 
>>>>>>>>>>>>>>>>> usecases are not
>>>>>>>>>>>>>>>>> time-sensitive anyway).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Nope, I cannot see any significantly better option which would 
>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>> allow a common implementation we can map on top of Dovetail / 
>>>>>>>>>>>>>>>> I-pipe.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> E.g. pre-allocate the work object for data-free signals and use 
>>>>>>>>>>>>>>> that -
>>>>>>>>>>>>>>> or not send it when it is in use, thus pending.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok, let's recap:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a
>>>>>>>>>>>>>>   user-provided memory block, containing a request header. 
>>>>>>>>>>>>>> Current
>>>>>>>>>>>>>>   callers should either already care for not resending a request
>>>>>>>>>>>>>>   uselessly because it is pending (e.g. [1]), or their internal 
>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>   should prevent that (e.g. [2]). This interface always convey 
>>>>>>>>>>>>>> user data
>>>>>>>>>>>>>>   by reference.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call 
>>>>>>>>>>>>> rtdm_schedule_nrt_work
>>>>>>>>>>>>> to do the work, it's the other way around. False alarm here.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in 
>>>>>>>>>>>>>> order to
>>>>>>>>>>>>>>   convey a request block we cannot squeeze into a work_struct, 
>>>>>>>>>>>>>> since the
>>>>>>>>>>>>>>   latter is in itself an anchor type the user may embed into its 
>>>>>>>>>>>>>> own
>>>>>>>>>>>>>>   private argument block. I'm unsure ensuring that
>>>>>>>>>>>>>>   rtdm_schedule_nrt_work() calls do not pile up would be worth 
>>>>>>>>>>>>>> it, as
>>>>>>>>>>>>>>   this call is not supposed to be used frenetically on some hot 
>>>>>>>>>>>>>> path in
>>>>>>>>>>>>>>   the first place. But if we'd want to do that, then we would 
>>>>>>>>>>>>>> have to
>>>>>>>>>>>>>>   change the signature of rtdm_schedule_nrt_work(), so that we 
>>>>>>>>>>>>>> gain a
>>>>>>>>>>>>>>   persistent context data we could use for determining whether a 
>>>>>>>>>>>>>> nrt
>>>>>>>>>>>>>>   work request is in flight. We could not probe the work_struct 
>>>>>>>>>>>>>> for that
>>>>>>>>>>>>>>   purpose, that would be racy.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() 
>>>>>>>>>>>>>> without
>>>>>>>>>>>>>> changing its signature?
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is no rule that prevents changing the signature if that is 
>>>>>>>>>>>>> needed.
>>>>>>>>>>>>> However, the kernel is fine without allocation as well, using a 
>>>>>>>>>>>>> very
>>>>>>>>>>>>> similar signature.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major 
>>>>>>>>>>>> difference:
>>>>>>>>>>>> the latter has to pass on the request from the oob to the in-band
>>>>>>>>>>>> context. This is what bugs us and creates the requirement for 
>>>>>>>>>>>> additional
>>>>>>>>>>>> mechanism and data.
>>>>>>>>>>>>
>>>>>>>>>>>>> I do not yet understand way we need that indirection via the 
>>>>>>>>>>>>> rtdm_nrtsig
>>>>>>>>>>>>> on Dovetail. I thought we can trigger work directly from the oob 
>>>>>>>>>>>>> domain
>>>>>>>>>>>>> there, can't we? How do you handle such a case in evl so far?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Dovetail can trigger in-band work from oob via the generic 
>>>>>>>>>>>> irq_work()
>>>>>>>>>>>> service, we don't need the extra code involved in the I-pipe for 
>>>>>>>>>>>> the
>>>>>>>>>>>> same purpose. The gist of the matter is about having the same
>>>>>>>>>>>> implementation for rtdm_schedule_nrt_work() which works in both 
>>>>>>>>>>>> Dovetail
>>>>>>>>>>>> and I-pipe contexts: the way we do that is by using 
>>>>>>>>>>>> rtdm_nrtsig_pend()
>>>>>>>>>>>> which already abstracts the base mechanism for relaying oob -> 
>>>>>>>>>>>> in-band
>>>>>>>>>>>> signals.
>>>>>>>>>>>>
>>>>>>>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service, 
>>>>>>>>>>>> which
>>>>>>>>>>>> combines the irq_work and work_struct anchors we need into a single
>>>>>>>>>>>> evl_work type, which in turn can be embedded into a user request
>>>>>>>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so 
>>>>>>>>>>>> it has
>>>>>>>>>>>> to build one internally by combining a dynamic allocation and the
>>>>>>>>>>>> user-provided work_struct.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Then enhance the rtdm service to enable such a combination - and be 
>>>>>>>>>>> even
>>>>>>>>>>> more ready for the switch to Xenomai 4, I would say. We can either 
>>>>>>>>>>> add a
>>>>>>>>>>> new service and deprecate the old one (while implementing it as you
>>>>>>>>>>> suggest) or just break the interface to get the (presumably) few 
>>>>>>>>>>> users
>>>>>>>>>>> quickly converted.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet 
>>>>>>>>>> code
>>>>>>>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage of 
>>>>>>>>>> this
>>>>>>>>>> call is unlikely, so I'd vote for simplifying the whole thing and
>>>>>>>>>> replace it entirely.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> OK, then we have a plan for this piece.
>>>>>
>>>>> Do we  need to implement new API to replace rtdm_schedule_nrt_work() to 
>>>>> avoid call xnmalloc 
>>>>> or just keep it as it is in Xenomai 3.2 ?
>>>>>
>>>>>>>>>
>>>>>>>>>>>>> And the third case remains xnthread_signal, btw.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM 
>>>>>>>>>>>> signals
>>>>>>>>>>>> for a thread, a seldom event. Optimizing there would be overkill, 
>>>>>>>>>>>> unless
>>>>>>>>>>>> the application behaves wrongly in the first place.
>>>>>>>>>>>
>>>>>>>>>>> Some caller are under nklock, thus it makes potentially lengthy 
>>>>>>>>>>> critical
>>>>>>>>>>> sections now a bit longer.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This is what needs to be fixed in the first place, we should not 
>>>>>>>>>> trigger
>>>>>>>>>> a signal relay when holding the ugly big lock.
>>>>>
>>>>> Which functions we need to fix or do optimization to avoid calling 
>>>>> xnthread_signal when holding
>>>>> big lock if possible? xnthread_suspend and xnthread_cancel?  
>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Whatever is simpler.
>>>>>>>>>
>>>>>>>>> Another aspect: Using xnmalloc for anything that should better succeed
>>>>>>>>> is possibly not good when we are OOM. That is actually no strict
>>>>>>>>> regression over I-pipe where we had a limited queue as well IIRC but a
>>>>>>>>> reason to avoid carrying that over - and possibly creating new error
>>>>>>>>> scenarios that are harder to debug.
>>>>>>>>>
>>>>>>>>
>>>>>>>> To mitigate the issue of generalized OOM, we could pull memory from a
>>>>>>>> local xnheap privately attached to xnthread_signal() along with any
>>>>>>>> caller requiring dynamic allocation, so that none of them could deplete
>>>>>>>> the sysheap.
>>>>>>>>
>>>>>>>> Getting rid of dynamic allocation entirely on the other hand would
>>>>>>>> require us to add one or more "signal slots" per possible cause of
>>>>>>>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE,
>>>>>>>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to
>>>>>>>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are
>>>>>>>> standard non-queueable signals, so we should only need a single slot 
>>>>>>>> per
>>>>>>>> signal cause.
>>>>>>>>
>>>>>>>
>>>>>>> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.?
>>>>>>> Likely, we only need to carry the single cause via a field in xnthread
>>>>>>> from the trigger point to the in-band handler. The same is probably the
>>>>>>> case for SIGSHADOW_ACTION_BACKTRACE.
>>>>>>>
>>>>>>
>>>>>> Multiple events triggering SIGSHADOW_ACTION_HOME and
>>>>>> SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely.
>>>>>
>>>>> Does that means that in struct xnthread there should be three slots of
>>>>> struct lostage_signal to handle each of SIGDEBUG/SIGSHADOW/SIGTERM
>>>>> like patch [1]?
>>>>>
>>>>>  If there is no pile up for each signal , we would just allocate fixed 
>>>>> position 
>>>>> for three signals in the array and then handle corresponding  
>>>>> lostage_signal 
>>>>>  by signo. Is it enough?
>>>>>
>>>>> [1]:
>>>>>
>>>>> +++ b/include/cobalt/kernel/thread.h
>>>>> @@ -51,6 +51,15 @@ struct xnsched_tpslot;
>>>>>  struct xnthread_personality;
>>>>>  struct completion;
>>>>>
>>>>> +#define XENO_MUM_SIGNALS 3 /*SIGDEBUG/SIGSHADOW/SIGTERM */
>>>>> +
>>>>> +struct lostage_signal {
>>>>> +       struct pipeline_inband_work inband_work; /* Must be first. */
>>>>> +       struct task_struct *task;
>>>>> +       int signo, sigval;
>>>>> +       struct lostage_signal *self; /* Revisit: I-pipe requirement */
>>>>> +};
>>>>> +
>>>>>  struct xnthread_init_attr {
>>>>>         struct xnthread_personality *personality;
>>>>>         cpumask_t affinity;
>>>>> @@ -205,6 +214,7 @@ struct xnthread {
>>>>>         const char *exe_path;   /* Executable path */
>>>>>         u32 proghash;           /* Hash value for exe_path */
>>>>>  #endif
>>>>> +       struct lostage_signal sigarray[XENO_MUM_SIGNALS];
>>>>>  };
>>>>>
>>>> 
>>>> I would have as many entries than we have distinct reasons for notifying
>>>> the application. i.e. include/cobalt/uapi/signal.h:
>>>> 
>>>> #define SIGSHADOW_ACTION_HARDEN            1
>>>> #define SIGSHADOW_ACTION_BACKTRACE 2
>>>> #define SIGSHADOW_ACTION_HOME              3
>>>> #define SIGDEBUG_MIGRATE_SIGNAL            1
>>>> #define SIGDEBUG_MIGRATE_SYSCALL   2
>>>> #define SIGDEBUG_MIGRATE_FAULT             3
>>>> #define SIGDEBUG_MIGRATE_PRIOINV   4
>>>> #define SIGDEBUG_NOMLOCK           5
>>>> #define SIGDEBUG_WATCHDOG          6
>>>> #define SIGDEBUG_RESCNT_IMBALANCE  7
>>>> #define SIGDEBUG_LOCK_BREAK                8
>>>> #define SIGDEBUG_MUTEX_SLEEP               9
>>>
>>> For SIGDEBUG, I believe we only need the first reason, as I wrote in the
>>> other reply. We are interested in the *initial* reason for a thread to
>>> migrate. Once it is migrated, it will consume that and only generate a
>>> new reason after going back.
>>>
>>
>>We may have SIGDEBUG_WATCHDOG pending concurrently to any other causes,
>>this is triggered asynchronously by the oob timer. So at the very least,
>>you want this one to have its own slot.
>>
>>So, to sum up: the following events are synchronous and mutually
>>exclusive due to the oob->in-band transition they cause:
>>
>>SIGDEBUG_MIGRATE_SIGNAL, SIGDEBUG_MIGRATE_SYSCALL,
>>SIGDEBUG_MIGRATE_FAULT, SIGDEBUG_MIGRATE_PRIOINV, SIGDEBUG_MUTEX_SLEEP,
>>SIGDEBUG_LOCK_BREAK, RESCNT_IMBALANCE. For these we may share a single slot.
>>SIGDEBUG_NOMLOCK can only happen at init when no Xenomai service is
>>available yet, we don't have to bother. This one can join the about
>>group too.
>>
>>SIGDEBUG_WATCHDOG needs its own slot.
>>
>>All SIGSHADOW_* events need their own slot: SIGSHADOW_ACTION_HARDEN and
>>SIGSHADOW_ACTION_HOME can be raised by remote threads asynchronously to
>>the target thread. SIGSHADOW_ACTION_BACKTRACE comes in addition to
>>SIGDEBUG_MIGRATE_*. For the latter reason, SIGSHADOW_ACTION_BACKTRACE
>>cannot pile up though (single we have to transition from oob->in-band
>>context in between).
>>
>>That would make five distinct slots in struct xnthread each representing
>>a group of events, each advertising its own pipeline_inband_work struct.
>
> OK.   SIGTERM (1) + SIGDEBUG(2) + SIGSHADOW(3) = 6 slots in struct xnthread.
>

Yes.

-- 
Philippe.

Re: "rtdm/nrtsig: move inband work description off the stack"

Reply via email to