Chen, Hongzhan <[email protected]> writes:
>>-----Original Message----- >>From: Philippe Gerum <[email protected]> >>Sent: Friday, June 4, 2021 4:15 PM >>To: Jan Kiszka <[email protected]> >>Cc: Chen, Hongzhan <[email protected]>; [email protected] >>Subject: Re: "rtdm/nrtsig: move inband work description off the stack" >> >> >>Jan Kiszka <[email protected]> writes: >> >>> On 04.06.21 09:51, Philippe Gerum wrote: >>>> >>>> Chen, Hongzhan <[email protected]> writes: >>>> >>>>>> Jan Kiszka <[email protected]> writes: >>>>>> >>>>>>> On 26.05.21 15:52, Philippe Gerum wrote: >>>>>>>> >>>>>>>> Jan Kiszka <[email protected]> writes: >>>>>>>> >>>>>>>>> On 25.05.21 15:20, Philippe Gerum wrote: >>>>>>>>>> >>>>>>>>>> Jan Kiszka <[email protected]> writes: >>>>>>>>>> >>>>>>>>>>> On 25.05.21 14:37, Philippe Gerum wrote: >>>>>>>>>>>> >>>>>>>>>>>> Jan Kiszka <[email protected]> writes: >>>>>>>>>>>> >>>>>>>>>>>>> On 25.05.21 12:01, Philippe Gerum wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jan Kiszka <[email protected]> writes: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 25.05.21 10:46, Philippe Gerum wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jan Kiszka <[email protected]> writes: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Philippe, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [1] makes rtdm_schedule_nrt_work a rather heavyweight >>>>>>>>>>>>>>>>> service, compared >>>>>>>>>>>>>>>>> to what it was (and even over I-pipe). xnmalloc is nothing >>>>>>>>>>>>>>>>> you would >>>>>>>>>>>>>>>>> expect from a "send a signal" service, specifically from >>>>>>>>>>>>>>>>> rtdm_nrtsig_pend which does not even make use of the sending >>>>>>>>>>>>>>>>> extra payload. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Can we do better? Also for xnthread_signal (fd and udd >>>>>>>>>>>>>>>>> usecases are not >>>>>>>>>>>>>>>>> time-sensitive anyway). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Nope, I cannot see any significantly better option which would >>>>>>>>>>>>>>>> still >>>>>>>>>>>>>>>> allow a common implementation we can map on top of Dovetail / >>>>>>>>>>>>>>>> I-pipe. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> E.g. pre-allocate the work object for data-free signals and use >>>>>>>>>>>>>>> that - >>>>>>>>>>>>>>> or not send it when it is in use, thus pending. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ok, let's recap: >>>>>>>>>>>>>> >>>>>>>>>>>>>> - rtdm_nrtsig_pend(): does not allocate anything, merely uses a >>>>>>>>>>>>>> user-provided memory block, containing a request header. >>>>>>>>>>>>>> Current >>>>>>>>>>>>>> callers should either already care for not resending a request >>>>>>>>>>>>>> uselessly because it is pending (e.g. [1]), or their internal >>>>>>>>>>>>>> logic >>>>>>>>>>>>>> should prevent that (e.g. [2]). This interface always convey >>>>>>>>>>>>>> user data >>>>>>>>>>>>>> by reference. >>>>>>>>>>>>> >>>>>>>>>>>>> Ah, I fantasized that rtdm_nrtsig_pend would call >>>>>>>>>>>>> rtdm_schedule_nrt_work >>>>>>>>>>>>> to do the work, it's the other way around. False alarm here. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - rtdm_schedule_nrt_work(): does allocate a nrtsig_t struct in >>>>>>>>>>>>>> order to >>>>>>>>>>>>>> convey a request block we cannot squeeze into a work_struct, >>>>>>>>>>>>>> since the >>>>>>>>>>>>>> latter is in itself an anchor type the user may embed into its >>>>>>>>>>>>>> own >>>>>>>>>>>>>> private argument block. I'm unsure ensuring that >>>>>>>>>>>>>> rtdm_schedule_nrt_work() calls do not pile up would be worth >>>>>>>>>>>>>> it, as >>>>>>>>>>>>>> this call is not supposed to be used frenetically on some hot >>>>>>>>>>>>>> path in >>>>>>>>>>>>>> the first place. But if we'd want to do that, then we would >>>>>>>>>>>>>> have to >>>>>>>>>>>>>> change the signature of rtdm_schedule_nrt_work(), so that we >>>>>>>>>>>>>> gain a >>>>>>>>>>>>>> persistent context data we could use for determining whether a >>>>>>>>>>>>>> nrt >>>>>>>>>>>>>> work request is in flight. We could not probe the work_struct >>>>>>>>>>>>>> for that >>>>>>>>>>>>>> purpose, that would be racy. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you see any way to have a smarter rtdm_schedule_nrt_work() >>>>>>>>>>>>>> without >>>>>>>>>>>>>> changing its signature? >>>>>>>>>>>>> >>>>>>>>>>>>> There is no rule that prevents changing the signature if that is >>>>>>>>>>>>> needed. >>>>>>>>>>>>> However, the kernel is fine without allocation as well, using a >>>>>>>>>>>>> very >>>>>>>>>>>>> similar signature. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> schedule_work() and rtdm_schedule_nrt_work() have a major >>>>>>>>>>>> difference: >>>>>>>>>>>> the latter has to pass on the request from the oob to the in-band >>>>>>>>>>>> context. This is what bugs us and creates the requirement for >>>>>>>>>>>> additional >>>>>>>>>>>> mechanism and data. >>>>>>>>>>>> >>>>>>>>>>>>> I do not yet understand way we need that indirection via the >>>>>>>>>>>>> rtdm_nrtsig >>>>>>>>>>>>> on Dovetail. I thought we can trigger work directly from the oob >>>>>>>>>>>>> domain >>>>>>>>>>>>> there, can't we? How do you handle such a case in evl so far? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Dovetail can trigger in-band work from oob via the generic >>>>>>>>>>>> irq_work() >>>>>>>>>>>> service, we don't need the extra code involved in the I-pipe for >>>>>>>>>>>> the >>>>>>>>>>>> same purpose. The gist of the matter is about having the same >>>>>>>>>>>> implementation for rtdm_schedule_nrt_work() which works in both >>>>>>>>>>>> Dovetail >>>>>>>>>>>> and I-pipe contexts: the way we do that is by using >>>>>>>>>>>> rtdm_nrtsig_pend() >>>>>>>>>>>> which already abstracts the base mechanism for relaying oob -> >>>>>>>>>>>> in-band >>>>>>>>>>>> signals. >>>>>>>>>>>> >>>>>>>>>>>> On the other hand, EVL has the evl_call_inband[_from]() service, >>>>>>>>>>>> which >>>>>>>>>>>> combines the irq_work and work_struct anchors we need into a single >>>>>>>>>>>> evl_work type, which in turn can be embedded into a user request >>>>>>>>>>>> block. This is what rtdm_schedule_nrt_work() does not expose, so >>>>>>>>>>>> it has >>>>>>>>>>>> to build one internally by combining a dynamic allocation and the >>>>>>>>>>>> user-provided work_struct. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Then enhance the rtdm service to enable such a combination - and be >>>>>>>>>>> even >>>>>>>>>>> more ready for the switch to Xenomai 4, I would say. We can either >>>>>>>>>>> add a >>>>>>>>>>> new service and deprecate the old one (while implementing it as you >>>>>>>>>>> suggest) or just break the interface to get the (presumably) few >>>>>>>>>>> users >>>>>>>>>>> quickly converted. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> rtdm_schedule_nrt_work() was added by Gilles when merging the RTnet >>>>>>>>>> code >>>>>>>>>> into Xenomai 3 IIRC, with little documentation. Widespread usage of >>>>>>>>>> this >>>>>>>>>> call is unlikely, so I'd vote for simplifying the whole thing and >>>>>>>>>> replace it entirely. >>>>>>>>>> >>>>>>>>> >>>>>>>>> OK, then we have a plan for this piece. >>>>> >>>>> Do we need to implement new API to replace rtdm_schedule_nrt_work() to >>>>> avoid call xnmalloc >>>>> or just keep it as it is in Xenomai 3.2 ? >>>>> >>>>>>>>> >>>>>>>>>>>>> And the third case remains xnthread_signal, btw. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> xnthread_signal() is used to trigger SIGDEBUG/SIGSHADOW/SIGTERM >>>>>>>>>>>> signals >>>>>>>>>>>> for a thread, a seldom event. Optimizing there would be overkill, >>>>>>>>>>>> unless >>>>>>>>>>>> the application behaves wrongly in the first place. >>>>>>>>>>> >>>>>>>>>>> Some caller are under nklock, thus it makes potentially lengthy >>>>>>>>>>> critical >>>>>>>>>>> sections now a bit longer. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This is what needs to be fixed in the first place, we should not >>>>>>>>>> trigger >>>>>>>>>> a signal relay when holding the ugly big lock. >>>>> >>>>> Which functions we need to fix or do optimization to avoid calling >>>>> xnthread_signal when holding >>>>> big lock if possible? xnthread_suspend and xnthread_cancel? >>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> Whatever is simpler. >>>>>>>>> >>>>>>>>> Another aspect: Using xnmalloc for anything that should better succeed >>>>>>>>> is possibly not good when we are OOM. That is actually no strict >>>>>>>>> regression over I-pipe where we had a limited queue as well IIRC but a >>>>>>>>> reason to avoid carrying that over - and possibly creating new error >>>>>>>>> scenarios that are harder to debug. >>>>>>>>> >>>>>>>> >>>>>>>> To mitigate the issue of generalized OOM, we could pull memory from a >>>>>>>> local xnheap privately attached to xnthread_signal() along with any >>>>>>>> caller requiring dynamic allocation, so that none of them could deplete >>>>>>>> the sysheap. >>>>>>>> >>>>>>>> Getting rid of dynamic allocation entirely on the other hand would >>>>>>>> require us to add one or more "signal slots" per possible cause of >>>>>>>> in-band signal into struct xnthread (e.g. SIGSHADOW_ACTION_BACKTRACE, >>>>>>>> SIGDEBUG_MIGRATE_SIGNAL etc), passing the proper slot to >>>>>>>> xnthread_signal(). SIGSHADOW(SIGWINCH) and SIGDEBUG(SIGXCPU) are >>>>>>>> standard non-queueable signals, so we should only need a single slot >>>>>>>> per >>>>>>>> signal cause. >>>>>>>> >>>>>>> >>>>>>> Can a thread actually trigger multiple causes per SIGDEBUG, e.g.? >>>>>>> Likely, we only need to carry the single cause via a field in xnthread >>>>>>> from the trigger point to the in-band handler. The same is probably the >>>>>>> case for SIGSHADOW_ACTION_BACKTRACE. >>>>>>> >>>>>> >>>>>> Multiple events triggering SIGSHADOW_ACTION_HOME and >>>>>> SIGSHADOW_ACTION_HARDEN _might_ pile up, although unlikely. >>>>> >>>>> Does that means that in struct xnthread there should be three slots of >>>>> struct lostage_signal to handle each of SIGDEBUG/SIGSHADOW/SIGTERM >>>>> like patch [1]? >>>>> >>>>> If there is no pile up for each signal , we would just allocate fixed >>>>> position >>>>> for three signals in the array and then handle corresponding >>>>> lostage_signal >>>>> by signo. Is it enough? >>>>> >>>>> [1]: >>>>> >>>>> +++ b/include/cobalt/kernel/thread.h >>>>> @@ -51,6 +51,15 @@ struct xnsched_tpslot; >>>>> struct xnthread_personality; >>>>> struct completion; >>>>> >>>>> +#define XENO_MUM_SIGNALS 3 /*SIGDEBUG/SIGSHADOW/SIGTERM */ >>>>> + >>>>> +struct lostage_signal { >>>>> + struct pipeline_inband_work inband_work; /* Must be first. */ >>>>> + struct task_struct *task; >>>>> + int signo, sigval; >>>>> + struct lostage_signal *self; /* Revisit: I-pipe requirement */ >>>>> +}; >>>>> + >>>>> struct xnthread_init_attr { >>>>> struct xnthread_personality *personality; >>>>> cpumask_t affinity; >>>>> @@ -205,6 +214,7 @@ struct xnthread { >>>>> const char *exe_path; /* Executable path */ >>>>> u32 proghash; /* Hash value for exe_path */ >>>>> #endif >>>>> + struct lostage_signal sigarray[XENO_MUM_SIGNALS]; >>>>> }; >>>>> >>>> >>>> I would have as many entries than we have distinct reasons for notifying >>>> the application. i.e. include/cobalt/uapi/signal.h: >>>> >>>> #define SIGSHADOW_ACTION_HARDEN 1 >>>> #define SIGSHADOW_ACTION_BACKTRACE 2 >>>> #define SIGSHADOW_ACTION_HOME 3 >>>> #define SIGDEBUG_MIGRATE_SIGNAL 1 >>>> #define SIGDEBUG_MIGRATE_SYSCALL 2 >>>> #define SIGDEBUG_MIGRATE_FAULT 3 >>>> #define SIGDEBUG_MIGRATE_PRIOINV 4 >>>> #define SIGDEBUG_NOMLOCK 5 >>>> #define SIGDEBUG_WATCHDOG 6 >>>> #define SIGDEBUG_RESCNT_IMBALANCE 7 >>>> #define SIGDEBUG_LOCK_BREAK 8 >>>> #define SIGDEBUG_MUTEX_SLEEP 9 >>> >>> For SIGDEBUG, I believe we only need the first reason, as I wrote in the >>> other reply. We are interested in the *initial* reason for a thread to >>> migrate. Once it is migrated, it will consume that and only generate a >>> new reason after going back. >>> >> >>We may have SIGDEBUG_WATCHDOG pending concurrently to any other causes, >>this is triggered asynchronously by the oob timer. So at the very least, >>you want this one to have its own slot. >> >>So, to sum up: the following events are synchronous and mutually >>exclusive due to the oob->in-band transition they cause: >> >>SIGDEBUG_MIGRATE_SIGNAL, SIGDEBUG_MIGRATE_SYSCALL, >>SIGDEBUG_MIGRATE_FAULT, SIGDEBUG_MIGRATE_PRIOINV, SIGDEBUG_MUTEX_SLEEP, >>SIGDEBUG_LOCK_BREAK, RESCNT_IMBALANCE. For these we may share a single slot. >>SIGDEBUG_NOMLOCK can only happen at init when no Xenomai service is >>available yet, we don't have to bother. This one can join the about >>group too. >> >>SIGDEBUG_WATCHDOG needs its own slot. >> >>All SIGSHADOW_* events need their own slot: SIGSHADOW_ACTION_HARDEN and >>SIGSHADOW_ACTION_HOME can be raised by remote threads asynchronously to >>the target thread. SIGSHADOW_ACTION_BACKTRACE comes in addition to >>SIGDEBUG_MIGRATE_*. For the latter reason, SIGSHADOW_ACTION_BACKTRACE >>cannot pile up though (single we have to transition from oob->in-band >>context in between). >> >>That would make five distinct slots in struct xnthread each representing >>a group of events, each advertising its own pipeline_inband_work struct. > > OK. SIGTERM (1) + SIGDEBUG(2) + SIGSHADOW(3) = 6 slots in struct xnthread. > Yes. -- Philippe.
