Hi, On 26/10/2022 17:30, Stefano Stabellini wrote: > > > On Wed, 26 Oct 2022, Bertrand Marquis wrote: >>> On 26 Oct 2022, at 12:29, Michal Orzel <michal.or...@amd.com> wrote: >>> >>> Hi all, >>> >>> On 25/10/2022 10:20, Bertrand Marquis wrote: >>>> Caution: This message originated from an External Source. Use proper >>>> caution when opening attachments, clicking links, or responding. >>>> >>>> >>>> Hi Michal, >>>> >>>>> On 25 Oct 2022, at 09:07, Michal Orzel <michal.or...@amd.com> wrote: >>>>> >>>>> Hi Bertrand, >>>>> >>>>> On 25/10/2022 09:45, Bertrand Marquis wrote: >>>>>> >>>>>> >>>>>> Hi Michal, >>>>>> >>>>>>> On 25 Oct 2022, at 08:11, Michal Orzel <michal.or...@amd.com> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> On 25/10/2022 03:29, Stefano Stabellini wrote: >>>>>>>> >>>>>>>> >>>>>>>> On Mon, 24 Oct 2022, Julien Grall wrote: >>>>>>>>>> On 24/10/2022 12:51, Julien Grall wrote: >>>>>>>>>>> Caution: This message originated from an External Source. Use proper >>>>>>>>>>> caution when opening attachments, clicking links, or responding. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 24/10/2022 10:07, Michal Orzel wrote: >>>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> Hi Michal, >>>>>>>>>>> >>>>>>>>>>>> Recently I came across a deadcode in Xen Arm arch timer code. >>>>>>>>>>>> Briefly >>>>>>>>>>>> speaking, we are routing >>>>>>>>>>>> the NS phys timer (CNTP) IRQ to Xen, even though Xen does not make >>>>>>>>>>>> use >>>>>>>>>>>> of it (as it uses the hypervisor timer CNTHP). >>>>>>>>>>>> This timer is fully emulated, which means that there is nothing >>>>>>>>>>>> that can >>>>>>>>>>>> trigger such IRQ. This code is >>>>>>>>>>>> a left over from early days, where the CNTHP was buggy on some >>>>>>>>>>>> models >>>>>>>>>>>> and we had to use the CNTP instead. >>>>>>>>>>>> >>>>>>>>>>>> As far as the problem itself is not really interesting, it raises a >>>>>>>>>>>> question of what to do with a deadcode, >>>>>>>>>>>> as there might be/are other deadcode places in Xen. >>>>>>>>>>> >>>>>>>>>>> There are multiple definition of deadcode. Depending on which one >>>>>>>>>>> you >>>>>>>>>>> chose, then this could cover IS_ENABLED() and possibly #ifdef. So >>>>>>>>>>> this >>>>>>>>>>> would result to a lot of places impacted with the decision. >>>>>>>>>>> >>>>>>>>>>> So can you clarify what you mean by deadcode? >>>>>>>>>> In the timer example, I think we have both a deadcode and >>>>>>>>>> unreachable code. >>>>>>>>>> For the purpose of this discussion, let's take the MISRA definition >>>>>>>>>> of a >>>>>>>>>> deadcode which is a "code that can be executed >>>>>>>>>> but has no effect on the functional behavior of the program". This >>>>>>>>>> differs >>>>>>>>>> from the unreachable code definition that is >>>>>>>>>> a "code that cannot be executed". Setting up the IRQ for Xen is an >>>>>>>>>> example >>>>>>>>>> of a deadcode. Code within IRQ handler is an unreachable code >>>>>>>>>> (there is nothing that can trigger this IRQ). >>>>>>>>>> >>>>>>>>>> What I mean by deadcode happens to be the sum of the two cases above >>>>>>>>>> i.e. >>>>>>>>>> the code that cannot be executed as well as the code that >>>>>>>>>> does not impact the functionality of the program. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> One may say that it is useful to keep it, because one day, >>>>>>>>>>>> someone might need it when dealing with yet another broken HW. Such >>>>>>>>>>>> person would still need to modify the other >>>>>>>>>>>> part of the code (e.g. reprogram_timer), but there would be less >>>>>>>>>>>> work >>>>>>>>>>>> required overall. Personally, I'm not in favor of >>>>>>>>>>>> such approach, because we should not really support possible >>>>>>>>>>>> scenarios >>>>>>>>>>>> with broken HW (except for erratas listing known issues). >>>>>>>>>>> >>>>>>>>>>> The difference between "broken HW" and "HW with known errata" is a >>>>>>>>>>> bit >>>>>>>>>>> unclear to me. Can you clarify how you would make the difference >>>>>>>>>>> here? >>>>>>>>>>> >>>>>>>>>>> In particular, at which point do you consider that the HW should >>>>>>>>>>> not be >>>>>>>>>>> supported by Xen? >>>>>>>>>> I'm not saying that HW should not be supported. The difference for me >>>>>>>>>> between broken HW and >>>>>>>>>> HW with known errata is that for the former, the incorrect behavior >>>>>>>>>> is often >>>>>>>>>> due to the early support stage, >>>>>>>>>> using emulators/models instead of real HW, whereas for the latter, >>>>>>>>>> the HW is >>>>>>>>>> already released and it happens to be that it is buggy >>>>>>>>>> (the HW vendor is aware of the issue and released erratas). >>>>>>>>> >>>>>>>>> Thanks for the clarification. What I would call broken is anything >>>>>>>>> that can't >>>>>>>>> be fixed in software. For a not too fictional example, an HW where >>>>>>>>> PCI devices >>>>>>>>> are using the same stream ID. So effectively, passthrough can't be >>>>>>>>> safely >>>>>>>>> supported. >>>>>>>>> >>>>>>>>> Regarding, not yet released HW, I don't think Xen should have >>>>>>>>> workaround for >>>>>>>>> them. I wouldn't even call it "broken" because they are not yet >>>>>>>>> released and >>>>>>>>> it is common to have bug in early revision. >>>>>>>>> >>>>>>>>>> Do we have any example in Xen for supporting broken HW, >>>>>>>>>> whose vendor is not aware of the issue or did not release any errata? >>>>>>>>> I will not cite any HW on the ML. But from my experience, the vendors >>>>>>>>> are not >>>>>>>>> very vocal about issues in public (some don't even seem to have >>>>>>>>> public doc). >>>>>>>>> The best way to find the issues is to look at Linux commit. >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Also, as part of the certification/FUSA process, there should be no >>>>>>>>>>>> deadcode and we should have explanation for every block of code. >>>>>>>>>>> >>>>>>>>>>> See above. What are you trying to cover by deadcode? Would >>>>>>>>>>> protecting >>>>>>>>>>> code with IS_ENABLED() (or #ifdef) ok? >>>>>>>>>> I think this would be ok from the certification point of view (this >>>>>>>>>> would at >>>>>>>>>> least means, that we are aware of the issue >>>>>>>>>> and we took some steps). Otherwise, such code is just an example of a >>>>>>>>>> deadcode/unreachable code. >>>>>>>>> >>>>>>>>> Thanks for the clarification. So the exact approach will depend on the >>>>>>>>> context.... >>>>>>>>> >>>>>>>>>>>> There are different ways to deal with a deadcode: > 1. Get rid of >>>>>>>>>>>> it >>>>>>>>>>>> completely >>>>>>>>>>>> 2. Leave it as it is >>>>>>>>> >>>>>>>>> ... this is my preference in the context of the timer. >>>>>>>> >>>>>>>> From a certification point of view, the fewer lines of code the better, >>>>>>>> and ideally all the lines of code used for the certified build should >>>>>>>> be >>>>>>>> testable and used. >>>>>>>> >>>>>>>> So I think 2. is the lest useful option from a certification >>>>>>>> perspective. For this reason, I'd prefer another alternative. >>>>>>>> >>>>>>>> >>>>>>>>> If the other don't like it, then 1 would be my preference. >>>>>>>>> >>>>>>>>> In general, my preference would be either 3.3 or 3.2 (see below). >>>>>>>> >>>>>>>> I also think that 3.2 and 3.3 are good options for the general case. >>>>>>>> For >>>>>>>> the timer, I can see why 1 is your (second) preference and I am fine >>>>>>>> with 1 as well. >>>>>>> Ok, sounds good to me. Let's still give Bertrand the chance to share >>>>>>> his opinion. >>>>>> >>>>>> We need to get rid of dead code and removing it is not always the best >>>>>> solution. >>>>>> >>>>>> If the code is or could be useful for someone some day, protecting it >>>>>> with ifdef is ok. >>>>>> >>>>>> In the mid term we will have to introduce a lot more ifdef or IS_ENABLED >>>>>> in the >>>>>> code so that we can compile out what we do not need and code not >>>>>> applying to >>>>>> some hardware is a case where we will do that (does not mean that by >>>>>> default >>>>>> we will not compile it in but we will make it easier to reduce the code >>>>>> size for a >>>>>> specific use case). >>>>>> >>>>>> So 3.2 and 3.3 are ok for me. >>>>> >>>>> So we all agree that the code in the current form is a no go from >>>>> certification purposes. >>>>> That is good :) >>>>> >>>>> The reason why I opt for solution 1 and not the others is that in the >>>>> latter case it would >>>>> mean introducing the Kconfig option to allow user to select the timer to >>>>> be used by Xen. >>>>> This is not really correct. Also in the current form, it would also >>>>> require adding more >>>>> code to time.c code because at the moment using CNTP for Xen would not >>>>> work out of the box. >>>>> The architecture defines the hypervisor timer for a purpose. If it does >>>>> not work, it means >>>>> that the HW is problematic. I agree that we would want to support such >>>>> use case but I'm not >>>>> really aware of any issue like that. Adding more code and Kconfig options >>>>> just because >>>>> one day someone may face issues with a new HW is something I am not a fan >>>>> of. >>>> >>>> I see 2 solutions here: >>>> - somehow push the code to a different file (not quite sure this is >>>> feasible here) >>>> - remove completely the code with a clean commit. Doing this it will be >>>> easy for someone needing this to later revert the patch >>>> >>>> It is definitely true here that adding more code to keep some unused code >>>> does not really make sense. >>>> And let’s be realistic here, if we need that one day, it will not take >>>> ages to support it somehow. >>>> >>>> As said, from a pure certification point of view: >>>> - we must not have deadcode >>>> - proper ifdef is acceptable >>>> - if 0 is not acceptable >>>> - commented code is not acceptable >>> >>> Given that we agree on that (+ IS_ENABLED option if possible), and the >>> option 1 seems >>> to be the best choice for the timer, I will create a patch removing the IRQ >>> path to get rid >>> of the deadcode/unreachable code. >>> >>> Do you think this is something we want for 4.17? >>> The risk is low as the code is already dead and the benefit is that we have >>> no deadcode. >>> What do you think? >>> >> >> We are very near from the release so from my point of view as it is not >> solving a bug, this should not go into 4.17. > > I agree This works for me and thank you all for the discussion! This thread and its outcome will certainly be useful for similar issues in the future.
~Michal