Hi,

On 26/10/2022 17:30, Stefano Stabellini wrote:
> 
> 
> On Wed, 26 Oct 2022, Bertrand Marquis wrote:
>>> On 26 Oct 2022, at 12:29, Michal Orzel <michal.or...@amd.com> wrote:
>>>
>>> Hi all,
>>>
>>> On 25/10/2022 10:20, Bertrand Marquis wrote:
>>>> Caution: This message originated from an External Source. Use proper 
>>>> caution when opening attachments, clicking links, or responding.
>>>>
>>>>
>>>> Hi Michal,
>>>>
>>>>> On 25 Oct 2022, at 09:07, Michal Orzel <michal.or...@amd.com> wrote:
>>>>>
>>>>> Hi Bertrand,
>>>>>
>>>>> On 25/10/2022 09:45, Bertrand Marquis wrote:
>>>>>>
>>>>>>
>>>>>> Hi Michal,
>>>>>>
>>>>>>> On 25 Oct 2022, at 08:11, Michal Orzel <michal.or...@amd.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 25/10/2022 03:29, Stefano Stabellini wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, 24 Oct 2022, Julien Grall wrote:
>>>>>>>>>> On 24/10/2022 12:51, Julien Grall wrote:
>>>>>>>>>>> Caution: This message originated from an External Source. Use proper
>>>>>>>>>>> caution when opening attachments, clicking links, or responding.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 24/10/2022 10:07, Michal Orzel wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> Hi Michal,
>>>>>>>>>>>
>>>>>>>>>>>> Recently I came across a deadcode in Xen Arm arch timer code. 
>>>>>>>>>>>> Briefly
>>>>>>>>>>>> speaking, we are routing
>>>>>>>>>>>> the NS phys timer (CNTP) IRQ to Xen, even though Xen does not make 
>>>>>>>>>>>> use
>>>>>>>>>>>> of it (as it uses the hypervisor timer CNTHP).
>>>>>>>>>>>> This timer is fully emulated, which means that there is nothing 
>>>>>>>>>>>> that can
>>>>>>>>>>>> trigger such IRQ. This code is
>>>>>>>>>>>> a left over from early days, where the CNTHP was buggy on some 
>>>>>>>>>>>> models
>>>>>>>>>>>> and we had to use the CNTP instead.
>>>>>>>>>>>>
>>>>>>>>>>>> As far as the problem itself is not really interesting, it raises a
>>>>>>>>>>>> question of what to do with a deadcode,
>>>>>>>>>>>> as there might be/are other deadcode places in Xen.
>>>>>>>>>>>
>>>>>>>>>>> There are multiple definition of deadcode. Depending on which one 
>>>>>>>>>>> you
>>>>>>>>>>> chose, then this could cover IS_ENABLED() and possibly #ifdef. So 
>>>>>>>>>>> this
>>>>>>>>>>> would result to a lot of places impacted with the decision.
>>>>>>>>>>>
>>>>>>>>>>> So can you clarify what you mean by deadcode?
>>>>>>>>>> In the timer example, I think we have both a deadcode and 
>>>>>>>>>> unreachable code.
>>>>>>>>>> For the purpose of this discussion, let's take the MISRA definition 
>>>>>>>>>> of a
>>>>>>>>>> deadcode which is a "code that can be executed
>>>>>>>>>> but has no effect on the functional behavior of the program". This 
>>>>>>>>>> differs
>>>>>>>>>> from the unreachable code definition that is
>>>>>>>>>> a "code that cannot be executed". Setting up the IRQ for Xen is an 
>>>>>>>>>> example
>>>>>>>>>> of a deadcode. Code within IRQ handler is an unreachable code
>>>>>>>>>> (there is nothing that can trigger this IRQ).
>>>>>>>>>>
>>>>>>>>>> What I mean by deadcode happens to be the sum of the two cases above 
>>>>>>>>>> i.e.
>>>>>>>>>> the code that cannot be executed as well as the code that
>>>>>>>>>> does not impact the functionality of the program.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> One may say that it is useful to keep it, because one day,
>>>>>>>>>>>> someone might need it when dealing with yet another broken HW. Such
>>>>>>>>>>>> person would still need to modify the other
>>>>>>>>>>>> part of the code (e.g. reprogram_timer), but there would be less 
>>>>>>>>>>>> work
>>>>>>>>>>>> required overall. Personally, I'm not in favor of
>>>>>>>>>>>> such approach, because we should not really support possible 
>>>>>>>>>>>> scenarios
>>>>>>>>>>>> with broken HW (except for erratas listing known issues).
>>>>>>>>>>>
>>>>>>>>>>> The difference between "broken HW" and "HW with known errata" is a 
>>>>>>>>>>> bit
>>>>>>>>>>> unclear to me. Can you clarify how you would make the difference 
>>>>>>>>>>> here?
>>>>>>>>>>>
>>>>>>>>>>> In particular, at which point do you consider that the HW should 
>>>>>>>>>>> not be
>>>>>>>>>>> supported by Xen?
>>>>>>>>>> I'm not saying that HW should not be supported. The difference for me
>>>>>>>>>> between broken HW and
>>>>>>>>>> HW with known errata is that for the former, the incorrect behavior 
>>>>>>>>>> is often
>>>>>>>>>> due to the early support stage,
>>>>>>>>>> using emulators/models instead of real HW, whereas for the latter, 
>>>>>>>>>> the HW is
>>>>>>>>>> already released and it happens to be that it is buggy
>>>>>>>>>> (the HW vendor is aware of the issue and released erratas).
>>>>>>>>>
>>>>>>>>> Thanks for the clarification. What I would call broken is anything 
>>>>>>>>> that can't
>>>>>>>>> be fixed in software. For a not too fictional example, an HW where 
>>>>>>>>> PCI devices
>>>>>>>>> are using the same stream ID. So effectively, passthrough can't be 
>>>>>>>>> safely
>>>>>>>>> supported.
>>>>>>>>>
>>>>>>>>> Regarding, not yet released HW, I don't think Xen should have 
>>>>>>>>> workaround for
>>>>>>>>> them. I wouldn't even call it "broken" because they are not yet 
>>>>>>>>> released and
>>>>>>>>> it is common to have bug in early revision.
>>>>>>>>>
>>>>>>>>>> Do we have any example in Xen for supporting broken HW,
>>>>>>>>>> whose vendor is not aware of the issue or did not release any errata?
>>>>>>>>> I will not cite any HW on the ML. But from my experience, the vendors 
>>>>>>>>> are not
>>>>>>>>> very vocal about issues in public (some don't even seem to have 
>>>>>>>>> public doc).
>>>>>>>>> The best way to find the issues is to look at Linux commit.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Also, as part of the certification/FUSA process, there should be no
>>>>>>>>>>>> deadcode and we should have explanation for every block of code.
>>>>>>>>>>>
>>>>>>>>>>> See above. What are you trying to cover by deadcode? Would 
>>>>>>>>>>> protecting
>>>>>>>>>>> code with IS_ENABLED() (or #ifdef) ok?
>>>>>>>>>> I think this would be ok from the certification point of view (this 
>>>>>>>>>> would at
>>>>>>>>>> least means, that we are aware of the issue
>>>>>>>>>> and we took some steps). Otherwise, such code is just an example of a
>>>>>>>>>> deadcode/unreachable code.
>>>>>>>>>
>>>>>>>>> Thanks for the clarification. So the exact approach will depend on the
>>>>>>>>> context....
>>>>>>>>>
>>>>>>>>>>>> There are different ways to deal with a deadcode: > 1. Get rid of 
>>>>>>>>>>>> it
>>>>>>>>>>>> completely
>>>>>>>>>>>> 2. Leave it as it is
>>>>>>>>>
>>>>>>>>> ... this is my preference in the context of the timer.
>>>>>>>>
>>>>>>>> From a certification point of view, the fewer lines of code the better,
>>>>>>>> and ideally all the lines of code used for the certified build should 
>>>>>>>> be
>>>>>>>> testable and used.
>>>>>>>>
>>>>>>>> So I think 2. is the lest useful option from a certification
>>>>>>>> perspective. For this reason, I'd prefer another alternative.
>>>>>>>>
>>>>>>>>
>>>>>>>>> If the other don't like it, then 1 would be my preference.
>>>>>>>>>
>>>>>>>>> In general, my preference would be either 3.3 or 3.2 (see below).
>>>>>>>>
>>>>>>>> I also think that 3.2 and 3.3 are good options for the general case. 
>>>>>>>> For
>>>>>>>> the timer, I can see why 1 is your (second) preference and I am fine
>>>>>>>> with 1 as well.
>>>>>>> Ok, sounds good to me. Let's still give Bertrand the chance to share 
>>>>>>> his opinion.
>>>>>>
>>>>>> We need to get rid of dead code and removing it is not always the best 
>>>>>> solution.
>>>>>>
>>>>>> If the code is or could be useful for someone some day, protecting it 
>>>>>> with ifdef is ok.
>>>>>>
>>>>>> In the mid term we will have to introduce a lot more ifdef or IS_ENABLED 
>>>>>> in the
>>>>>> code so that we can compile out what we do not need and code not 
>>>>>> applying to
>>>>>> some hardware is a case where we will do that (does not mean that by 
>>>>>> default
>>>>>> we will not compile it in but we will make it easier to reduce the code 
>>>>>> size for a
>>>>>> specific use case).
>>>>>>
>>>>>> So 3.2 and 3.3 are ok for me.
>>>>>
>>>>> So we all agree that the code in the current form is a no go from 
>>>>> certification purposes.
>>>>> That is good :)
>>>>>
>>>>> The reason why I opt for solution 1 and not the others is that in the 
>>>>> latter case it would
>>>>> mean introducing the Kconfig option to allow user to select the timer to 
>>>>> be used by Xen.
>>>>> This is not really correct. Also in the current form, it would also 
>>>>> require adding more
>>>>> code to time.c code because at the moment using CNTP for Xen would not 
>>>>> work out of the box.
>>>>> The architecture defines the hypervisor timer for a purpose. If it does 
>>>>> not work, it means
>>>>> that the HW is problematic. I agree that we would want to support such 
>>>>> use case but I'm not
>>>>> really aware of any issue like that. Adding more code and Kconfig options 
>>>>> just because
>>>>> one day someone may face issues with a new HW is something I am not a fan 
>>>>> of.
>>>>
>>>> I see 2 solutions here:
>>>> - somehow push the code to a different file (not quite sure this is 
>>>> feasible here)
>>>> - remove completely the code with a clean commit. Doing this it will be 
>>>> easy for someone needing this to later revert the patch
>>>>
>>>> It is definitely true here that adding more code to keep some unused code 
>>>> does not really make sense.
>>>> And let’s be realistic here, if we need that one day, it will not take 
>>>> ages to support it somehow.
>>>>
>>>> As said, from a pure certification point of view:
>>>> - we must not have deadcode
>>>> - proper ifdef is acceptable
>>>> - if 0 is not acceptable
>>>> - commented code is not acceptable
>>>
>>> Given that we agree on that (+ IS_ENABLED option if possible), and the 
>>> option 1 seems
>>> to be the best choice for the timer, I will create a patch removing the IRQ 
>>> path to get rid
>>> of the deadcode/unreachable code.
>>>
>>> Do you think this is something we want for 4.17?
>>> The risk is low as the code is already dead and the benefit is that we have 
>>> no deadcode.
>>> What do you think?
>>>
>>
>> We are very near from the release so from my point of view as it is not 
>> solving a bug, this should not go into 4.17.
> 
> I agree
This works for me and thank you all for the discussion!
This thread and its outcome will certainly be useful for similar issues in the 
future.

~Michal

Reply via email to