On 26/03/2019 13:39, Jan Beulich wrote:
>>>> On 26.03.19 at 13:43, <andrew.coop...@citrix.com> wrote:
>> On 26/03/2019 09:08, Jan Beulich wrote:
>>>>>> Leave the warning which identifies the problematic devices, but drop the
>>>>>> remaining logic.  This leaves the system in better overall state, and 
>>>>>> working
>>>>>> in the same way that it did in previous releases.
>>>>> I wonder whether you've taken the time to look at the description
>>>>> of the commit first introducing this logic (a8059ffced "VT-d: improve
>>>>> RMRR validity checking"). I find it worrying in particular to
>>>>> effectively revert a change which claims 'to avoid any security
>>>>> vulnerability with malicious s/s re-enabling "supposed disabled"
>>>>> devices' without any discussion of why that may have been a
>>>>> wrong perspective to take.
>>>> I had, and as a maintainer, I'd reject a patch like that were it
>>>> presented today.
>>> Understood. But whether you'd accept it with a better description
>>> is unknown, I assume.
>> I severely doubt I'd accept it at all, because it is entirely
>> unreasonable behaviour.
>>
>> At best, it is the equivalent of throwing your hands up in the air and
>> saying "I give up", and that is not good enough behaviour for Xen.
>>
>>>> There is a nebulous claim of security, but it is exactly that -
>>>> nebulous.  There isn't enough information to work out what the concern
>>>> was, and even if the concern was valid, disabling VT-d across the system
>>>> isn't an appropriate action to take.
>>> This heavily depends on the position the system's admin takes:
>>> Enabling VT-d in an incomplete fashion may as well be considered
>>> worse than not enabling it at all.
>> No - that's simply not true, or a reasonable position to take. 
> As is every way of thinking differently than you do?

No, but I do expect common sense to be used in the judgement of what is
appropriate and/or reasonable end user behaviour.

> I'm sorry to
> be putting it this way, but you continue to make claims about
> how people ought to think without giving any reason why that's
> the only valid way. I can't see anything wrong with someone
> putting themselves on the position that if they see an enabled
> IOMMU, they assume that pass-through is as safe as it can
> (currently) be.

Once again, we get back to a un-justified (and disputed) claim of
"security".

*What* is unsafe about having non-active devices behind an IOMMU?

How does this scenario differ from one of PCI hotplug where the device
really doesn't exist at boot time, and comes into existence later?

> Just to then be caught by surprise that there is
> a device not actually handled by any IOMMU? After all a non-
> existent device listed in a table may as well be a hint that it's
> just its SBDF which the firmware got wrong.

and where does Xen currently check this?  (this is a rhetorical question
- Xen cannot check this.)

There is absolutely nothing *at all* which guarantees that just because
a number of devices are identified to be behind specific IOMMUs, that
DMA wont start appearing from elsewhere in the system.

Security of the system when it comes to IOMMUs *is and always will be* a
mutually cooperative and trusting relationship between Xen and the firmware.

The notion of "I'm safe because there were no inconsistencies in a piece
of information I have to trust fully" is security theatre, not security.

>
>> Disabling the IOMMU prevents the system from booting with a PVH dom0.
> But doing what you did is not the only way of getting around this.
> Defaulting to workaround_bios_bug=1 in the PVH case would be
> another, as would be a mode in which the IOMMU exists for Dom0's
> purposes only (i.e. still disallowing any pass-through to DomU-s).

A discussion along these lines might be appropriate in the middle of a
dev cycle, and might even be valid for a discussion of future improvements.

It is not appropriate for resolving an issue identified as a 4.12
blocker by the RM, on a timescale which needs to fit into the 4.12
release plans.

>> I am not aware of a credible case where partially enabled VT-d is less
>> secure than no VT-d, and there is one headline case now where disabled
>> VT-d causes a failure to boot.
>>
>>> Furthermore, as much as the security related claim there is
>>> nebulous, your description - I'm sorry to say that - isn't much
>>> better, as you don't clarify why there's _no_ security aspect
>>> there. Stating that "this leaves the system in better overall
>>> state" without making clear why that is _for everyone_ is not
>>> helpful at all.
>> The nebulous security claim is not relevant to this patch.
>>
>> This code was not run previously.  An unexpected consequence of a change
>> in 4.12 caused it to run, and break booting on some (sadly rather
>> common) systems.
>>
>> This is a regression in 4.12 and needs resolving.  The choice is between
>> reverting dcf41790 or removing this code, and reverting dcf41790 is
>> obviously not a valid thing to do.
> As explained before, there was an earlier regression, which - if it
> had been noticed in time - would have made all versions from 4.2
> to 4.11 behave like 4.12 without your change.

And if you'd not broken the behaviour back in 4.2, this class of system
would have been discovered the first time someone tried booting Xen on
it, not at the point someone is trying to upgrade 4.11 to 4.12.

I also wouldn't be pushing this specific fix in this specific
situation.  (In reality, I'd have already pushed this fix when the bug
was originally reported, because its unreasonable behaviour no matter
the situation which calls for it to be modified.)

> This behavior was
> intended by the original author. Ripping the code out by convincing
> people to bypass normal review flow is, well, not very nice to put it
> mildly.

That again was explicitly called out, with an explanation of why I was
deviating from usual process, including an agreement from The Rest that
the exceptional circumstances were warranted.

>
>> Beyond that, I really don't care what the exact behaviour of 4.11 was. 
>> If there is a real security issue then it still needs fixing on all
>> versions of Xen, and this change doesn't alter that property.
>>
>> However, until someone can work out what the alleged issue is, we can't
>> really progress this argument, and we mustn't keep broken code simply
>> because it purports to "fix" an unspecified issue.
> You seem to forget that your change is to deal with one form of
> broken firmware. It is simply impossible to enumerate all ways in
> which firmware _might_ be broken. The original code allegedly
> tried to deal with some other form of firmware flaw.

There is no such thing as a non-buggy firmware, just like there is no
such thing as a bug-free Xen.

What matters is that Xen copes in a way which is not detrimental to a
user being able to reconfigure their system.

>
> Just like in the EFI case, where there's so much breakage, I do
> think that default behavior of software ought to be to assume
> sane firmware behavior, allowing for workarounds where
> needed. Unless positively identified to be needed on a system,
> and unless needed virually everywhere, such workarounds
> should not be enabled by default. That is, in the given case a
> DMI quirk could have been added enabling workaround_bios_bug
> by default for the R740.

I am not going to start this argument again.

Disagreements with all of these points have been raised by multiple
parties on xen-devel, and I find myself in a position where I don't have
anything civil to add.

>
>>>> I'm not sure what more you are looking for, but this is very clear cut
>>>> and safe from my point of view.
>>> Well, your claim regarding "4.11 and earlier" is clearly wrong
>> I have made a statement, backed up with specific reference to the code
>> which, to the best of my ability, demonstrates it to be true.
>>
>> If you believe contrary then clearly identify the fault in my reasoning.
> I did, by pointing out the earlier regression, which you elected to
> ignore altogether in your reply.

You identified why Xen 4.11 didn't behave in the way you expected.

In doing so, you also demonstrated why the commit message was, in fact,
correct.


Like other parts of this thread, it was deviating sufficiently
off-topic/relevance that I chose to trim it.  I will continue doing so
in an effort to reduce the amount of my time that this thread is
wasting.  I don't know for certain, but I expect you've also got better
things to do with your time than arguing over off-topic aspects of this
thread.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to