On 23.12.2020 17:29, Julien Grall wrote:
> On 23/12/2020 16:24, Jan Beulich wrote:
>> On 23.12.2020 17:16, Julien Grall wrote:
>>> On 23/12/2020 16:11, Jan Beulich wrote:
>>>> On 23.12.2020 16:16, Julien Grall wrote:
>>>>> On 23/12/2020 15:00, Jan Beulich wrote:
>>>>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option 
>>>>>>>>> would
>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>
>> Please note this (in your original submission). I simply ...
>>
>>>>>>>> We already have
>>>>>>>>
>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>>>>> {
>>>>>>>>         dom_iommu(d)->arch.amd.root_table = NULL;
>>>>>>>> }
>>>>>>>>
>>>>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>>>>
>>>>>>>>         hd->arch.vtd.pgd_maddr = 0;
>>>>>>>
>>>>>>> Let me have a look if that works.
>>>>>>>
>>>>>>>>
>>>>>>>> I guess what's missing is prevention of the root table
>>>>>>>> getting re-setup.
>>>>>>>
>>>>>>> This is taken care in the follow-up patch by forbidding page-table
>>>>>>> allocation. I can mention it in the commit message.
>>>>>>
>>>>>> My expectation is that with that subsequent change the change here
>>>>>> (or any variant of it) would become unnecessary.
>>>>>
>>>>> I am not be sure. iommu_unmap() would still get called from put_page().
>>>>> Are you suggesting to gate the code if d->is_dying as well?
>>>>
>>>> Unmap shouldn't be allocating any memory right now, as in
>>>> non-shared-page-table mode we don't install any superpages
>>>> (unless I misremember).
>>>
>>> It doesn't allocate memory, but it will try to access the IOMMU
>>> page-tables (see more below).
>>>
>>>>
>>>>> Even if this patch is deemed to be unecessary to fix the issue.
>>>>> This issue was quite hard to chase/reproduce.
>>>>>
>>>>> I think it would still be good to harden the code by zeroing
>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>>>>> pointer was still "valid".
>>>>
>>>> But my point was that this zeroing already happens.
>>>> What I
>>>> suspect is that it gets re-populated after it was zeroed,
>>>> because of page table manipulation that shouldn't be
>>>> occurring anymore for a dying domain.
>>>
>>> AFAICT, the zeroing is happening in ->teardown() helper.
>>>
>>> It is only called when the domain is fully destroyed (see call in
>>> arch_domain_destroy()). This will happen much after relinquishing the code.
>>>
>>> Could you clarify why you think it is already zeroed and by who?
>>
>> ... trusted you on what you stated there. But perhaps I somehow
>> misunderstood that sentence to mean you want to put your addition
>> into the teardown functions, when apparently you meant to invoke
>> them earlier in the process. Without clearly identifying why this
>> would be a safe thing to do, I couldn't imagine that's what you
>> suggest as alternative. 
> 
> This was a wording issue. I meant moving ->teardown() before (or calling 
> from) iommu_free_pgtables().
> 
> Shall I introduce a new callback then?

Earlier zeroing won't help unless you prevent re-population, or
unless you make the code capable of telling "still zero" from
"already zero". But I have to admit I'd like to also have Paul's
opinion on the matter.

Jan

>> This is because the interdependencies of
>> the IOMMU code are pretty hard to follow at times, and hence any
>> such re-ordering has a fair risk of breaking something elsewhere.
> 
> Right, this is another reason to try to get most of my fix 
> self-contained rather relying on top-layer call to protect against a 
> domain dying.
> 
> Cheers,
> 


Reply via email to