Hi ,

> On 28 Jun 2022, at 6:23 pm, Mykyta Poturai <mykyta.potu...@gmail.com> wrote:
> 
>> Hi Mykyta,
>> 
>>> On 21 Jun 2022, at 10:38 am, Mykyta Poturai <mykyta.potu...@gmail.com> 
>>> wrote:
>>> 
>>>> Thanks for testing the patch.
>>>>> But not fixed the "Unexpected Global fault" that occasionally happens 
>>>>> when destroying
>>>>> the domain with an actively working GPU. Although, I am not sure if this 
>>>>> issue
>>>>> is relevant here.
>>>> 
>>>> Can you please if possible share the more details and logs so that I can 
>>>> look if this issue is relevant here ?
>>> 
>>> So in my setup I have a board with IMX8 chip and 2 core Vivante GPU. GPU is 
>>> split between domains.
>>> One core goes to Dom0 and one to DomU.
>>> 
>>> Steps to trigger this issue:
>>> 1. Start DomU
>>> 2. Start wayland and glmark2-es2-wayland inside DomU
>>> 3. Destroy DomU
>>> 
>>> Sometimes it destroys fine but roughly 1 of 8 times I get logs like this:
>>> 
>>> root@dom0:~# xl dest DomU
>>> [12725.412940] xenbr0: port 1(vif8.0) entered disabled state
>>> [12725.671033] xenbr0: port 1(vif8.0) entered disabled state
>>> [12725.689923] device vif8.0 left promiscuous mode
>>> [12725.696736] xenbr0: port 1(vif8.0) entered disabled state
>>> [12725.696989] audit: type=1700 audit(1616594240.068:39): dev=vif8.0 prom=0 
>>> old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967295
>>> (XEN) smmu: /iommu@51400000: Unexpected global fault, this could be serious
>>> (XEN) smmu: /iommu@51400000:    GFSR 0x00000001, GFSYNR0 0x00000004, 
>>> GFSYNR1 0x00001055, GFSYNR2 0x00000000
>>> 
>>> My guess is that this happens because GPU continues to access memory when 
>>> the context is already invalidated,
>>> and therefore triggers the "Invalid context fault".
>> 
>> Yes you are right in this case GPU trying to do DMA operation after Xen 
>> destroyed the guest and configures
>> the S2CR type value to fault. Solution to this issue is the patch that I 
>> shared earlier.
>> 
>> You can try this patch and confirm.This patch will solve both the issues.
>> 
>> diff --git a/xen/drivers/passthrough/arm/smmu.c 
>> b/xen/drivers/passthrough/arm/smmu.c
>> index 5cacb2dd99..ff1b73d3d8 100644
>> --- a/xen/drivers/passthrough/arm/smmu.c
>> +++ b/xen/drivers/passthrough/arm/smmu.c
>> @@ -1680,6 +1680,10 @@ static int arm_smmu_attach_dev(struct iommu_domain 
>> *domain, struct device *dev)
>> if (!cfg)
>> return -ENODEV;
>> 
>> + ret = arm_smmu_master_alloc_smes(dev);
>> + if (ret)
>> + return ret;
>> +
>> return arm_smmu_domain_add_master(smmu_domain, cfg);
>> }
>> 
>> @@ -2075,7 +2079,7 @@ static int arm_smmu_add_device(struct device *dev)
>> iommu_group_add_device(group, dev);
>> iommu_group_put(group);
>> 
>> - return arm_smmu_master_alloc_smes(dev);
>> + return 0;
>> }
>> 
>> 
>> Regards,
>> Rahul
> 
> Hi Rahul,
> 
> With this patch I get the same results, here is the error message:
> 
> (XEN) smmu: /iommu@51400000: Unexpected global fault, this could be serious
> (XEN) smmu: /iommu@51400000:    GFSR 0x00000001, GFSYNR0 0x00000004, GFSYNR1 
> 0x00001055, GFSYNR2 0x00000000
> 

As you mentioned earlier, this fault is observed because GPU continues to 
access memory when the context is
already invalidated, and therefore triggers the "Invalid context fault".  This 
is a different issue and is not related to this patch.

@Julien are you okay if I will send the below patch for official review as this 
issue pending for a long time?

In this patch we don’t need to call arm_smmu_master_free_smes() in 
arm_smmu_detach_dev() but we will implement new
function arm_smmu_domain_remove_master() that will configure the s2cr value to 
type fault in detach function.
arm_smmu_domain_remove_master() will revert the changes done by 
arm_smmu_domain_add_master() in attach function.


diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 69511683b4..da3adf8e7f 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -1598,21 +1598,6 @@ out_err:
return ret;
}

-static void arm_smmu_master_free_smes(struct arm_smmu_master_cfg *cfg)
-{
- struct arm_smmu_device *smmu = cfg->smmu;
- int i, idx;
- struct iommu_fwspec *fwspec = arm_smmu_get_fwspec(cfg);
-
- spin_lock(&smmu->stream_map_lock);
- for_each_cfg_sme(cfg, i, idx, fwspec->num_ids) {
- if (arm_smmu_free_sme(smmu, idx))
- arm_smmu_write_sme(smmu, idx);
- cfg->smendx[i] = INVALID_SMENDX;
- }
- spin_unlock(&smmu->stream_map_lock);
-}
-
static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
struct arm_smmu_master_cfg *cfg)
{
@@ -1635,6 +1620,20 @@ static int arm_smmu_domain_add_master(struct 
arm_smmu_domain *smmu_domain,
return 0;
}

+static void arm_smmu_domain_remove_master(struct arm_smmu_domain *smmu_domain,
+ struct arm_smmu_master_cfg *cfg)
+{
+ struct arm_smmu_device *smmu = smmu_domain->smmu;
+ struct arm_smmu_s2cr *s2cr = smmu->s2crs;
+ struct iommu_fwspec *fwspec = arm_smmu_get_fwspec(cfg);
+ int i, idx;
+
+ for_each_cfg_sme(cfg, i, idx, fwspec->num_ids) {
+ s2cr[idx] = s2cr_init_val;
+ arm_smmu_write_s2cr(smmu, idx);
+ }
+}
+
static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
{
int ret;
@@ -1684,10 +1683,11 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)

static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev)
{
+ struct arm_smmu_domain *smmu_domain = domain->priv;
struct arm_smmu_master_cfg *cfg = find_smmu_master_cfg(dev);

if (cfg)
- arm_smmu_master_free_smes(cfg);
+ return arm_smmu_domain_remove_master(smmu_domain, cfg);

}


Regards,
Rahul

Reply via email to