Public bug reported:

[Impact]
With SMMU page translation mode enabled. Several inefficiencies on ARM SMMUv3 
driver has been identified. As a result IO BW is severely impacted. 

[Test]
Running iperf on mellanox Connect-x4 cards capable of doing 40Gbps BW connected 
back to back we get only about 15Gbps.

iperf -c 192.168.5.5 --bind 192.168.5.10 -P 5 -w 3.1M
------------------------------------------------------------
Client connecting to 192.168.5.5, TCP port 5001
Binding to local address 192.168.5.10
TCP window size: 6.20 MByte (WARNING: requested 3.10 MByte)
------------------------------------------------------------
[ 7] local 192.168.5.10 port 42588 connected with 192.168.5.5 port 5001
[ 3] local 192.168.5.10 port 42582 connected with 192.168.5.5 port 5001
[ 5] local 192.168.5.10 port 42584 connected with 192.168.5.5 port 5001
[ 4] local 192.168.5.10 port 42586 connected with 192.168.5.5 port 5001
[ 6] local 192.168.5.10 port 42590 connected with 192.168.5.5 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 6.50 GBytes 5.58 Gbits/sec
[ 4] 0.0-10.0 sec 1.81 GBytes 1.56 Gbits/sec
[ 6] 0.0-10.0 sec 6.08 GBytes 5.22 Gbits/sec
[ 7] 0.0-10.0 sec 1.99 GBytes 1.71 Gbits/sec
[ 5] 0.0-10.0 sec 2.00 GBytes 1.72 Gbits/sec
[SUM] 0.0-10.0 sec 18.4 GBytes 15.8 Gbits/sec

[Fix]
After applying the patches listed below from linux-next, we were able to get 
throughputs of 35+ Gbps. These patches are currently in linux-next and in line 
for 4.13-rc.

iommu/io-pgtable-arm: Use dma_wmb() instead of wmb() when publishing table
iommu/arm-smmu-v3: Remove io-pgtable spinlock
iommu/arm-smmu: Remove io-pgtable spinlock
iommu/io-pgtable-arm-v7s: Support lockless operation
iommu/io-pgtable-arm: Support lockless operation
iommu/io-pgtable: Introduce explicit coherency
iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
iommu/io-pgtable-arm: Improve split_blk_unmap
iommu/io-pgtable-arm-v7s: Check table PTEs more precisely
iommu/io-pgtable-arm-v7s: constify dummy_tlb_ops.
iommu/io-pgtable-arm-v7s: Check for leaf entry before dereferencing it
iommu/io-pgtable-arm-v7s: Add support for the IOMMU_PRIV flag
iommu/io-pgtable-arm: Avoid shift overflow in block size
iommu/io-pgtable-arm: Check for leaf entry before dereferencing it
iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag
iommu: add IOMMU_PRIV attribute

[Regression Potential]
These patches cherry-pick cleanly ontop of Zesty (4.10), and the patches are 
limited to ARM SMMU and pagetable fixes. The patches were tested on QDF2400 
system with mlx connect-X4 40G cards.

** Affects: linux (Ubuntu)
     Importance: Critical
     Assignee: Manoj Iyer (manjo)
         Status: Incomplete


** Tags: qdf2400 zesty

** Summary changed:

- [Zesty] Fixes to smmuv3 on ARM64
+ [Zesty] Fixes to smmuv3 and io-pgtable on ARM64

** Summary changed:

- [Zesty] Fixes to smmuv3 and io-pgtable on ARM64
+ [Zesty] Fixes to iommu on arm64 to improve IO performance

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1705123

Title:
  [Zesty] Fixes to iommu on arm64 to improve IO performance

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1705123/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to