** Description changed: + SRU Justification: + + [Impact] + + * With the introduction of c76c067e488c "s390/pci: Use dma-iommu layer" + (upstream with since kernel v6.7-rc1) there was a move (on s390x only) + to a different dma-iommu implementation. + + * And with 92bce97f0c34 "s390/pci: Fix reset of IOMMU software counters" + (again upstream since 6.7(rc-1) the IOMMU_DEFAULT_DMA_LAZY kernel config + option should now be set to 'yes' by default for s390x. + + * Since CONFIG_IOMMU_DEFAULT_DMA_STRICT and IOMMU_DEFAULT_DMA_LAZY + are related to each other CONFIG_IOMMU_DEFAULT_DMA_STRICT needs to be + set to "no" by default, which was upstream done by b2b97a62f055 + "Revert "s390: update defconfigs"". + + * These changes are all upstream, but were not picked up by the Ubuntu + kernel config. + + * And not having these config options set properly is causing significant + PCI-related network throughput degradation (up to -72%). + + * This shows for almost all workloads and numbers of connections, + deteriorating with the number of connections increasing. + + * Especially drastic is the drop for a high number of parallel connections + (50 and 250) and for small and medium-size transactional workloads. + However, also for streaming-type workloads the degradation is clearly + visible (up to 48% degradation). + + [Fix] + + * The (upstream) fix is to set + IOMMU_DEFAULT_DMA_STRICT=no + and + IOMMU_DEFAULT_DMA_LAZY=y + (which is needed for the changed DAM IOMMU implementation since v6.7). + + [Test Case] + + * Setup two Ubuntu Server 24.04 LPARs (with kernel 6.8) + (one acting as server and as client) + that have (PCIe attached) RoCE Express devices attached + and that are connected to each other. + + * Sample workload rr1c-200x1000-250 with rr1c-200x1000-250.xml: + <?xml version="1.0"?> + <profile name="TCP_RR"> + <group nprocs="250"> + <transaction iterations="1"> + <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" /> + </transaction> + <transaction duration="300"> + <flowop type="write" options="size=200"/> + <flowop type="read" options="size=1000"/> + </transaction> + <transaction iterations="1"> + <flowop type="disconnect" /> + </transaction> + </group> + </profile> + + * Install uperf on both systems, client and server. + + * Start uperf at server: uperf -s + + * Start uperf at client: uperf -vai 5 -m uperf-profile.xml + + * Switch from strict to lazy mode + either using the new kernel (or the test build below) + or using kernel cmd-line parameter iommu.strict=0. + + * Restart uperf on server and client, like before. + + * Verification will be performed by IBM. + + [Regression Potential] + + * The is a certain regression potential, since the behavior with + the two modified kernel config options will change significantly. + + * This may solve the (network) throughput issue with PCI devices, + but may also come with side-effects on other PCIe based devices + (the old compression adapters or the new NVMe carrier cards). + + [Other] + + * CCW devices are not affected. + + * This is s390x-specific only, hence will not affect any other + architecture. + + __________ + Symptom: Comparing Ubuntu 24.04 (kernelversion: 6.8.0-31-generic) against Ubuntu 22.04, all of our PCI-related network measurements on LPAR show massive throughput degradations (up to -72%). This shows for almost all workloads and numbers of connections, detereorating with the number of connections increasing. Especially drastic is the drop for a high number of parallel connections (50 and 250) and for small and medium-size transactional workloads. However, also for streaming-type workloads the degradation is clearly visible (up to 48% degradation). Problem: With kernel config setting CONFIG_IOMMU_DEFAULT_DMA_STRICT=y, IOMMU DMA mode changed from lazy to strict, causing these massive degradations. Behavior can also be changed with a kernel commandline parameter (iommu.strict) for easy verification. The issue is known and was quickly fixed upstream in December 2023, after being present for little less than two weeks. Upstream fix: https://github.com/torvalds/linux/commit/b2b97a62f055dd638f7f02087331a8380d8f139a Repro: rr1c-200x1000-250 with rr1c-200x1000-250.xml: <?xml version="1.0"?> <profile name="TCP_RR"> - <group nprocs="250"> - <transaction iterations="1"> - <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" /> - </transaction> - <transaction duration="300"> - <flowop type="write" options="size=200"/> - <flowop type="read" options="size=1000"/> - </transaction> - <transaction iterations="1"> - <flowop type="disconnect" /> - </transaction> - </group> + <group nprocs="250"> + <transaction iterations="1"> + <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" /> + </transaction> + <transaction duration="300"> + <flowop type="write" options="size=200"/> + <flowop type="read" options="size=1000"/> + </transaction> + <transaction iterations="1"> + <flowop type="disconnect" /> + </transaction> + </group> </profile> 0) Install uperf on both systems, client and server. 1) Start uperf at server: uperf -s 2) Start uperf at client: uperf -vai 5 -m uperf-profile.xml 3) Switch from strict to lazy mode using kernel commandline parameter iommu.strict=0. 4) Repeat steps 1) and 2). Example: For the following example, we chose the workload named above (rr1c-200x1000-250): iommu.strict=1 (strict): 233464.914 TPS iommu.strict=0 (lazy): 835123.193 TPS
** Changed in: ubuntu-z-systems Status: New => In Progress ** Changed in: linux (Ubuntu) Status: New => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2071471 Title: [UBUNTU 24.04] IOMMU DMA mode changed in kernel config causes massive throughput degradation for PCI-related network workloads To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/2071471/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs