Re: [Intel-wired-lan] [REGRESSION] Packet loss after hot-plugging ethernet cable on HP Zbook (Arrow Lake)

2025-07-04 Thread En-Wei WU
Thank you all for your quick response. Sorry for the delay.

I ran two independent tests:

1. The same experiment as Timo said: When the packet-loss problem
occurs (by hot-plugging the Ethernet cable), running the following
command fixes the issue
$ ethtool -r # trigger a re-negotiation

2. As Vitaly suggests: By enabling flow control, we no longer observe
any packet loss.
e1000e :00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: Rx/Tx


>From the power management perspective, I can confirm that the Ethernet
controller stays D0 at all times. But I’m not sure if it’s the case
for PHY, as I’m not familiar with how to check the power state of a
PHY.

Thanks,
En-Wei.

On Tue, 1 Jul 2025 at 20:44, Timo Teras  wrote:
>
> On Tue, 1 Jul 2025 14:46:18 +0300
> "Lifshits, Vitaly"  wrote:
>
> > On 7/1/2025 8:31 AM, En-Wei WU wrote:
> > > Hi,
> > >
> > > I'm seeing a regression on an HP ZBook using the e1000e driver
> > > (chipset PCI ID: [8086:57a0]) -- the system can't get an IP address
> > > after hot-plugging an Ethernet cable. In this case, the Ethernet
> > > cable was unplugged at boot. The network interface eno1 was present
> > > but stuck in the DHCP process. Using tcpdump, only TX packets were
> > > visible and never got any RX -- indicating a possible packet loss or
> > > link-layer issue.
> > >
> > > This is on the vanilla Linux 6.16-rc4 (commit
> > > 62f224733431dbd564c4fe800d4b67a0cf92ed10).
> > >
> > > Bisect says it's this commit:
> > >
> > > commit efaaf344bc2917cbfa5997633bc18a05d3aed27f
> > > Author: Vitaly Lifshits 
> > > Date:   Thu Mar 13 16:05:56 2025 +0200
> > >
> > >  e1000e: change k1 configuration on MTP and later platforms
> > >
> > >  Starting from Meteor Lake, the Kumeran interface between the
> > > integrated MAC and the I219 PHY works at a different frequency.
> > > This causes sporadic MDI errors when accessing the PHY, and in rare
> > > circumstances could lead to packet corruption.
> > >
> > >  To overcome this, introduce minor changes to the Kumeran idle
> > >  state (K1) parameters during device initialization. Hardware
> > > reset reverts this configuration, therefore it needs to be applied
> > > in a few places.
> > >
> > >  Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
> > >  Signed-off-by: Vitaly Lifshits 
> > >  Tested-by: Avigail Dahan 
> > >  Signed-off-by: Tony Nguyen 
> > >
> > >   drivers/net/ethernet/intel/e1000e/defines.h |  3 +++
> > >   drivers/net/ethernet/intel/e1000e/ich8lan.c | 80
> > > +-
> > >   drivers/net/ethernet/intel/e1000e/ich8lan.h |  4 
> > >   3 files changed, 82 insertions(+), 5 deletions(-)
> > >
> > > Reverting this patch resolves the issue.
> > >
> > > Based on the symptoms and the bisect result, this issue might be
> > > similar to
> > > https://lore.kernel.org/intel-wired-lan/[email protected]/
> > >
> > >
> > > Affected machine is:
> > > HP ZBook X G1i 16 inch Mobile Workstation PC, BIOS 01.02.03
> > > 05/27/2025 (see end of message for dmesg from boot)
> > >
> > > CPU model name:
> > > Intel(R) Core(TM) Ultra 7 265H (Arrow Lake)
> > >
> > > ethtool output:
> > > driver: e1000e
> > > version: 6.16.0-061600rc4-generic
> > > firmware-version: 0.1-4
> > > expansion-rom-version:
> > > bus-info: :00:1f.6
> > > supports-statistics: yes
> > > supports-test: yes
> > > supports-eeprom-access: yes
> > > supports-register-dump: yes
> > > supports-priv-flags: yes
> > >
> > > lspci output:
> > > 0:1f.6 Ethernet controller [0200]: Intel Corporation Device
> > > [8086:57a0] DeviceName: Onboard Ethernet
> > >  Subsystem: Hewlett-Packard Company Device [103c:8e1d]
> > >  Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > > ParErr- Stepping- SERR- FastB2B- DisINTx+
> > >  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> > > >TAbort- SERR-  > >  Latency: 0
> > >  Interrupt: pin D routed to IRQ 162
> > >  IOMMU group: 17
> > >  Region 0: Memory at 9228 (32-bit, non-prefetchable)
> > > [size=128K] Capabilities: [c8] Power Management version 3
> > >  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> > > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > >  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1
> > > PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> > >  Address: fee00798  Data: 
> > >  Kernel driver in use: e1000e
> > >  Kernel modules: e1000e
> > >
> > > The relevant dmesg:
> > > <<>>
> > >
> > > [0.927394] e1000e: Intel(R) PRO/1000 Network Driver
> > > [0.927398] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> > > [0.927933] e1000e :00:1f.6: enabling device ( -> 0002)
> > > [0.928249] e1000e :00:1f.6: Interrupt Throttling Rate
> > > (ints/sec) set to dynamic conservative mode
> > > [1.155716] e1000e :0

Re: [Intel-wired-lan] [REGRESSION] Packet loss after hot-plugging ethernet cable on HP Zbook (Arrow Lake)

2025-07-01 Thread Timo Teras
On Tue, 1 Jul 2025 14:46:18 +0300
"Lifshits, Vitaly"  wrote:

> On 7/1/2025 8:31 AM, En-Wei WU wrote:
> > Hi,
> > 
> > I'm seeing a regression on an HP ZBook using the e1000e driver
> > (chipset PCI ID: [8086:57a0]) -- the system can't get an IP address
> > after hot-plugging an Ethernet cable. In this case, the Ethernet
> > cable was unplugged at boot. The network interface eno1 was present
> > but stuck in the DHCP process. Using tcpdump, only TX packets were
> > visible and never got any RX -- indicating a possible packet loss or
> > link-layer issue.
> > 
> > This is on the vanilla Linux 6.16-rc4 (commit
> > 62f224733431dbd564c4fe800d4b67a0cf92ed10).
> > 
> > Bisect says it's this commit:
> > 
> > commit efaaf344bc2917cbfa5997633bc18a05d3aed27f
> > Author: Vitaly Lifshits 
> > Date:   Thu Mar 13 16:05:56 2025 +0200
> > 
> >  e1000e: change k1 configuration on MTP and later platforms
> > 
> >  Starting from Meteor Lake, the Kumeran interface between the
> > integrated MAC and the I219 PHY works at a different frequency.
> > This causes sporadic MDI errors when accessing the PHY, and in rare
> > circumstances could lead to packet corruption.
> > 
> >  To overcome this, introduce minor changes to the Kumeran idle
> >  state (K1) parameters during device initialization. Hardware
> > reset reverts this configuration, therefore it needs to be applied
> > in a few places.
> > 
> >  Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
> >  Signed-off-by: Vitaly Lifshits 
> >  Tested-by: Avigail Dahan 
> >  Signed-off-by: Tony Nguyen 
> > 
> >   drivers/net/ethernet/intel/e1000e/defines.h |  3 +++
> >   drivers/net/ethernet/intel/e1000e/ich8lan.c | 80
> > +-
> >   drivers/net/ethernet/intel/e1000e/ich8lan.h |  4 
> >   3 files changed, 82 insertions(+), 5 deletions(-)
> > 
> > Reverting this patch resolves the issue.
> > 
> > Based on the symptoms and the bisect result, this issue might be
> > similar to
> > https://lore.kernel.org/intel-wired-lan/[email protected]/
> > 
> > 
> > Affected machine is:
> > HP ZBook X G1i 16 inch Mobile Workstation PC, BIOS 01.02.03
> > 05/27/2025 (see end of message for dmesg from boot)
> > 
> > CPU model name:
> > Intel(R) Core(TM) Ultra 7 265H (Arrow Lake)
> > 
> > ethtool output:
> > driver: e1000e
> > version: 6.16.0-061600rc4-generic
> > firmware-version: 0.1-4
> > expansion-rom-version:
> > bus-info: :00:1f.6
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: yes
> > supports-register-dump: yes
> > supports-priv-flags: yes
> > 
> > lspci output:
> > 0:1f.6 Ethernet controller [0200]: Intel Corporation Device
> > [8086:57a0] DeviceName: Onboard Ethernet
> >  Subsystem: Hewlett-Packard Company Device [103c:8e1d]
> >  Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR- FastB2B- DisINTx+
> >  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> > >TAbort- SERR-  >  Latency: 0
> >  Interrupt: pin D routed to IRQ 162
> >  IOMMU group: 17
> >  Region 0: Memory at 9228 (32-bit, non-prefetchable)
> > [size=128K] Capabilities: [c8] Power Management version 3
> >  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> >  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1
> > PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> >  Address: fee00798  Data: 
> >  Kernel driver in use: e1000e
> >  Kernel modules: e1000e
> > 
> > The relevant dmesg:
> > <<>>
> > 
> > [0.927394] e1000e: Intel(R) PRO/1000 Network Driver
> > [0.927398] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> > [0.927933] e1000e :00:1f.6: enabling device ( -> 0002)
> > [0.928249] e1000e :00:1f.6: Interrupt Throttling Rate
> > (ints/sec) set to dynamic conservative mode
> > [1.155716] e1000e :00:1f.6 :00:1f.6 (uninitialized):
> > registered PHC clock
> > [1.220694] e1000e :00:1f.6 eth0: (PCI Express:2.5GT/s:Width
> > x1) 24:fb:e3:bf:28:c6
> > [1.220721] e1000e :00:1f.6 eth0: Intel(R) PRO/1000 Network
> > Connection [1.220903] e1000e :00:1f.6 eth0: MAC: 16, PHY:
> > 12, PBA No: FF-0FF [1.222632] e1000e :00:1f.6 eno1:
> > renamed from eth0
> > 
> > <<>>
> > 
> > [  153.932626] e1000e :00:1f.6 eno1: NIC Link is Up 1000 Mbps
> > Half Duplex, Flow Control: None
> > [  153.934527] e1000e :00:1f.6 eno1: NIC Link is Down
> > [  157.622238] e1000e :00:1f.6 eno1: NIC Link is Up 1000 Mbps
> > Full Duplex, Flow Control: None
> > 
> > No error message seen after hot-plugging the Ethernet cable.
> >   
> 
> Thank your for the report.
> 
> We did not encounter this issue during our patch testing. However, we 
> will attempt to reproduce it in our lab.
> 
> One detail 

Re: [Intel-wired-lan] [REGRESSION] Packet loss after hot-plugging ethernet cable on HP Zbook (Arrow Lake)

2025-07-01 Thread Timo Teräs
On Tue, 1 Jul 2025, 14.46 Lifshits, Vitaly, 
wrote:

> On 7/1/2025 8:31 AM, En-Wei WU wrote:
> > Hi,
> >
> > I'm seeing a regression on an HP ZBook using the e1000e driver
> > (chipset PCI ID: [8086:57a0]) -- the system can't get an IP address
> > after hot-plugging an Ethernet cable. In this case, the Ethernet cable
> > was unplugged at boot. The network interface eno1 was present but
> > stuck in the DHCP process. Using tcpdump, only TX packets were visible
> > and never got any RX -- indicating a possible packet loss or
> > link-layer issue.
> >
> > This is on the vanilla Linux 6.16-rc4 (commit
> > 62f224733431dbd564c4fe800d4b67a0cf92ed10).
> >
> > Bisect says it's this commit:
> >
> > commit efaaf344bc2917cbfa5997633bc18a05d3aed27f
> > Author: Vitaly Lifshits 
> > Date:   Thu Mar 13 16:05:56 2025 +0200
> >
> >  e1000e: change k1 configuration on MTP and later platforms
> >
> >  Starting from Meteor Lake, the Kumeran interface between the
> integrated
> >  MAC and the I219 PHY works at a different frequency. This causes
> sporadic
> >  MDI errors when accessing the PHY, and in rare circumstances could
> lead
> >  to packet corruption.
> >
> >  To overcome this, introduce minor changes to the Kumeran idle
> >  state (K1) parameters during device initialization. Hardware reset
> >  reverts this configuration, therefore it needs to be applied in a
> few
> >  places.
> >
> >  Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
> >  Signed-off-by: Vitaly Lifshits 
> >  Tested-by: Avigail Dahan 
> >  Signed-off-by: Tony Nguyen 
> >
> >   drivers/net/ethernet/intel/e1000e/defines.h |  3 +++
> >   drivers/net/ethernet/intel/e1000e/ich8lan.c | 80
> >
> +-
> >   drivers/net/ethernet/intel/e1000e/ich8lan.h |  4 
> >   3 files changed, 82 insertions(+), 5 deletions(-)
> >
> > Reverting this patch resolves the issue.
> >
> > Based on the symptoms and the bisect result, this issue might be
> > similar to
> https://lore.kernel.org/intel-wired-lan/[email protected]/
> >
> >
> > Affected machine is:
> > HP ZBook X G1i 16 inch Mobile Workstation PC, BIOS 01.02.03 05/27/2025
> > (see end of message for dmesg from boot)
> >
> > CPU model name:
> > Intel(R) Core(TM) Ultra 7 265H (Arrow Lake)
> >
> > ethtool output:
> > driver: e1000e
> > version: 6.16.0-061600rc4-generic
> > firmware-version: 0.1-4
> > expansion-rom-version:
> > bus-info: :00:1f.6
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: yes
> > supports-register-dump: yes
> > supports-priv-flags: yes
> >
> > lspci output:
> > 0:1f.6 Ethernet controller [0200]: Intel Corporation Device [8086:57a0]
> >  DeviceName: Onboard Ethernet
> >  Subsystem: Hewlett-Packard Company Device [103c:8e1d]
> >  Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR- FastB2B- DisINTx+
> >  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > SERR-  >  Latency: 0
> >  Interrupt: pin D routed to IRQ 162
> >  IOMMU group: 17
> >  Region 0: Memory at 9228 (32-bit, non-prefetchable)
> [size=128K]
> >  Capabilities: [c8] Power Management version 3
> >  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> >  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
> >  Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> >  Address: fee00798  Data: 
> >  Kernel driver in use: e1000e
> >  Kernel modules: e1000e
> >
> > The relevant dmesg:
> > <<>>
> >
> > [0.927394] e1000e: Intel(R) PRO/1000 Network Driver
> > [0.927398] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> > [0.927933] e1000e :00:1f.6: enabling device ( -> 0002)
> > [0.928249] e1000e :00:1f.6: Interrupt Throttling Rate
> > (ints/sec) set to dynamic conservative mode
> > [1.155716] e1000e :00:1f.6 :00:1f.6 (uninitialized):
> > registered PHC clock
> > [1.220694] e1000e :00:1f.6 eth0: (PCI Express:2.5GT/s:Width
> > x1) 24:fb:e3:bf:28:c6
> > [1.220721] e1000e :00:1f.6 eth0: Intel(R) PRO/1000 Network
> Connection
> > [1.220903] e1000e :00:1f.6 eth0: MAC: 16, PHY: 12, PBA No:
> FF-0FF
> > [1.222632] e1000e :00:1f.6 eno1: renamed from eth0
> >
> > <<>>
> >
> > [  153.932626] e1000e :00:1f.6 eno1: NIC Link is Up 1000 Mbps Half
> > Duplex, Flow Control: None
> > [  153.934527] e1000e :00:1f.6 eno1: NIC Link is Down
> > [  157.622238] e1000e :00:1f.6 eno1: NIC Link is Up 1000 Mbps Full
> > Duplex, Flow Control: None
> >
> > No error message seen after hot-plugging the Ethernet cable.
> >
>
> Thank your for the report.
>
> We did not encounter this issue during our patch testing. However, we
> will attempt to reproduce it in

Re: [Intel-wired-lan] [REGRESSION] Packet loss after hot-plugging ethernet cable on HP Zbook (Arrow Lake)

2025-07-01 Thread Lifshits, Vitaly

On 7/1/2025 8:31 AM, En-Wei WU wrote:

Hi,

I'm seeing a regression on an HP ZBook using the e1000e driver
(chipset PCI ID: [8086:57a0]) -- the system can't get an IP address
after hot-plugging an Ethernet cable. In this case, the Ethernet cable
was unplugged at boot. The network interface eno1 was present but
stuck in the DHCP process. Using tcpdump, only TX packets were visible
and never got any RX -- indicating a possible packet loss or
link-layer issue.

This is on the vanilla Linux 6.16-rc4 (commit
62f224733431dbd564c4fe800d4b67a0cf92ed10).

Bisect says it's this commit:

commit efaaf344bc2917cbfa5997633bc18a05d3aed27f
Author: Vitaly Lifshits 
Date:   Thu Mar 13 16:05:56 2025 +0200

 e1000e: change k1 configuration on MTP and later platforms

 Starting from Meteor Lake, the Kumeran interface between the integrated
 MAC and the I219 PHY works at a different frequency. This causes sporadic
 MDI errors when accessing the PHY, and in rare circumstances could lead
 to packet corruption.

 To overcome this, introduce minor changes to the Kumeran idle
 state (K1) parameters during device initialization. Hardware reset
 reverts this configuration, therefore it needs to be applied in a few
 places.

 Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
 Signed-off-by: Vitaly Lifshits 
 Tested-by: Avigail Dahan 
 Signed-off-by: Tony Nguyen 

  drivers/net/ethernet/intel/e1000e/defines.h |  3 +++
  drivers/net/ethernet/intel/e1000e/ich8lan.c | 80
+-
  drivers/net/ethernet/intel/e1000e/ich8lan.h |  4 
  3 files changed, 82 insertions(+), 5 deletions(-)

Reverting this patch resolves the issue.

Based on the symptoms and the bisect result, this issue might be
similar to 
https://lore.kernel.org/intel-wired-lan/[email protected]/


Affected machine is:
HP ZBook X G1i 16 inch Mobile Workstation PC, BIOS 01.02.03 05/27/2025
(see end of message for dmesg from boot)

CPU model name:
Intel(R) Core(TM) Ultra 7 265H (Arrow Lake)

ethtool output:
driver: e1000e
version: 6.16.0-061600rc4-generic
firmware-version: 0.1-4
expansion-rom-version:
bus-info: :00:1f.6
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

lspci output:
0:1f.6 Ethernet controller [0200]: Intel Corporation Device [8086:57a0]
 DeviceName: Onboard Ethernet
 Subsystem: Hewlett-Packard Company Device [103c:8e1d]
 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- >>

[0.927394] e1000e: Intel(R) PRO/1000 Network Driver
[0.927398] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[0.927933] e1000e :00:1f.6: enabling device ( -> 0002)
[0.928249] e1000e :00:1f.6: Interrupt Throttling Rate
(ints/sec) set to dynamic conservative mode
[1.155716] e1000e :00:1f.6 :00:1f.6 (uninitialized):
registered PHC clock
[1.220694] e1000e :00:1f.6 eth0: (PCI Express:2.5GT/s:Width
x1) 24:fb:e3:bf:28:c6
[1.220721] e1000e :00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
[1.220903] e1000e :00:1f.6 eth0: MAC: 16, PHY: 12, PBA No: FF-0FF
[1.222632] e1000e :00:1f.6 eno1: renamed from eth0

<<>>

[  153.932626] e1000e :00:1f.6 eno1: NIC Link is Up 1000 Mbps Half
Duplex, Flow Control: None
[  153.934527] e1000e :00:1f.6 eno1: NIC Link is Down
[  157.622238] e1000e :00:1f.6 eno1: NIC Link is Up 1000 Mbps Full
Duplex, Flow Control: None

No error message seen after hot-plugging the Ethernet cable.



Thank your for the report.

We did not encounter this issue during our patch testing. However, we 
will attempt to reproduce it in our lab.


One detail that caught my attention is that flow control is disabled in 
both scenarios. Could you please check whether the issue persists when 
flow control is enabled? This might require connecting to a link partner 
that supports flow control.