e1000e: Detected Hardware Unit Hang

ยท 312 words ยท 2 minute read

Are you, as I, suffering from Detected Hardware Unit Hang in the kernel log from an e1000e network card? How do you notice it. A transfer goes to zero bytes for around 10 seconds, then restarts, then goes to zero, then restarts and so on…

Workaround ๐Ÿ”—

The workaround is to turn off TCP offloading on the network card. This can be done with ethtool or with NetworkManager as a more permanent solution. I found the workaround on Server Fault, e1000e Reset adapter unexpectedly / Detected Hardware Unit Hang.

ethtool ๐Ÿ”—

ethtool -K eth0 tx off rx off

NetworkManager ๐Ÿ”—

This was my permanent solution on my problem. See The bug. Documentation of this way of modifying the network card was provided by RedHat at Chapter 29. Configuring ethtool offload features using NetworkManager.

  1. Find the network card NetworkManger connection name. My network card name is eno1. See the log excerpt from The bug.
    nmcli dev show eno1 | grep -i general.conn
    
  2. Add permanent configuration to the connection. My connection is named bridge0 slave 1.
    nmcli con modify 'bridge0 slave 1' ethtool.feature-rx off ethtool.feature-tx off
    
  3. Activate.
    sudo nmcli con up 'bridge0 slave 1'
    

The bug ๐Ÿ”—

You detect it in the kernel log. You’ll see them in dmesg or journalctl -b -t kernel it will look something like this:

[ 2044.821230] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                 TDH                  <3e>
                 TDT                  <6d>
                 next_to_use          <6d>
                 next_to_clean        <3d>
               buffer_info[next_to_clean]:
                 time_stamp           <1001a884a>
                 next_to_watch        <3e>
                 jiffies              <1001a9f40>
                 next_to_watch.status <0>
               MAC Status             <40080083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3c00>
               PHY Extended Status    <3000>
               PCI Status             <10>
[ 2046.036778] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
[ 2046.036873] bridge0: port 1(eno1) entered disabled state
[ 2051.705902] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 2051.706018] bridge0: port 1(eno1) entered blocking state
[ 2051.706023] bridge0: port 1(eno1) entered forwarding state

This was taken from my own dmesg.