e1000e Reset adapter unexpectedly / Detected Hardware Unit Hang

Solution 1:

Ok so after posting this question last night night I continued to do some research the only real solution I came across seems to have taken care of the problem.

Disabling TSO, GSO and GRO using ethtool:

ethtool -K eth0 gso off gro off tso off

According to a post found here: http://ehc.ac/p/e1000/bugs/378/

From what I understand this will or can cause a reduction in performance.

I also noticed another solution was to disable Active-State Power Management

pcie_aspm=off

According to this post on serverfault: Linux e1000e (Intel networking driver) problems galore, where do I start?

I haven’t tried this solution yet. I will try it and see if that makes a difference and post back my findings.

EDIT:

Ok so I have tried turning off Active-State Power Management, pcie_aspm=off and this didn't have any effect. I continued to notice errors in my log file.

This may still work for some as some of the Intel nics have issues with different kernels of falling asleep when power management is enabled.

Solution 2:

Disabling Enhanced C1 (C1E) in the BIOS fixed it for me.

Not sure if the lower power state of C1E is messing with the driver, or that there's an oops in the driver when the processor is in this state.

Anyway, problem solved.


Solution 3:

Disabling only TCP Segmentation Offload (TSO) does the trick for me.

ethtool -K eth0 tso off

Note: It does not seem to be necessary to also disable Generic Receive Offload (GRO) and Generic Segmentation Offload (GSO), as it is recommended by various sources. As far as I learned, these are implemented purely in software, and should be safe. Don't sacrifice more performance than necessary.


Solution 4:

I had the issue (triggering same kernel error as you and userspace SSH errors like "Corrupted MAC on input").

Solution

What worked for me was to disable TCP checksum offloading :

# ethtool -K eth0 tx off rx off

Clean & long-term integration of this with debian-ish /etc/network/interfaces:

#!/bin/bash
#
# Disables TCP offloading on all ifaces
#
# Inspired by: @Michelunik https://serverfault.com/a/422554/62953

RUN=true
case "${IF_NO_TOE,,}" in
    no|off|false|disable|disabled)
        RUN=false
    ;;
esac


# Other offloading options that could be disabled (not TCP related):
#  sg tso ufo gso gro lro rxvlan txvlan rxhash
# see man ethtool

if [ "$MODE" = start -a "$RUN" = true ]; then
  TOE_OPTIONS="rx tx"
  for TOE_OPTION in $TOE_OPTIONS; do
    /sbin/ethtool --offload "$IFACE" "$TOE_OPTION" off &>/dev/null || true
  done
fi

source, inspiration.

Context

  • Debian Jessie
  • Kernel 4.7.0-0.bpo.1-amd64
  • lspci 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I218-V (rev 04)