What causes this? pcieport 0000:00:03.0: PCIe Bus Error: AER / Bad TLP

I can give at least a few details, even though I cannot fully explain what happens.

As described for example here, the CPU communicates with the PCIe bus controller by transaction layer packets (TLPs). The hardware detects when there are faulty ones, and the Linux kernel reports that as messages.

The kernel option pci=nommconf disables Memory-Mapped PCI Configuration Space, which is available in Linux since kernel 2.6. Very roughly, all PCI devices have an area that describe this device (which you see with lspci -vv), and the originally method to access this area involves going through I/O ports, while PCIe allows this space to be mapped to memory for simpler access.

That means in this particular case, something goes wrong when the PCIe controller uses this method to access the configuraton space of a particular device. It may be a hardware bug in the device, in the PCIe root controller on the motherboard, in the specific interaction of those two, or something else.

By using pci=nommconf, the configuration space of all devices will be accessed in the original way, and changing the access methods works around this problem. So if you want, it's both resolving and suppressing it.


Adding the kernel command line option pci=nommconf resolved the issue for me. Therefore, I'm assume the issue is motherboard-related. It happens on all my X99 motherboard-equipped computers. It does not happen on Z170 systems or any other hardware I own.


I get the same errors (Bad TLP associated with device 8086:6f08). I have X99 Deluxe II, Samsung 960 pro, Nvidia 1080 ti. These problems seem to be associated with X99 chipset and M.2 device, like Samsung Pro.

The X99 Deluxe II motherboard shares bandwidth between PCIE16_3 slot and M.2/U.2. Following comment from @Nic, in the BIOS I changed Onboard Devices Configuration | U.2_2 Bandwidth from Auto to U.2_2. This fixed the problem for me.

Tags:

Hardware

Pci