APEI Generic Hardware Error

Looks like your RAM is failing, or having errors that are being corrected. Depending on the severity it sounds like these errors are impacting it's ability to function and it's having to reboot afterwards.

From the looks of this thread the message bit at the end about the error section length being too small is likely the culprit.

excerpt - [PATCH 1/1] efi: cper: Support different length of Error Section

Some fields might be added to the Error Section in the newer UEFI spec. For example, the fields 'Reserved', 'Rank Number', 'Card Handle' and 'Module Handle' are added to the Memory Error Section started from UEFI spec 2.3. Unfortunately, there will have the following warning message if the memory corrected error is detected and the field 'revision' in struct acpi_generic_data is less then 0x203 (UEFI spec 2.3):

{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
{1}[Hardware Error]: It has been corrected by h/w and requires no further action
{1}[Hardware Error]: event severity: corrected
{1}[Hardware Error]:  Error 0, type: corrected
{1}[Hardware Error]:   section_type: memory error
[Firmware Warn]: error section length is too small

This behavior causes this corrected error cannot be displayed correctly. To solve the issue, this patch supports different length of the Error Section for different UEFI spec version.

And, this patch employs a pre-defined structure to clean up the duplicated codes in function cper_estatus_print_section.

With applying this patch, the memory corrected error could be displayed correctly after injecting the error.

Tested on v3.14-rc5 with Grantley platform and Intel RAStool.

So it would seem a patch for that particular error is in the works and might be available in a newer version of the kernel.


FYI I appeared to have a very similar issue as this.

As it turned out the solution was taking the memory out, and reseating it, and everything was back to normal.

Tags:

Hardware