What does a permanent ZFS error indicate?

Solution 1:

The wording of zpool status is a bit misleading. A permanent error (in this context) indicates that an I/O error has occurred and has been logged to the SPA (Storage Pool Allocator) error log for that pool. This does not necessarily mean there is irrecoverable data corruption.

What you should do is run a zpool scrub on the pool. When the scrub completes, the SPA error log will be rotated and will no longer show errors from before the scrub. If the scrub detects no errors then zpool status will no longer show any "permanent" errors.

Regarding the documentation, it is saying that only "fatal errors" are logged in this way. A fatal error is an I/O error that could not be automatically corrected by ZFS and therefore was exposed to an application as a failed I/O. By contrast, if the I/O was immediately retried successfully or if the logical I/O was satisfied from a redundant device, it would not be considered a fatal error and therefore would not be logged as a data corruption error.

A fatal error does not necessarily mean permanent data loss, it just means that at the time it could not be fixed before it propagated up to the application. For example, a loose cable or a bad controller could cause temporary fatal errors which ZFS would describe as "permanent." Whether it truly is a problem depends on the nature of the I/O and whether the application is capable of recovering from I/O errors.

EDIT: Fully agree with @bahamat that you should invest in redundancy as soon as possible.

Solution 2:

A permanent error means that there has been a checksum error in the file and there were not sufficient replicas to repair. It means that at least one read returned corrupted data due to an I/O error. If whatever received the read, then wrote that back to the same disk file you would now have irrecoverable data corruption.

Looking at your pool configuration, it looks like you have no redundancy. This is very dangerous. You don't get any of the self-healing benefits of ZFS, but it will be able to tell you when there has been data corruption. Ordinarily ZFS will automatically and silently correct corrupted reads, but in your case it can't. It also looks like you've already run zpool clear because the CKSUM count is 0 for both drives.

Unfortunately, with no replicas there's really no way to know.

Tags:

Zfs