How to solve UTF8 invalid byte sequence copy errors on a restore, when the source database is encoded in UTF8?

Digging around the internet, I've seen that this is a pretty common problem. The common solution is to use the plain text format dump and feed it through iconv to correct the encoding.

Here is more information about that.

"I don't know how they got there in the first place"

It could have happened as described here - although this generates an error on 8.4:

If you create a table with any text type (i.e. text, varchar(10), etc.), then you can insert an invalid byte sequence into that field using octal escapes.

For instance, if you have a UTF8-encoded database, you can do:

=> CREATE TABLE foo(t TEXT);

=> INSERT INTO foo VALUES(E'\377');

Now, if you COPY the table out, you can't COPY the resulting file back in. That means your pg_dump backups won't be able to restore. The only way to get your data back in is to re-escape that value.

There is a good post on this excellent blog about the general issues and some ways to deal with them

I don't recommend blindly running iconv on the plain text dump because it may convert valid characters(eg: Chinese characters ) to some other characters. It is better to find the invalid UTF8 character by running below command.

grep -naxv '.*' plain_text_dump.sql

and then run iconv on the particular data. Check this doc for detailed step by step explanation.

How to solve UTF8 invalid byte sequence copy errors on a restore, when the source database is encoded in UTF8?

Tags:

Postgresql

Migration

Pgadmin

Related

Recent Posts