How to specify character encoding for 7z?

Depending on the encoding used to create the zip file, you might be able to prevent unwanted translations by temporarily setting the locale to "C":

LC_ALL=C 7z x $archive

(This helped for a zip created by IZArc on Win7, using two of your example filenames.)

However, for the archive in the question, the "filename" field contains the CP1251 encoding of "ДКП.doc" (84 8a 8f 2e 64 6f 63). The "extra" field uses an Info-zip extension (see section 4.6.9 of the Zip Specification v 6.3.4 ) to store the UTF-8 filename. unzip knows about this header, and uses the UTF-8 name, ignoring the CP1251 one.

7z doesn't do anything with this "extra field", and only uses the CP1251 one. Depending on the current locale, it might create the file using that exact name (the raw bytes 84 8a 8f), or worse, treat them as unicode points to be expanded to UTF-8 first (c2 84 c2 8a c2 8f).

One option is to use external utilities to change the zip first:

#!/bin/bash

cp orig.zip renamed.zip

index=0
zipinfo -1 orig.zip | while read name ; do
        ziptool renamed.zip rename $index "$name"
        index=$((index+1))
done

ziptool is from libzip. zipinfo is distributed with Info-ZIP's UnZip, so you might as well have just used unzip.


I've found this discussion thread with the author of p7zip, Igor Pavlov, on p7zip's page: OEM charset issues in Linux. It's a twin of this Q&A. This post says it all.

Probably -mcp switch doesn't work in p7zip. But -mcp works in 7-zip (Windows version). So now I don't know how to make it working for p7zip. the function: Rusting MultiByteToUnicodeString(const AString &srcString, UINT codePage) in CPP\Common\StringConvert.cpp

It's dated 2016-04-18. I checked the latest p7zip release from July and the switch is still missing. At least in documentation, as I didn't test.

Tags:

Locale

7Z