Convert Text File Encoding

These Windows files with Persian text are encoded in Windows-1256. So it can be deciphered by command similar to OP tried, but with different charsets. Namely:

recode Windows-1256..UTF-8 <Windows_file.txt > UTF8_file.txt
(denounced upon original poster’s complaints; see comments)

iconv -f Windows-1256 Windows_file.txt > UTF8_file.txt

This one assumes that the LANG environment variable is set to a UTF-8 locale. To convert to any encoding (UTF-8 or otherwise), regardless of the current locale, one can say:

iconv -f Windows-1256 Windows_file.txt -t ${output_encoding} > ${output_file}

Original poster is also confused with semantic of text recoding tools (recode, iconv). For source encoding (source.. or -f) one must specify encoding with which the file is saved (by the program that created it). Not some (naïve) guesses based on mojibake characters in programs that try (but fail) to read it. Trying either ISO-8859-15 or WINDOWS-1252 for a Persian text was obviously an impasse: these encodings merely do not contain any Persian letter.


The working solution I found is using the Microsoft Visual Studio Code text editor which is Freeware and available for Linux.

Open the file you want to convert its encoding in VS-Code. At the bottom of the window, there are a few buttons. One of them is related to the file encoding, as shown below:

enter image description here

Clicking this button pops up an overhead menu which includes two items. From this menu select the "Reopen with Encoding" option, just like below:

enter image description here

This will open another menu which includes a list of different encoding, as shown below. Now select "Arabic (Windows 1256)":

enter image description here

This will fix the gibberish text like this:

enter image description here

Now click the encoding button again and this time select the "Save with Encoding" option, just as below:

enter image description here

And in the new menu select the "UTF-8" option:

enter image description here

This will save the corrected file using the UTF-8 encoding:

enter image description here

Done! :)


I don't know if this works with Farsi: I use Gedit, it gives a fault with wrong encoding, and I can chose what I want to translate to UTF-8, it was just text not lit format, but here is a screenshot!

enter image description here

Sorry I finally got through my text files, so now they are all converted.

I loved notepad++ too, miss it still.

Tags:

Encoding