Which character encoding is used by the DBF file in shapefiles?

The original DBF standard defines to use ISO8859-1, and only ISO8859-1. So, when you get a Shapefile that is really standards conform, it should be ISO8859-1. Of course, this (very old) restriction is a not really usable nowadays.

ArcGIS and Geopublisher, AtlasStyler and Geoserver started to extend the standard to define the encoding. For ArcGIS, e.g., just create a .cpg file (with the same basename as the other Shapefiles) and fill it with the name of the encoding.

e.g. create a myshape.cpg with a texteditor and insert 5 characters "UTF-8" and save it. If you then open the Shapefile in ArcGIS, it reads the textual contents of the DBF in that charset.

Geoserver: Geoserver WFS can export any WFS layer as a zipped Shapefile. When this is done, a .cst file is contained in the zip, doing exactly the same as the .cpg file.

Attention: All this only applies to the data, not the column names. You should really only use ASCII in the column names of a DBF if you want the file to be openable with other programs.

Hint: To change the encoding of a DBF open it with OpenOffice Calc.. choose SaveAs... click the "Filter options" in the bottom left and press save. You can then define the encoding to convert the text contents into.


I'm pretty sure that there is no "right" encoding. A .dbf file can be in any encoding and you'll be able to open the Shapefile and read the attributes correctly if you know it.

You can find the ESRI white paper here: http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf

I usually expect a shapefile to be either UTF-8 or in the local of the covered country (often some Latin encoding).


Anytime I see a question on encoding I refer people to this article: http://www.joelonsoftware.com/articles/Unicode.html

As it says:

It does not make sense to have a string without knowing what encoding it uses. You can no longer stick your head in the sand and pretend that "plain" text is ASCII.