Replacing non-English characters in attribute tables using ArcPy and Python?

I am too quite often dealing with special characters such as you have in Swedish (ä,ö,å), but also some others presenting in other languages such as Portuguese and Spanish (é,í,ú,ó etc.). For instance, I have data where the name of city is written in plain Latin with all the accents removed, so the "Göteborg" becomes "Goteborg" and "Åre" is "Are". In order to perform the joins and match the data I have to replace the accents to the English Latin-based character.

I used to do this as you've shown in your own answer first, but this logic soon became rather cumbersome to maintain. Now I use the unicodedata module which is already available with Python installation and arcpy for iterating the features.

import unicodedata
import arcpy
import os

def strip_accents(s):
   return ''.join(c for c in unicodedata.normalize('NFD', s)
                  if unicodedata.category(c) != 'Mn')

arcpy.env.workspace = r"C:\TempData_processed.gdb"
workspace = arcpy.env.workspace

in_fc = os.path.join(workspace,"FC")
fields = ["Adm_name","Adm_Latin"]
with arcpy.da.UpdateCursor(in_fc,fields) as upd_cursor:
    for row in upd_cursor:
        row[1] = strip_accents(u"{0}".format(row[0]))
        upd_cursor.updateRow(row)

See the link for more information about using the unicodedata module at What is the best way to remove accents in a python unicode string?

Turns out iterating over ÅÄÖ wasn't that easy. It is refered to as a unicode string, and when checking in the if-statements that has to be used instead of the literal ÅÄÖ. After I figured that out, the rest was a piece of cake :)

Resulting code:

# -*- coding: cp1252 -*-
def code(infield):
    data = ''
    for i in infield:
##        print i
        if i == u'\xc4': #Ä
            data = data + 'AE'
        elif i == u'\xe4': #ä
            data = data + 'ae'
        elif i == u'\xc5': #Å
            data = data + 'AA'
        elif i == u'\xe5': #å
            data = data + 'aa'
        elif i == u'\xd6': #Ö
            data = data + 'OE'
        elif i == u'\xf6': #ö
            data = data + 'oe'
        else:
            data = data + i
    return data


shp = arcpy.GetParameterAsText(0)
field = arcpy.GetParameterAsText(1)
newfield = field + '_U'
arcpy.AddField_management(shp, newfield, 'TEXT')

prows = arcpy.UpdateCursor(shp)

for row in prows:
    row.newfield = code(row.field)
    prows.updateRow(row)

Replacing non-English characters in attribute tables using ArcPy and Python?

Tags:

Arcpy

Arcgis 10.1

Field Calculator

Unicodeencodeerror

Related

Recent Posts