Can not use `cut -c` (`--characters`) with UTF-8?

You haven't said which cut you're using, but since you've mentioned the GNU long option --characters I'll assume it's that one. In that case, note this passage from info coreutils 'cut invocation':

‘-c character-list’
‘--characters=character-list’

Select for printing only the characters in positions listed in character-list. The same as -b for now, but internationalization will change that.

(emphasis added)

For the moment, GNU cut always works in terms of single-byte "characters", so the behaviour you see is expected.


Supporting both the -b and -c options is required by POSIX — they weren't added to GNU cut because it had multi-byte support and they worked properly, but to avoid giving errors on POSIX-compliant input. The same -c has been done in some other cut implementations, although not FreeBSD's and OS X's at least.

This is the historic behaviour of -c. -b was newly added to take over the byte role so that -c can work with multi-byte characters. Maybe in a few years it will work as desired consistently, although progress hasn't exactly been quick (it's been over a decade already). GNU cut doesn't even implement the -n option yet, even though it is orthogonal and intended to help the transition. There are potential compatibility problems with old scripts, which may be a concern, although I don't know definitively what the reason is.


colrm (part of util-linux, should be already installed on most distributions) seems to handle internationalization much better :

$ echo 'αβγ' | colrm 3
αβ
$ echo 'αβγ' | colrm 2
α

Beware of the numbering : colrm N will remove columns from N, printing characters up to N-1.

(credits)


Since many grep implementations are multibyte-aware, you can also use grep -o to simulate some uses of cut -c.

First two characters:

$ echo Τηεοδ29 | grep -o '^..'
Τη

Last two characters:

$ echo Τηεοδ29 | grep -o '...$'
δ29

Second character:

$ echo Τηεοδ29 | grep -o '^..' | grep -o '.$'
η

Adjust the number of periods, or use {x,y} syntax, to simulate cut ranges.