How to remove invalid characters from filenames?

Solution 1:

One way would be with sed:

mv 'file' $(echo 'file' | sed -e 's/[^A-Za-z0-9._-]/_/g')

Replace file with your filename, of course. This will replace anything that isn't a letter, number, period, underscore, or dash with an underscore. You can add or remove characters to keep as you like, and/or change the replacement character to anything else, or nothing at all.

Solution 2:

I assume you are on Linux box and the files were made on a Windows box. Linux uses UTF-8 as the character encoding for filenames, while Windows uses something else. I think this is the cause of the problem.

I would use "convmv". This is a tool that can convert filenames from one character encoding to another. For Western Europe one of these normally works:

convmv -r -f windows-1252 -t UTF-8 .
convmv -r -f ISO-8859-1 -t UTF-8 .
convmv -r -f cp-850 -t UTF-8 .

If you need to install it on a Debian based Linux you can do so by running:

sudo apt-get install convmv

It works for me every time and it does recover the original filename.

Source: LeaseWebLabs


Solution 3:

I had some japanese files with broken filenames recovered from a broken usb stick and the solutions above didn't work for me.

I recommend the detox package:

The detox utility renames files to make them easier to work with. It removes spaces and other such annoyances. It'll also translate or cleanup Latin-1 (ISO 8859-1) characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters.

Example usage:

detox -r -v /path/to/your/files
-r Recurse into subdirectories
-v Be verbose about which files are being renamed 
-n Can be used for a dry run (only show what would be changed)

Solution 4:

I assume you mean you want to traverse the filesystem and fix all such files?

Here's the way I'd do it

find /path/to/files -type f -print0 | \
perl -n0e '$new = $_; if($new =~ s/[^[:ascii:]]/_/g) {
  print("Renaming $_ to $new\n"); rename($_, $new);
}'

That would find all files with non-ascii characters and replace those characters with underscores (_). Use caution though, if a file with the new name already exists, it'll overwrite it. The script can be modified to check for such a case, but I didnt put that in to keep it simple.


Solution 5:

Following answers at https://stackoverflow.com/questions/2124010/grep-regex-to-match-non-ascii-characters, You can use:

rename 's/[^\x00-\x7F]//g' *

where * matches the files you want to rename. If you want to do it over multiple directories, you can do something like:

find . -exec rename 's/[^\x00-\x7F]//g' "{}" \;

You can use the -n argument to rename to do a dry run, and see what would be changed, without changing it.

Tags:

Linux

Bash