What characters are safe in cross-platform file names for Linux, Windows and OS-X

While RedGrittyBrick's answer is technically correct, safety isn't the only issue: usability is also important. I think a better question is "what characters are good to use in a filename".

Some potential guidelines:

  • [0-9a-zA-Z_] - Alphanumeric characters and the underscore are always fine to use.
  • \/:*?"<>| and the null byte are problematic on at least one system, and should always be avoided.
  • Spaces are used as argument separators on many systems, so filenames with spaces should be avoided when possible. Other whitespaces (e.g. tabs) even more so.
  • Semicolons (;) are used to separate commands on many systems. Semicolons and commas(,) are used to separate command line arguments on (some versions of?) the windows command line.
  • []()^ #%&!@:+={}'~ and [`] all have special meanings in many shells, and are annoying to work around, and so should be avoided. They also tend to look horrible in URLs.
  • Leading characters to avoid:
    • Many command line programs use the hyphen [-] to indicate special arguments.
    • *nix based systems use a full-stop [.] as a leading character for hidden files and directories.
  • Anything not in the ASCII set can cause problems on older or more basic systems (e.g. some embedded systems), and should be used with care.

That basically leaves you with:

[0-9a-zA-Z-._]

that are always safe and not annoying to use (as long as you start the filename with an alpha-numeric) :)


Summary:

  • Windows: anything except ASCII's control characters and \/:*?"<>|
  • Linux, OS-X: anything except null or /

On all platforms it is best to avoid non-printable characters such as the ASCII control-characters.

Windows

In Windows, Windows Explorer does not allow control-characters or \/:*?"<>| You can use spaces. If you use spaces, you will often have to quote the filename when used from the command line (but GUI apps are unaffected so far as I know). Windows filesystem such as NTFS apparently store the encoding with the filename, but UTF-16 is standard.

Some parts of Windows are case-sensitive, other parts are case-insensitive. It is easy to create distinct filenames like "Ab" and "ab" on a Windows NTFS filesystem. These names refer to separate files which contain distinct separate content. However, although the Windows command-prompt will happily list both files using dir, you cannot easily access or manipulate one of them using commands such as type. See below.

Linux, OS-X

In Linux and OS-X only / of the printable ASCII set is prohibited I believe. Some characters (shell metacharacters like *?!) will cause problems in command lines and will require the filename to be appropriately quoted or escaped.

Linux filesystems such as ext2, ext3 are character-set agnostic (I think they just treat it more or less as a byte stream - only nulls and / are prohibited). This means you can store filenames in UTF-8 encoding. I believe it is up to the shell or other application to know what encoding to use to properly convert the filename for display or processing.

Conclusion

So you could probably safely use something like (if it weren't so hard to type)


Case-(in)sensitivity in Windows

C> dir /B
Ab
aB
аB

C> type Ab
b
b

C> type aB
b
b

C> type аB
unicode homograph

Note that we cannot type the contents of the second file, the Windows type command just returns the contents of Ab instead. The third file would be distinct from aB on Linux also.

(Windows 10 NTFS).


You could:

  1. replace current underscores with # (proofreader's symbol for space)
  2. underscore to 'section' date from filename (or a second hyphen - easier to type)

Alt-1. initial-caps can replace spaces: YYMMDD-HHMM-FileName.ext or YYMMDD-HHMM_FileName.ext

Minimal characters for clear display, which auto-sorts with padded zeroes for Jan-Sep (& 1st-9th ea mo).

Tags:

Filenames