Why did my folder names end up like this, and how can I fix this using a script?

You can use the perl rename utility (aka prename or file-rename) to rename the directories.

NOTE: This is not to be confused with rename from util-linux, or any other version.

rename -n 's/([[:cntrl:]])/ord($1)/eg' run_*/

This uses perl's ord() function to replace each control-character in the filename with the ordinal number for that character. e.g ^A becomes 1, ^B becomes 2, etc.

The -n option is for a dry-run to show what rename would do if you let it. Remove it (or replace it with -v for verbose output) to actually rename.

The e modifier in the s/LHS/RHS/eg operation causes perl to execute the RHS (the replacement) as perl code, and the $1 is the matched data (the control character) from the LHS.

If you want zero-padded numbers in the filenames, you could combine ord() with sprintf(). e.g.

$ rename -n 's/([[:cntrl:]])/sprintf("%02i",ord($1))/eg' run_*/ | sed -n l
rename(run_\001, run_01)$
rename(run_\002, run_02)$
rename(run_\003, run_03)$
rename(run_\004, run_04)$
rename(run_\005, run_05)$
rename(run_\006, run_06)$
rename(run_\a, run_07)$
rename(run_\b, run_08)$
rename(run_\t, run_09)$

The above examples work if and only if sp.run_number in your matlab script was in the range of 0..26 (so it produced control-characters in the directory names).

To deal with ANY 1-byte character (i.e. from 0..255), you'd use:

rename -n 's/run_(.)/sprintf("run_%03i",ord($1))/e' run_*/

If sp.run_number could be > 255, you'd have to use perl's unpack() function instead of ord(). I don't know exactly how matlab outputs an unconverted int in a string, so you'll have to experiment. See perldoc -f unpack for details.

e.g. the following will unpack both 8-bit and 16-bit unsigned values and zero-pad them to 5 digits wide:

 rename -n 's/run_(.*)/sprintf("run_%05i",unpack("SC",$1))/e' run_*/

And I guess for the sake of curiosity, how in the heck did this happen in the first place?

folder = [sp.saveLocation, 'run_', sp.run_number, '/'];

where sp.run_number was an integer. I forgot to convert it to a string, but for some reason running mkdir(folder); (in matlab) still succeeded.

So, it would appear that mkdir([...]) in Matlab concatenates the members of the array to build the filename as a string. But you gave it a number instead, and numbers are what the characters on a computer really are. So, when sp.run_number was 1, it gave you the character with value 1, and then the character with value 2, etc.

Those are control characters, they don't have printable symbols, and printing them on a terminal would have other consequences. So instead, they're often represented by different sorts of escapes: \001 (octal), \x01 (hex), ^A are all common representations for the character with value 1. The character with value zero is a bit different, it's the NUL byte that is used to mark the end of a string in C and in the Unix system calls.

If you went higher than 31, you'd start to see printable characters, 32 is space (not very visible though), 33 = !, 34 = " etc.

So,

  • run_ run_^A/ run_^B/ — The first run_ corresponds to the one with a zero byte, the string ends there. The others show that your shell likes to use display the control codes with ^A. The notation also hints at the fact that the char with numerical value 1 can be entered as Ctrl-A, though you need to tell the shell to interpret as not as a control character, but as a literal, Ctrl-V Ctrl-A should do that at least in Bash.

  • ls: run_ run_? run_?ls doesn't like to print unprintable characters on the terminal, it replaces them with question marks.

  • rsync: run_\#003/ — that one's new to me, but the idea is the same, the backslash marks an escape, and the rest is the numerical value of the character. It seems to me that the number here is in octal, like in the more common \003.

  • using the command ls | LC_ALL=C sed -n l ... run_\006$ run_\a$ run_\b$ run_\t$\a, \b and \t are C escapes for alarm (bell), backspace and tab, respectively. They have the numerical values 7, 8 and 9, so it should be clear why they come after \006. Using those C escapes is yet another way to mark the control characters. The trailing dollar signs mark the line ends.

As for cd, assuming my assumptions are right, cd run_ should go to that one single directory without an odd trailing character, and cd run_? should give an error since the question mark is a glob character that matches any single character, and there are multiple matching filenames, but cd only expects one.

Which of these options is the correct representation of the folder?

All of them, in a sense...

In Bash, you can use the \000 and \x00 escapes inside $'...' quotes to represent the special characters, so $'run_\033 (octal) or $'run_\x1b' correspond to the directory with the character value 27 (which happens to be ESC). (I don't think Bash supports escapes with decimal numbers.)

cas's answer has a script to rename those, so I won't go there.


Easiest would be to create the wrong filename and the correct filename in the same environment where the mishap happened, and then just move/rename the folders to the correct names.

To avoid collisions between existing names better use another destination folder.

./saveLocationA/wrongname1 -> ./saveLocationB/correctname1
./saveLocationA/wrongname2 -> ./saveLocationB/correctname2
./saveLocationA/wrongname3 -> ./saveLocationB/correctname3

If possible, I would prefer fixing the script and just running it again; fixing some weird bug post mortem probably costs more and can introduce new problems.

Good luck!