Why is this binary file transferred over "ssh -t" being changed?

TL;DR

Don't use -t. -t involves a pseudo-terminal on the remote host and should only be used to run visual applications from a terminal.

Explanation

The linefeed character (also known as newline or \n) is the one that when sent to a terminal tells the terminal to move its cursor down.

Yet, when you run seq 3 in a terminal, that is where seq writes 1\n2\n3\n to something like /dev/pts/0, you don't see:

1
 2
  3

but

1
2
3

Why is that?

Actually, when seq 3 (or ssh host seq 3 for that matter) writes 1\n2\n3\n, the terminal sees 1\r\n2\r\n3\r\n. That is, the line-feeds have been translated to carriage-return (upon which terminals move their cursor back to the left of the screen) and line-feed.

That is done by the terminal device driver. More exactly, by the line-discipline of the terminal (or pseudo-terminal) device, a software module that resides in the kernel.

You can control the behaviour of that line discipline with the stty command. The translation of LF -> CRLF is turned on with

stty onlcr

(which is generally enabled by default). You can turn it off with:

stty -onlcr

Or you can turn all output processing off with:

stty -opost

If you do that and run seq 3, you'll then see:

$ stty -onlcr; seq 3
1
 2
  3

as expected.

Now, when you do:

seq 3 > some-file

seq is no longer writing to a terminal device, it's writing into a regular file, there's no translation being done. So some-file does contain 1\n2\n3\n. The translation is only done when writing to a terminal device. And it's only done for display.

similarly, when you do:

ssh host seq 3

ssh is writing 1\n2\n3\n regardless of what ssh's output goes to.

What actually happens is that the seq 3 command is run on host with its stdout redirected to a pipe. The ssh server on host reads the other end of the pipe and sends it over the encrypted channel to your ssh client and the ssh client writes it onto its stdout, in your case a pseudo-terminal device, where LFs are translated to CRLF for display.

Many interactive applications behave differently when their stdout is not a terminal. For instance, if you run:

ssh host vi

vi doesn't like it, it doesn't like its output going to a pipe. It thinks it's not talking to a device that is able to understand cursor positioning escape sequences for instance.

So ssh has the -t option for that. With that option, the ssh server on host creates a pseudo-terminal device and makes that the stdout (and stdin, and stderr) of vi. What vi writes on that terminal device goes through that remote pseudo-terminal line discipline and is read by the ssh server and sent over the encrypted channel to the ssh client. It's the same as before except that instead of using a pipe, the ssh server uses a pseudo-terminal.

The other difference is that on the client side, the ssh client sets the terminal in raw mode. That means that no translation is done there (opost is disabled and also other input-side behaviours). For instance, when you type Ctrl-C, instead of interrupting ssh, that ^C character is sent to the remote side, where the line discipline of the remote pseudo-terminal sends the interrupt to the remote command.

When you do:

ssh -t host seq 3

seq 3 writes 1\n2\n3\n to its stdout, which is a pseudo-terminal device. Because of onlcr, that gets translated on host to 1\r\n2\r\n3\r\n and sent to you over the encrypted channel. On your side there is no translation (onlcr disabled), so 1\r\n2\r\n3\r\n is displayed untouched (because of the raw mode) and correctly on the screen of your terminal emulator.

Now, if you do:

ssh -t host seq 3 > some-file

There's no difference from above. ssh will write the same thing: 1\r\n2\r\n3\r\n, but this time into some-file.

So basically all the LF in the output of seq have been translated to CRLF into some-file.

It's the same if you do:

ssh -t host cat remote-file > local-file

All the LF characters (0x0a bytes) are being translated into CRLF (0x0d 0x0a).

That's probably the reason for the corruption in your file. In the case of the second smaller file, it just so happens that the file doesn't contain 0x0a bytes, so there is no corruption.

Note that you could get different types of corruption with different tty settings. Another potential type of corruption associated with -t is if your startup files on host (~/.bashrc, ~/.ssh/rc...) write things to their stderr, because with -t the stdout and stderr of the remote shell end up being merged into ssh's stdout (they both go to the pseudo-terminal device).

You don't want the remote cat to output to a terminal device there.

You want:

ssh host cat remote-file > local-file

You could do:

ssh -t host 'stty -opost; cat remote-file` > local-file

That would work (except in the writing to stderr corruption case discussed above), but even that would be sub-optimal as you'd have that unnecessary pseudo-terminal layer running on host.


Some more fun:

$ ssh localhost echo | od -tx1
0000000 0a
0000001

OK.

$ ssh -t localhost echo | od -tx1
0000000 0d 0a
0000002

LF translated to CRLF

$ ssh -t localhost 'stty -opost; echo' | od -tx1
0000000 0a
0000001

OK again.

$ ssh -t localhost 'stty olcuc; echo x'
X

That's another form of output post-processing that can be done by the terminal line discipline.

$ echo x | ssh -t localhost 'stty -opost; echo' | od -tx1
Pseudo-terminal will not be allocated because stdin is not a terminal.
stty: standard input: Inappropriate ioctl for device
0000000 0a
0000001

ssh refuses to tell the server to use a pseudo-terminal when its own input is not a terminal. You can force it with -tt though:

$ echo x | ssh -tt localhost 'stty -opost; echo' | od -tx1
0000000   x  \r  \n  \n
0000004

The line discipline does a lot more on the input side.

Here, echo doesn't read its input nor was asked to output that x\r\n\n so where does that come from? That's the local echo of the remote pseudo-terminal (stty echo). The ssh server is feeding the x\n it read from the client to the master side of the remote pseudo-terminal. And the line discipline of that echoes it back (before stty opost is run which is why we see a CRLF and not LF). That's independent from whether the remote application reads anything from stdin or not.

$ (sleep 1; printf '\03') | ssh -tt localhost 'trap "echo ouch" INT; sleep 2'
^Couch

The 0x3 character is echoed back as ^C (^ and C) because of stty echoctl and the shell and sleep receive a SIGINT because stty isig.

So while:

ssh -t host cat remote-file > local-file

is bad enough, but

ssh -tt host 'cat > remote-file' < local-file

to transfer files the other way across is a lot worse. You'll get some CR -> LF translation, but also problems with all the special characters (^C, ^Z, ^D, ^?, ^S...) and also the remote cat will not see eof when the end of local-file is reached, only when ^D is sent after a \r, \n or another ^D like when doing cat > file in your terminal.


When using that method to copy the file the files appear to be different.

Remote server

ls -l | grep vim_cfg
-rw-rw-r--.  1 slm slm 9783257 Aug  5 16:51 vim_cfg.tgz

Local server

Running your ssh ... cat command:

$ ssh dufresne -t 'cat ~/vim_cfg.tgz' > vim_cfg.tgz

Results in this file on the local server:

$ ls -l | grep vim_cfg.tgz 
-rw-rw-r--. 1 saml saml 9820481 Aug 24 12:13 vim_cfg.tgz

Investigating why?

Investigating the resulting file on the local side shows that it's been corrupted. If you take the -t switch out of your ssh command then it works as expected.

$ ssh dufresne 'cat ~/vim_cfg.tgz' > vim_cfg.tgz

$ ls -l | grep vim_cfg.tgz
-rw-rw-r--. 1 saml saml 9783257 Aug 24 12:17 vim_cfg.tgz

Checksums now work too:

# remote server
$ ssh dufresne "md5sum ~/vim_cfg.tgz"
9e70b036836dfdf2871e76b3636a72c6  /home/slm/vim_cfg.tgz

# local server
$ md5sum vim_cfg.tgz 
9e70b036836dfdf2871e76b3636a72c6  vim_cfg.tgz

Tags:

Ssh

File Copy