How do ssh remote command line arguments get parsed

There is always a remote shell. In the SSH protocol, the client sends the server a string to execute. The SSH command line client takes its command line arguments and concatenates them with a space between the arguments. The server takes that string, runs the user's login shell and passes it that string. (More precisely: the server runs the program that is registered as the user's shell in the user database, passing it two command line arguments: -c and the string sent by the client. The shell is not invoked as a login shell: the server does not set the zeroth argument to a string beginning with -.)

It is impossible to bypass the remote shell. The protocol doesn't have anything like sending an array of strings that could be parsed as an argv array on the server. And the SSH server will not bypass the remote shell because that could be a security restriction: using a restricted program as the user's shell is a way to provide a restricted account that is only allowed to run certain commands (e.g. an rsync-only account or a git-only account).

You may not see the shell in pstree because it may be already gone. Many shells have an optimization where if they detect that they are about to do “run this external command, wait for it to complete, and exit with the command's status”, then the shell runs “execve of this external command” instead. This is what's happening in your first example. Contrast the following three commands:

ssh otherhost pstree -a -p
ssh otherhost 'pstree -a -p'
ssh otherhost 'pstree -a -p; true'

The first two are identical: the client sends exactly the same data to the server. The third one sends a shell command which defeats the shell's exec optimization.


I think I figured it out:

$ ssh otherhost pstree -a -p -s '$$'
init,1         
  `-sshd,3736
      `-sshd,11998
          `-sshd,12000
              `-pstree,12001 -a -p -s 12001

The arguments to pstree are to: show command line arguments, show pids, and show just parent processes of the given pid. The '$$' is a special shell variable that bash will replace with its own pid when bash evaluates the command line arguments. It's quoted once to stop it from being interpreted by my local shell. But it's not doubly quoted or escaped to allow it to be interpreted by the remote shell.

As we can see, it is replaced with 12001 so that's the pid of the shell. We can also see from the output: pstree,12001 that the process with a pid of 12001 is pstree itself. So pstree is the shell?

What I gather is going on there is that bash is being invoked and it is parsing the command line arguments, but then it invokes exec to replace itself with the command being run.

It seems that it only does this in the case of a single remote command:

$ ssh otherhost pstree -a -p -s '$$' \; echo hi
init,1         
  `-sshd,3736
      `-sshd,17687
          `-sshd,17690
              `-bash,17691 -c pstree -a -p -s $$ ; echo hi
                  `-pstree,17692 -a -p -s 17691
hi

In this case, I'm requesting two commands be run: pstree followed by echo. And we can see here that bash does in fact show up in the process tree as a parent of pstree.