Why is setting a variable before a command legal in bash?

Relevant information can be found on the man page provided by the BASH maintainer (last checked August 2020). Section Shell Grammar, Simple Commands states (emphasis added):

A simple command is a sequence of optional variable assignments followed by blank-separated words and redirections, and terminated by a control operator. The first word specifies the command to be executed, and is passed as argument zero. The remaining words are passed as arguments to the invoked command.

So you can pass any variable you'd like. Your echo example does not work because the variables are passed to the command, not set in the shell. The shell expands $x and $y before invoking the command. This works, for example:

$ x="once upon" y="a time" bash -c 'echo $x $y'
once upon a time

The defined variables become like environment variables on the forked process.

If you run

A="b" echo $A

then bash first expands $A into "" and then runs

A="b" echo

Here is the correct way:

x="once upon" y="a time" bash -c 'echo $x $y'

Notice the single quotes in bash -c, otherwise you have the same problem as above.

So your loop example is legal because the bash builtin 'read' command will look for IFS in its environment variables, and find ,. Therefore,

for i in `TEST=test bash -c 'echo $TEST'`
  echo "TEST is $TEST and I is $i"

will print TEST is and I is test

Lastly, as for syntax, in a for loop a string is expected. Therefore I had to use backticks to make it into a command. However, while loops expect command syntax, such as IFS=, read xx yy zz.

man bash


[...] The environment for any simple command or function may be augmented temporarily by prefixing it with parameter assignments, as described above in PARAMETERS. These assignment statements affect only the environment seen by that command.

The variables are expanded before the variable assignment takes place. For the obvious reason that var=x would work the other way, too, but var=$othervar would not. I.e. your $x is needed before it is available. But that is not the main problem. The main problem is that the command line can be modified by the shell environment only but the assignment does not become part of the shell environment.

You mix up to features: You want a command line replacement but put the variable definition into the commands environment. Command line replacements have to be made by the shell. The environment must be explicitly used by the called command. Whether and how this is done depends on the command.

The advantage of this usage is that you can set the environment for a subprocess without affecting the shell environment.

x="once upon" y="a time" bash -c 'echo $x $y'

works as you expect because in that case both features are combined: The command line replacement is not done by the calling shell but by the subprocess shell.