What is the difference between authorized_keys and known_hosts file for SSH?

The known_hosts file lets the client authenticate the server, to check that it isn't connecting to an impersonator. The authorized_keys file lets the server authenticate the user.

Server authentication

One of the first things that happens when the SSH connection is being established is that the server sends its public key to the client, and proves (thanks to public-key cryptography) to the client that it knows the associated private key. This authenticates the server: if this part of the protocol is successful, the client knows that the server is who it claims it is.

The client may check that the server is a known one, and not some rogue server trying to pass off as the right one. SSH provides only a simple mechanism to verify the server's legitimacy: it remembers servers you've already connected to, in the ~/.ssh/known_hosts file on the client machine (there's also a system-wide file /etc/ssh/known_hosts). The first time you connect to a server, you need to check by some other means that the public key presented by the server is really the public key of the server you wanted to connect to. If you have the public key of the server you're about to connect to, you can add it to ~/.ssh/known_hosts on the client manually.

By the way, known_hosts can contain any type of public key supported by the SSH implementation, not just DSA (also RSA and ECDSA).

Authenticating the server has to be done before you send any confidential data to it. In particular, if the user authentication involves a password, the password must not be sent to an unauthenticated server.

User authentication

The server only lets a remote user log in if that user can prove that they have the right to access that account. Depending on the server's configuration and the user's choice, the user may present one of several forms of credentials (the list below is not exhaustive).

The user may present the password for the account that he is trying to log into; the server then verifies that the password is correct.
The user may present a public key and prove that he possesses the private key associated with that public key. This is exactly the same method that is used to authenticate the server, but now the user is trying to prove its identity and the server is verifying it. The login attempt is accepted if the user proves that he knows the private key and the public key is in the account's authorization list (~/.ssh/authorized_keys on the server).
Another type of method involves delegating part of the work of authenticating the user to the client machine. This happens in controlled environments such as enterprises, when many machines share the same accounts. The server authenticates the client machine by the same mechanism that is used the other way round, then relies on the client to authenticate the user.

Those two files are both used by SSH but for completely different purposes, which could easily explain your confusion.

Authorized Keys

By default SSH uses user accounts and passwords that are managed by the host OS. (Well, actually managed by PAM but that distinction probably isn't too useful here.) What this means is that when you attempt to connect to SSH with the username 'bob' and some password the SSH server program will ask the OS "I got this guy named 'bob' who's telling me his password is 'wonka'. Can I let him in?" If the answer is yes, then SSH allows you to authenticate and you go on your merry way.

In addition to passwords SSH will also let you use what's called public-key cryptography to identify you. The specific encryption algorithm can vary, but is usually RSA or DSA, or more recently ECDSA. In any case when you set up your keys, using the ssh-keygen program, you create two files. One that is your private key and one that is your public key. The names are fairly self-explanatory. By design the public key can be strewn about like dandelion seeds in the wind without compromising you. The private key should always be kept in the strictest of confidence.

So what you do is place your public key in the authorized_keys file. Then when you attempt to connect to SSH with username 'bob' and your private key it will ask the OS "I got this guy name 'bob', can be be here?" If the answer is yes then SSH will inspect your private key and verify if the public key in the authorized_keys file is its pair. If both answers are yes, then you are allowed in.

Known Hosts

Much like how the authorized_keys file is used to authenticate users the known_hosts file is used to authenticate servers. Whenever SSH is configured on a new server it always generates a public and private key for the server, just like you did for your user. Every time you connect to an SSH server, it shows you its public key, together with a proof that it possesses the corresponding private key. If you do not have its public key, then your computer will ask for it and add it into the known_hosts file. If you have the key, and it matches, then you connect straight away. If the keys do not match, then you get a big nasty warning. This is where things get interesting. The 3 situations that a key mismatch typically happens are:

The key changed on the server. This could be from reinstalling the OS or on some OSes the key gets recreated when updating SSH.
The hostname or IP address you are connecting to used to belong to a different server. This could be address reassignment, DHCP, or something similar.
Malicious man-in-the-middle attack is happening. This is the biggest thing that key checking is trying to protect you from.

In both cases, known_hosts and authorized_keys, the SSH program is using public key cryptography in order to identify either the client or the server.

About Secure Files Containing Public Keys

To help you understand how "known_hosts" and "authorized_keys" are different, here is some context explaining how those files fit into "ssh". This is an over-simplification; there are lots more capabilities and complications to "ssh" than are mentioned here.

Associations are in Trusted Sources

While it has been said that public-key values "can be safely strewn about like seeds in the wind," keep in mind that it's the gardner, not the seed-pod, who decides which seeds get established in the garden. Altough a public-key is not secret, fierce protection is required to preserve the trusted association of the key with the thing that the key is authenticating. The places entrusted to make this association include "known_hosts", "authorized_keys", and "Certificate Authority" listings.

The Trusted Sources Used by "ssh"

For a public-key to be relevant to "ssh," the key must be registered ahead of time, and stored in the appropriate secure file. (This general truth has one important exception, which will be discussed later.) The server and client each have their own, securely stored list of public-keys; a login will succeed only if each side is registered with the other.

"known_hosts" resides on the client
"authorized_keys" resides on the server

The client's secure file is called "known_hosts", and the server's secure file is called "authorized_keys". These files are similar in that each has text with one public-key per line, but they have subtle differences in format and usage.

Key-pairs are Used for Authentication

A public-private key pair are used to perform "asymmetric cryptography." The "ssh" program can use asymmetric cryptography for authentication, where an entity has to answer a challenge to prove its identity. The challenge is created by encoding with one key, and answered by decoding with the other key. (Note that asymmetric cryptogrophy is used only during the login phase; then "ssh" (TSL/SSL) switches to another form of encryption to handle the data stream.)

One Key-pair for Server, Another for Client

In "ssh", both sides (client and server) are suspicious of the other; this is an improvement over the predecessor to "ssh," which was "telnet". With "telnet", the client was required to provide a password, but the server was not vetted. The lack of vetting allowed "man-in-the-middle" attacks to occur, with catastrophic consequences to security. By contrast, in the "ssh" process, the client surrenders no information until the server first answers a challenge.

The Steps in "ssh" Authentication

Before sharing any login information, the "ssh" client first eliminates the opportunity for a man-in-the-middle attack by challenging the server to prove "Are you really who I think you are?" To make this challenge, the client needs to know the public-key that is associated with the target server. The client must find the server's name in the "known_hosts" file; the associated public-key is on the same line, after the server name. The association between server-name and public-key must be kept inviolate; therefore permissions on the "known_hosts" file must be 600 -- nobody else can write (nor read).

Once the server has authenticated, it gets a chance to challenge the client. The authentication will involve one of the public-keys found in the "authorized_keys". (When none of those keys works, the "sshd" process falls-back on password style authentication.)

The File Formats

So for "ssh", as with any login process, there are lists of "friends", and only those on the list are allowed to attempt to pass a challenge. For the client, the "known_hosts" file is a list of friends who can act as servers (hosts); these are listed by name. For the server, the equivalent list of friends is the "authorized_keys" file; but there are no names in that file, since the public-keys themselves act like identifiers. (The server doesn't care where the login is coming from, but only where it's going. The client is attempting to access a particular account, the account name was specified as a parameter when "ssh" was invoked. Remember that the "authorized_keys" file is specific to that account, since the file is under that account's home directory.)

Although there are many capabilities that can be expressed in a configuration entry, the basic, most common usage has the following parameters. Note that parameters are separated by space characters.

For "known_hosts":

{server-id} ssh-rsa {public-key-string} {comment}

For "authorized_keys":

ssh-rsa {public-key-string} {comment}

Note that the token ssh-rsa indicates that the algorithm used for encoding is "rsa". Other valid algorithms include "dsa" and "ecdsa". Therefore, a different token might take the place of the ssh-rsa shown here.

Let "ssh" Auto-Configure the "known_hosts" Entry

In both cases, if the public key is not found within a secure file, then assymetric encryption does not happen. As mentioned earlier, there is one exception to this rule. A user is allowed to knowingly choose to risk the possibility of a man-in-the-middle attack by logging into a server that is not listed in the user's "known_hosts" file. The "ssh" program warns the user, but if the user chooses to go forward, the "ssh" client allows it "just this once." To assure it happens just once, the "ssh" process automatically configures the "known_hosts" file with the required information by asking the server for the public-key, and then writing that into the "known_hosts" file. This exception totally subverts security by allowing the adversary to provide the association of a server-name with a public-key. This security risk is allowed because it makes things so much easier for so many people. Of course, the correct and secure method would have been for the user to manually insert a line with server-name and public-key into the "known_hosts" file before ever attempting to login to the server. (But for low-risk situations, the extra work might be pointless.)

The One-to-Many Relationships

An entry in the client's "known_hosts" file has the name of a server and a public-key that is applicable to the server machine. The server has a single private-key that is used to answer all challenges, and the client's "known_hosts" entry must have the matching public-key. Therefore, all clients that ever access a particular server will have the identical public-key entry in their "known_hosts" file. The 1:N relation is that a server's public-key can appear in many client's "known_hosts" files.

An entry in the "authorized_keys" file identifies that a friendly client is allowed to access the account. The friend might use the same public-private key pair to access multiple, different servers. This allows a single key-pair to authenticate to all servers ever contacted. Each of the targeted server accounts would have the identical public-key entry in their "authorized_keys" files. The 1:N relation is that one client's public-key can appear in the "authorized_keys" files for multiple accounts on multiple servers.

Sometimes, users who work from multiple client machines will replicate the same key pair; typically this is done when a user works on a desk-top and a lap-top. Because the client machines authenticate with identical keys, they will match the same entry in the server's "authorized_keys".

Location of Private Keys

For the server side, a system process, or daemon, handles all incoming "ssh" login requests. The daemon is named "sshd". The location of the private key depends upon the SSL installation, for example Apple puts it at /System/Library/OpenSSL, but after installing your own version of OpenSSL, the location will be /opt/local/etc/openssl.

For the client side, you invoke "ssh" (or "scp") when you need it. Your command line will include various parameters, one of which may optionally specify which private key to use. By default, the client side key-pair are often called $HOME/.ssh/id_rsa and $HOME/.ssh/id_rsa.pub.

Summary

Bottom line is that both "known_hosts" and "authorized_keys" contain public keys, but ...

known_hosts -- the client checks if host is genuine
authorized_keys -- the host checks whether client login is allowed