Why is it a bad idea to use plain oauth2 for authentication?

Note: If you are looking for something like OAuth2, but for authentication, you should
use OpenId Connect instead.


OAuth2 is meant for a user to authorize an application to load the user's resources from some resource provider. In other words: OAuth2 is a mechanism for delegation of authorization. The protocol does not support authentication (although it is commonly misused for exactly that).

The security hole is in the assumption you make in the 5th bullet point.

You say:

Site A asks site B for Sally's user_id and logs her in with that ID

While in reality it should read:

Site A asks site B for the user_id from the user-data that the access_token grants access to.

enter image description here Figure 1: OAuth flow for (confidential) clients.

If all goes as planned, the access_token is, indeed, from the user you redirected to B for authentication. But: there is no guarantee that is the case. In fact, any (malicious) website that the user has previously granted the right to access the user's data (using OAuth2 with B), can get a valid authorization_code from B and send it to you, in bullet point 3.

In other words, if I run a website which asks users for their permission to access their resources at B using OAuth2, I can impersonate all those users at all websites which misuse OAuth2 (with B as OAuth2 Authorization server) for authentication.

The 'problem' with OAuth2 is that the authorization_code is not generated for a specific client_id. So if you receive an authorization_code, you can not be sure if B issued the authorization_code you received to you, or to some other service. Which is deemed acceptable for authorization but is absolutely unacceptable for authentication.

Update:
As to your comment:

(and I restrict A to only accept one from users it previously redirected to B, but this is still unauthenticated)

I believe that you are here adding an extra precaution, which is not mandatory in the OAuth protocol. As such, it cannot be relied upon.


As explained by Jacco, a naive implementation of authentication on top of oauth2 has several vulnerabilities, the most common of which is CSRF.

Given there's a perfectly good authentication protocol available without all this pitfalls, it's not a good idea to roll your own.

OTOH, there's a lot to learn by doing it and understanding and fixing these issues.

TL;DR: don't use oauth2 for authentication unless you're doing it to learn why you shouldn't do it. Use OpenID Connect.

OAuth 2.0 Threat Model and Security Considerations

First and foremost, there's an extensive analysis of the threat model for oauth2 in RFC6819

There's several possible "flows" in oauth2. The one I focused on for my project was the authorization_code flow.

Authorization "code"

Here's what RFC6819 has to say about it:

An authorization "code" represents the intermediate result of a successful end-user authorization process and is used by the client to obtain access and refresh tokens. Authorization "codes" are sent to the client's redirect URI instead of tokens for two purposes:

  1. Browser-based flows expose protocol parameters to potential attackers via URI query parameters (HTTP referrer), the browser cache, or log file entries, and could be replayed. In order to reduce this threat, short-lived authorization "codes" are passed instead of tokens and exchanged for tokens over a more secure direct connection between the client and the authorization server.

  2. It is much simpler to authenticate clients during the direct request between the client and the authorization server than in the context of the indirect authorization request. The latter would require digital signatures.

So authorization codes are more secure, yay!

authorization_code flow vulnerabilities are analyzed in section 4.4.1 of RFC6819.

This section covers a lot of ground. I'll just focus on a few of the threats.

CSRF

From section 4.4.1.8:

An attacker could authorize an authorization "code" to their own protected resources on an authorization server. He then aborts the redirect flow back to the client on his device and tricks the victim into executing the redirect back to the client. The client receives the redirect, fetches the token(s) from the authorization server, and associates the victim's client session with the resources accessible using the token.

Impact: The user accesses resources on behalf of the attacker. [...] For example, the user may upload private items to an attacker's resources

This is also covered in section 10.12 of RFC6749:

The client MUST implement CSRF protection for its redirection URI. This is typically accomplished by requiring any request sent to the redirection URI endpoint to include a value that binds the request to the user-agent's authenticated state (e.g., a hash of the session cookie used to authenticate the user-agent). The client SHOULD utilize the "state" request parameter to deliver this value to the authorization server when making an authorization request.

So in your redirect to the oauth2 provider you simply add a parameter state, which is simply a CSRF token (should be unguessable, stored in a secure cookie, etc.). This token will be sent back along with the authorization_code when the oauth2 provider redirects the user back.

The countermeasure for this attack has to be implemented by both the client and the authorization server, and can also be enforced by the authorization server.

The state parameter is also covered in this sec.SE question.

Code Substitution (OAuth Login)

This one (covered in section 4.4.1.13 of RFC6819) is specifically aimed at the authentication over oauth2 scenario.

Basically an attacker obtains an authorization_code for the user through a malicious site (let's call it site C) and sends it to the legitimate site (which we're still calling site A) which exchanges it for an access_token that is then used to assert the user's identity through the resource server. This effectively lets the attacker login as the user on site A.

This is the one mentioned by Jacco in his answer.

The countermeasure for this attack has to be implemented by the authorization server:

All clients must indicate their client ids with every request to exchange an authorization "code" for an access token. The authorization server must validate whether the particular authorization "code" has been issued to the particular client. If possible, the client shall be authenticated beforehand.

Others

Believe it or not, the previous attacks and their countermeasures cover most of the threats to authentication when using the code flow.

There's lots of other threats and countermeasures, many of which should always be implemented:

From section 4.4.1.3:

Handle-based tokens must use high entropy Authenticate the client; this adds another value that the attacker has to guess Bind the authorization "code" to the redirect URI; this adds another value that the attacker has to guess Use short expiry time for tokens

These should all be implemented by the authorization server.

From section 4.1.1.4:

The authorization server should authenticate the client The authorization server should validate the client's redirect URI against the pre-registered redirect URI

These should also be implemented by the authorization server.

From section 4.4.1.5, 4.4.1.6 and others:

the redirect URI of the client should point to an HTTPS protected endpoint

This one should be implemented by the client, and probably enforced by the authorization server.

Then it's okay to use oauth2 for login

Nope. Don't do it. Use OpenID Connect.

Remember the countermeasures from section 4.4.1.13? Well there was another one I didn't quote:

Clients should use an appropriate protocol, such as OpenID (cf. [OPENID]) or SAML (cf. [OASIS.sstc-saml-bindings-1.1]) to implement user login. Both support audience restrictions on clients.

There you go. Use that instead.

If you still want/need to authenticate against an oauth2 provider, first make sure your provider implements all the countermeasures previously mentioned above.

If it does then you may be able to pull it off. Test extensively and hire a security team to perform a full analysis of your solution.

Also, make sure all the provider's features that you rely on for security are documented in you provider's API, otherwise they might be removed without previous notice and you end up with a Very Broken™ product.

In my case: - I was lucky enough that my provider implemented all of these countermeasures on their side. - I'm not relying on this for authentication beyond an initial testing period of the app (it's not a required feature of my app, just a convenient placeholder pre-launch)

Also, I learned enough about oauth2 throughout this implementation to make it well worth it.

If you want to know more, read both RFC6819 and RFC6749. I also found this site very useful.