Nginx - What is does the nodelay option do when limiting requests?

Solution 1:

TL;DR: The nodelay option is useful if you want to impose a rate limit without constraining the allowed spacing between requests.

I had a hard time digesting the other answers, and then I discovered new documentation from Nginx with examples that answers this:

Here's the pertinent part. Given:

limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s;

location /login/ {
  limit_req zone=mylimit burst=20;

The burst parameter defines how many requests a client can make in excess of the rate specified by the zone (with our sample mylimit zone, the rate limit is 10 requests per second, or 1 every 100 milliseconds). A request that arrives sooner than 100 milliseconds after the previous one is put in a queue, and here we are setting the queue size to 20.

That means if 21 requests arrive from a given IP address simultaneously, NGINX forwards the first one to the upstream server group immediately and puts the remaining 20 in the queue. It then forwards a queued request every 100 milliseconds, and returns 503 to the client only if an incoming request makes the number of queued requests go over 20.

If you add nodelay:

location /login/ {
  limit_req zone=mylimit burst=20 nodelay;

With the nodelay parameter, NGINX still allocates slots in the queue according to the burst parameter and imposes the configured rate limit, but not by spacing out the forwarding of queued requests. Instead, when a request arrives “too soon”, NGINX forwards it immediately as long as there is a slot available for it in the queue. It marks that slot as “taken” and does not free it for use by another request until the appropriate time has passed (in our example, after 100 milliseconds).

Solution 2:

The documentation here has an explanation that sounds like what you want to know:

The directive specifies the zone (zone) and the maximum possible bursts of requests (burst). If the rate exceeds the demands outlined in the zone, the request is delayed, so that queries are processed at a given speed

From what I understand, requests over the burst will be delayed (take more time and wait until they can be served), with the nodelay options the delay is not used and excess requests are denied with a 503 error.

This blog post ( gives good explanation how the rate limiting works on nginx:

If you’re like me, you’re probably wondering what the heck burst really means. Here is the trick: replace the word ‘burst’ with ‘bucket’, and assume that every user is given a bucket with 5 tokens. Every time that they exceed the rate of 1 request per second, they have to pay a token. Once they’ve spent all of their tokens, they are given an HTTP 503 error message, which has essentially become the standard for ‘back off, man!’.

Solution 3:

The way I see it is as follows:

  1. Requests will be served as fast as possible until the zone rate is exceeded. The zone rate is "on average", so if your rate is 1r/s and burst 10 you can have 10 requests in 10 second window.

  2. After the zone rate is exceeded:

    a. Without nodelay, further requests up to burst will be delayed.

    b. With nodelay, further requests up to burst will be served as fast as possible.

  3. After the burst is exceeded, server will return error response until the burst window expires. e.g. for rate 1r/s and burst 10, client will need to wait up to 10 seconds for the next accepted request.

Solution 4:

The setting defines whether requests will be delayed so that they conform to the desired rate or whether they will be simply rejected...somewhat whether the rate limiting is managed by the server or responsibility is passed to the client.

nodelay present

Requests will be handled as quickly as possible; any requests sent over the specified limit will be rejected with the code set as limit_req_status

nodelay absent (aka delayed)

Requests will be handled at a rate that conforms with the specified limit. So for example if a rate is set of 10 req/s then each request will be handled in >= .1 (1/rate) seconds, thereby not allowing the rate to be exceeded, but allowing the requests to get backed up. If enough requests back up to overflow the bucket (which would also be prevented by a concurrent connection limit), then they are rejected with the code set as limit_req_status.

The gory details are here: where that logic kicks in when the limit has not yet been passed and now the delay is optionally going to be applied to the request. The application of nodelay in particular from the directive comes into play here: causing the value of delay above to be 0 triggering that handler to immediately return NGX_DECLINED which passes the request to the next handler (rather than NGX_AGAIN which will effectively requeue it to be processed again).