Apache Tomcat chokes after 300 connections

Solution 1:

Have you increased maxThreads in the AJP 1.3 Connector on port 8009?

Solution 2:

Consider setting up an asynchronous proxying web server like nginx or lighttpd in front of Apache. Apache serves content synchronously so workers are blocked until clients download generated content in full (more details here). Setting up an asynchronous (non-blocking) proxy usually improves situation dramatically (I used to lower the number of concurrently running Apache workers from 30 to 3-5 using nginx as a frontend proxy).


Solution 3:

I suspect your problem is in tomcat not apache, from the logs you have shown anyway. When you get 'error 110' trying to connect back into tomcat it indicates you've got a queue of connections waiting to be served that no more can fit into the listening backlog setup for the listening socket in tomcat.

From the listen manpage:
   The  backlog  parameter defines the maximum length the queue of pending 
   connections may grow to.  If a connection request arrives with
   the queue full the client may receive an error with an indication
   of ECONNREFUSED or, if the underlying protocol supports  
   retransmission, the request may be ignored so that retries succeed.

If I had to guess, I would suspect that the vast majority of HTTP requests when the server is "choking" is blocked waiting for something to come back from tomcat. I bet if you attempted to fetch some static content thats directly served up by apache (rather than being proxied to tomcat) that this would work even when its normally 'choking'.

I am not familiar with tomcat unfortunately, but is there a way to manipulate the concurrency settings of this instead?

Oh, and you might need to also consider the possibility that its the external network services thats limiting the number of connections that it is doing to you down to 300, so it makes no difference how much manipulating of concurrency you are doing on your front side if practically every connection you make relies on an external web services response.

In one of your comments you mentioned data goes stale after 2 minutes. I'd suggest caching the response you get from this service for two minutes to reduce the amount of concurrent connections you are driving to the external web service.


Solution 4:

The first step to troubleshoot this is enabling Apache's mod_status and studying its report — until you've done this, actually you're blindly walking. That's not righteous. ;-)

The second thing to mention (I by myself dislike to be told answers to questions I wasn't asking, but ...) is using more efficient and special front-ends servers like nginx.

Also, did you exactly restart apache, or just gracefully reloaded it? :)