How to Handle Sudden Burst in New HTTPS Connections?

Solution 1:

Thank you @MichaelHampton for your help.

I found a solution for my problem, and hopefully it may help others (particularly if you are using Java).

I have heard many suggestions to simply increase nofiles to allow more connections, but i'd like to start by reiterating that the problem is not that the server isn't able to make more connections, it's that it's not able to make connections quick enough and dropping connections.

My first attempt to solve this problem was to increase the connection queue through net.ipv4.tcp_max_syn_backlog, net.core.somaxconn and again in the application's server config where appropriate. For vertx this is server.setAcceptBacklog(...);. This resulted in accepting more connections in queue, but it didn't make establishing the connections any faster. From a connecting client's point of view, they were no longer reset connections due to overflow, establishing connections just took much longer. For this reason, increasing the connection queue wasn't a real solution and just traded one problem for another.

Trying to narrow down where in the connection process the bottleneck was, I tried the same benchmarks with HTTP instead of HTTPS and found that the problem went away completely. My particular problem was with the TLS Handshake itself and the servers ability to satisfy it.

With some more digging into my own application, I found that replacing Javas default SSLHandler with a native one (OpenSSL) greatly increased the speed of connecting via HTTPS.

Here were the changes I made for my specific application (using Vertx 3.9.1).

  1. Add netty-tcnative dependencies
<!-- https://mvnrepository.com/artifact/io.netty/netty-tcnative -->
<dependency>
    <groupId>io.netty</groupId>
    <artifactId>netty-tcnative</artifactId>
    <version>2.0.31.Final</version>
    <classifier>osx-x86_64</classifier>
    <scope>runtime</scope>
</dependency>

<!-- https://mvnrepository.com/artifact/io.netty/netty-tcnative -->
<dependency>
    <groupId>io.netty</groupId>
    <artifactId>netty-tcnative</artifactId>
    <version>2.0.31.Final</version>
    <classifier>linux-x86_64-fedora</classifier>
    <scope>compile</scope>
</dependency>

The first dependency is for osx to test at runtime. The second is for centos linux when compiled. linux-x86_64 is also available for other flavors. I tried to use boringssl because openssl doesn't support ALPN but after many hours I couldn't get it to work so i've decided to live without http2 for now. With most connections only sending 1-2 small requests before disconnecting this really isn't an issue for me anyway. If you could use boringssl instead, that's probably preferred.

  1. Because I am not using an uber version of the dependency. I needed to install the os dependencies for centos. This was added to the Dockerfile
RUN yum -y install openssl
RUN yum -y install apr
  1. To tell the vertx server to use OpenSSL instead of the Java version, set the OpenSSL options on the server (even if just the default object)
httpServerOptions.setOpenSslEngineOptions(new OpenSSLEngineOptions());
  1. Finally, in my run script, I added the io.netty.handler.ssl.openssl.useTasks=true option to Java. This tells the ssl handler to use tasks when handling the requests so that it is non-blocking.
java -Dio.netty.handler.ssl.openssl.useTasks=true -jar /app/application.jar

After these changes, I am able to establish connections much quicker with less overhead. What took tens of seconds before and resulted in frequent connection resets now takes 1-2 seconds with no resets. Could be better, but a big improvement from where I was.

Solution 2:

Nice fix!.

So it seems to be the SSL layer, it certainly has to do a lot more processing, in terms network handshakes, and crypto transformations which take resources. Unless your SSL can offload some of the processing onto hardware, SSL can certainly increase load on your servers, and as you found out not all SSL libraries are created equal!.

These problems are a great candidate for a front end reverse proxy. This can ideally be place before your application, and handle all SSL connections to clients, and then do http to your back end.

Your original application has a little bit less to do, as your front end reverse proxy can soak up all the SSL work, and tcp connection management.

Apache and NGNIX can do this, and has quite a few options for load balancing those connections to the least loaded backend server.

You will find that NGNIX can do SSL terminations a lot faster than java can, and even if java can, your distributing the processing of the connection management across machines, thus reducing load (memory/cpu/disk io) on your back end server. You get the side effect of making the configuration of the back end simpler.

Downside is the your using http between your proxy and applications, which in some ultra secure environments is not desirable.

Good Luck!