Network latency: 100Mbit vs. 1Gbit

Solution 1:

The only way latency would drop appreciably is if the current 100Mbit link is saturated. If it is not saturated, you will likely not notice any change.

Additionally, your assumption that the 1Gbit link will be able to support larger packets is incorrect. Max packet size is determined by the MTU of the various devices along the path that the packet takes - starting with the NIC on your server, all the way through to the MTU of your customer's computer. In internal LAN applications (when you have control over all the devices along the path), it is sometimes possible to increase the MTU, but in this situation, you are pretty much stuck with the default MTU of 1500. If you send packets larger than that, they will end up getting fragmented, thereby actually decreasing performance.

Solution 2:

YES gbit has a lower latency, since:

  • the same number of bytes can be transfered in faster time

BUT the improvement is only appreciable if the packet(s) have a certain size:

  • 56 byte package => virtually no faster transfer
  • 1000 byte package => 20% faster transfer
  • 20000 byte package(s) => 80% faster transfer

So if you have an application which is very sensitive to latency (4ms vs. 0.8ms, round-trip for 20kb) and require larger packages to be transferred, than switching from 100mbit to gbit can give you a latency reduction, even though you use much less than the 100mbit/s in average (= the link is not saturated permanently).

Server (100mbit) -> Switch (gbit) -> Server (100mbit):

size: 56 :: rtt min/avg/max/mdev = 0.124/0.176/0.627/0.052 ms
size: 100 :: rtt min/avg/max/mdev = 0.131/0.380/1.165/0.073 ms
size: 300 :: rtt min/avg/max/mdev = 0.311/0.463/2.387/0.115 ms
size: 800 :: rtt min/avg/max/mdev = 0.511/0.665/1.012/0.055 ms
size: 1000 :: rtt min/avg/max/mdev = 0.560/0.747/1.393/0.058 ms
size: 1200 :: rtt min/avg/max/mdev = 0.640/0.830/2.478/0.104 ms
size: 1492 :: rtt min/avg/max/mdev = 0.717/0.782/1.514/0.055 ms
size: 1800 :: rtt min/avg/max/mdev = 0.831/0.953/1.363/0.055 ms
size: 5000 :: rtt min/avg/max/mdev = 1.352/1.458/2.269/0.073 ms
size: 20000 :: rtt min/avg/max/mdev = 3.856/3.974/5.058/0.123 ms

Server (gbit) -> Switch (gbit) -> Server (gbit):

size: 56 :: rtt min/avg/max/mdev = 0.073/0.144/0.267/0.038 ms
size: 100 :: rtt min/avg/max/mdev = 0.129/0.501/0.630/0.074 ms
size: 300 :: rtt min/avg/max/mdev = 0.185/0.514/0.650/0.072 ms
size: 800 :: rtt min/avg/max/mdev = 0.201/0.583/0.792/0.079 ms
size: 1000 :: rtt min/avg/max/mdev = 0.204/0.609/0.748/0.078 ms
size: 1200 :: rtt min/avg/max/mdev = 0.220/0.621/0.746/0.080 ms
size: 1492 :: rtt min/avg/max/mdev = 0.256/0.343/0.487/0.043 ms
size: 1800 :: rtt min/avg/max/mdev = 0.311/0.672/0.815/0.079 ms
size: 5000 :: rtt min/avg/max/mdev = 0.347/0.556/0.803/0.048 ms
size: 20000 :: rtt min/avg/max/mdev = 0.620/0.813/1.222/0.122 ms

= in average over multiple servers 80% latency reduction for 20kb ping

(If only one of the links is gbit, you will still have a 5% latency reduction for 20kb ping.)


Solution 3:

I think you have a fundamental misconception about bandwidth latency and "speed". Speed is a function of bandwidth and latency. For instance consider a shipment of data on DVDs shipped across the country taking 3 days to arrive. Compare that to sending the data across the internet. The internet has a much lower latency connection, but to match the "speed" of the connection to the shipment you'd have to have recieve at 9.6MB a sec (reference example from this source).

In your case upgrading to higher bandwidth would allow you to serve more concurrent users but not improve the latency to any individual user.


Solution 4:

You're looking at the world through a pinhole. A valid test of latency differences at different speeds would be between two identical NICs connected with a cross-connect cable. Set the NICs mathching speeds of 10mb, 100mb and 1000mb. This will show that there is virtually no difference in latency at the different speeds. All packets travel at the same wire speed regardless of max bandwidth being used. Once you add switches with store and forward caching everything changes. Testing latency through a switch must be done with only two connections to the switch. Any other traffic may affect the latency of your test. Even then the switch may roll-over logs, adjust packet type counters, update internal clock, etc.. Everything may affect latency.

Yes, switching from 100mb to 1gb might be faster (lower latency) due to hardware changes, different NIC, different switch, different driver. I have seen larger changes in ping latency from driver differences than any other changes; bandwidth, switches, offloading NICs, etc..

The switch would be the next biggest change with cut-through significantly faster than store and forward for single transmit tests. However, a well designed store and forward switch may overtake the cut-through switch in overall performance under high load. In the early days of gigabit I'v seen 10mb high performance backplane switches with lower latency than cheap gigabit switches.

Ping tests are practically irrelevant for performance analysis when using the Internet. They are quick tests to get a ballpark idea of what's happening on the transport at the moment of the test. Production performance testing is much more complicated than just a ping. High performance switches are computers and under high load behave differently - change in latency.

Having a slower NIC, or a NIC set to a slower speed, could actually help a server with concurrent bursts by throttling the input to the server using the switches cache. A single re-transmit may negate any decrease in latency. Usually medium to high-load traffic levels are important, not single ping tests. e.g. Old slow Sun Ultrasparc (higher latency for a single ping) outperforms new cheap gigabit desktop used as dev server when under 70% 100mb bandwidth load. Desktop has faster gb NIC, faster connection gb-gb, faster memory, more memory, faster disk and faster processor but it doesn't perform as well as tuned server class hardware/software. This is not to say that a current tuned server running gb-gb isn't faster than old hardware, even able to handle larger throughput loads. There is just more complexity to the question of "higher performance" than you seem to be asking.

Find out if your provider is using different switches for the 100mb vs. 1gb connections. If they use the same switch backplane than I would only pay for the increase if the traffic levels exceeded the lower bandwidth. Otherwise you may find that in short time many other users will switch over to the gigabit and the few users left on the old switch now have higher performance - lower latency, during high loads on the switch (overall switch load, not just to your servers).

Apples and oranges example: Local ISP provided a new switch for bundled services, DSL and phone. Initially users saw an increase in performance. System was oversold. Now users that remain on the old switch have higher consistent performance. During late night, users on the new system are faster. In the evening under high load the old switch clients clearly outperform the new overloaded system.

Lower latency doesn't always correlate to faster delivery. You mention MySQl in the 20 requests to serve a single page. That traffic shouldn't be on the same NIC as the page requests. Moving all internal traffic to an internal network will reduce collisions and total packet counts on the outgoing NIC and provide larger gains than the .04ms latency gain of a single packet. Reduce the number of requests per page to reduce page load latency. Compress the pages, html, css, javascript, images to decrease page load times. These three changes will give larger overall gains ongoing than paying for bandwidth not being used to get a .04ms latency reduction. The ping needs to run 24hrs and be averaged to see the real latency change. Smart switches now do adaptive RTSP type throttling with small initial bandwidth increases and large transfers throttled. Depending on your page sizes (graphics, large html/css/javascript) you may see initial connection latencies/bandwidth much lower/higher than a large page or full page transfers. If part of your page is streaming you may see drastically different performance between the page and the stream.