How to configure UPS to restart servers in the right sequence?

Solution 1:

The standard answer for this is "not at all". Fix the software to handle restarts in random order. If you really need SOME servers to start first (example: Active Directory) put them on USV's that are possibly surviving a LOT longer. A low power atom based server is good enough as Active Directory controller and will survive a day on a small USV.

Do high level UPS give other options to fix the restart sequence ?

No. I would say it is generally assumed programmers are competent enough to work around the issue properly.

What you COULD do is:

  • Have servers start "randomly". Except for DHCP / Active Directory there is nothing really demanding an order that can not be fixed.
  • Have a control server after some time (5 minutes) start the services on the various machines in the correct order.

I would say that this type of setup is a lot more common. I would call any software that REQUIRES server starts in a particular order (outside of pure infrastructure) as broken and unfit for business.

Just as note: our own setup is a low cost 20kva USV (low cost because we got one used) for the servers, with a slaved 2000VA USV for a machine serving as "root" of the network (and backup machine). Slaved means that the USV is behind the big one - so it only switches to battery when the large one (that lasts between half an hour and 8 hours depending on how much of our computing grid is online) is going into terminal shutdown.

Solution 2:

Managed Power Distributions Units (rather than the UPS) often do support customised delays in enabling individual outlets after power is resumed.

Typically that is to prevent circuit breakers from tripping when a cabinet full of systems powers up at the same time immediately after power is restored, but that can also be used to preserve the boot order of your system dependancies.


Solution 3:

I had this exact issue. The only difference being we invested in sturdy rack mounted APC power units (for example APC SmartUPS 3000 ). With the APC PowerChute network shutdown software (PowerChute Network Shutdown software) , I'm able to shut down and bring up servers in a specific order. Another handy feature of the software was setting the servers to shut down at the very last minute, i.e. calculating how much battery power the APC units had left and shutting down the servers with just enough time for them properly shut down instead of just powering off.

The software is...not user friendly but it's nothing difficult if you take some time to figure it out. If you're interested in investing more in your infrastructure, this is definitely the route to go.


Solution 4:

It sounds like the UPS units are low-cost and not capable of being configured for a specific output-on wait time after power is restored (some higher end units are). To get the same functionality, you need to pick a specific host to always power on right away (maybe whichever system is allowed to boot at any time) and leave all the other servers in a powered off state (configured in bios to return to power off when AC is applied, and to honor the Wake On Lan magic packet to power on when told to do so). Then, on the main host that does boot, run a script/utility to time the transmission of the WOL magic packet to each host.

Tags:

Ups