Why don't servers always run at max?
Latency will be one reason. The lag between "disk give me this data I need before I can do anything else" and the time the data gets back will leave the CPU idle for that time.
Resources probably do run at 100%, but for very brief periods. An operating system booting will follow the general pattern of "process or decide something, fetch something from disk, do something in memory, do something with a device", repeating many times per second. So when you see a disk at 25% in a 2 second period that probably means it was running at 100% for 0.5 seconds then idle the rest of the time.
As EEAA pointed out multicore systems make this a bit more complex. A single threaded piece of software on a CPU that can execute four threads can only hit 25% running at full speed. Even multithreaded software can rarely hit 100%, because data has to flow (usually) from hard drive, to RAM, to cache, to CPU. Keeping that pipeline full is difficult, and tends to happen mostly with predictable workloads like video encoding. In this case the operating system can observe read patterns and retrieve data before it's required, putting it into appropriate caches, such as the disk cache in RAM.
You're thinking about this in a very simplistic way, which is causing you to make some incorrect assumptions, which I'll try and clear up.
First, and potentially most simply, on a multicore system, in order to understand CPU usage you have to take into account whether or not the process load is multithreaded, and designed to take advantage of multiple cores. If this is not the case, depending on the mix of processes running, you may not ever see 100% usage. Ever.
Second, you need to consider IO device performance. How does your system know, for instance, how many IOps your devices are capable of? It doesn't. A more meaningful metric for you to watch is your
iowait value during boot (which may be difficult to obtain during the boot process) or the disk queues/latency during boot (which should be easier to obtain from your hypervisor). If you see queues or latency spike, it's likely that your IO devices are a contributing factor to your performance issues.
I have been working with server for about 20 years now, Its usually not a good thing when a component is running at 100% all the time.
For instance, lets say you have a SQL database that you do not want to swap to disk but instead want to run entirely out of memory.
If your database is 24GB and the OS need 8GB you wouldn't want to only allocate 32GB of RAM for the machine, there are a lot of "things" that can go wrong, bad code, DDOS, heavy application usage, who knows, not having any head room how would you know the server is in trouble?
We have about 2000 servers in our data center, we like to see them all running at about 75% CPU and RAM, hhd space is on our SAN so that is a completely different ball of wax. We also have alerts to tell us when they hit 85% CPU or RAM, and alarms to tell us when they hit 90%.