Monitor DELL hardware on VMware ESXi 5.5 server

Solution 1:

Yes, you can monitor the standalone ESXi Host using any SNMP monitoring software but some items may only be visible using a monitoring tool that supports the CIM protocol.

All of my ESXi Hosts are part of vCenter but we monitor them directly (using the vmkernal Host IP address) with SolarWinds NPM. There are 5 or 6 CIM modules built into ESXi 5.5 that give you hardware health but RAID card health is not one of them. You will need to add the Dell OMSA VIB that adds the additional CIM agents including the one for the RAID array. Brian Atkinson's post is still the best I have found that describes the process,

https://communities.vmware.com/people/vmroyale/blog/2012/07/26/how-to-use-dell-dset-with-esxi

You only need to follow the instructions for installing the OMSA ESXi VIB if you are going to use a third party monitoring tool that gives historical information and does alerting. If you wish to use the Dell OMSA Server you can install it remotely on bare bones server, remotely in a VM or locally as a VM.

You can use the OMSA server to connect to DRAC and iDRAC Out of Band (OOB/ IPMI/ iLo) management cards or to the ESXi Host after you install the OMSA VIB on the ESXi Host. You will not see the RAID Health information in the DRAC or iDRAC though - only when connecting the OMSA Server to an ESXi Host - I repeat the Server keyword so there is no confusion between the Server which is acting as a client to the OMSA VIB that is installed on the ESXi Host.

Some useful resources:

Show the current CIM providers on an ESXi Host https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2053715

Show the currently installed VIBs on the ESXi Host from the Host's CLI, esxcli software vib list

You do see some minor additional hardware health details when you connect to a vCenter server versus the ESXi Host directly but generally if you do not see the hardware health you are looking for in the Configuration/ Health Status panel then you are missing a CIM provider and you need to locate and install the VIB on the ESXi Host. When you add the Dell OMSA VIB to the ESXi Host you will see a Storage sensor added to the Health Status page which shows the RAID volumes, drives, controller and battery health for your storage controller. You may need to reset the sensors for it to show up and sometimes it takes 15 to 20 minutes the first time after the VIB install and reboot of the ESXi Host.

If you do not see a sensor on the ESXi Host's Health Status page when you connect with the vSphere Client then you are most likely not going to see it when you are remotely polling the sensors with monitoring software.

Also you should note that not all servers have the same sensors and you may not be able to get the same health status from all depending on the Server hardware, RAID card and version of the CIM available for the combination. You may also need to upgrade or change the VIBs for the RAID card in order for the health status to work. The CIM provider (which is the OMSA VIB in this case) talks to the hardware through the device VIB (the real device driver) and passes this information to the CIM Broker on the ESXi Host - also known as the Small Footprint CIM Broker Daemon (sfcbd). When you poll the ESXi Host for hardware health using robust monitoring software it will get some information using SNMP queries, some using CIM and some using the ESXi API (which are SOAP requests). The CIM client talks to the sfcbd process on the ESXi Host.

Sometimes the CIM process just stops working. When that happens you will be restarting the sfcbd-watchdog process on the ESXi Host. This will restart the sfcbd service and CIM polling will work again. From the CLI of the Host, /etc/init.d/sfcbd-watchdog restart

I think that covers most of the items you need to get you running.

Solution 2:

I understand what you're looking for; specific notes on how to manage and monitor the health of a standalone VMware ESXi host.

In practice, the approach should be slightly different. I'll explain how I manage hosts.

In a situation where you have multiple ESXi hosts under vCenter management, the assumption is that you leverage the vCenter for monitoring and health status, versus querying the individual hosts. There's a catch-all alarm that I configure in vCenter to alert on "Host Hardware Health". I typically don't care if it's a power supply, RAM, disk or any other component, but rather that the host is unhealthy.

Monitoring a standalone ESXi host isn't going to be very helpful, as the point of the Dell/HP drivers is to expose information to vCenter. And I don't believe it's the best practice to query individual hosts in this manner. Granted, that's because you ideally want your VM hosts centrally managed.

If you run vCenter with a single host, you DO get this ability, so maybe that's an option for your environment.

If you need some sort of out-of-band monitoring, couldn't you query the DRAC instead?