Nagios server best practices?

Hostgroups and templates.

Templates let you define classes for your hosts and services, e.g. "normal service", "critical service", "low-priority host". They also serve as a useful way to divide responsibilities if you've got multiple teams with different responsibilities, so you can have a "linux host" template and a "windows host" template, with each one defining the appropriate contact info.

You can use multiple templates on a single resource, so you can compose appropriately-orthogonal templates. For example, you can have

host foo {
    use windows-host,normal-priority-host
    ...
}

which would pull in the contact info (and escalations) for the Windows team and the polling rates and thresholds for a "normal" host.

Hostgroups let you group together all of the checks for a subset of your hosts. Have things like "baseline-linux-hosts" that check load, disk space, sshability, and whatever other things should be on every host you monitor. Add groups like "https-servers" with checks for HTTP connectivity, HTTPS connectivity, and SSL certificate expiration dates; "fileservers" with checks for NFS and SMB accessibility and maybe more aggressive disk checks; or "virtual-machines" with checks for whether the VM accessibility tools are running properly.

Put each host and hostgroup in its own file. That file should contain the host or hostgroup definition first, followed by the definitions of the services that apply to it.

If you use the cfg_dir directive in your nagios.cfg file, Nagios will search recursively through that directory. Make use of that. For a setting of cfg_dir=/etc/nagios/conf.d, you can have a directory tree like the following:

  • /etc/nagios/conf.d/
    • commands.d/
      • http.cfg
      • nrpe.cfg
      • smtp.cfg
      • ssh.cfg
    • hosts.d/
      • host1.cfg
      • host2.cfg
      • host3.cfg
    • hostgroups.d/
      • hostgroup1.cfg
      • hostgroup2.cfg

I tend to make a directory for each resource type (commands, contactgroups, contacts, escalations, hostgroups, hosts, servicegroups, timeperiods) except for services, which get grouped in with the hosts or hostgroups that use them.

The precise structure can vary according to your organizational needs. At a past job, I used subdirectories under hosts.d for each different site. At my current job, most of the Nagios host definitions are managed by Puppet, so there's one directory for Puppet-managed hosts and a separate one for hand-managed hosts.

Note that the above also breaks out commands into multiple files, generally by protocol. Thus, the nrpe.cfg file would have the commands check_nrpe and check_nrpe_1arg, while http.cfg could have check_http, check_http_port, check_https, check_https_port, and check_https_cert.1

I don't typically have a tremendous number of templates, so I usually just have a hosts.d/templates.cfg file and a services.d/templates.cfg file. If you use them more heavily, they can go into appropriately-named files in a templates.d directory.

1 I like to also have a check_http_blindly command, which is basically check_http -H $HOSTADDRESS$ -I $HOSTADDRESS$ -e HTTP/1.; it returns OK even if it gets a 403 response code.


Make extensive use of service and hostgroups, and templating. Create hostgroups, and assign services to the hostgroups. Use servicegroups for dependencies, escalations, and logical grouping in the web UI.

If you have groups for everything, adding a new host is just 3 or 4 lines: name, address, template(s), and (optionally) hostgroups. Everything can be templated.

Be sure to read the docs on inheritance, and also the time-saving tricks page. Multiple inheritance can get tricky, but when used correctly it's a huge time-saver.