Nagios配置学习手记
|
2005.05.17
chnl@163.com
本文只对nagios的简单安装、配置作了描述。具体对command(plugins)的配置,对本地及远程host、service的监控,与SNMP管理的集成,将在后续文档中继续介绍。
测试主机的操作系统版本是Redhat 9.0。
本文主要参考Nagios的官方文档Nagios_2_0_Docs.pdf而成。
本文档的所有权归chenl(chnl@163.com)所有,使用GPL发布,可以自由拷贝,转载,转载时请保持文档的完整性,严禁用于任何商业用途。
之前一段时间,一直在一家专业做IT网管的公司做项目,所以对相关的产品都比较留意。刚好最近手头有一些Linux的Server要管理,因为规模不大,就上网google了一下相关的Open Source的东西,找到Nagios之后,自己配置使用之后,发现还不错。Nagios可以提供对主机、网络服务等强大的监控、处理能力。能够以多样的灵活的方式处理IT设施的一些问题。以plugins的方式提供用户个性化需求的接口。
以下是简单的配置过程,希望能对感兴趣的人有所帮助。
Nagios的官方文档给出的安装步骤,都是从源码开始的。具体的安装过程,pdf文档中非常德相近。因为想偷懒,我这里找了Nagios的rpm二进制包来安装。
安装的文件列表:
gd-2.0.33.tar.gz
nagios-2.2-1.rh9.rf.i386.rpm
nagios-devel-2.2-1.rh9.rf.i386.rpm
nagios-plugins-1.4.2.tar.gz
Nagios的运行,是一定需要gd来支持的,因此要先安装gd-2.0.33.tar.gz。
具体的安装过程,请参阅解压后软件包中附带的INSTALL文档,如果没有特殊要求,只要进入解压缩后的软件目录中执行一下命令即可:
# ./configurer || ./make || ./make install
Nagios的安装,此处采用rpm方式。只要安装以下两个软件包就可以:
nagios-2.2-1.rh9.rf.i386.rpm
nagios-devel-2.2-1.rh9.rf.i386.rpm (非必须)
常规的安装大致为执行
# rpm -ivh nagios*.rpm
可以在Nagios的官方网站上下载到最新的nagios-plugins的程序包,安装也非常简单,与gd的安装类似。记得要做的是,在安装完成后,你需要把libexec中生成的文件,全部copy到/usr/lib/nagios/plugins(rpm安装后,nagios缺省的plugins所在目录)下。
Nagios安装后缺省配置文件所在的目录为/etc/nagios。需要修改的文件列表如下。
在这个文件中,为了便于调试,我们选择使用minimal.cfg,其他的都注释掉,如下所示:
cfg_file=/etc/nagios/minimal.cfg
#cfg_file=/etc/nagios/contactgroups.cfg
#cfg_file=/etc/nagios/contacts.cfg
#cfg_file=/etc/nagios/dependencies.cfg
#cfg_file=/etc/nagios/escalations.cfg
#cfg_file=/etc/nagios/hostgroups.cfg
#cfg_file=/etc/nagios/hosts.cfg
#cfg_file=/etc/nagios/services.cfg
#cfg_file=/etc/nagios/timeperiods.cfg
另外,对checkcommands.cfg 及 misccommands.cfg的引用,也要注释掉。
详细配置请见附录。
这是主要的配置文件,文件中定义了相应的command,host,service等。大多采用了模板的形式,只要参照原有的配置,很容易做一些个性化配置的添加。
详细配置请见附录。
这是nagios web页面及cgi的认证文件,需要有htpasswd来生成。
可以采用以下的命令:
htpasswd -c /etc/nagios/htpasswd.users nagiosadmin
输入nagiosadmin的密码就可以了。
首先要修改/etc/httpd/conf/httpd.conf 这个文件,确认有以下的配置在:
Include /etc/httpd/conf.d/nagios.conf
其次,修改 /etc/httpd/conf.d/nagios.conf ,其中各个参数项,可以根据自己的情况修改。确认以下的配置:
ScriptAlias /nagios/cgi-bin "/usr/lib/nagios/cgi"
<Directory "/usr/lib/nagios/cgi">
# SSLRequireSSL
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
# Order deny,allow
# Deny from all
# Allow from 127.0.0.1
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /etc/nagios/htpasswd.users
Require valid-user
</Directory>
Alias /nagios "/usr/share/nagios"
<Directory "/usr/share/nagios">
# SSLRequireSSL
Options None
AllowOverride None
Order allow,deny
Allow from all
# Order deny,allow
# Deny from all
# Allow from 127.0.0.1
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /etc/nagios/htpasswd.users
Require valid-user
</Directory>
全部配置完成之后,就可以启动nagios,可以使用以下的命令启动:
/etc/init.d/nagios start
并使用以下的命令来追踪日志文件:
tail -f /var/log/nagios /nagios.log
一般出现的错误,大多都和错误的配置有关,查一下官方文档,一般都可以解决。
Nagios需要httpd支持,因此需要启动httpd。
/etc/init.d/httpd start
打开浏览器之后,输入主机的ip地加上nagios的路径即可,如果你的ip是192.168.200.172,那么输入http://192.168.200.172/nagios就可以了。
如下图所示:
只要输入正确的用户名/密码,即可登陆。
以下是一些页面的截图:
配置文档列表:
##############################################################################
#
# NAGIOS.CFG - Sample Main Config File for Nagios
#
# Read the documentation for more information on this configuration
# file. I've provided some comments here, but things may not be so
# clear without further explanation.
#
# Last Modified: 11-23-2005
#
##############################################################################
# LOG FILE
# This is the main log file where service and host events are logged
# for historical purposes. This should be the first option specified
# in the config file!!!
log_file=/var/log/nagios/nagios.log
# OBJECT CONFIGURATION FILE(S)
# This is the configuration file in which you define hosts, host
# groups, contacts, contact groups, services, etc. I guess it would
# be better called an object definition file, but for historical
# reasons it isn't. You can split object definitions into several
# different config files by using multiple cfg_file statements here.
# Nagios will read and process all the config files you define.
# This can be very useful if you want to keep command definitions
# separate from host and contact definitions...
# Plugin commands (service and host check commands)
# Arguments are likely to change between different releases of the
# plugins, so you should use the same config file provided with the
# plugin release rather than the one provided with Nagios.
#cfg_file=/etc/nagios/checkcommands.cfg
# Misc commands (notification and event handler commands, etc)
#cfg_file=/etc/nagios/misccommands.cfg
# You can split other types of object definitions across several
# config files if you wish (as done here), or keep them all in a
# single config file.
cfg_file=/etc/nagios/minimal.cfg
#cfg_file=/etc/nagios/contactgroups.cfg
#cfg_file=/etc/nagios/contacts.cfg
#cfg_file=/etc/nagios/dependencies.cfg
#cfg_file=/etc/nagios/escalations.cfg
#cfg_file=/etc/nagios/hostgroups.cfg
#cfg_file=/etc/nagios/hosts.cfg
#cfg_file=/etc/nagios/services.cfg
#cfg_file=/etc/nagios/timeperiods.cfg
# Extended host/service info definitions are now stored along with
# other object definitions:
#cfg_file=/etc/nagios/hostextinfo.cfg
#cfg_file=/etc/nagios/serviceextinfo.cfg
# You can also tell Nagios to process all config files (with a .cfg
# extension) in a particular directory by using the cfg_dir
# directive as shown below:
#cfg_dir=/etc/nagios/servers
#cfg_dir=/etc/nagios/printers
#cfg_dir=/etc/nagios/switches
#cfg_dir=/etc/nagios/routers
# OBJECT CACHE FILE
# This option determines where object definitions are cached when
# Nagios starts/restarts. The CGIs read object definitions from
# this cache file (rather than looking at the object config files
# directly) in order to prevent inconsistencies that can occur
# when the config files are modified after Nagios starts.
object_cache_file=/var/log/nagios/objects.cache
# RESOURCE FILE
# This is an optional resource file that contains $USERx$ macro
# definitions. Multiple resource files can be specified by using
# multiple resource_file definitions. The CGIs will not attempt to
# read the contents of resource files, so information that is
# considered to be sensitive (usernames, passwords, etc) can be
# defined as macros in this file and restrictive permissions (600)
# can be placed on this file.
resource_file=/etc/nagios/resource.cfg
# STATUS FILE
# This is where the current status of all monitored services and
# hosts is stored. Its contents are read and processed by the CGIs.
# The contents of the status file are deleted every time Nagios
# restarts.
status_file=/var/log/nagios/status.dat
# NAGIOS USER
# This determines the effective user that Nagios should run as.
# You can either supply a username or a UID.
nagios_user=nagios
# NAGIOS GROUP
# This determines the effective group that Nagios should run as.
# You can either supply a group name or a GID.
nagios_group=nagios
# EXTERNAL COMMAND OPTION
# This option allows you to specify whether or not Nagios should check
# for external commands (in the command file defined below). By default
# Nagios will *not* check for external commands, just to be on the
# cautious side. If you want to be able to use the CGI command interface
# you will have to enable this. Setting this value to 0 disables command
# checking (the default), other values enable it.
check_external_commands=0
# EXTERNAL COMMAND CHECK INTERVAL
# This is the interval at which Nagios should check for external commands.
# This value works of the interval_length you specify later. If you leave
# that at its default value of 60 (seconds), a value of 1 here will cause
# Nagios to check for external commands every minute. If you specify a
# number followed by an "s" (i.e. 15s), this will be interpreted to mean
# actual seconds rather than a multiple of the interval_length variable.
# Note: In addition to reading the external command file at regularly
# scheduled intervals, Nagios will also check for external commands after
# event handlers are executed.
# NOTE: Setting this value to -1 causes Nagios to check the external
# command file as often as possible.
#command_check_interval=1
#command_check_interval=15s
command_check_interval=-1
# EXTERNAL COMMAND FILE
# This is the file that Nagios checks for external command requests.
# It is also where the command CGI will write commands that are submitted
# by users, so it must be writeable by the user that the web server
# is running as (usually 'nobody'). Permissions should be set at the
# directory level instead of on the file, as the file is deleted every
# time its contents are processed.
command_file=/var/log/nagios/rw/nagios.cmd
# COMMENT FILE
# This is the file that Nagios will use for storing host and service
# comments.
comment_file=/var/log/nagios/comments.dat
# DOWNTIME FILE
# This is the file that Nagios will use for storing host and service
# downtime data.
downtime_file=/var/log/nagios/downtime.dat
# LOCK FILE
# This is the lockfile that Nagios will use to store its PID number
# in when it is running in daemon mode.
lock_file=/var/run/nagios.pid
# TEMP FILE
# This is a temporary file that is used as scratch space when Nagios
# updates the status log, cleans the comment file, etc. This file
# is created, used, and deleted throughout the time that Nagios is
# running.
temp_file=/var/log/nagios/nagios.tmp
# EVENT BROKER OPTIONS
# Controls what (if any) data gets sent to the event broker.
# Values: 0 = Broker nothing
# -1 = Broker everything
# <other> = See documentation
event_broker_options=-1
# EVENT BROKER MODULE(S)
# This directive is used to specify an event broker module that should
# by loaded by Nagios at startup. Use multiple directives if you want
# to load more than one module. Arguments that should be passed to
# the module at startup are seperated from the module path by a space.
#
# Example:
#
# broker_module=<modulepath> [moduleargs]
#broker_module=/somewhere/module1.o
#broker_module=/somewhere/module2.o arg1 arg2=3 debug=0
# LOG ROTATION METHOD
# This is the log rotation method that Nagios should use to rotate
# the main log file. Values are as follows..
# n = None - don't rotate the log
# h = Hourly rotation (top of the hour)
# d = Daily rotation (midnight every day)
# w = Weekly rotation (midnight on Saturday evening)
# m = Monthly rotation (midnight last day of month)
log_rotation_method=d
# LOG ARCHIVE PATH
# This is the directory where archived (rotated) log files should be
# placed (assuming you've chosen to do log rotation).
log_archive_path=/var/log/nagios/archives
# LOGGING OPTIONS
# If you want messages logged to the syslog facility, as well as the
# NetAlarm log file set this option to 1. If not, set it to 0.
use_syslog=1
# NOTIFICATION LOGGING OPTION
# If you don't want notifications to be logged, set this value to 0.
# If notifications should be logged, set the value to 1.
log_notifications=1
# SERVICE RETRY LOGGING OPTION
# If you don't want service check retries to be logged, set this value
# to 0. If retries should be logged, set the value to 1.
log_service_retries=1
# HOST RETRY LOGGING OPTION
# If you don't want host check retries to be logged, set this value to
# 0. If retries should be logged, set the value to 1.
log_host_retries=1
# EVENT HANDLER LOGGING OPTION
# If you don't want host and service event handlers to be logged, set
# this value to 0. If event handlers should be logged, set the value
# to 1.
log_event_handlers=1
# INITIAL STATES LOGGING OPTION
# If you want Nagios to log all initial host and service states to
# the main log file (the first time the service or host is checked)
# you can enable this option by setting this value to 1. If you
# are not using an external application that does long term state
# statistics reporting, you do not need to enable this option. In
# this case, set the value to 0.
log_initial_states=0
# EXTERNAL COMMANDS LOGGING OPTION
# If you don't want Nagios to log external commands, set this value
# to 0. If external commands should be logged, set this value to 1.
# Note: This option does not include logging of passive service
# checks - see the option below for controlling whether or not
# passive checks are logged.
log_external_commands=1
# PASSIVE CHECKS LOGGING OPTION
# If you don't want Nagios to log passive host and service checks, set
# this value to 0. If passive checks should be logged, set
# this value to 1.
log_passive_checks=1
# GLOBAL HOST AND SERVICE EVENT HANDLERS
# These options allow you to specify a host and service event handler
# command that is to be run for every host or service state change.
# The global event handler is executed immediately prior to the event
# handler that you have optionally specified in each host or
# service definition. The command argument is the short name of a
# command definition that you define in your host configuration file.
# Read the HTML docs for more information.
#global_host_event_handler=somecommand
#global_service_event_handler=somecommand
# SERVICE INTER-CHECK DELAY METHOD
# This is the method that Nagios should use when initially
# "spreading out" service checks when it starts monitoring. The
# default is to use smart delay calculation, which will try to
# space all service checks out evenly to minimize CPU load.
# Using the dumb setting will cause all checks to be scheduled
# at the same time (with no delay between them)! This is not a
# good thing for production, but is useful when testing the
# parallelization functionality.
# n = None - don't use any delay between checks
# d = Use a "dumb" delay of 1 second between checks
# s = Use "smart" inter-check delay calculation
# x.xx = Use an inter-check delay of x.xx seconds
service_inter_check_delay_method=s
# MAXIMUM SERVICE CHECK SPREAD
# This variable determines the timeframe (in minutes) from the
# program start time that an initial check of all services should
# be completed. Default is 30 minutes.
max_service_check_spread=30
# SERVICE CHECK INTERLEAVE FACTOR
# This variable determines how service checks are interleaved.
# Interleaving the service checks allows for a more even
# distribution of service checks and reduced load on remote
# hosts. Setting this value to 1 is equivalent to how versions
# of Nagios previous to 0.0.5 did service checks. Set this
# value to s (smart) for automatic calculation of the interleave
# factor unless you have a specific reason to change it.
# s = Use "smart" interleave factor calculation
# x = Use an interleave factor of x, where x is a
# number greater than or equal to 1.
service_interleave_factor=s
# HOST INTER-CHECK DELAY METHOD
# This is the method that Nagios should use when initially
# "spreading out" host checks when it starts monitoring. The
# default is to use smart delay calculation, which will try to
# space all host checks out evenly to minimize CPU load.
# Using the dumb setting will cause all checks to be scheduled
# at the same time (with no delay between them)!
# n = None - don't use any delay between checks
# d = Use a "dumb" delay of 1 second between checks
# s = Use "smart" inter-check delay calculation
# x.xx = Use an inter-check delay of x.xx seconds
host_inter_check_delay_method=s
# MAXIMUM HOST CHECK SPREAD
# This variable determines the timeframe (in minutes) from the
# program start time that an initial check of all hosts should
# be completed. Default is 30 minutes.
max_host_check_spread=30
# MAXIMUM CONCURRENT SERVICE CHECKS
# This option allows you to specify the maximum number of
# service checks that can be run in parallel at any given time.
# Specifying a value of 1 for this variable essentially prevents
# any service checks from being parallelized. A value of 0
# will not restrict the number of concurrent checks that are
# being executed.
max_concurrent_checks=0
# SERVICE CHECK REAPER FREQUENCY
# This is the frequency (in seconds!) that Nagios will process
# the results of services that have been checked.
service_reaper_frequency=10
# AUTO-RESCHEDULING OPTION
# This option determines whether or not Nagios will attempt to
# automatically reschedule active host and service checks to
# "smooth" them out over time. This can help balance the load on
# the monitoring server.
# WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE
# PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY
auto_reschedule_checks=0
# AUTO-RESCHEDULING INTERVAL
# This option determines how often (in seconds) Nagios will
# attempt to automatically reschedule checks. This option only
# has an effect if the auto_reschedule_checks option is enabled.
# Default is 30 seconds.
# WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE
# PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY
auto_rescheduling_interval=30
# AUTO-RESCHEDULING WINDOW
# This option determines the "window" of time (in seconds) that
# Nagios will look at when automatically rescheduling checks.
# Only host and service checks that occur in the next X seconds
# (determined by this variable) will be rescheduled. This option
# only has an effect if the auto_reschedule_checks option is
# enabled. Default is 180 seconds (3 minutes).
# WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE
# PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY
auto_rescheduling_window=180
# SLEEP TIME
# This is the number of seconds to sleep between checking for system
# events and service checks that need to be run.
sleep_time=0.25
# TIMEOUT VALUES
# These options control how much time Nagios will allow various
# types of commands to execute before killing them off. Options
# are available for controlling maximum time allotted for
# service checks, host checks, event handlers, notifications, the
# ocsp command, and performance data commands. All values are in
# seconds.
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
# RETAIN STATE INFORMATION
# This setting determines whether or not Nagios will save state
# information for services and hosts before it shuts down. Upon
# startup Nagios will reload all saved service and host state
# information before starting to monitor. This is useful for
# maintaining long-term data on state statistics, etc, but will
# slow Nagios down a bit when it (re)starts. Since its only
# a one-time penalty, I think its well worth the additional
# startup delay.
retain_state_information=1
# STATE RETENTION FILE
# This is the file that Nagios should use to store host and
# service state information before it shuts down. The state
# information in this file is also read immediately prior to
# starting to monitor the network when Nagios is restarted.
# This file is used only if the preserve_state_information
# variable is set to 1.
state_retention_file=/var/log/nagios/retention.dat
# RETENTION DATA UPDATE INTERVAL
# This setting determines how often (in minutes) that Nagios
# will automatically save retention data during normal operation.
# If you set this value to 0, Nagios will not save retention
# data at regular interval, but it will still save retention
# data before shutting down or restarting. If you have disabled
# state retention, this option has no effect.
retention_update_interval=60
# USE RETAINED PROGRAM STATE
# This setting determines whether or not Nagios will set
# program status variables based on the values saved in the
# retention file. If you want to use retained program status
# information, set this value to 1. If not, set this value
# to 0.
use_retained_program_state=1
# USE RETAINED SCHEDULING INFO
# This setting determines whether or not Nagios will retain
# the scheduling info (next check time) for hosts and services
# based on the values saved in the retention file. If you
# If you want to use retained scheduling info, set this
# value to 1. If not, set this value to 0.
use_retained_scheduling_info=0
# INTERVAL LENGTH
# This is the seconds per unit interval as used in the
# host/contact/service configuration files. Setting this to 60 means
# that each interval is one minute long (60 seconds). Other settings
# have not been tested much, so your mileage is likely to vary...
interval_length=60
# AGGRESSIVE HOST CHECKING OPTION
# If you don't want to turn on aggressive host checking features, set
# this value to 0 (the default). Otherwise set this value to 1 to
# enable the aggressive check option. Read the docs for more info
# on what aggressive host check is or check out the source code in
# base/checks.c
use_aggressive_host_checking=0
# SERVICE CHECK EXECUTION OPTION
# This determines whether or not Nagios will actively execute
# service checks when it initially starts. If this option is
# disabled, checks are not actively made, but Nagios can still
# receive and process passive check results that come in. Unless
# you're implementing redundant hosts or have a special need for
# disabling the execution of service checks, leave this enabled!
# Values: 1 = enable checks, 0 = disable checks
execute_service_checks=1
# PASSIVE SERVICE CHECK ACCEPTANCE OPTION
# This determines whether or not Nagios will accept passive
# service checks results when it initially (re)starts.
# Values: 1 = accept passive checks, 0 = reject passive checks
accept_passive_service_checks=1
# HOST CHECK EXECUTION OPTION
# This determines whether or not Nagios will actively execute
# host checks when it initially starts. If this option is
# disabled, checks are not actively made, but Nagios can still
# receive and process passive check results that come in. Unless
# you're implementing redundant hosts or have a special need for
# disabling the execution of host checks, leave this enabled!
# Values: 1 = enable checks, 0 = disable checks
execute_host_checks=1
# PASSIVE HOST CHECK ACCEPTANCE OPTION
# This determines whether or not Nagios will accept passive
# host checks results when it initially (re)starts.
# Values: 1 = accept passive checks, 0 = reject passive checks
accept_passive_host_checks=1
# NOTIFICATIONS OPTION
# This determines whether or not Nagios will sent out any host or
# service notifications when it is initially (re)started.
# Values: 1 = enable notifications, 0 = disable notifications
enable_notifications=1
# EVENT HANDLER USE OPTION
# This determines whether or not Nagios will run any host or
# service event handlers when it is initially (re)started. Unless
# you're implementing redundant hosts, leave this option enabled.
# Values: 1 = enable event handlers, 0 = disable event handlers
enable_event_handlers=1
# PROCESS PERFORMANCE DATA OPTION
# This determines whether or not Nagios will process performance
# data returned from service and host checks. If this option is
# enabled, host performance data will be processed using the
# host_perfdata_command (defined below) and service performance
# data will be processed using the service_perfdata_command (also
# defined below). Read the HTML docs for more information on
# performance data.
# Values: 1 = process performance data, 0 = do not process performance data
process_performance_data=0
# HOST AND SERVICE PERFORMANCE DATA PROCESSING COMMANDS
# These commands are run after every host and service check is
# performed. These commands are executed only if the
# enable_performance_data option (above) is set to 1. The command
# argument is the short name of a command definition that you
# define in your host configuration file. Read the HTML docs for
# more information on performance data.
#host_perfdata_command=process-host-perfdata
#service_perfdata_command=process-service-perfdata
# HOST AND SERVICE PERFORMANCE DATA FILES
# These files are used to store host and service performance data.
# Performance data is only written to these files if the
# enable_performance_data option (above) is set to 1.
<