Thursday, 31 March 2011

Check Path...............

ECHO $PATH  /usr/bin/python

SSH setup - between 2 linux boxes without passwords

On Nagios Server

I created /home/nagios/.ssh/

root@Nagi:/home/nagios# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/root/.ssh/id_dsa): /home/nagios/.ssh/id-dsa
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/nagios/.ssh/id-dsa.
Your public key has been saved in /home/nagios/.ssh/id-dsa.pub.
The key fingerprint is:
d2:f1:3b:2e:8a:8c:3a:db:33:81:ec:57:7e:0a:88:37 root@Nagi
The key's randomart image is:
+--[ DSA 1024]----+
|                 |
|                 |
|        .        |
|       . o       |
|..    . S .      |
|o.o  . .   .     |
|o.Eoo     o      |
|.+o=.o ... .     |
|o++oo.+. ..      |
+-----------------+
root@Nagi:/home/nagios#

Nagios Error: Check command not defined anywhere!


Error: Service check command 'check_disk_remote' specified in service 'check_disk_remote' for host 'Nagios-CPT' not defined anywhere!








Location check:

** Plugin located in /usr/local/nagios/libexec/
** Commands.cfg --> is the service defined?
** services.cfg --> is the service defined to be used with a host?

Nagios - Examples check_disk_remote

superman@Nagi:/usr/local/nagios/libexec$ ./check_disk_remote -e ssh -H 10.0.0.110 -w 90 -c 95 -v
superman@10.0.0.110's password:
superman@10.0.0.110's password:
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/sda1             73986264   1480204  68747724       3% /
percent = 3% warn=90 crit=95
none                    504728       236    504492       1% /dev
none                    508936         0    508936       0% /dev/shm
none                    508936       280    508656       1% /var/run
none                    508936         0    508936       0% /var/lock
none                    508936         0    508936       0% /lib/init/rw
none                  73986264   1480204  68747724       3% /var/lib/ureadahead/debugfs
OK: All Filesystems are below threshold (90/95%) | /=3%;;;0;100

Nagios - Smartmon monitoring

CHECK_SMARTMON

Define services

/usr/local/etc/nagios/objects/services.cfg

# SMART ad0
define service {
        use                             generic-service
        host_name                       host1,host2,host3
        service_description             nrpe_check_smart_ad0
        check_command                   check_nrpe2!check_smart_ad0
}

# SMART ad1
define service {
        use                             generic-service
        host_name                       host2
        service_description             nrpe_check_smart_ad1
        check_command                   check_nrpe2!check_smart_ad1
}

Edit Suoders
I add the following to /usr/local/etc/sudoers on the servers being monitored:
nagios          ALL=(ALL) NOPASSWD: /usr/local/libexec/nagios/check_smartmon -d /dev/ad*
nagios          ALL=(ALL) NOPASSWD: /usr/local/libexec/nagios/check_smartmon -d /dev/da*
Add to commands.cfg
command[check_smart_ad0]=/usr/local/bin/sudo /usr/local/libexec/nagios/check_smartmon -d /dev/ad0
command[check_smart_ad1]=/usr/local/bin/sudo /usr/local/libexec/nagios/check_smartmon -d /dev/ad1
 
ERROR:
superman@Nagi:/usr/local/nagios/libexec$ ./check_smartmon ?
-bash: ./check_smartmon: /usr/local/bin/python: bad interpreter: No such file or directory
If your smartmon don't work --> change the python path in your check_smartmon file - and remove the top
line which points to /usr/local/bin/python and replace with /usr/bin/python 
= Installation =
Adjust the first line to your Python binary (e.g. /usr/local/bin/python or
/usr/bin/python) and the path to your smartctl binary (e.g.
/usr/local/sbin/smartctl or /usr/sbin/smartctl).
 
Install Smartmon
sudo apt-get install smartmontools 
 
Check_smartmon Examples: 

Tuesday, 29 March 2011

Nagios Error: Host has no default contacts or contactgroups defined

Warning: Host 'BLALBA' has no default contacts or contactgroups defined!

host definition:


define host{
        use             generic-host
        host_name       Scopserve
        alias           Scopserve
        address         172.18.0.30
        }

error - generic-host don't have a contact or contactgroup defined.

Setup correct template/hostgroup definition and "use" the correct definition ie:



template.cfg

define host{
        name                    PT-PBX  ; The name of this host template
        use                     generic-host    ; Inherit default values from the generic-host template
        check_period            24x7            ; By default, switches are monitored round the clock
        check_interval          5               ; Switches are checked every 5 minutes
        retry_interval          1               ; Schedule host check retries at 1 minute intervals
        max_check_attempts      10              ; Check each switch 10 times (max)
        check_command           check-host-alive        ; Default command to check if routers are "alive"
        notification_period     24x7            ; Send notifications at any time
        notification_interval   30              ; Resend notifications every 30 minutes
        notification_options    d,r             ; Only send notifications for specific host states
        contact_groups          admins          ; Notifications get sent to the admins by default
        register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
        }

New host definition using "CPT-PBX"

define host{
        use             CPT-PBX
        host_name       Scopserve
        alias           Scopserve
        address         172.18.0.30
        }

Nagios error: "Invalid_Max_Check_Attempts"

Error: Invalid max_check_attempts value for host 'CBD-DC'
Error: Could not register host (config file '/usr/local/nagios/etc/objects/windows.cfg', starting on line 2)
   Error processing object config files!



Check:
*  Does your generic-host template specify a valid max_check_attempts
value? If not, you'll need to add it there or to the host definition
itself.
* Host defined in windows.cfg
* Check hostgroups.cfg defined in /usr/local/nagios/etc/nagios.cfg
* Check hostgroup defined in /usr/local/nagios/etc/objects/templates.cfg

Friday, 25 March 2011

Nagios - Monitoring Eventlogs on Windows Servers (My Comprehensive Guide)

Monitor DNS events on Windows Servers

  • Copy eventlog_agent files to c:\
  • Create folder on c:\ called "programme" 
  • Create subfolder "eventlog_agent"
  • Copy the eventlog_agent files (.exe, .bat, .reg) to c:\programme\eventlog_agent\
  • Run eventlog_agent.exe (if doing it manually)
http://naplax.sourceforge.net/install_check_win_eventlog.txt
 Create /usr/local/nagios/etc/objects/eventlogs.cfg

  • add "eventlogs.cfg" to nagios.cfg
  • Add hosts to eventlogs.cfg
Contents - eventlogs.cfg

define service{
       service_description    System Eventlog
       use                             generic-service
       check_command         check_win_eventlog!a!System!.*:+1
       max_check_attempts     1
       host_name                    Recruit
       contact_groups             admins
       is_volatile                     1
}



define service{
       service_description       DNS Eventlog
       use                              generic-service
       check_command          check_win_eventlog!a!DNS!.*:+1
       max_check_attempts     1
       host_name                    Recruit
       contact_groups             admins
       is_volatile                      1
}


define service{
       service_description      Directory Service Eventlog
       use                               generic-service
       check_command          check_win_eventlog!a!Directory Service!.*:+1
       max_check_attempts     1
       host_name                    Recruit
       contact_groups             admins
       is_volatile                     1

define service{
       service_description    File Replication Service Eventlog
       use                            generic-service
       check_command          check_win_eventlog!a!File Replication Service!.*:+1
       max_check_attempts     1
       host_name                   Recruit
       contact_groups            admins
       is_volatile                     1
}

The bits in red needs to be filled in correctly.

Errors

If eventlog.exe not running you'll get this error message


Current Status:
CRITICAL  
 (for 0d 0h 1m 57s)
Status Information:An Error occured before state could be read: Connection refused at /usr/local/nagios/libexec/check_win_eventlog.pl line 145.


If errors continue - restart the .exe running on the host

To automate & install the .exe as a service

You will need 'instsrv.exe' and 'srvany.exe' from Microsoft Resource Kit.
Just copy those files together with 'eventlog_agent.exe', 'eventlog_agent.bat' and
'eventlog_agent.reg' into the folder 'c:\programme\eventlog_agent' and run the
batch file. If you want to use a different folder, then you will need to modify
the path in 'eventlog_agent.bat' and 'eventlog_agent.reg'
Autostart

You may put the exe into your Systems Autostart Folder. But this requires that there is
someone logged in.

Uninstall the eventlog_agent
If you used installation method a) or c), then can just delete the Files.
If you used installation method b), then you go into the installation directory
and call "eventlog_agent.bat stop" on the console.

Thursday, 24 March 2011

Using Nagios to monitor Zimbra Servers

Monitoring Zimbra Mail queue's with Nagios

edit
vi /usr/local/nagios/libexec/utils.pm

remove
$PATH_TO_MAILQ   = "/usr/bin/mailq";

Add
$PATH_TO_MAILQ  ="/opt/zimbra/postfix/sbin/mailq";

Test
/usr/local/nagios/libexec# /usr/local/nagios/libexec/check_mailq xxx.xxx.xxx.xxx 1 -w 100 -c 150

Error
root@Nagi:/usr/local/nagios/libexec# /usr/local/nagios/libexec/check_mailq 10.0.0.251 -w 100 -c 150
ERROR: /opt/zimbra/postfix/sbin/mailq is not executable by (uid 0:gid(0 0))


Fix Error


edit
vi /etc/sudoers
nagios ALL=(zimbra) NOPASSWD: /usr/local/nagios/libexec/check_clamav.pl
nagios ALL=(zimbra) NOPASSWD: /usr/local/nagios/libexec/check_mail




NRPE Checks
NRPE

command[check_zimbra_route_lookup_handler]=/usr/lib/nagios/plugins/check_http -H localhost -p 7072
command[check_zimbra_spell_checker]=/usr/lib/nagios/plugins/check_http -H localhost -p 7780
command[check_zimbra_pop3_real]=/usr/lib/nagios/plugins/check_pop -H localhost -p 7110
command[check_zimbra_pop3s_real]=/usr/lib/nagios/plugins/check_pop -H localhost -p 7995 -S
command[check_zimbra_imap_real]=/usr/lib/nagios/plugins/check_imap -H localhost -p 7143
command[check_zimbra_imaps_real]=/usr/lib/nagios/plugins/check_imap -H localhost -p 7993 -S
command[check_zimbra_mailq]=/usr/lib/nagios/plugins/check_mailq -w 100 -c 150 -M postfix
command[check_zimbra_clamd]=/usr/lib/nagios/plugins/check_clamd -H localhost
command[check_zimbra_mysql]=/usr/lib/nagios/plugins/check_mysql -s /opt/zimbra/db/mysql.sock
command[check_zimbra_mysql_logger]=/usr/lib/nagios/plugins/check_mysql -s /opt/zimbra/logger/db/mysql.sock
command[check_zimbra_amavisd]=/usr/lib/nagios/plugins/check_smtp -H localhost -p 10024 -e '220 [127.0.0.1] ESMTP amavisd-new service ready'
command[check_zimbra_lmtp]=/usr/lib/nagios/plugins/check_smtp -H localhost -p 7025 -e '220 zimbra.example.com Zimbra LMTP ready'
command[check_zimbra_postfix_amavis]=/usr/lib/nagios/plugins/check_smtp -H localhost -p 10025 -e '220 zimbra.example.com ESMTP Postfix'

check_clamav.pl
command[check_zimbra_clamd_sig]=sudo -u zimbra /usr/lib/nagios/plugins/contrib/check_clamav.pl -w 3 -c 5

/etc/sudoers
nagios ALL=(zimbra)     NOPASSWD: /usr/lib/nagios/plugins/contrib/check_clamav.pl

Validate SSL Cert
service_description     Zimbra SSL Certificate
command_line    $USER1$/check_http -S -H zimbra.example.com -C 10

Check LDAP
service_description     Zimbra LDAP
check_command   check_ldap_with_HOST!zimbra.example.com!dc=de

Monday, 21 March 2011

Event ID 13508, Source: NtFrs

The File Replication Service is having trouble enabling replication from BELL-AD-PRIMARY to SERVER1 for c:\windows\sysvol\domain using the DNS name Bell-AD-Primary.ambition24.com. FRS will keep retrying.
 Following are some of the reasons you would see this warning.

 [1] FRS can not correctly resolve the DNS name Bell-AD-Primary.ambition24.com from this computer.
 [2] FRS is not running on Bell-AD-Primary.ambition24.com.
 [3] The topology information in the Active Directory for this replica has not yet replicated to all the Domain Controllers.

Troubleshooting:
* Check if SYSVOL is shared by using the "net share" command on all Servers?
* Is the FRS running on all Servers?
*  Run dcdiag and netdiag on both servers. to check
replication please run repadmin /showreps >rep.txt.

Sunday, 6 March 2011

Cisco Router Startup

On startup the System Bootstrap (BootStrap) process:

  1. Runs the POST
  2.  Find IOS in flash memory (Tell router how to load by def flash)
  3. IOS load and look for a valid configuration "startup config" stored in RAM or NVRAM
  4. Once the IOS is loaded the POST information will be displayed
  5. If no startup config is found in NVRAM the Router will go into "setup mode

Acronyms

POST - Power on Self Test
IOS - Internetwork Operating System
EEPROM - Electronically Erasable Programmable Read Only Memory
NVRAM - Nonvolatile Read Only Memory

Cisco Router - Online Simulator

Cisco Online Router Sim
http://www.techexams.net/testsim/techsim.php#