We maintain a wide variety of hardware and systems not only for our nonprofit clients, but also for ourselves. One challenge of that is connecting unique systems into one platform for monitoring. I am migrating all of our monitoring infrastructure from Icinga1 (https://icinga.com/docs/icinga1/latest/en/) to Icinga2 (https://icinga.com/), which has built in support for the Debian and Windows systems that we administer, and most of the services that run upon them. There are however, some specific aspects of systems that need to be monitored, with no existing method to monitor them. For those aspects, I write monitoring scripts that run commands, parse the output, and report back to the monitoring system.
The most recent instance of this was figuring out and setting up monitoring of Windows Server Azure backups using the Windows Scripting language PowerShell. As in most situations, this work was built on top of generously shared code from others. There already exists a PowerShell script (https://github.com/juangranados/nagios-plugins/blob/master/check_wsb.ps1) that I am using to check the Windows Server local backups, developed by Github user juangranados (https://github.com/juangranados). I took code from that script, and from a script to check Azure Backup Status (https://github.com/hkarthik7/Common-PowerShell-Scripts/blob/master/Get-A...) by Harish Karthic (https://github.com/hkarthik7) and developed my own script (https://github.com/mxroo/nagios-plugins/blob/master/check_cloudbackup.ps1). For this client server, we want to make sure the Windows backups to Azure have run within the last 48 hours, and that they completed successfully.
<# .SYNOPSIS Check Windows Azure Backup last scheduled job status. .DESCRIPTION Check Windows Azure Backup and returns Nagios output and code. PARAMETER Hours Number of hours since now to check for backup jobs. Default 48. .OUTPUTS OK: All last backups jobs within $Hours successful. CRITICAL: Backup job failed. .EXAMPLE .\check_cloudbackup.ps1 -Hours 96 Based on check_wsb.ps1 by Juan Granados https://github.com/juangranados/nagios-plugins #> Param( [Parameter(Mandatory=$false,Position=0)] [ValidateNotNullOrEmpty()] [int]$Hours=48 ) # Set specific subscription Select-AzSubscription -Subscription "XXXX" | Out-Null # Get Vault $Vault = Get-AzRecoveryServicesVault # Get backup status try{ $BackupStatus = Get-AzRecoveryServicesBackupJob -VaultId $Vault.ID -ErrorAction Stop }catch{ Write-Output "UNKNOWN: Could not get Windows Azure Backup" $host.SetShouldExit(3) } # If backup status exists if ($BackupStatus){ # Go through each backup retrieved Foreach($Status in $BackupStatus) { # Check last backup $LastSuccessfulBackupTime = ($Status.EndTime).Date # If there is a last backup If ($LastSuccessfulBackupTime){ # If last backup has been performed in time and its result is ok, exit OK with last backup date If ( (($Status.EndTime).Date -ge (get-date).AddHours(-$($Hours))) -and $Status.Status -eq 'Completed'){ Write-Output "OK: last backup date $($Status.EndTime)." $host.SetShouldExit(0) } # If last backup was not performed in time, exit WARNING with last backup date ElseIf ( ($Status.EndTime).Date -le (get-date).AddHours(-$($Hours)) ){ Write-Output "WARNING: last backup date $($Status.EndTime)." $host.SetShouldExit(1) break } # If last backup failed, exit CRITICAL with last backup date ElseIf ( ($Status.Status) -eq 'Failed' ){ Write-Output "CRITICAL: Last backup ending $($Status.EndTime) failed." $host.SetShouldExit(2) break } # If none of the above tests work, exit UNKNOWN Else{ Write-Output "UNKNOWN: Unknown status" $host.SetShouldExit(3) } } # If there isn't a backup time, exit CRITICAL else{ Write-Output "CRITICAL: There is not any successful backup yet." $host.SetShouldExit(2) } } } # If no backup information, exit UNKNOWN Else{ Write-Output "UNKNOWN: Could not get Windows Server Backup information." $host.SetShouldExit(3) }
This script requires modification to use the appropriate Azure Subscription ID. It will only work when run as a user that has the Azure PowerShell modules installed (https://docs.microsoft.com/en-us/powershell/azure/install-az-ps), and when there is only one Vault in that subscription, which is our current use case. In the future, if we need to check more than one subscription or vault, I will copy the Foreach loops (https://github.com/hkarthik7/Common-PowerShell-Scripts/blob/c100c11342ab...) from Harish Karthic's script.
Once I had the script, I created a new command in Icinga2, telling the Icinga2 agent on my windows servers to use PowerShell to run that ps1 file, which I copied to all necessary servers.
object CheckCommand "Windows Cloud Backups" { import "plugin-check-command" command = [ "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" ] arguments += { "-File" = "C:\\Program Files\\ICINGA2\\sbin\\check_cloudbackup.ps1" } }
After 8 years of writing monitoring check scripts using bash, this was my first time successfully writing a script using PowerShell to do something new. I didn't find anyone else who had implemented this specific check, so it felt important to put it back out there, continuing to expand the list of services Icinga2 can check.