Practical Automation - Netwatch
By Brandon James on 2019-09-12
Small data gathering utilities are one of my favorite forms of automation. They provide immediate value and have no risk of failure. Netwatch runs a set of commands periodically against a group of network devices. It features concurrency, error handling and just the right amount of customization. Netwatch was inspired by a post on Reddit and the *nix utility watch
.
If you want to skip the fluff and get right to the code, you can clone my repo on github.
Implementation Goals
- Execution should take place on each device simultaneously to get a full picture of the network at a single moment in time.
- The script should be reusable without any modifications to the script itself.
- The output should include accurate timestamps so we can correlate events across multiple devices.
- Output should be written to a file each time the main loop executes, that way if the script is stopped or dies we don't lose the data up to that point.
- The script should be able to complete it's task without using task scheduler or Cron. As these require configuration to take place outside of the script.
Installation
Requires python 3+. I strongly suggest creating a new virtual environment for use with this script. I broadly cover this here.
install virtualenv if it's not already installed
python -m pip install virtualenv
create a new virtualenv
python -m virtualenv venv
clone the repository
git clone https://github.com/bjames/netwatch
- activate the virtualenv
- Linux/OS X
source ./venv/bin/activate
- Windows
venv\Scripts\activate
- Linux/OS X
install the required libraries `python -m pip install -r netwatch/requirements.txt'
Configuration
In the YAML file, anything defined under the default section can be overridden on a per-device basis. The YAML file below demonstrates overriding the device_type and command list.
# Number of devices to run commands on in parallel
threads: 4
# Time to sleep in between iterations (seconds)
sleep_time: 300
# maximum iterations (set to 0 to repeat forever)
max_iter: 288
# If true, the results are output to one file per device
file_output: true
# If true, the results are printed to the screen
print_output: true
# default settings that may be overridden on a per device basis
default:
# commands to run
commands:
- show int status
- show cdp neighbors
# cisco_ios or cisco_nxos
device_type: cisco_ios
port: 22
# list devices and device specific config
device_list:
- hostname: IOS-SWITCH-1A
- hostname: NXOS-SWITCH-1A
device_type: cisco_nxos
commands:
- show vlan brief
- show ip route
This configuration runs
show int status
followed byshow cdp neighbors
on IOS-SWITCH-1A andshow vlan brief
followed byshow ip route
on NXOS-SWITCH-1A. The script runs every 300 seconds and repeats 288 times or once every 5 minutes for 24 hours.The script uses python's multiprocessing library's Pool.map() function. As long as you give the script enough threads (one per device), the command execution will start on each device at roughly the same time. In the above configuration, we've told the script it can spawn up to 4 threads, but in this case only 2 threads will actually be used. I've used 16 threads on Linux VM with 2 cores and 4GB of RAM without any performance degradation, but your mileage my vary.
You can read more about how I handle concurrency here and how I handle script configuration here.
Usage
- Run the script
python ./netwatch.py
. You can also specify a configuration filepython ./netwatch.py myconfig.yaml
.- The script will validate your credentials against the first device in the device list before execution begins. This is done to prevent account lockouts.
- The script will run until it is either stopped or the
max_iter
value is reached. Iffile_output
is set totrue
, the script creates one log file per device and appends the output to the log files following each iteration.
Example Output
!!!!!!! show int status on sbx-nxos-mgmt.cisco.com at 2019-09-12 22:35:32.990650
--------------------------------------------------------------------------------
Port Name Status Vlan Duplex Speed Type
--------------------------------------------------------------------------------
mgmt0 DO NOT TOUCH CONFI connected routed full 1000 --
--------------------------------------------------------------------------------
Port Name Status Vlan Duplex Speed Type
--------------------------------------------------------------------------------
Eth1/1 -- connected trunk full 1000 10g
Eth1/2 -- connected trunk full 1000 10g
Eth1/3 -- connected 1 full auto 10g
Eth1/4 -- connected 1 full auto 10g
Eth1/5 L3 Link connected routed full 1000 10g
< REDACTED FOR BREVITY>
Po11 -- connected trunk full 1000 --
Lo1 -- connected routed auto auto --
Lo100 -- connected routed auto auto --
Vlan1 -- down routed auto auto --
Vlan100 mgmt svi - DEMO PL connected routed auto auto --
Vlan101 prod svi - DEMO PL connected routed auto auto --
Vlan102 dev svi - DEMO PLE connected routed auto auto --
Vlan103 test svi - DEMO PL connected routed auto auto --
Vlan104 security svi - DEM connected routed auto auto --
Vlan105 iot svi - DEMO PLE connected routed auto auto --
!!!!!!! show cdp neighbors on sbx-nxos-mgmt.cisco.com at 2019-09-12 22:35:33.895258
Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge
S - Switch, H - Host, I - IGMP, r - Repeater,
V - VoIP-Phone, D - Remotely-Managed-Device,
s - Supports-STP-Dispute
Device-ID Local Intrfce Hldtme Capability Platform Port ID
Total entries displayed: 0
!!!!!!! show int status on sbx-nxos-mgmt.cisco.com at 2019-09-12 22:35:56.499806
--------------------------------------------------------------------------------
Port Name Status Vlan Duplex Speed Type
--------------------------------------------------------------------------------
mgmt0 DO NOT TOUCH CONFI connected routed full 1000 --
--------------------------------------------------------------------------------
Port Name Status Vlan Duplex Speed Type
--------------------------------------------------------------------------------
Eth1/1 -- connected trunk full 1000 10g
Eth1/2 -- connected trunk full 1000 10g
Eth1/3 -- connected 1 full auto 10g
Eth1/4 -- connected 1 full auto 10g
Eth1/5 L3 Link connected routed full 1000 10g
< REDACTED FOR BREVITY>
Po11 -- connected trunk full 1000 --
Lo1 -- connected routed auto auto --
Lo100 -- connected routed auto auto --
Vlan1 -- down routed auto auto --
Vlan100 mgmt svi - DEMO PL connected routed auto auto --
Vlan101 prod svi - DEMO PL connected routed auto auto --
Vlan102 dev svi - DEMO PLE connected routed auto auto --
Vlan103 test svi - DEMO PL connected routed auto auto --
Vlan104 security svi - DEM connected routed auto auto --
Vlan105 iot svi - DEMO PLE connected routed auto auto --
!!!!!!! show cdp neighbors on sbx-nxos-mgmt.cisco.com at 2019-09-12 22:35:57.305606
Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge
S - Switch, H - Host, I - IGMP, r - Repeater,
V - VoIP-Phone, D - Remotely-Managed-Device,
s - Supports-STP-Dispute
Device-ID Local Intrfce Hldtme Capability Platform Port ID
Total entries displayed: 0
!!!!!!! show int status on sbx-nxos-mgmt.cisco.com at 2019-09-12 22:36:18.511021
--------------------------------------------------------------------------------
Port Name Status Vlan Duplex Speed Type
--------------------------------------------------------------------------------
mgmt0 DO NOT TOUCH CONFI connected routed full 1000 --
--------------------------------------------------------------------------------
Port Name Status Vlan Duplex Speed Type
--------------------------------------------------------------------------------
Eth1/1 -- connected trunk full 1000 10g
Eth1/2 -- connected trunk full 1000 10g
Eth1/3 -- connected 1 full auto 10g
Eth1/4 -- connected 1 full auto 10g
Eth1/5 L3 Link connected routed full 1000 10g
< REDACTED FOR BREVITY>
Po11 -- connected trunk full 1000 --
Lo1 -- connected routed auto auto --
Lo100 -- connected routed auto auto --
Vlan1 -- down routed auto auto --
Vlan100 mgmt svi - DEMO PL connected routed auto auto --
Vlan101 prod svi - DEMO PL connected routed auto auto --
Vlan102 dev svi - DEMO PLE connected routed auto auto --
Vlan103 test svi - DEMO PL connected routed auto auto --
Vlan104 security svi - DEM connected routed auto auto --
Vlan105 iot svi - DEMO PLE connected routed auto auto --
!!!!!!! show cdp neighbors on sbx-nxos-mgmt.cisco.com at 2019-09-12 22:36:19.516013
Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge
S - Switch, H - Host, I - IGMP, r - Repeater,
V - VoIP-Phone, D - Remotely-Managed-Device,
s - Supports-STP-Dispute
Device-ID Local Intrfce Hldtme Capability Platform Port ID
Total entries displayed: 0
!!!!!!! show int status on sbx-nxos-mgmt.cisco.com at 2019-09-12 22:36:35.806752
--------------------------------------------------------------------------------
Port Name Status Vlan Duplex Speed Type
--------------------------------------------------------------------------------
mgmt0 DO NOT TOUCH CONFI connected routed full 1000 --
--------------------------------------------------------------------------------
Port Name Status Vlan Duplex Speed Type
--------------------------------------------------------------------------------
Eth1/1 -- connected trunk full 1000 10g
Eth1/2 -- connected trunk full 1000 10g
Eth1/3 -- connected 1 full auto 10g
Eth1/4 -- connected 1 full auto 10g
Eth1/5 L3 Link connected routed full 1000 10g
< REDACTED FOR BREVITY>
Po11 -- connected trunk full 1000 --
Lo1 -- connected routed auto auto --
Lo100 -- connected routed auto auto --
Vlan1 -- down routed auto auto --
Vlan100 mgmt svi - DEMO PL connected routed auto auto --
Vlan101 prod svi - DEMO PL connected routed auto auto --
Vlan102 dev svi - DEMO PLE connected routed auto auto --
Vlan103 test svi - DEMO PL connected routed auto auto --
Vlan104 security svi - DEM connected routed auto auto --
Vlan105 iot svi - DEMO PLE connected routed auto auto --
!!!!!!! show cdp neighbors on sbx-nxos-mgmt.cisco.com at 2019-09-12 22:36:19.516013
Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge
S - Switch, H - Host, I - IGMP, r - Repeater,
V - VoIP-Phone, D - Remotely-Managed-Device,
s - Supports-STP-Dispute
Device-ID Local Intrfce Hldtme Capability Platform Port ID
Total entries displayed: 0
!!!!!!! show int status on sbx-nxos-mgmt.cisco.com at 2019-09-12 22:36:35.806752
--------------------------------------------------------------------------------
Port Name Status Vlan Duplex Speed Type
--------------------------------------------------------------------------------
mgmt0 DO NOT TOUCH CONFI connected routed full 1000 --
--------------------------------------------------------------------------------
Port Name Status Vlan Duplex Speed Type
--------------------------------------------------------------------------------
Eth1/1 -- connected trunk full 1000 10g
Eth1/2 -- connected trunk full 1000 10g
Eth1/3 -- connected 1 full auto 10g
Eth1/4 -- connected 1 full auto 10g
Eth1/5 L3 Link connected routed full 1000 10g
< REDACTED FOR BREVITY>
Po11 -- connected trunk full 1000 --
Lo1 -- connected routed auto auto --
Lo100 -- connected routed auto auto --
Vlan1 -- down routed auto auto --
Vlan100 mgmt svi - DEMO PL connected routed auto auto --
Vlan101 prod svi - DEMO PL connected routed auto auto --
Vlan102 dev svi - DEMO PLE connected routed auto auto --
Vlan103 test svi - DEMO PL connected routed auto auto --
Vlan104 security svi - DEM connected routed auto auto --
Vlan105 iot svi - DEMO PLE connected routed auto auto --
!!!!!!! show cdp neighbors on sbx-nxos-mgmt.cisco.com at 2019-09-12 22:36:36.812270
Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge
S - Switch, H - Host, I - IGMP, r - Repeater,
V - VoIP-Phone, D - Remotely-Managed-Device,
s - Supports-STP-Dispute
Device-ID Local Intrfce Hldtme Capability Platform Port ID
Total entries displayed: 0
Full Source
from netmiko import ConnectHandler, NetMikoAuthenticationException, NetMikoTimeoutException, logging
from time import sleep
from datetime import datetime
from sys import argv
from functools import partial
from multiprocessing import Pool
from yaml import safe_load
import getpass
def print_output(device_result):
for i in range(len(device_result['results'])):
print('\n!!!!!!! {} on {} at {}\n\n{}'.format(device_result['commands'][i],
device_result['hostname'],
device_result['timestamps'][i],
device_result['results'][i]))
def write_output(device_result):
with open('netwatch_output_{}.log'.format(device_result['hostname']), 'a') as outfile:
for i in range(len(device_result['results'])):
outfile.write('\n!!!!!!! {} on {} at {}\n\n{}'.format(device_result['commands'][i],
device_result['hostname'],
device_result['timestamps'][i],
device_result['results'][i]))
def execute_commands(device_settings, username, password):
"""
Runs commands on a device and returns a dictionary object with the device name and the results
"""
output = []
timestamps = []
try:
ssh_session = ssh_connect(device_settings['hostname'], device_settings['device_type'],
device_settings['port'], username, password)
# write any exceptions to output so they are recorded in the results
except Exception as e:
output = ['Exception occured before SSH connection: {}'.format(e)] * len(device_settings['commands'])
timestamps = [datetime.now()] * len(device_settings['commands'])
# only executes if no exceptions occur
else:
for command in device_settings['commands']:
try:
output.append(ssh_session.send_command(command))
except IOError:
output.append('COMMAND TIMED OUT')
except Exception as e:
output.append('Exception after SSH connection: {}'.format(e))
finally:
timestamps.append(datetime.now())
device_result = {
'hostname': device_settings['hostname'],
'commands': device_settings['commands'],
'results': output,
'timestamps': timestamps
}
return device_result
def ssh_connect(hostname, device_type, port, username, password):
"""
connects to a device and returns a netmiko ssh object
"""
device = {
'device_type': device_type,
'ip': hostname,
'username': username,
'password': password,
'port': port
}
return ConnectHandler(**device)
def get_validate_credentials(hostname, device_type, port):
"""
gets username and password, opens an ssh session to verify the credentials
then closes the ssh session
returns username and password
Doing this prevents multiple threads from locking out an account due to mistyped creds
"""
# attempts to get the username, prompts if needed
username = input('Username: ')
# prompts user for password
password = getpass.getpass()
authenticated = False
while not authenticated:
try:
test_ssh_session = ssh_connect(hostname, device_type, port, username, password)
test_ssh_session.disconnect()
except NetMikoAuthenticationException:
print('authentication failed on ' + hostname + ' (CTRL + C to quit)')
username = input('Username: ')
password = getpass.getpass()
except NetMikoTimeoutException:
print('SSH timed out on ' + hostname)
raise
else:
# if there is no exception set authenticated to true
authenticated = True
return username, password
def merge_settings(device, script_config):
''' merges the default and device specific dictionaries '''
device_settings = script_config['default'].copy()
device_settings.update(device)
return device_settings
def set_device_settings(script_config):
'''
creates an array of devices containing device specific
script configuration
'''
device_settings = []
for device in script_config['device_list']:
device_setting = merge_settings(device, script_config)
device_settings.append(device_setting)
return device_settings
def main():
try:
config_file_name = argv[1]
except IndexError:
config_file_name = 'netwatch.yml'
# pull data from config file
script_config = safe_load(open(config_file_name))
# create the device settings list
device_settings = set_device_settings(script_config)
# get valid credentials from the user
username, password = get_validate_credentials(device_settings[0]['hostname'], device_settings[0]['device_type'], device_settings[0]['port'])
iter = 0
while script_config['max_iter'] == 0 or script_config['max_iter'] > iter:
print('running commands')
# execute on each device in the device list in parallel
with Pool(script_config['threads']) as pool:
# create a partial function, then use pool.map passing in the device_settings list
device_results = pool.map(partial(execute_commands,
username=username,
password=password),
device_settings)
iter += 1
# Iterate over our device_results and write it to a file/print to screen
for device_result in device_results:
if script_config['file_output']:
try:
write_output(device_result)
except Exception as e:
print('Exception occured while writing to file for {}'.format(device_result['hostname']))
print(e)
if script_config['print_output']:
print_output(device_result)
print('loop has excuted {} time(s), last execution completed at {}'.format(iter, datetime.now()))
print('maxium iterations is {} (when set to 0 loop executes forever)'.format(script_config['max_iter']))
if script_config['max_iter'] > iter:
print('sleeping for {} seconds (CTRL+C to quit)'.format(script_config['sleep_time']))
sleep(script_config['sleep_time'])
main()