Monitoring Interface errors – Linux & Zabbix

Initially I created a simple script that took the stats from netstat -i every minute and manipulated them so that it left a file per interface per error counter with two figures:

the total number of errors
difference since the last reading

After talking with my colleague this wasn’t needed for Zabbix; all we did was to put the entire output of the command to a file every minute as before. So the crontab entry is simply:

* * * * * netstat -i > /tmp/netstat_errors.log

The file output looks like:

Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
bond0 1500 0 63986618864 32276 1285 1285 151641153147 0 0 0 BMmRU
eth0 1500 0 17181548766 5167 237 237 36784243675 0 0 0 BMsRU
eth1 1500 0 14907173536 4744 423 423 39635710320 0 0 0 BMsRU
eth2 1500 0 16998738705 9614 343 343 38152005955 0 0 0 BMsRU
eth3 1500 0 14899157857 12751 282 282 37069193197 0 0 0 BMsRU
eth4 1500 0 0 0 0 0 0 0 0 0 BMsRU
eth5 1500 0 0 0 0 0 0 0 0 0 BMsRU
eth6 1500 0 676620084 0 0 0 677705447 0 0 0 BMRU
lo 16436 0 81929389 0 0 0 81929389 0 0 0 LRU

Then we added the following to the /usr/local/zabbix/zabbix_agentd.conf file in the user defined section at the bottom:

UserParameter=netstat_errors[*], cat /tmp/netstat_errors.log | grep $1 | awk ‘{print $’$2′}’

With the first section, grep, it allows us to pick out the specific line via interface name, for example eth1, as this will be supplied by the Zabbix server it’s set as a variable.

The second section, awk, allows us to select a specific part of that line. The ‘double’ variable part in the print section is because you specify a variable in awk which is in $<number> format, but as the Zabbix server will supply this number we set it as another variable within the client config. This is hopefully made clearer below.

We have a template for our GPFS servers so within Zabbix Configuration > Templates. Locate the required templates and click on Items > Create Item.

As we want to monitor the 3 main receive errors:

RX-ERR
RX-DRP
RX-OVR

On 4 interfaces:

eth0
eth1
eth2
eth3

We need 12 items so we create one and then clone it enough times to cover all the errors and interfaces. So to work out variable numbers use the table:

Iface	MTU	Met	RX-OK	RX-ERR	RX-DRP	RX-OVR	TX-OK	TX-ERR	TX-DRP	TX-OVR	Flg
eth0	1500	0	17181548766	5167	237	237	36784243675	0	0	0	BMSRU
$1	$2	$3	$4	$5	$6	$7	$8	$9	$10	$11	$12

So for the first error of eth0 we’d run manually:

cat /tmp/netstat_errors.log | grep eth0 | awk ‘{print $5}’

So in the Item configuration we’d set:

name	network_errors_eth0_RX-ERR
Type	Zabbix agent
Key	netstat_errors[eth0,$5]
Type of Information	Numeric
New Flexible interval	50
Store Value	Delta (simple Change)
Show Value	As is

When done we just clone that configuration and change the name and the key to reflect the different error, or interface or both.

As this was added to the GPFS template and as long as the crontab and client config file were updated on all the servers Zabbix will automagically start gathering stats on all the servers and their interfaces.

Next create a graph, Configuration > Templates. Locate the required templates and click on Graph > Create Graph.

name	interface errors
Y axis MIN val	Fixed
Y axis MAX val	Calculated
Items	Add all the interface error items

Done!

Rejected I/O

General Geeky Meanderings

Monitoring Interface errors – Linux & Zabbix