Monitoring Interface errors – Linux & Zabbix

Initially I created a simple script that took the stats from netstat -i every minute and manipulated them so that it left a file per interface per error counter with two figures:

  • the total number of errors
  • difference since the last reading

After talking with my colleague this wasn’t needed for Zabbix; all we did was to put the entire output of the command to a file every minute as before. So the crontab entry is simply:

* * * * * netstat -i > /tmp/netstat_errors.log

The file output looks like:

Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
bond0 1500 0 63986618864 32276 1285 1285 151641153147 0 0 0 BMmRU
eth0 1500 0 17181548766 5167 237 237 36784243675 0 0 0 BMsRU
eth1 1500 0 14907173536 4744 423 423 39635710320 0 0 0 BMsRU
eth2 1500 0 16998738705 9614 343 343 38152005955 0 0 0 BMsRU
eth3 1500 0 14899157857 12751 282 282 37069193197 0 0 0 BMsRU
eth4 1500 0 0 0 0 0 0 0 0 0 BMsRU
eth5 1500 0 0 0 0 0 0 0 0 0 BMsRU
eth6 1500 0 676620084 0 0 0 677705447 0 0 0 BMRU
lo 16436 0 81929389 0 0 0 81929389 0 0 0 LRU

Then we added the following to the /usr/local/zabbix/zabbix_agentd.conf file in the user defined section at the bottom:

UserParameter=netstat_errors[*], cat /tmp/netstat_errors.log | grep $1 | awk ‘{print $’$2′}’

With the first section, grep, it allows us to pick out the specific line via interface name, for example eth1, as this will be supplied by the Zabbix server it’s set as a variable.

The second section, awk, allows us to select a specific part of that line. The ‘double’ variable part in the print section is because you specify a variable in awk which is in $<number> format, but as the Zabbix server will supply this number we set it as another variable within the client config. This is hopefully made clearer below.

We have a template for our GPFS servers so within Zabbix Configuration > Templates. Locate the required templates and click on Items > Create Item.

As we want to monitor the 3 main receive errors:

RX-ERR
RX-DRP
RX-OVR

On 4 interfaces:

eth0
eth1
eth2
eth3

We need 12 items so we create one and then clone it enough times to cover all the errors and interfaces. So to work out variable numbers use the table:

Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 17181548766 5167 237 237 36784243675 0 0 0 BMSRU
$1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12

So for the first error of eth0 we’d run manually:

cat /tmp/netstat_errors.log | grep eth0 | awk ‘{print $5}’

So in the Item configuration we’d set:

name network_errors_eth0_RX-ERR
Type Zabbix agent
Key netstat_errors[eth0,$5]
Type of Information Numeric
New Flexible interval 50
Store Value Delta (simple Change)
Show Value As is

When done we just clone that configuration and change the name and the key to reflect the different error, or interface or both.

As this was added to the GPFS template and as long as the crontab and client config file were updated on all the servers Zabbix will automagically start gathering stats on all the servers and their interfaces.

Next create a graph,  Configuration > Templates. Locate the required templates and click on Graph > Create Graph.

name interface errors
Y axis MIN val Fixed
Y axis MAX val Calculated
Items Add all the interface error items

Done!

Leave a Reply

Your email address will not be published. Required fields are marked *