Initially I created a simple script that took the stats from netstat -i every minute and manipulated them so that it left a file per interface per error counter with two figures:
- the total number of errors
- difference since the last reading
After talking with my colleague this wasn’t needed for Zabbix; all we did was to put the entire output of the command to a file every minute as before. So the crontab entry is simply:
* * * * * netstat -i > /tmp/netstat_errors.log
The file output looks like:
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
bond0 1500 0 63986618864 32276 1285 1285 151641153147 0 0 0 BMmRU
eth0 1500 0 17181548766 5167 237 237 36784243675 0 0 0 BMsRU
eth1 1500 0 14907173536 4744 423 423 39635710320 0 0 0 BMsRU
eth2 1500 0 16998738705 9614 343 343 38152005955 0 0 0 BMsRU
eth3 1500 0 14899157857 12751 282 282 37069193197 0 0 0 BMsRU
eth4 1500 0 0 0 0 0 0 0 0 0 BMsRU
eth5 1500 0 0 0 0 0 0 0 0 0 BMsRU
eth6 1500 0 676620084 0 0 0 677705447 0 0 0 BMRU
lo 16436 0 81929389 0 0 0 81929389 0 0 0 LRU
Then we added the following to the /usr/local/zabbix/zabbix_agentd.conf file in the user defined section at the bottom:
UserParameter=netstat_errors[*], cat /tmp/netstat_errors.log | grep $1 | awk ‘{print $’$2′}’
With the first section, grep, it allows us to pick out the specific line via interface name, for example eth1, as this will be supplied by the Zabbix server it’s set as a variable.
The second section, awk, allows us to select a specific part of that line. The ‘double’ variable part in the print section is because you specify a variable in awk which is in $<number> format, but as the Zabbix server will supply this number we set it as another variable within the client config. This is hopefully made clearer below.
We have a template for our GPFS servers so within Zabbix Configuration > Templates. Locate the required templates and click on Items > Create Item.
As we want to monitor the 3 main receive errors:
RX-ERR
RX-DRP
RX-OVR
On 4 interfaces:
eth0
eth1
eth2
eth3
We need 12 items so we create one and then clone it enough times to cover all the errors and interfaces. So to work out variable numbers use the table:
Iface | MTU | Met | RX-OK | RX-ERR | RX-DRP | RX-OVR | TX-OK | TX-ERR | TX-DRP | TX-OVR | Flg |
eth0 | 1500 | 0 | 17181548766 | 5167 | 237 | 237 | 36784243675 | 0 | 0 | 0 | BMSRU |
$1 | $2 | $3 | $4 | $5 | $6 | $7 | $8 | $9 | $10 | $11 | $12 |
So for the first error of eth0 we’d run manually:
cat /tmp/netstat_errors.log | grep eth0 | awk ‘{print $5}’
So in the Item configuration we’d set:
name | network_errors_eth0_RX-ERR |
---|---|
Type | Zabbix agent |
Key | netstat_errors[eth0,$5] |
Type of Information | Numeric |
New Flexible interval | 50 |
Store Value | Delta (simple Change) |
Show Value | As is |
When done we just clone that configuration and change the name and the key to reflect the different error, or interface or both.
As this was added to the GPFS template and as long as the crontab and client config file were updated on all the servers Zabbix will automagically start gathering stats on all the servers and their interfaces.
Next create a graph, Configuration > Templates. Locate the required templates and click on Graph > Create Graph.
name | interface errors |
---|---|
Y axis MIN val | Fixed |
Y axis MAX val | Calculated |
Items | Add all the interface error items |
Done!