Every fledgling network administrator eventually comes face-to-face with the question “how do I monitor my network?”. But “monitoring a network” means different things to different people, and if you ask a crowd of experienced network managers for tool suggestions, it is highly likely that you will get a few misleading recommendations simply because the what-to-measure is not in alignment with your objectives.

Depending on who you are talking to, the task of network monitoring can mean bandwidth, traffic analysis, packet inspection, performance, or uptime.

What follows is by no means a complete listing of possible applications, but an explanation of which sample tools could be considered appropriate for the task at hand.


Sample MRTG Graph

Sample MRTG Graph

Bandwidth is the throughput that a particular network connection can support. For example, a bonded 2xT1 Internet connection has a max bandwidth of approximately 3Mbps (2x 1.544). Whereas a laptop NIC might support 100Mbps, and a switch interlink could clock in at 10Gbps.

So bandwidth monitoring is knowing how much of the available allotment any one particular port is using. This measure is typically represented as a strip chart with peaks and valleys. Tools such as MRTG (open-source), Cacti (open-source), PRTG (trialware), and Dell OpenManage (OEM bundle) use SNMP to query the built-in port counters of switches and routers to provide a graphical view (and running history) of your port utilizations.

For simple port-to-device mapping, you would be measuring the traffic that a single host is using. Yet for a firewall’s connection to the Internet, you would be measuring the aggregate usage for all your online users, possibly even including inbound remote VPN.

It is the connection to the Internet that is often the first target of measurement. If you oversee a 3M pipe and you have a monitoring system in place to know that it is consistently saturated, then you can use that as justification to management for increasing the size of your connection.

But in the above graph, what is the make-up of the traffic? Who is doing what? SNMP port utilizations take a simple-minded all-bits-are-the-same view. You know that a lot of something is happening, but nothing specific. If the port of a desktop is showing high activity, how do you tell if it is a CIFS file copy vs a BitTorrent download? And if too many users are visiting inappropriate web sites or streaming audio, maybe identifying and curtailing that activity would be more cost-effective than simply upgrading the Internet speed. But how would you find out?

Traffic Analysis

Sample PRTG TopList

Sample PRTG TopList

An analysis of the specific traffic goes deeper than SNMP’s all-bits-equal view. It provides a more detailed breakdown of port activity based on protocol and source/destination information, presented in bar chart, pie chart, or % tabular form.

To obtain a detailed traffic analysis, you need a device on your network segment that is listening to all activity, not just those packets destined as its own. A NIC configured in this manner is said to be in “promiscuous mode”. A Unix/Linux box with such an eavesdropping NIC can use the open-source NTOP program to show top network activity. (NTOP is similar to the on-the-same-host TOP process monitoring tool.)

Another method of capturing this aggregate data involves a switch that is configured to duplicate switch traffic and send it to the port where the monitoring node resides. Cisco calls this “SPAN” for “Switched Port ANalyzer”; other manufacturers simply call it port-mirroring. A network sniffer sensor, such as the one provided by PRTG, can be used to analyze this cacophony of traffic and produce easy-to-understand lists and graphs that show how bandwidth is being consumed.

Lastly, sFlow or NetFlow records can be used to deliver traffic-flow measurements to a remote monitoring device. As routed traffic, the advantage of flow monitoring is that the collector can be indirectly connected to the network that it is monitoring (rather than on the same switch or subnet). PRTG (trialware) and SolarWinds Orion NTA (commercial) offer NetFlow collectors that can produce wonderful eye-candy.

Packet Inspection

Sample WireShark Output

Sample WireShark Output

Packet inspection is the hardcore version of traffic analysis. A “network protocol analyzer” – also known as a “sniffer” – utilizes a NIC in promiscuous mode to capture every packet on the wire. It reassembles all the frames and allows an administrator to view them in the context of sessions between source and destination. Not for the faint of heart, sniffers expose all the gory bits-and-bytes innerworkings of TCP/UDP packets and Ethernet frames.

WireShark, formerly Ethereal, is the classic open-source tool for deep inspection. While it is not known for its management-friendly colored graphs, it does give the network administrator an in-depth view of exactly what is going on with the network from the Application layer (HTTP, IMAP, etc) all the way down to the Data Link layer (MAC addressing). It is the gearhead’s ultimate debugging tool.


Sample SmokePing Graph

Sample SmokePing Graph

Performance of the network is measured in terms of packet loss (number of times a packet was lost in transit), latency (typical time for a packet to go up and back), and jitter (the standard deviation across all latency measurements). For applications such as VoIP, details like a packet loss profile of the network is every bit as critical as measuring pure bandwidth.

SmokePing is an example of an open-source tool that provides such metrics. It is in some sense a cousin of MRTG, which is appropriate considering their shared RRDtool lineage. Yet instead of total bandwidth, the spikes of the strip chart represent latency (round-trip time) against a baseline of packet-loss, with the gradiant wisps of ‘smoke’ indicating the amount of jitter.


Sample Nagios Status Screen

Sample Nagios Status Screen

Uptime monitoring of a network allows the administrator to see – and more importantly, be notified – when a network-connected device goes offline. These tools typically support not just ICMP “ping” detection, but port-specific connectivity for measuring HTTP, SMTP, and other application/service responsiveness.

Nagios (open-source), Zenoss (open-source), and SolarWinds ipMonitor (commercial) are great examples of this kind of tool. You can define service groups, contacts, and escalation rules. And you are able to monitor system internals performance (such as CPU load, RAM, disk space, etc) in addition to whether the port or protocol is “pingable”.

These systems can provide a geographic overlay map-based view of your enterprise, and also understand the concept of dependencies such that if a switch goes down, you will only be alerted about it – not overwhelmed by the screaming horde of unreachable hosts that might be hiding behind it.

Which Is Right For You?

With all the choices available, there is no one correct selection. Yet I would never suggest using SmokePing for deep packet inspection, and that was the main point of this overview: the right “network monitoring” tool for the right “network monitoring” task. But within the scope of, say, bandwidth monitoring, the choice of Cacti vs MRTG vs something else becomes a personal preference.

You could elect to go best-of-breed and have multiple small tools, or you could try for the kitchen-sink approach and get one tool that does many things. But there are trade-offs with the latter, and you need to be aware of the capabilities before getting too far down the garden path. For example, Nagios is famous both for its complexity in setting up, and its extensibility in making it do whatever you need via plug-ins. You could use Nagios to monitor bandwidth, but doing so requires MRTG and the check_mrtgtraf plug-in, so you are not really gaining a kitchen sink solution after all.

Zenoss is a bit more of a one-stop-shop in that it has built-in MRTG-like graphs and histories, as well as WMI host performance monitoring, a rudimentary CMDB, and event monitoring via SNMP traps and Syslog centralization.

And PRTG is very much like the all-encompassing Zenoss, except running on Windows. It likewise supports SNMP traps & polling, centralized event logging, and WMI queries, plus adds WBEM, SOAP/REST, and NetFlow/sniffer sensors.

I have used all of the above apps and would not fault the choice of any. But I must admit that I am a very big fan of the kitchen-sinky PRTG. The AJAX web interface is among the most impressive UIs that I have ever seen. They suck you in with 10 free sensors, and when you find out everything it is capable of, you are going to start wanting to spring for more…