August, 2000  -  Vol. 1 (2)
In this issue:

Front page

Wireless Wide Area Networking

Participation in the NAI as a remote site: What happens?

Throughput Tests and Path Diagnostics

Throughput and Satellite Delay

DAG Software

Traffic Flow Measurements

Recruitment for New AMP sites

News briefs




For more information:

WareOnEarth Communications, Inc. (WCI)

TCP/IP and Network Performance Tuning slides,
(includes more examples of mping and what it can reveal)

Tools page, (with links to treno and testrig)

phil or cindyd (at)
wareonearth.com

Throughput Tests and Path Diagnostics

Researchers: Phil Dykstra and Cindy Dykstra, WareOnEarth Communications, Inc. (WCI)

The San Diego office of WareOnEarth Communications, Inc. (WCI) continues to enhance and expand their Active Measurement Program (AMP) for the Department of Defense's Defense Research and Engineering Network (DREN). We now have 15 deployed AMP boxes, covering the continental United States, Hawaii, and Japan. WCI is working closely with the NLANR Measurement and Network Analysis Group in San Diego and has shared some of their developments, such as geographic information (used for mapping functions and best-case latency calculations), downtime analysis, ASN paths, throughput tests, and improved loss measurements.

One of WCI's recent efforts has been the development of throughput tests and network diagnostics using the AMP constellation. Every DREN AMP box (referred to as damp-xxx) runs netperf and testrig servers and can thus be used as a target for user tests. Testrig is configured to offer 750 KB windows, so it should be suitable for high speed TCP flows over reasonable delays. Periodic treno and mping tests are performed, usually early on Saturday or Sunday mornings. Summaries are built showing the n^2 matrix of results between seven DREN sites. Mping, in particular, has proven to be useful in revealing problems over wide area paths.

Figure 1 Figure 2

Figures 1 and 2 show selected mping "thumbnails" from the weekly diagnostic tests. In all cases 1000 byte packets were used, and the number of packets in flight was varied from 1 to 500. Packets per second (pps) vs. the number in flight (window size) is shown, with green being the PPS transmitted and red, the PPS received. The graphs are named as src-dst, i.e., source and destination node names, and normally displayed in a full nxn matrix so that anomalies are quickly visible. (More recently, we have switched to using 1250 byte packets, because at 10000 bits per packet, it is easy to read both packets per second and bits per second from the Y axis.)

Figure 1 shows fairly normal (or desired) behavior with a well defined knee and a long stable queuing region, while Figure 2, the reverse path, shows increasing round trip time (1/slope) with load. Together these two figures show that the packet forwarding behavior in opposite directions of the same path may be quite different. As load increases to the right (more packets in flight) eventually some element in the network becomes overloaded and packets are discarded (separation between the green and red lines). TCP throughput tests such as treno or testrig usually report data rates near, or slightly below this point where significant loss begins.

Figure 3 Figure 4

Figure 3 shows a relatively common situation. There is little stable queuing (something has insufficient buffering) and some drop off in performance during discard (red line does not remain flat). Some device is taking time to throw packets away (perhaps logging a message). The periodic spikes indicate some regular event, such as a Cisco route-cache cleaner, which might be avoided via Cisco Express Forwarding (CEF), etc.

Figure 4 occurs whenever a path crosses from the continental U.S. to Hawaii (or, the reverse direction). On that path, OC3 ATM hits a DS3 ATM circuit, and thus has to be throttled down. The throughput oscillates in a repeatable way with very little packet loss. We believe that this is the effect of an ATM rate shaper beating against the changing offered load, but have yet to pinpoint the proper course of action to improve it.

Figure 5 Figure 6

Figures 5 and 6 show high loss situations. The first (Figure 5), where loss is independent of offered load, could indicate low level bit errors on a link. The second (Figure 6) resulted from an ethernet duplex problem (auto-negotiation was not working properly). These graphs often show a characteristic hump rather than a knee. Our experience indicates that duplex problems may be far more common than people realize. Since the network still "works" and low rate pings report no loss, the problems often go unnoticed. It is only under load that the problem becomes clear.

back to top
[Note: WCI is hiring. We are looking for Unix Programmers, Network Engineers, and Security Specialists. If you are interested in working in San Diego, as well as other areas, please contact the author.]

To subscribe, or if you have comments and/or questions, please contact us.
© 2000    The NLANR Measurement and Network Analysis Group,  located at the San Diego Supercomputer Center (SDSC), University of California, San Diego (UCSD).   This work is supported by the National Science Foundation (NSF) (cooperative agreement no. ANI-9807479). Any opinions, findings and conclusions or recommendations expressed in this publication are those of the author(s), and do not necessarily reflect the views of the NSF.