Quarterly Report on Research Activities
3Q 2003 (July - September)
Table of Contents
Development of real-time analysis and new metrics ~ We spent quite some time discussing data link layers and their differences and merits before shaping the nature of the real-time application we want to design. One way to look at the Internet is to think about how you drive your car. You listen to the sound of the engine, the noise of the road, the feeling the steering wheel provides for grip on the road, the forces you experience in going around corners, the light conditions, rain, ice, etc. With the Internet, there are no such signals. The users, or network administrators, are flying blind: no sounds, no windows, no radar screen. The first time you notice is when something "doesn't work." The change is rather sudden, like the car falling apart in the middle of the highway at 70 miles an hour. No warnings whatsoever. The engine we want to design should provide the kind of "noise" characterizing normal behavior, and put the user into the position to detect abnormal behavior early, labeled neither good nor bad, just different. In time, this will be important. We are pursuing three different approaches to this effort and making good progress.
-
We have put together something of a remote sensor. Presently there are three different layers. The first layer will look basically at the link and IP headers and produce a total of 7 counters, broken down into packets and bytes, inbound and outbound. Counters are all 32 bit. The second layer looks at applications (TCP/UDP port numbers) and records the top 10 active applications (the limit of 10 is rather arbitrary at this point). The third layer looks at a number of flow (connection) properties. We are still carving those out, but they look a great deal at TCP/IP connection intrinsics, connection symmetry, packet loss, retransmission, reordering, SYN/SYN/ACK, FIN/FIN handshakes, and much more. These parameters would be sampled once per second and fed back to a central collection point (pma.nlanr.net). First estimates indicate we will have a data flow of 1024 Bytes/sec (8 Kbits/sec). Imagine a hundred sensors in the field - this still makes for less than 1 MBit/sec at the central collection point, quite a reasonable design. We would store those parameters in a database per each collection point and age them on an hourly, daily, weekly and monthly basis, using RRD. The RRD tool suite would also be used to display the sampled parameters on a Web page.
Early releases were completed, the first implementing packet length statistics. Early performance testing (using Endace's SmartBits tester) showed that the utilization of CPU for the real-time measurements is minimal (0.1% for the current version). The development of the design had been focused solely on Ethernet in the beginning. Therefore, using the SDA point (SDSC Abilene link monitoring ATM OC12c) gave an opportunity to test the design on a different link layer technology (ATM/AAL5 at OC12 speeds). This resulted in the next evolution of the program which works well with ATM and Ethernet, legacy or otherwise. Further structuring the program allowed for the addition of a second stream of data, either from another interface on the same card, or a second physical card. Much of the code to merge two streams of traffic for bidirectional statistics has been finished. Currently there are two ideas on how to account for the traffic on each link, which will be resolved shortly.
We used the SmartBits GigE tester (at Endace) to generate worst case traffic with smallest size packets at 64 Bytes each, approximately 1.4 million packets/sec. Initially we ran into an issue which is still present in the current API which will make the application poll 100% of the time and hence use 100% CPU time. This is a known issue currently being addressed. We put a patch in manually to allow for at least 8KByte of data to arrive before processing packets and subsequently the CPU load dropped to 2-3%. This is on a Dell 2650 (P4 Xeon at 1.8GHz). We are safe to claim this application will perform at 10 Gigabits/sec in the future.
Towards the end of the quarter, we tried to figure out why testing of this real-time tool yielded seemingly distorted packet length numbers on the SDSC Abilene link we are monitoring. We captured some short traces with different driver versions and did the analysis with different tools (dagtools as well as Chris' real-time tool) to make sure we are not looking at an artifact caused by a bug in the program. As it turned out, the problem really seems to lie in the network, though we have not yet come up with a satisfying explanation of what we are seeing (about 50% MTU-sized packets in one direction and nearly none --between 0.4 and 2%-- in the other direction). More effort will be devoted to this analysis.
- A prototype of a real-time link utilization analysis tool was developed. This instantaneous bandwidth tool uses CDF (cumulative distribution functions) histograms to characterize the distribution of the gaps between packets. This approach helps in understanding the utilization of the very link that is being monitored. For a more systematic approach, support for ns2 trace files was added; this enabled the generation of synthetic traffic with a specific profile. This helped in understanding what these sequences can actually tell. Preliminary tests were quite promising in that they revealed some properties of the traffic which were not visible by just looking at bandwidth averages.
Evaluation of the output of this histogram sequence (instantaneous bandwidth) tool began. Sample sequences can be viewed at: http://pma.nlanr.net/~klaus/histo/. These sequences illustrate some very interesting features. Comparing the different directions of the same trace shows that this packet gap histogram approach yields some insights into the traffic behavior which are not visible by just looking at average utilization figures. In some cases, they actually provide quite opposite ratings for bandwidth availability when compared to what the average utilization indicated. As this link level packet analysis effort has progressed, there is a feeling that we may well be working on a single measurement point notion of IP packet congestion.
- A third approach utilizing TCP slow start synchronization to identify bottlenecks by examining packet traces is also under consideration and development. A literature review was performed on TCP congestion control and avoidance, especially on large scale slow start synchronization of TCP connections passing a congested link. All studies report the existence of this phenomenon, though it becomes less pronounced if NewReno or RED are used. But even then, there is still evidence of congestion (TCP has to know it from somewhere). Therefore, it was decided to pursue an approach which utilizes this effect to hunt down congested links of an Internet path. There are currently two paths being pursued for the TCP flow engine; preliminary versions are nearly completed. The end product will include the best design aspects re performance advantages from each of these. Discussions were held regarding the different metrics to examine.
Experiments using tcptrace (a TCP flow engine with graphical output for various statistics) were performed. It provides one particular statistic about outstanding (meaning unacknowledged) packets as an approximation for the sender window. Though there is no way for tcptrace to detect simultaneous congestion events in TCP connections, this output helped in understanding the sending behavior of an application. This is important when it comes to identifying simultaneous congestions. The challenge is to discern sudden drops in supply (what we are looking for) and drops in demand. Review of additional tcptrace graphics of the Leipzig-I trace data lead to the belief that this approach will work. A couple of occurrences of simultaneous congestion events could be visually identified.
New (and developing) strategically important passive measurements & deployments ~
Work continued towards the deployment of an additional OC192MON (to be located at SDSC). This has been much more difficult than expected.
-
The new Linux operating system and additional applications were installed. A number of significant issues, including some SDSC equipment problems, had to be resolved. We are making attempts to connect the OC192MON to a data source, artificial or real, with no luck in this reporting period. The plan is to take measurements from the TeraGrid Juniper T640 router. We are working with Kevin Walsh of SDSC NetOps and Stephen Donnelly of Endace on the issues with the OC192mon. We now understand that the Adtech supports all three PHY modes and we have configured it to OC192c PoS (as opposed to the other 10GigE LAN and WAN modes). Still seeing RXF errors at the PHY chip (Khatanga). Stephen Donnelly believes that there may be an issue with having two cards in the system, the Endace folks are working on it.
- More testing/debugging of the OC192MON took place during Jörg's visit to SDSC. Jim arranged access to the Adtech equipment which was integral in the testing. Good progress was made. We confirmed that it is a problem between the Dell and the cards. However, we still do not have a configuration which works for a pair of cards in a system, which is what we need for the Abilene backbone instrumentation (and an installation at SC2003). Continued OC192MON testing following Jörg's visit resulted in some not so good news. It appears quite firm now that the boards need to be reconditioned by human attention. Endace (Knox Street) is still working out the exact patch needed. The good news is that the PCI-X 133 MHz upgrade could be performed at the same time. An open question is whether to recall the boards to New Zealand or send someone to the US for the upgrade. Expecting decisions to that effect shortly. The availability of the boards is now gating the IPLS and SC2003 installations.
Changes were made to the anonymization tool to make it work for the GigE traces captured in Leipzig (thanks to Jesper Peterson of Endace for his help with this). These are part of Leipzig-II, a two-point trace taken at the Internet access router. The Leipzig-II traces have now been anonymized.
The PMA monitors for the AMPATH connection at Florida International University (new) and the University of Florida at Gainesville (rebuild) were built, configured, and shipped; we expect that they will in be in production soon.
Two new PMA monitors which were recently completed and shipped to the sites - the KISTI site in Korea and the AMPATH GigaPop in Miami, Fla. - have been installed and connected. They are expected to be coming online soon as connection issues are being resolved. Work continues getting the MAX GigaPop PMA site online.
Site nai-p-fla is now starting to mount the new passive monitor at that site. However they are asking for some additional mounting hardware. I will be shipping the mounting hardware as soon as possible.
Continued to work with the AMPATH (nai-p-fiu) site to resolve the optical connection problem. Eric Johnson, the network engineer there, is now working with the technician Ernie Rubi. Eric is trying to gain the use of an optical power meter locally in order to measure the signal levels at the signal taps. If he is not successful I will ship a power meter from here.
The PMA machine at NCSA was disconnected when the OC3 connection went away. There is a desire to now connect it to an OC12 connection. That discussion was carried to Jörg who has made a plan to create an OC12 Dag3.2 monitor at that site.
A PMA monitor was assembled, configured, and shipped to Internet2 in Ann Arbor, Michigan, and is expected to be connected soon. Working with Matt Zekauskas on this. There had been a possibility of Jörg visiting Ann Arbor, but his travel plans will not allow time to do this.
The new passive monitor shipped to Internet2 in Ann Arbor is on site and is expected to be connected soon. A report from Matt Zekauskas indicates he is back in Ann Arbor and will be pursuing the installation on the Internet2 connection. However, he will need to do some machine reconfiguring due to the fact that the IP addresses on the machine were reassigned.
Upgrades, troubleshooting, and maintenance on the PMA infrastructure ~
A new PMA server was installed and migration of data completed. HPSS was used extensively in the process, including recovering some files that had been previously lost, and improving robustness. The data array was quite full, so a 40 GB IDE disk was added at the time. Four additional 120 GB disks were installed when they became available. This greatly improved the capacity of the server. In addition to other installations and configurations, we did another rsync from the /traces directory on the old machine to the new, which puts the new subscriber system in place for the Trace User Community.
Student researcher Chris Gross spent the summer in New Zealand working closely with PMA project lead Jörg Micheel. His activities also included working directly with the Dag card designers. This has already proved beneficial to other members of the PMA team upon his return to SDSC.
A total of 13 problem sites in the PMA infrastructure received attention this period. Only five were still being investigated, or pending site action, at the end of the quarter. Note: These statistics cover only August and September, because this tally began with the month of August.
More detail on these activities can be found in the monthly reports for this reporting period, available at:
http://moat.nlanr.net/Reports/MNA/200307July.html
http://moat.nlanr.net/Reports/MNA/200308August.html
http://moat.nlanr.net/Reports/MNA/200308Sept.html
Good progress was made on the reimplementation of AMP and development of a new testing architecture ~
- Completed the testing code for the xfer system. The test starts up the two parts of the transfer system then sends test data while killing one or another or both of the components, checking that the data gets through exactly once.
- Code to send different sized ICMP test packets (and record the packet sizes with the results) was added.
- Issues with the transfer code (occasionally seeing a bad sequence number in the disk save file) were resolved.
- Added a b-tree IP address to amp-name translation routine (v4 or v6 addresses) so that filenames can all be done by amp-name, not address. Wrote checking code for that and included it in the test suite. Also added exactly-once semantics to the ICMP save code; wrote test code for it and added it to the test suite.
- Added random sized packets to the ICMP test for the new amplet.
- Worked on the traceroute transfer format.
- Ran through all the tests and combinations of flags with the last couple of months of code working on getting everything to test correctly on Linux and FreeBSD.
- Performed soak tests and corrected a few bugs. This included a bug with the buffer overflow that was happening as a result of the thread pool overflowing and another bug that also related to abnormal exits. Pthreads has a nice mechanism to deal with a stack of exit routines, but the Linux macro implementation of the push includes an unmatched open brace that is matched by the pop. This is problematic because it severely limits how you can use the routines; worked around this. (Good to find these problems now, rather than after the release of the code, particularly because the plan is to provide only some assistance/tech support.)
- Rounded off the current revision of the amplet code, updated the CVS tree, did a series of test builds with all the different options and documented the process of adding a new test (including actually adding one).
IP Measurement Protocol (IPMP) ~
The ability to identify the bottleneck capacity and the position in a path where a bottleneck occurs is helpful for network operators in understanding the performance seen on a path. Current capacity estimation techniques measure network paths with varying levels of accuracy, speed, and robustness. Tools like pathrate have become progressively more sophisticated and accurate at measuring the capacity of a tight link, but do not provide any indication of where the tight link is located in the path. pathchar-like techniques attempt to measure the capacity of each hop by exploiting the Time To Live (TTL) value, but it is difficult to isolate the delay contributed by a specific link when a noisy link precedes the link measured.
We have developed a technique to segment a path into isolated links for capacity estimation purposes, by time stamping the probe packets at the ingress interface of each router in the path. The technique uses IPMP to obtain timestamps at each point in the network, but could use any other tracing protocol that provides a timestamp of sufficient resolution. The technique can be used to estimate the capacity of each hop by a method that isolates each hop in the path in an end-to-end measurement. This technique allows for forward and reverse path measurement and does not require the timestamp sources at each router to be synchronized.
IPMP Bandwidth estimation experiments and result analysis performed this period:
- Worked with Gnuplot to get it to plot a 3-D surface, and then used it to produce useful graphs based on the IPMP data collected last period. The experiment involved sending IPMP packet pairs and taking the minimum separation seen with 200 packets (with varying sizes) of the first and second packets. The idea is to send a large first packet that would cause a smaller packet to queue immediately behind it the whole way through, then work out the capacity of each hop on the path with IPMP packet-pairs by taking the size of the second packet and dividing it by the separation of the end of the first packet from the end of the second.
- The IPMP packet pair method has a number of advantages over other methods assuming deployment on a network:
- We do not need to filter out post-narrow congestion modes (PNCMs) because we isolate each hop with timestamps (not true if the layer 2 is composed of switches with cross traffic).
- We can see the capacity of every hop and identify the capacity limiting link, rather than just know the one-way capacity.
- A basic finding to date is that if the second packet is larger than the first, then the first will get further and further in front of the second the more store-and-forward devices (routers / switches) it traverses. This is because the first is received and transmitted in full before the second packet is received in its entirety. Graphs reflecting this finding are at: http://voodoo.cs.waikato.ac.nz/~mjl12/1.ps (and /2.ps /3.ps /4.ps). In these graphs the dark black frame represents what the separation of the two packets should have been, based on the size of the second packet. The back half (basically a diagonal line through x=0 to y=1460 in each of these graphs) reflects the basic finding.
- There are additional interesting things in each graph. It is possible that the flat surface before y=600 in 1.ps is due to a physical limitation of the particular network interface which is used to send the packets. This hypothesis was investigated by running the IPMP packet pair technique on the WAND emulation network. Data from these experiments were used to produce graphs which were then compared with the results/graphs discussed above. It is clear that the shelves below first packet size of 600 bytes are confined to either the FreeBSD operating system or the RealTek interface used.
A program was written to generate cross traffic (CT) based on packet sizes and packet inter-arrival times seen in a particular trace. It does not replay the trace. The point of this is to have a cross-traffic-from-trace generator/application that creates traffic based on that seen in the real world on the emulation network. The first application for this will be for CT experiments using the IPMP bandwidth estimation techniques.
The AMP IPv6 mesh continues to grow. One of the new international AMP sites, amp-surf (SURFnet GigaPop in Amsterdam, Holland), was among those who requested that they be added. There are now 13 sites in the AMP IPv6 mesh. http://amp.nlanr.net//IPv6/
We have written a topology mapping tool, IPv6 Scamper, to traceroute through a large address list in an efficient manner by conducting traceroutes in parallel - as the need dictates - to fill a packets-per-second rate. Scamper is quite similar to CAIDA's Skitter, except that Scamper can traceroute to IPv6 addresses, as well as IPv4 addresses. The focus of the current research is in providing insight into the behavior and growth patterns of the IPv6 Internet. We have composed an address list, which has been Scampered from several locations at various times. This period Scamper-0.9.4b2 was sent to Wim Biemolt at surfnet.nl and to Ronald van der Pol at NLnet.nl, in ordr to obtain views of the IPv6 Internet from machines in the European Union. Henk Uijterwaal (RIPE-NCC) and Tim Chown were also sent Scamper and the address list to run from their networks. Also, the Scamper source was modified to make it compile on MacOSX.
Efforts to profile the length of loops discovered during the Scamper IPv6 runs took place. This is a relatively important statistic given that 25% of all addresses in the list fail with a loop detected. http://voodoo.cs.waikato.ac.nz/~mjl12/ipv6-scamper/
A C API was developed for accessing Scamper output files - both the ASCII text style and a binary file format previously designed. Work continued on the Scamper library, which was used to write a series of programs to perform rudimentary analysis on the data collected from the six Scamper runs that have been taken from various points in the IPv6 Internet. http://voodoo.cs.waikato.ac.nz/~mjl12/ipv6-scamper/
- The first is a simple program to test the API by plotting a simple histogram of hop lengths seen for a given Scamper file.
- Wrote a program that, given a prefix, will extract all addresses that match the prefix and then draw a directed graph of all ingress and egress hops for that prefix. A diagram of ipv6.aorta.net based on data collected with Scamper at http://voodoo.cs.waikato.ac.nz/~mjl12/ipv6-scamper/127prefix.aorta.png. The red bits are examples of what could be /127 routing; the green bits indicate the addresses that fit within the prefix.
- Wrote a program that, given a Scamper file, will produce a graph of the RTT against the hop. The bits that are interesting are the RTTs that are seconds above the mean RTT seen, which might indicate the use of a tunnel.
- Also wrote a program to generate basic stats of the hit-rate of the various address sources used to compose the IPv6 address list.
- Wrote a program to fetch out the EUI64 addresses seen in the Scamper traces and obtain counts of the Organizationally Unique Identifiers (OUIs) from them, and convert that to OUI name.
An investigation of the prevalence of /127 routing prefix length between routers was begun. (Although, there is some question that this may not make sense given the traceroute method of data collection). This investigation was prompted by Pekka Savola's internet-draft RFC 3627, which suggests that the use of a /127 routing prefix should be avoided. We have run Scamper, across a target address list of 4241 IPv6 addresses, and found hops that might indicate the use of a /127 routing prefix. http://voodoo.cs.waikato.ac.nz/~mjl12/127prefix.html
New (and developing) strategically important active measurement deployments ~
We have been in discussions with Greg Cole (NaukaNet) about monitoring to about 30 or 40 Russian sites (not a full mesh). In addition to providing interesting data, this will also provide a stepping stone to a Russian AMP deployment and possible full mesh.
A request for an AMP monitor was received from NaukaNet this period. The monitor point on the NaukaNet network is Northwestern University in Chicago, Ill. The machine was prepared and shipped, installed and initiated, and is now collecting and transferring data.
The machine for the amp-cudi site in Mexico, Corporacion Universitaria para el Desarrollo de Internet (CUDI), was shipped this period and, after some work with Mexican Customs, the machine arrived at CUDI and was installed. It has been connected and is now collecting data.
Three newly installed AMP monitors were brought online this period: amp-rnpb (RNPnet GigaPop in Brazil), amp-ampath-mia (AMPATH GigaPop in Miami, Fla.), and amp-surf (SURFnet GigaPop in Amsterdam, Holland).
The long-awaited startup of the AMP monitor at the Great Plains Network GigaPop, amp-gpng, took place (connected and started) this period.
Upgrades, troubleshooting, and maintenance on the AMP infrastructure ~
The archiving process performed on the AMP data collection servers (AMP and VOLT) moves the active measurement data older that six months to the HPSS (SDSC's high performance storage system). When this method was started some time ago, this process would take the data disk fill status down to approximately 68 percent. However the active measurement infrastructure has continued to grow. The archive process just run took the disk fill down to the low seventies percentage (approximately 73 to 74 percent). This indicates that with the growth of the AMP meshes, the accumulation of data over a six month period has grown considerably. Another factor to note is that the archiving process is becoming necessary on an interval of every four to six weeks. This may be something to consider with continued development of the AMP network.
The amp server experienced a disk crash in the concatenated disk array; the failed disk was located and replaced, and the amp server restored from the volt disk array. Due to the potential ease of recovery on a non-concatenated disk structure, we decided to change the disk concatenation scheme of the AMP server to individual data disks, as is the case on the VOLT server. This change was performed which proved fortunate when the AMP replacement disk (which had been fully tested) failed the following week. The data restore proceeded quite rapidly compared to the time that would have been needed if the concatenated array scheme had been retained. Now the disk configuration on both AMP and VOLT will be maintained identically in order to facilitate restores between the two servers. Note that this does not change the fact that during the restoration process the am_slave process (which collects the data from the remote site AMPlets) is down on both servers.; however since the process is faster, complete data collection is resumed more quickly
As this was the first time in quite a while that we have lost a disk (a couple of years), we discussed the fact that in the not too distant future we should upgrade the amp and volt servers. There has not been any major work on them for quite some time. A RAID controller would remove many hours of work when we lose a drive and new drives could hugely increase the capacity of the array and lengthen the time between backup runs.
Even though the newly discovered vulnerability on OpenSSH did not represent a threat to AMPlets at remote AMP sites, we decided to move ahead with the update to the 3.7 version to ensure that remote site people do not perceive the monitors as vulnerable. Data collection and transfer were not affected by this.
Computer component obsolescence is an issue with which we are continuously faced. The component that becomes obsolete and out of manufacture most is the system board (mother board). It has become necessary again to test and upgrade to new system boards as the current board (GigaByte 6VML board) will go out of manufacture and is becoming unavailable. Investigation revealed that only a few manufacturers are continuing to produce boards supporting the Pentium III processors. Boards supporting Pentium IVs are much more available now. This may necessitate a switch of processors to the P-4.
While site outages have remained low, outages have increased to some degree. Some sites remain in a disconnected state. This is due in part to skeleton crews during the summer months. However, some sites are reporting disconnects on purpose to guard some subnets from attacks. Some site crews are reporting the deluge of viruses and worms has kept them too busy to lend a hand on NLANR AMP boxes. This issue improved towards the end of the quarter.
A total of 27 problem sites in the AMP infrastructure received attention this period. Only seven were still being investigated, or pending site action, at the end of the quarter. Note: These statistics cover only August and September, because this tally began with the month of August.
More detail on these activities can be found in the monthly reports for this reporting period, available at:
http://moat.nlanr.net/Reports/MNA/200307July.html
http://moat.nlanr.net/Reports/MNA/200308August.html
http://moat.nlanr.net/Reports/MNA/200308Sept.html
Work was performed on the HPWREN utilization reports that summarize the usage of HPWREN at several key locations - including the Palomar and Mount Laguna Observatories, the Santa Margarita Ecological Reserve, and the Tribal Digital Village Network. These reports focus on how each location uses HPWREN to its benefit and how effective HPWREN has been in providing high-speed internet access to areas where it would otherwise be unavailable. Each report also traces the use of the network from the time it was first installed up through this pas August, and demonstrates to what extent usage of HPWREN has increased in that time span. http://stat.hpwren.ucsd.edu/HPWREN/Test/Reports
This work will provide a baseline and comparison point by which we can measure the changes in network utilization from the beginning of HPWREN to the present. This will hopefully lead to an understanding of how small user communities change in utilization after the network goes from new to truly useful.
The measurement infrastructure of HPWREN (High Performance Wireless Research and Education Network) was further developed. Tools to visualize network measurements with graphs were written and existing tools and Web pages significantly improved.
Wim Biemolt, surfnet.nl, Ronald van der Pol, NLnet.nl, Henk Uijterwaal (RIPE-NCC), and Tim Chown ~
AMP: Sent them Scamper to run, which will provide views of the IPv6 Internet from machines in the European Union. Awaiting output from them.
Joe Abley and Paul Vixie ~
AMP and PMA: Have been in touch with them regarding support of measurements for the new F.root mirrors and DNS server configs that are being built around the world, and how we can help. We also sent Joe Abley the source to Scamper (he had asked for it at NZNOG).
Peter Arzberger, PRAGMA ~
AMP & PMA: Attended his PRAGMA meeting; attendees included some colleagues from Taiwan who were quite interesting, as well as some of the LTER folks. Ronn gave a presentation with him to Dr. Yang from Korea. Also working with him about the upcoming CUDI meeting in Mexico in early October. He will show a couple of NLANR/MNA slides for us when presenting there. Met with him about PRAGMA funding and future plans. The brochure that PRAGMA is developing on their project will include a number of references to our work and possibly one or two images. We are providing them with text and images.
Eric Boyd, PIPES ~
AMP: He is interested in using the AMP data as a fundamental building block of the PIPES project.
Nevil Brownlee ~
PMA: Helped him debug some work on the MFN OC48MON in San Jose, CA, without much luck, we could not make it work. Hope is we can get those resolved by replacing the outdated Dag4.1 cards by the next generation 4.2 cards. Discussed network instrumentation in Auckland (reviving our measurement point there) and other issues related to Dag software and API support. E-mail exchanges with him about the NeTraMet based analysis on DagMONs.
Bill Cleveland, Bell Labs ~
PMA: Worked to help him with his measurements at Global Crossing.
Greg Cole, NaukaNet ~
AMP: Spoke with him about possible measurement infrastructures in Russia and China. Additional talks will be held about participation in the upcoming Gloriad proposal.
Steve Corbato, Internet2 ~
PMA: He is keen for us to get one of the OC192 systems installed into IPLS.
Jon Crowcroft, Cambridge ~
PMA: Briefly communicated with him regarding the NREDS workshop, Jon was very supportive but could not help in this case.
Yuri Demchenko, NLnet Labs ~
AMP: Sent him a copy of our slides from the RELARN meeting, as requested by Greg Cole of NaukaNet. Slides
Christophe Diot, Intel, Cambridge UK ~
PMA: Dialogs with him on his program to donate PC equipment for passive measurements, and we continue to find a way to install more measurement points and support activities by other research groups.
Stephen Donnelly, Endace ~
PMA: Working with him to make sure all the Dag tool options and statistics are proper for the new OC192 deployment and other issues related to the deployment of another OC192MON.
Anja Feldman, TU Munich ~
PMA: Continuing discussions regarding passive measurements. She is the General Chair for SIGCOMM this year.
Koryn Grant, Endace Performance Test Engineer ~
PMA: Worked with him to test Chris' Internet performance engine on a Gigabit Ethernet DAG card at Endace Knox Street.
John Hicks, TransPAC, Indiana University ~
PMA: Spoke with him about the return of the OC48MONs from IPLS, which have now arrived at SDSC.
Warren Matthews, SLAC ~
AMP: Sent him the am_slave code and wrote up some installation instructions so he can get the data from the SLAC monitor in real time.
Margaret Murray, CAIDA ~
PMA: Met with her about the ENDACE card donations and collaboration with CAIDA.
Evi Nemeth, University of Colorado at Boulder ~
PMA: Several interesting conversations about various passive measurement research topics.
Ian Pratt, University of Cambridge ~
AMP & PMA: Conversations with him regarding PAM2004. Have reserved the pam2004.org domain name. Helped him with preparations for the announcement of PAM2004. The Web page is now in place, and as are the necessary e-mail lists.
Jesper Peterson, Endace ~
PMA: Helped make changes to the anonymization tool to make it work for the GigE Leipzig traces.
Pekka Savola, Netcore ~
AMP: Made contact with him; his internet-draft prompted work on finding possible examples of /127 routing prefixes--it is now RFC 3627. He provided a few pointers to other tricks we could try to find routing prefix lengths. Might be an interesting small project to try with Scamper.
Paul Schopis, OARnet GigaPop ~
AMP: Met with him; he is supportive of hosting an AMP at the OARnet GigaPop.
Rick Summerhill ~
AMP: Met with him about putting AMP machines at each of the 11 Abilene backbone nodes as part of the Abilene Observatory project. http://abilene.internet2.edu/observatory/
Research colleagues at the University of Catalonia, Barcelona, Spain ~
PMA: Met research colleagues there, and the discussion turned out to be extremely interesting. We have possibly found another strong collaborator for the PMA project. We are in discussions with Pere and Josep to see how we can strengthen the relationship in a variety of ways.
Kevin Walsh, SDSC NetOps ~
PMA: Arranged to obtain direct access to the Spirent Adtech test equipment which we are using to test the OC192 monitor. This will enable control over the test values as well as be invaluable to the project in the future. He has great enthusiasm in assisting us with this project. Also working with him regarding the SC2003 Bandwidth Challenge.
Matt Zekauskas, Internet2 ~
PMA: This period, we worked with him regarding the move of the ADV PMA monitor to Ann Arbor.
More detail on these activities can be found in the monthly reports for this reporting period, available at:
http://moat.nlanr.net/Reports/MNA/200307July.html
http://moat.nlanr.net/Reports/MNA/200308August.html
http://moat.nlanr.net/Reports/MNA/200308Sept.html
Presentations and Conference/Meeting Participation ~
Netforum 2003 conference (NZNOG 2003)
Mt Wellington, Auckland, July 9-12, 2003 (NZNOG, the New Zealand Network Operators, NZ equivalent of NANOG)
- Jörg gave a 60 minute presentation about the state of the art in passive Internet measurements. Attended the other sessions and was positively impressed by the quality of the presentations. Had the impression that the issues raised were much closer to people's needs on a day-to-day basis.
- Tony presented on the simulation work we did a year or two back which created more interest than expected.
- Matthew gave a talk about IPMP and the IPv6 mapping project and attended. Created a large poster with the IPv6 map for NZNOG 2003. Produced a map consisting of 56 pages of A4 with flags of the major points on each AS. It attracted some attention, Joe Abley who is an ex-pat NZer now working at ISI offered a box to run Scamper from.
Jörg presented his paper, "Writing Applications for Real-time Network Monitoring" at the AUUG2003 Conference (Sydney, Australia, September 3rd-5th).
Jörg attended ACM SIGCOMM 2003 in Karlsruhe. It was not only a data communications festival, but a prime opportunity to talk to a lot of people in the network arena.
Jörg attended the Internet2 Security at Line Speed (SALS) NSF Sponsored Workshop (Invitation Only, total of 30 attendees), in Chicago, IL, August 12-13, 2003. The outcome was not unexpected, but educational. Most people will treat security as a matter of restricting IP communications by moving "wide open" to "tailored to needs" via firewalls and similar "middleboxen". More interesting was the fact that tools are all there, no surprises expected in the near future, but changes are expected in the way that those mechanisms are operated. Multi-tiered policies, better ways to communicate, multi-campus arrangements and similar communications and cultural changes promise some form of "improvement" on the subject. Overall, reassuring to see that there are no strict technology changes expected. http://apps.internet2.edu/sals/2003aug/agenda.html
Tony visited Canterbury and Massey Universities presenting material about what NLANR/MNA is currently doing, in the hope of finding more collaborators. He also attended a talk about NeTraMet++, given by Nevil Brownlee.
Tony attended a seminar given by a visitor from Cardiff University about network metrics and QoS for grid computing which raised some interesting issues.
Ronn attended the Kansas Joint Techs TEV meeting in Lawrence Kansas; participated in the PIPES meeting; and attended the JET meeting.
Ronn participated in the SDSC Measurement Working Group meeting. There are plans to have some kind of measurement meeting at SDSC in December.
Lana Kennedy and Ben Reesman each prepared and presented a poster at the SDSC sponsored Student Symposium for student researchers on their current work and activities.
Cichlid 3-D Visualization Tool ~
Good progress on the reimplementation continued with the following:
- Developed a new OpenGL rendering code for a pie-chart graph type. The images produced look very good and the code works very well. In addition to use in Cichlid, this code will generate images for publications, such as those created for the Citings/Proceedings Web page. http://moat.nlanr.net/Data-users/citings_proceedings.html
- Worked with Jeff Brown's camera and transformation code, then created a class that contains Jeff Brown's camera math, and integrated it into the OpenGL GUI components with the concurrency-related code.
- Began to implement the bar chart graph from the original Cichlid code. Fortunately most of this code will move right over. Did some reworking of the way that the client-side protocol threading works. Worked on routines that pack and unpack bar charts and took some advice from Klaus on how to speed up the partial data updates for bar chart graphs. Developed a new serialization method for similar graphs, extending my previous implementation to include a more sophisticated interface which allows multiple units of color per cell. Also wrote a more elaborate serialization method using these unit data structures.
- Developed routines to do the drawing for vertex-edge graphs as well as routines to serialize and deserialize datasets.
- Began developing a more sophisticated interface for graph objects to publish their unique options (NURBS, render quality, interpolation) to the windowing interface in a graph-independent way. This consists of an interface infrastructure that can enumerate a list of graph characteristics and the available options (multiple choice) for each one, which then gets built into a menu. All of this happens without the window class knowing anything except a dynamically allocated data structure which holds character strings.
- Spent time working with the windowing system, regarding the generation of timer events to keep the graph animating properly.
- In preparation for correcting several mistakes in the design of the reimplementation of Cichlid, spent a lot of time studying network and systems programming fundamentals. The two most important regard portability: byte ordering of network traffic and the portability of threading. Wrote 1500 lines of supplementary code for Cichlid which solves some of the problems that I have raised with the work to date. This includes: a small thread library, byte packing code, new network utility code, and a rigorous network specification. Todd Hansen and I also wrote several key routines in the client rendering setup.
- Wrote more code, including memory management for the byte stream, a more advanced FrameSocket abstraction, and a server setup using the new threads. Worked on the algorithm that handles the animation and on a class that abstracts playback in a more modular fashion. Reviewed and debugged the code that manages storage and submission of datasets into the data stream, as well as optimizing and debugging other parts of the code.
- Worked on a number of technical issues. Wrote parameterized types to generalize the idea of animating dataset using C++ templates. Also shuffled Cichlid's drawing library into a more C++ friendly form, and added the wedge drawing stuff from the pie charts so that it is more of a cohesive whole.
Publications, Networked Data, Documentation ~
Micheel, J. Writing Applications for Real-time Network Monitoring. Proceedings of the AUUG 2003 Conference: Open Standards, Open Source, Open Computing, Sydney, Australia, Aug.-Sep. 2003.
An article is in progress on our international deployments and collaborations, for publication on the SDSC Web site (and our own). The focus will be on technology transfer to groups in Korea, Australia and Taiwan - as well as the hosting of NLANR/MNA monitors in other countries.
Several extensive Web projects involving the development of many new pages, updating of current ones, and in some cases, realignment of the underlying tree structure, were begun.
-
One of the primary building blocks for this total revamping of the MNA Web site, the design and development of a new Web page format and style template which will be consistent throughout the Web site, was completed. An important aspect of this is the navigation bar (navbar) which will have three versions: a general NLANR/MNA one, and one each for AMP and PMA. A important element of the navbar is that it incorporates dynamic content, with two featured "teaser" images (just 100 pixels in width), each with a "more info" link. The current featured activities are the new Citings pages and the OC48 instrumentation and measurements taken on the Abilene backbone (our first OC48c Packet-over-SONET data set). The underlying design of the navbar is such that these featured activities can be changed at any time without having to change the html on all pages. The plan is to rotate them every two weeks or so, and of course add new ones as we highlight different activities.
During the process of creating multiple draft page samples, flaws in the design were identified and resolved. In addition, the nature of our pages which contain both text and illustrated data of varying sizes (some quite large) necessitated development of two page templates: one for regular pages with images that fit easily into the template size constraints and another for "data" pages, such that the researcher is not confined to a specific size when displaying illustrated data (graphs and/or images), while maintaining continuity and readability. Extended trial and testing across browsers resulted in additional refinements. A mini version of the navbar (just two links vs. the full sidebar one) for use with the data pages (where the complete sidebar version is not appropriate) was designed.
- Citings: Data Users pages ~
The aim of these pages is to know who is doing what with our data, and approximately how many folks are using our publicly available measurements, analyses, and tools, for their own research efforts. We will also have pages regarding our collaborations in this suite of pages. The Citings/DataUsers pages currently consist of the following pages which were completed and posted this quarter: Published papers referencing our work, sorted by meeting/conference and sorted by year and a "By the Numbers" page with numerical information regarding proceedings' papers in which we are cited. The results of this work which spanned many weeks of compilation efforts form an excellent foundation which will be extended as new information is added. The pages are posted at: http://moat.nlanr.net/Data-users/.
- Example: Of 29 total accepted papers at IMW 2001 (San Francisco, CA, USA November 1-2, 2001), 11 papers referenced our work (37.9% of the total number of papers accepted).
- Work began on the redesign and restructuring of all of the PMA Web pages, the primary aim of which is to increase readability and accessibility. Nine primary pages that will form the core of the new PMA Web site were identified. A PMA navbar was developed based on these primary pages. A preliminary draft of a totally revamped directory tree structure based on the primary pages was decided upon. We are now ready to create a final full detail draft of the complete tree structure, after which, final decisions will be made and all of the new pages will be taken live, with redirects to the old pages (using a single file format).
- Coordinate information was added to the AMP database to make the automatic map generation scripts work for the AMP international sites (part of new AMP site indices pages).
The AMP and PMA posters were updated; will be displayed at SC2003 in Phoenix and other meetings as appropriate.
More detail on these activities can be found in the monthly reports for this reporting period, available at:
http://moat.nlanr.net/Reports/MNA/200307July.html
http://moat.nlanr.net/Reports/MNA/200308August.html
http://moat.nlanr.net/Reports/MNA/200308Sept.html
Chris Gross had the opportunity of visiting New Zealand this summer and worked closely with PMA project lead, Jörg Micheel, on a real-time passive measurement and analysis tool. This tool is intended to be deployed through the NLANR PMA infrastructure to, among other things, collect and process packet level statistics, information at the data-link, network, and transport layers, and eventually detect network "events" for the purposes of informing network administrators of problems on their link. The tool works with the DAG cards of Endace measurements to collect packets in real time.
When Chris finished up his summer's work and returned to the U.S., Jörg (PMA lead) wrote the following in his weekly report.
"Overall, I am very pleased that we have managed to cover all the topics we had planned for Chris' visit to Hamilton [NZ]. He learned how to configure, access and control the DAG cards via the API, how to interpret the record formats, how to parse the packet data information, ARP, IP, UDP, TCP. He finished an application which delivers one second statistics on a variety of parameters that we had determined earlier as important. He managed to feed the stream of data into a small RRD database, and learned how to plot such data as a GIF file for Web access. These are all the steps needed from the lowest level all the way to the user interface for any serious real-time network analysis application. Well done, Chris."
Lana Kennedy joined the NLANR/MNA staff in July as the Student Writer. She began by assisting Maureen Curran in bringing the completion of the old monthly reports up to date, as well as in continuing to stay on top of the new ones. Her skills with the format and organization of the reports have grown quickly (despite a number of major changes reflecting changing priorities), allowing her to complete increasing amounts to free Maureen up for other responsibilities. she assisted Maureen in the creation of several Web pages, including the Citings: Data Users pages, which have been posted, and the Collaborations pages, which are still under development. She is also helping with the redesign of the PMA pages. she had no prior experience with html or Web design, but she has adapted to it well and is comfortable populating, editing, and even creating html documents.
She also began a coding project, under the supervision of Tony McGregor, in which she will add the Iperf bandwidth test to the new amplet reimplementation software. she will work with Ben Reesman, who has agreed to assist her with technical issues as she builds her understanding of the UNIX operating system, network functionality, and C programming techniques. She has had some experience with C++ and Java programming, but had no background in UNIX, and has been progressively learning the fundamentals of the operating system to complete the Iperf project and increase her own knowledge base and extend her skills.
Matthew Luckie continues to take the lead on two significant AMP projects, IPMP and AMP IPv6.
He is also developing bandwidth estimation techniques using IPMP. The bandwidth estimation techniques are novel enhancements to current bandwidth estimation techniques, and are being tested for accuracy in the lab. At quarter's end, he had submitted a paper on this work (which has been accepted). As a result, he will be attending the invitation only Bandwidth Estimation Workshop next quarter.
He continued to develop Scamper, the topology mapping tool he developed (which is similar to Skitter) by writing an API to access the files that Scamper generates. He also performed analysis on the data that Scamper has been generating. http://voodoo.cs.waikato.ac.nz/~mjl12/ipv6-scamper/
He also prepared and delivered a "programming Java sockets" lecture for a 3rd year networking course, which went very well.
Klaus Mochalski is a visiting Ph.D. student from the University of Leipzig, Germany. His research focuses on real-time applications of passive measurements. During his three-month visit with NLANR/MNA he worked on two different approaches. First, he has developed a new metric to characterize small-time scale network link utilization characteristics. He has written a tool for measuring this metric in real-time and has used it to analyze some of the existing NLANR/MNA passive trace data sets. He is currently working on a paper to publish this work.
Second, he has worked on an approach to identify bottlenecks along an Internet path by only watching the traffic on one end of the path using passive measurements. Therefore trace data is searched for signs of global TCP synchronization. This information complemented by routing information can yield a subpath containing the congested node. To achieve this, he has started to develop a tool to detect TCP flows and synchronization effects through real-time measurements.
Apart from his research, Klaus has also contributed some time to the improvement of the PMA Web site (and has been very helpful in this regard).
This quarter Ben Reesman did extensive implementation for the Cichlid project. He designed and wrote most of the underlying Cichlid infrastructure, including networking and GUI elements. However, many of the design decisions that were made early in the project were revised when several serious shortcomings of the original design became apparent. In light of this, most of the code was rewritten in a more extensible style, and many improvements that were conceptualized (by myself and others) during the first run through were implemented. At the end of the quarter, Cichlid was a large and well thought out body of code providing the basic infrastructure for new graph types. During this period most of the learning that he did had to do with network traffic fundamentals and intelligent networking and threading code.
- 30 -
|