NLANR/MNA logo

      Summary of Research Activities - July 2003


Development and distribution of measurement and analysis tools:

Good progress is being made on the continuing development of new metrics and real-time analysis for PMA.

  • Completed the first release of the real-time program. (This first cut implements packet length statistics.) In order to do this learned a lot about networking in general and about byte swapping, encapsulation, de-multiplexing, etc. Performed measurements and experiments on the local wireless network (CRCnet), testing to ensure that all the data records in a specific window of memory on the cards are being accounted for and that the code is as fast as it can possibly be. Placed a few more metrics and debugged some problems with the steady amount of traffic from CRCnet. The plan is to make the program portable across all Dag cards, therefore looked into the other framing and encapsulation technologies that will be involved with the other Dag cards. [Chris Gross w/ Jörg Micheel]
  • Also did some performance testing using Endace's SmartBits tester. The idea was to see how the code performed under adverse conditions. Once it was determined that the code did not break under a heavy load, the next step was to see how much of the CPU was being utilized for processing. Surprisingly, nearly none except perhaps 0.1% of the CPU was being used for the real-time measurements. [Chris Gross, Jörg Micheel]
  • Met with the ITS department at Waikato (equivalent to SDSC ENS) to revive our PMA monitor at the university access link, this would be for Chris to see some real traffic to analyze. Concerns regarding the impact to the network link by the Dag3.5E, which is a card with onboard tap for 10/100 Ethernet prevented them from allowing our request. We can have the old measurement point back (at the hub), which the university is using to collect traffic statistics for accounting purposes.  [Jörg Micheel]
  • Discussed my idea for a real-time link utilization analysis tool with Jörg and started working on it. It is already in a fairly usable state, though still very prototypical. It was necessary to solve some network configuration issues (static IP address instead of DHCP, NFS mount of /traces from pma.nlanr.net) about my machine to be able to pursue evaluation, as it requires X for its graphical output. This instantaneous bandwidth/histogram sequence tool now works quite well using X to display its output. To characterize the distribution of the gaps between packets I use histograms, more specifically CDFs (cumulative distribution functions). This approach should help in understanding the utilization of the very link that is being monitored. Have begun to evaluate the visual differences it produces for various trace files. For a more systematic approach, I added support for ns2 trace files, so I can generate synthetic traffic of a certain profile. This really helps to come to grips with what these sequences can actually tell. [Klaus Mochalski]
  • Confirmed with Klaus his results on measuring link utilization, and how to change the analysis to provide more meaningful results. [Jörg Micheel]
  • Did literature research on another idea of utilizing TCP slow start synchronization to identify bottlenecks somewhere in the Internet by merely looking at packet traces. Read a dozen or so papers on TCP congestion control and avoidance, especially on large scale slow start synchronization of TCP connections passing a congested link. All studies report the existence of this phenomenon, though it becomes less pronounced if NewReno or RED are used. But even then, there is still evidence of congestion (TCP has to know it from somewhere), so I will pursue this second approach which utilizes this effect to hunt down congested links of an Internet path. [Klaus Mochalski]
  • Discussed working on a flow engine with Jörg, he is interested and wants to have a generic flow engine anyway. Jörg suggested to both Klaus and Chris, independently, to start using flow analysis, for different reasons. We agree that we should find some time to discuss possible alignments between the two real-time applications. [Klaus Mochalski, Jörg Micheel]

More progress was made on the reimplementation of AMP and the development of a new testing architecture.

  • Completed the testing code for the xfer system this period. This was more complex and involved than I expected. The test starts up the two parts of the transfer system then sends test data while killing one or another or both of the components, checking that the data gets through exactly once. Struggled to figure out why the code was dying unexpectedly towards the end of the test. It turns out that when a process at one end of a file system socket dies, not only do subsequent read or write operations on the socket at the other end of the connection fail, but a signal is also sent. Once discovered, it was completely obvious to handle, but it took quite a while to track down. [Tony McGregor]
  • Code to send different sized icmp test packets (and record the packet sizes with the results) was added. Issues with the transfer code (occasionally seeing a bad sequence number in the disk save file) were resolved. Also added a b-tree IP address to amp-name translation routine (v4 or v6 addresses) so that filenames can all be done by amp-name, not address. Wrote checking code for that and included it in the test suite. Also added exactly-once semantics to the ICMP save code and wrote test code for that and added it to the test suite. Added random sized packets to the ICMP test for the new amplet. Ran through all the tests and combinations of flags with the last couple of months of code working on getting everything to test correctly on Linux and FreeBSD. [Tony McGregor]

IPMP experiments and result analysis performed this period:  

  • Worked getting gnuplot to plot a 3-D surface, and managed to have it print useful graphs based on the IPMP data collected last period. The experiment involved sending IPMP packet pairs and taking the minimum separation seen with 200 packets (with varying sizes) of the first and second packets. The idea was to send a large first packet that would cause a smaller packet to queue immediately behind it the whole way through, then work out the capacity of each hop on the path with IPMP packet-pairs by taking the size of the second packet and dividing it by the separation of the end of the first packet from the end of the second. [Matthew Luckie]
  • The IPMP packet pair method has a number of advantages over other methods assuming deployment on a network:
    • we do not need to filter out post-narrow congestion modes (PNCMs) because we isolate each hop with timestamps (not true if the layer 2 is composed of switches with cross traffic)
    • we can see the capacity of every hop and identify the capacity limiting link, rather than just know the one-way capacity.
  • A basic finding to date is that if the second packet is larger than the first, then the first will get further and further in front of the second the more store-and-forward devices (routers / switches) it traverses. This is because the first is received and transmitted in full before the second packet is received in its entirety. Graphs reflecting this finding are at:   http://voodoo.cs.waikato.ac.nz/~mjl12/1.ps   (and /2.ps /3.ps /4.ps). In these graphs the dark black frame represents what the separation of the two packets should have been, based on the size of the second packet. The back half (basically a diagonal line through x=0 to y=1460 in each of these graphs) reflects the basic finding. There are additional interesting things in each graph. It is possible that the flat surface before y=600 in 1.ps is due to a physical limitation of the particular network interface which is used to send the packets, but that is a guess that needs to be backed up experimentally by trying other machines / NICs. [Matthew Luckie]
  • Used the IPMP packet pair technique to run an experiment on the WAND emulation network, then produced some graphs on the data gathered.   Graphs   When compared with the results/graphs discussed above, it becomes clear that the shelves below first packet size of 600 bytes are confined to either the FreeBSD operating system or the realtek interface used (guess is it was the realtek). [Matthew Luckie]
  • Spent some time writing a program that generates cross traffic based on packet sizes and packet inter-arrival times seen in a particular trace. It does not replay the trace. The point of this is to have a CT generator that creates traffic based on that seen in the real-world for the emulation network. The first application for this will be for CT using my IPMP bandwidth estimation techniques. [Matthew Luckie]
  • Worked on an API for accessing Scamper output files - both the ASCII text style and a binary file format that I designed. It is mostly complete:  it is a C API, with pointers to functions in each struct for those who would prefer an object-oriented style. Wrote code to test it all out. Am working at putting some of the utility functions that Scamper itself uses in the library, and then making a man page documenting the library at the end. [Matthew Luckie]

Work on the reimplementation of the Cichlid 3-D Visualization System also continued. Developed a new OpenGL rendering code for a pie-chart graph type. The images produced look very good and the code works very well. In addition to use in Cichlid, this code will generate images for publications. Have written more threading code, and the protocols are all working nicely (now that all those bugs are gone). Worked with Jeff Brown's camera and transformation code, then created a class that contains Jeff Brown's camera math, and integrated it into the OpenGL GUI components with the concurrency-related code that I wrote. Beginning to implement the bar chart graph from the original Cichlid code. Fortunately most of this code will move right over. Did some reworking of the way that the client-side protocol threading works. Met with Todd to discuss some potential problems with IP fragmentation and also to talk about relocating one of the frame queues to a different class. Finished what will hopefully be the final draft of the Cichlid GUI. Also wrote a lot of code which makes the application of dataset-diff frames into the graph a lot more sophisticated. Also developed routines to do the drawing for vertex-edge graphs as well as routines to serialize and deserialize datasets. [Ben Reesman]

Activities extending the Network Analysis Infrastructure (NAI) in support of new and developing HPC needs:

The lucky-13th IPv6 AMP site was identified to become part of the AMP IPv6 mesh. Bud relayed that the site administrators at Surfnet in Amsterdam asked that their AMP monitor be added to the IPv6 mesh. Their monitor is part of the international AMP mesh only, and therefore not in the HPC mesh, so it is not entirely obvious how this should be done, as every other IPv6 monitor is part of the HPC mesh. A quick survey of the AMP monitors that have IPv6 addresses autoconfigured but which are not part of the HPCv6 mesh found the following sites:  amp-hean, amp-unch-ch, amp-vt, and amp-wisn. [Matthew Luckie]

Sent Scamper-0.9.4b2 to Wim Biemolt at surfnet.nl and to Ronald van der Pol at NLnet.nl, which will give me views of the IPv6 Internet from machines in the European Union. Awaiting output from them. [Matthew Luckie]

Sent Scamper and the address list to Henk Uijterwaal (RIPE-NCC) and Tim Chown to see if they'd run if from their networks for me. I have not heard back at this time. [Matthew Luckie]

Work continued towards the deployment of an additional OC192MON (located at SDSC). The new Linux operating system and additional applications sent by Jörg were installed on the machine. A number of significant issues, including some SDSC equipment problems, had to be resolved before the OC192 machine could get going. We are making attempts to connect the OC192MON to a data source, artificial or real, with no luck in this reporting period. It appears that this will be more difficult than expected; testing continues. The plan is to take measurements from the TeraGrid Juniper T640 router. We are working with a very busy SDSC NetOps throughout this process. Also working with Stephen Donnelly (of Endace) to make sure all the Dag tool options and statistics are proper. [Jim Hale, Jörg Micheel]

~ New deployments and updates on new (and developing) strategically important measurement sites

A request for an AMP monitor was received from NaukaNet this period. The monitor point on the NaukaNet network is Northwestern University in Chicago, Ill. The machine was prepared and shipped, and is expected to be installed and started early next period. [Bud Hale]

The machine for the Mexico site, amp-cudi, (Corporacion Universitaria para el Desarrollo de Internet) is ready for shipment. The monitor will be shipped as soon as a valid shipping address is received. [Bud Hale, Jim Hale]

Two newly installed AMP monitors were brought online this period. They are amp-rnpb (RNPnet GigaPop in Brazil) and amp-ampath-mia (AMPATH GigaPop in Miami, Fla.). For the AMPATH site however, the system_manager system failed to distribute the new HPC.list so that it can be addressed by all other HPC sites; this is being investigated. Later in the period, amp-rnpb (RNPNet, Brazil) experienced a brief outage. Contacted the site people who acted quickly and restored the machine with a power cycle. The machine logs show no cause for the crash. Will continue to monitor it. [Bud Hale]

The PMA monitors for the AMPATH connection at Florida International University (new) and the University of Florida at Gainesville (rebuild) were built, configured, and shipped; we expect that they will in be in production soon. [Jim Hale, Bud Hale]

The PMA machine at NCSA was disconnected when the OC3 connection went away. There is a desire to now connect it to an OC12 connection. That discussion was carried to Jörg who has made a plan to create an OC12 Dag3.2 monitor at that site. [Bud Hale]

Discussions with Jörg developed a plan to get the OC12 monitor returned from AdvancedNet to the Internet2 connection in Ann Arbor, Michigan. Working with Matt Zekauskas to complete that move (received the go-ahead from him). [Bud Hale]

Several of the international mesh sites needed attention this period. We expect to get the sites back collecting and transferring data soon. [Bud Hale]

Working with the site technician at the newly installed amp-surf (SURFnet GigaPop in Amsterdam, Holland) to get port 22 unblocked at the router. This is needed to enable the start up of the machine. I was able to get the block removed and after some hiccups of our system_manager system, it started collecting and transferring data. Also, it will be added to the IPv6 list to do IPv6 measurements. [Bud Hale]

Analysis and coordination with the amp-unin (UNINet GigaPop in Thailand) site people revealed that the monitor there is apparently suffering a disk failure. Discussion with site people led to a decision to ship a replacement machine. That machine is in shipment now. [Bud Hale]

The other international mesh machine still down is amp-hutf (Helsinki Univ. of Technology in Finland). It appears network operations there is operating with a skeleton crew and it has been difficult to contact site people. However, I will be following up on this site. The problem appears to be with the network configuration. [Bud Hale]

After some delay, the amp-gpnp (Great Plains Network GigaPop) site completed their planning and announced that the AMP monitor there will be installed soon. [Bud Hale]

Outreach, application support, utilization improvement, and documentation activities:

Worked with Bill Cleveland at Bell Labs to help him with his measurements at Global Crossing. [Jörg Micheel]

Have been in touch with Joe Abley and Paul Vixie regarding support of measurements for the new F.root mirrors and DNS server configs that are being built around the world, and how we can help. [Jörg Micheel]

Dialogs with Christophe Diot at Intel Cambridge UK on his program to donate PC equipment for passive measurements, and email exchanges with him regarding upcoming visits and the installation of the PMA monitor there. [Jörg Micheel]

At NZNOG 2003 made some new contacts with a few ISP people interested in measurement. [Tony McGregor]

Met up with Evi Nemeth (University of Colorado at Boulder) at NZNOG 2003. She came in from Portugal, leaving her boat on the Azores by itself for a week, and we had several interesting conversations about various research topics. [Jörg Micheel]

Received a message from Matt Zekauskas on the move of the ADV PMA monitor to Ann Arbor. We are discussing a possible visit of mine attached to the SALS workshop in August. Some email exchanges him regarding upcoming visits and the installation of the PMA monitor there. [Jörg Micheel]

Met with Dennis Su and Simon Travaglia, his manager (ITS - the computer services department at the University of Waikato). They greeted me with the bad news that my request for inserting the Dag3.5E into the university access link had been declined, just too much risk even if the card would be reliable, just too much equipment to watch in the critical path. Having said that, we can still have the "old" measurement point at the hub back, which the university is also using to collect traffic statistics for accounting purposes (also known as the Mr Bean and Mrs Bean computers). I've been asked to reinitiate the old agreement that had been in place for the NZIX data sets, which I did, and it appears it was met with positive nodding. Thus spent the rest of the week to figure how to make use of the 3.5Es with tap in such a scenario, eventually deciding to go without board changes and using an old 3.2E as a "dongle" to terminate the connection between the hub and the Dag collector. [Jörg Micheel]

Helped Nevil Brownlee debug some work on the MFN OC48MON (San Jose, CA), without much luck, we could not make it work. Hope is we can get those resolved by replacing the outdated Dag4.1 cards by the next generation 4.2 cards. Discussed network instrumentation in Auckland (reviving our measurement point there) and other issues related to Dag software and API support. [Jörg Micheel]

Sent a copy of my slides from the Relarn meeting to Yuri Demchenko, as requested by Greg Cole (NaukaNet).   Slides   [Tony McGregor]

Spoke with Greg Monaco by phone about the Nauka-Net press release. [Ronn Ritke]

Meeting with Kevin Walsh to review NLANR/MNA measurement plans for the SC2003 Bandwidth Challenge. [Ronn Ritke]

Meeting with Peter Arzberger about PRAGMA funding and future plans. Created a paragraph on the international mesh in the PRAGMA region. [Ronn Ritke]

Spoke to John Hicks about return of the OC48MONs from IPLS, which have now arrived at SDSC. [Jörg Micheel]

Briefly communicated with Jon Crowcroft in Cambridge regarding the NREDS workshop, Jon was very supportive but could not help in this case. [Jörg Micheel]

Brief conversation with Ian Pratt regarding PAM2004 (will talk more in the future). Have reserved the pam2004.org domain name. [Jörg Micheel]

Netforum 2003 conference (NZNOG 2003)
Mt Wellington, Auckland, July 9-12, 2003 (NZNOG, the New Zealand Network Operators, NZ equivalent of NANOG)

  • Gave a 60 minute presentation about the state of the art in passive Internet measurements. Attended the other sessions and was positively impressed by the quality of the presentations. Had the impression that the issues raised were much closer to people's needs on a day-to-day basis. [Jörg Micheel]
  • Presented on the simulation work we did a year or two back which created more interest than I expected. [Tony McGregor]
  • Gave a talk about IPMP and the IPv6 mapping project and attended. Created a large poster with the IPv6 map for NZNOG 2003. Produced a map consisting of 56 pages of A4 with flags of the major points on each AS. It attracted some attention, Joe Abley who is an ex-pat NZer now working at ISI offered a box to run Scamper from. [Matthew Luckie]

Attended a seminar given by a visitor from Cardiff University about network metrics and QoS for grid computing which raised some interesting issues. [Tony McGregor]

Prepared and delivered a "programming Java sockets" lecture for a 3rd year networking course. That went well so I am happy. [Matthew Luckie]

The Security at Line Speeds (SALS in August) workshop is nearly confirmed, I would use the opportunity for other visits on the East and West Coast, as fits. Have also registered for SIGCOMM 2003 in Karlsruhe. [Jörg Micheel]

An extensive Web project involving the development of several new pages and updating of existing ones was begun. This involves the creation of several new pages related to our collaborations and data users. The design and potential layouts of these Citings pages were developed earlier in the year; refinement of these has taken place now that the execution of these pages has begun. Information gathering and compilation work began for the collaborations, citings (where our work is referenced) and "numbers of note" (tallies and percentages of papers referencing our work) Web pages. Preliminary observations are that it appears that several papers of ours are being used frequently and that many other researchers use our data and tools in their work. [Maureen Curran, Lana Kennedy]

Fixed a long standing problem with the AMP Web status page that Bud uses. Bud and I fixed up a problem with the STAR TAP data directory.  [Tony McGregor, Bud Hale]

The AMP and PMA posters were updated; will be displayed at SC2003 in Phoenix and other meetings as appropriate. [Ronn Ritke, Mike Gannis]

Ongoing measurement and analysis, networked data, and infrastructure support:

We had a paper accepted by the AUUG2003 Conference (Sydney, Australia, September 3rd-5th), on writing real-time applications for passive measurements. Final paper has been submitted. [Jörg Micheel]

Started mirroring the Leipzig-I data set which Klaus collected in November of last year and published this May, nearly complete. [Jörg Micheel]

Klaus Mochalski, Ph.D. student at the University of Leipzig, with whom we have had a number of collaborative projects, is taking a three month sabbatical with us this summer at SDSC. He will be conducting PMA research while here, working closely with Jörg . Ronn worked on arrangements and preparations for his arrival with several people at SDSC.

The archive process performed on the AMP data collection servers (AMP and VOLT) moves the active measurement data older that six months to the HPSS (SDSC's high performance storage system). When this method was started some time ago, this process would take the data disk fill status down to approximately 68 percent. However the active measurement infrastructure has continued to grow. The archive process just run took the disk fill down to the low seventies percentage (approximately 73 to 74 percent). This indicates that with the growth of the AMP meshes, the accumulation of data over a six month period has grown considerably. Another factor to note is that the archiving process is becoming necessary on an interval of every four to six weeks. This may be something to consider with continued development of the AMP network. [Bud Hale]

Worked very hard to complete the migration of the old to the new pma.nlanr.net server system. Used the HPSS extensively, to recover some files that had been previously lost, and improved robustness. Also cleared the HPSS from some 2000 files which had accumulated over the last 18 months as regular nightly disk dumps. Configured the various services on the new pma server, sendmail, ftp, http. Installed and configured mailman to deal with the PMA related mailing lists. Finally did another rsync from the /traces directory on the old machine to the new, which puts the new subscriber system in place for the Trace User Community. [Jörg Micheel]

As the names and IP addresses of the two machines were swapped and rebooted, the new pma server became live; fixed a few more details and let the PMA monitors continue their job of regular downloading the 8x90 second trace files. At this stage started the process of paging in all the long trace files from the HPSS which had been lost on the old machine due to failure of the RAID0 array. Final polishing took place on the server (including copying the old http and ftp log files, and completion of mirroring the long trace files from the HPSS). The current data array is quite full, it appears we will have to go for the full 1TByte version as soon as we are able. The pma.nlanr.net data seems to have been settled, with users again able to retrieve the daily and long traces. [Jörg Micheel]

The new pma.nlanr.net data server suddenly showed some disk problems, initially only during backup runs, now also in server logs. Began testing and investigation. [Jim Hale, Jörg Micheel]

Per Jörg 's request for additional disk hard drive space for trace collection, I added an additional 40 GB IDE disk to the new PMA Server. An additional four 120 GB disks are still in the queue to be installed as soon as possible. [Jim Hale]

As previously discussed, computer component obsolescence is an issue with which we are continuously faced. The component that becomes obsolete and out of manufacture most is the system board (mother board). It has become necessary again to test and upgrade to new system boards as the current board (GigaByte 6VML board) will go out of manufacture and is becoming unavailable. [Bud Hale]

AMP and PMA site outages continued to be at low levels. [Bud Hale]

Lana Kennedy, currently an EE major at UCSD, began work with us as the Student Writer; she will work closely with Maureen on several important projects.

- 30 -


Back to the Top       last modified: August 15, 2003       Comments, questions, and suggestions are welcome:   Feedback .
acknowledgment