NLANR/MNA logo

      Summary of Research Activities - Sept. 2003


Development and distribution of measurement and analysis tools:

Good progress continues on the development of new metrics and real-time analysis for PMA.

  • Overall, I am very pleased that we have managed to cover all the topics we had planned for Chris' visit to Hamilton. He learned how to configure, access and control the DAG cards via the API, how to interpret the record formats, how to parse the packet data information, ARP, IP, UDP, TCP. He finished an application which delivers one second statistics on a variety of parameters that we had determined earlier as important. He managed to feed the stream of data into a small RRD database, and learned how to plot such data as a GIF file for web access. These are all the steps needed from the lowest level all the way to the user interface for any serious real-time network analysis application. Well done, Chris.  [Jörg Micheel]
  • Working with Koryn Grant, Endace Performance Test Engineer, tested Chris' Internet performance engine on a Gigabit Ethernet DAG card at Endace Knox Street. We used the SmartBits GigE tester to generate worst case traffic with smallest size packets at 64 Bytes each, approximately 1.4 million packets/sec. Initially we ran into an issue which is still present in the current API which will make the application poll 100% of the time and hence use 100% CPU time. This is a known issue currently being addressed. We put a patch in manually to allow for at least 8KByte of data to arrive before processing packets and subsequently the CPU load dropped to 2-3%. This is on a Dell 2650 (P4 Xeon at 1.8GHz). We are safe to claim this application will perform at 10 Gigabits/sec in the future. A small hiccup occurred with the numbers being wrong all of the time. The average packet size would show north of 1200 bytes per packet and the packet and byte rates would never match what the SmartBits generated. We will have to retest and see if those fixes would render the results more legible.  [Jörg Micheel, Chris Gross]
  • We tried to figure out why Chris' tools yield so seemingly distorted packet length numbers on the SDSC Abilene link we are monitoring. We captured some short traces with different driver versions and did the analysis with different tools (dagtools and Chris' real-time tool) to make sure we are not looking at some artifact caused by a bug in his program. As it turned out, the problem really seems to lie in the network though we could not yet come up with a satisfying explanation of what we are seeing (about 50% MTU- sized packets in one direction and nearly none--between 0.4 and 2%--in the other direction). Chris is going to put some more effort into this analysis and I will assist him as need be.  [Klaus Mochalski, Chris Gross, Jim Hale]
  • Worked on and tested the merging code. I also refined some more of my real-time code. I am starting to add more and more tcp flow capability to the program as well. I have been trying to use the time stamps on the Dag records to tell second intervals for more accurate time keeping, but the time stamps are not quite right on sda. [Chris Gross]
  • Learning and playing around with rrdtools; produced a sample graph and sent it to Jörg. Learning this (rrdtool) is advancing the visualization part of the real-time infrastructure some. I also made some updates to a few PERL scripts I have that take care of the rrdtool functionality. [Chris Gross]
  • For the TCP flow engine, I wrote the code dealing with different trace formats and recycled some old pieces of code from dagtools/dagflow. Apart from the coding I did some experiments with tcptrace (a TCP flow engine with graphical output for various statistics) to get an idea of what I am looking for in traces. It provides one particular statistic about outstanding (meaning unacknowledged) packets as an approximation for the sender window. Though there is no way for tcptrace to detect simultaneous congestion events in TCP connections, this output really helps in getting an idea about the sending behavior of an application. This is important when it comes to identifying simultaneous congestions. The challenge is to discern sudden drops in supply (what we are looking for) and drops in demand. By looking at more tcptrace graphics of the Leipzig-I trace data I gained some confidence and motivation that this approach is going to work. I could visually identify a couple of occurrences of simultaneous congestion events. Looking at the tcptrace source code also provided some inspiration for my flow engine. Unfortunately this also made me discard some of the code I had already written.  [Klaus Mochalski]
  • While in San Diego, spent quite some time working with Klaus on the various ideas to look at link level traffic characterization.  [Jörg Micheel, Klaus Mochalski]

Progress continues on the reimplementation of AMP and development of a new testing architecture.

  • Worked on the tracroute transfer format (mostly done), including updating the test code for it. Rounded off the current revision of the amplet code, updated the CVS tree, did a series of test builds with all the different options and documented the process of adding a new test (including actually adding one to make sure I have the process right). Then CVS check out a clean copy and sent it to Lana for her to look at adding iperf to it. [Tony McGregor]
  • Experimented with Iperf a bit more. Todd gave me a machine that I could install FreeBSD on, so that I could operate an actual window system and not have to worry about using SecureCRT to do my Iperf project, which would be difficult and time-consuming. Set up, configured, did security measures. Tried connecting computers together with Iperf and making a mini-network to see exactly how the program functions. After Ben explained to me about the self-looping IP, it was easy to make the two terminals connect to each other and see the output of the run. Tony sent me the amplet code, will be compiling it shortly. [Lana Kennedy]

Spent quite a bit of time writing the cross-traffic-from-trace generator/application, which is beginning to look quite good. This is part of the IPMP Bandwidth Estimation work.  [Matthew Luckie]

Contacted Pekka Savola, whose internet-draft prompted my work on finding possible examples of /127 routing prefixes, as his internet-draft is now RFC 3627. He gave me a few pointers to other tricks I could try to find routing prefix lengths. Might be an interesting small project to try with Scamper. Also fixed some Mac OSX things on Scamper. [Matthew Luckie]

Progress on the reimplementation of the Cichlid 3-D Visualization System continued.  

  • Wrote more code, including memory management for the byte stream, a more advanced FrameSocket abstraction, and a server setup using the new threads. I worked on the algorithm that handles the animation and on a class that abstracts playback in a more modular fashion. Reviewed and debugged the code that manages storage and submission of datasets into the data stream, as well as optimizing and debugging other parts of the code.  [Ben Reesman]
  • Worked on a number of technical issues. I wrote parameterized types to generalize the idea of animating dataset using C++ templates. I spent an enormous amount of time trying to force together a design based on templates and inheritance, but it looks like g++'s level of standards compliance is not quite up to the task, so I eventually caved in and used a composition approach. Also shuffled Cichlid's drawing library into a more C++ friendly form, and added the wedge drawing stuff from the pie charts so that it is more of a cohesive whole. [Ben Reesman]

Activities extending the Network Analysis Infrastructure (NAI) in support of new and developing HPC needs:

~ New (and developing) strategically important measurement sites

Talked to Greg Cole about monitoring to (not a full mesh) about 30 or 40 Russian sites and a stepping stone to a Russian AMP. [Tony McGregor]

The new passive monitor shipped to Internet2 in Ann Arbor is on site and is expected to be connected soon. A report from Matt Zekauskas indicates he is back in Ann Arbor and will be pursuing the installation on the Internet2 connection. However, he will need to do some machine reconfiguring due to the fact that the IP addresses on the machine were reassigned. [Bud Hale]

Site nai-p-fla is now starting to mount the new passive monitor at that site. However they are asking for some additional mounting hardware. I will be shipping the mounting hardware as soon as possible. [Bud Hale]

Continued to work with the AMPATH (nai-p-fiu) site to resolve the optical connection problem. Eric Johnson, the network engineer there, is now working with the technician Ernie Rubi. Eric is trying to gain the use of an optical power meter locally in order to measure the signal levels at the signal taps. If he is not successful I will ship a power meter from here. [Bud Hale, Jim Hale]

Continuing work on the OC192 monitor:

  • Testing/debugging of the OC192MON took place during Jörg's visit to SDSC. Jim arranged access to the Adtech equipment which was integral in the testing. Good progress was made. Confirmed that it is a problem between the Dell and the cards. Not happy though, since we still do not have a configuration which works for a pair of cards in a system, which is what we need for the Abilene backbone instrumentation.  [Jörg Micheel, Jim Hale]
  • After Jörg's visit installed the Adtech interface on both a laptop and desktop machine, enabling access to the test equipment at our need. Maintained the Adtech 10GB signal generation until the network reassignment was complete. [Jim Hale]
  • Endace also has a new set of Xilinx images ready to try on the OC192c cards, which we should use to see if the problems we have been experiencing on the San Diego OC192MON will go away. [Jörg Micheel]
  • Some not so good news resulting from continued OC192MON testing. It appears quite firm now that the boards need to be reconditioned by human attention. Knox Street is still working out the exact patch needed. The good news is that the PCI-X 133 MHz upgrade could be performed at the same time. An open question is whether to recall the boards to New Zealand or send someone to the US for the upgrade. I am expecting decisions to that effect shortly. The availability of the boards is now gating the IPLS and SC2003 installations.  [Jörg Micheel]

Outreach, application support, utilization improvement, and documentation activities:

Visited Canterbury and Massey Universities this week and talked about network research in the hope of finding more collaborators. Presented material about what NLANR/MNA is currently doing.  [Tony McGregor]

Attended a talk about NeTraMet++ by Nevil Brownlee.  [Tony McGregor]

Sent Joe Abley the source to Scamper, as he asked for it at NZNOG. I also had a request from someone at the American University of Rome for the source of Scamper.  [Matthew Luckie]

Met research colleagues at the University of Catalonia in Barcelona, and the discussion turned out to be extremely interesting. I am confident we have found another strong collaborator for the PMA project here. I am in discussion with Pere and Josep to see how we can strengthen the relationship in a variety of ways.  [Jörg Micheel]

Spoke with Greg Cole about possible measurement infrastructures in Russia and China. Additional talks will be held about participation in the upcoming Gloriad proposal. [Ronn Ritke]

We received good support by Stephen Donnelly (Endace) on the OC192MON testing from Hamilton.  [Jörg Micheel]

Am working with Peter Arzberger of PRAGMA about the upcoming CUDI meeting in Mexico in early October. He will show a couple of NLANR/MNA slides for us when presenting there. [Ronn Ritke]

Helped Ian Pratt at the University of Cambridge Computer Lab with preparations for the announcement of PAM2004. The Web page is now in place, so are the necessary email lists. We are using support by Endace for hosting some of the DNS and email related services. [Jörg Micheel]

Jay Dombrowski expressed a desire for his department (SDSC ENS) to meet with Jörg during his visit to work out the needs of the PMA relative to SDSC, unfortunately this invitation came after Jörg's departure.

Started to read the OGSI global grid network measurement documents.  [Tony McGregor]

Article in progress on our international deployments and collaborations, for publication on the SDSC Web site (and our own). The focus will be on technology transfer to groups in Korea, Australia and Taiwan - while mentioning the hosting of NLANR/MNA monitors in other countries. Dave Hart (NSF PR) will contact Tom Greene about possible PR on the article from NSF. [Mike Gannis, Ronn Ritke, Maureen Curran (rev.)]

Work continued on the redesign of the PMA Web pages, including the development of a new directory infrastructure. Worked out an html infrastructure problem with the page template(s) which occurred only on certain browsers. Also, discussions were held regarding the AMP Web pages (and template style), and the next phase of Data Users/Citings pages (in particular the design of the Collaborations in-depth page). [Maureen Curran, Klaus Mochalski, Tony McGregor, Lana Kennedy]

There was discussion of the creation of Web pages to provide some tutoring in the connection of passive monitors. These pages would provide online diagrams similar to the diagrams I include with monitors when shipping, but would provide much more detail and much more in various options and variations that a technician may encounter. [Bud Hale, Maureen Curran]

The brochure that PRAGMA is developing on their project will include a number of references to our work and possibly one or two images. We are providing them with text and images. The problem with the international sites' coordinates for the AMP maps (input manually by Ben a month or so ago, but now missing) was resolved (bug found and fixed by Todd, who had written the original code way back when). A JPG of the new AMP site map could then be created for PRAGMA.  [Maureen Curran, Ben Reesman (images)]

For use in current slide presentations (as well as other venues) wrote an overview of AMP IPv6, a brief outline/overview of IPMP, and brief paragraphs on each of the three approaches to real-time measurements and analysis for PMA that we are doing. [Maureen Curran]

Ongoing measurement and analysis, networked data, and infrastructure support:

Even though the newly discovered vulnerability on OpenSSH did not represent a threat to AMPlets at remote AMP sites, Tony decided to move ahead with the update to the 3.7 version to be sure that remote site people do not perceive the monitors as vulnerable. Matthew compiled the sshd binaries on amp-sdsc and Tony performed two rounds of ssh fixes. A timing issue caused sshd (daemon) to fail to start on about 18 sites. Bud worked with the site admins for each of the problem monitors, as only a reboot was needed to restart the sshd. Data collection and transfer were not affected by this.  [Bud Hale, Tony McGregor, Matthew Luckie]

HPSS failures caused stoppages to the extent that the archiving from VOLT had to be abandoned. The HPSS problem was corrected by SDSC, but it left the data disk fill on VOLT in the mid-eighty percent level. During preparations to restart the archiving on VOLT, the AMP server experienced a disk crash in the concatenated disk array. A decision was made to change the disk concatenation scheme of the AMP server to individual data disks as is the case on the VOLT server. This was accomplished; the failed disk was located and replaced. Then the AMP server was restored from the VOLT disk array. During the restoration process the am_slave process (which collects the data from the remote site AMPlets) is down on both servers. In the future the disk configuration on both AMP and VOLT will be maintained identically in order to facilitate restores between the two servers (but am_slave will still stop during the process). [Bud Hale]

The AMP replacement disk had been tested using the Hitachi/IBM disk fitness test. This test has been very valuable in finding incipient failures in the past. However the newly installed disk failed on Monday of the following week. It was replaced and the data was restored. It is worth noting here that the data restore proceeded quite rapidly compared to the time that would have been needed if the concatenated array scheme had been retained. However, several days later, I discovered the second replacement had failed. Both of these disks had been removed from the server taken out of service, however, both had tested as good with the disk fitness test. We have again recovered from the failure and the replacement disk is a new disk, never before used. The failed disks will be returned on warranty.  [Bud Hale]

Assembled the disk chassis for the addition of the four new 120 GB disks to the PMA server. The assembly is a little unorthodox, but will work for now. This will greatly improve the capacity of the machine. Also, the new disk installed on the PMA server last period is working fine. Worked with Klaus and Chris Gross on nai-p-sda regarding some issues Chris was having. It worked out to be a great opportunity. Just being involved in the discussion explained a lot of details I had questions on.  [Jim Hale]

Existing measurement sites maintenance and troubleshooting:

Site outages remain low. However outages have increased to some degree. Some sites remain in a disconnected state. Some of this has been due to skeleton crews during the summer months. However, some sites are reporting disconnects on purpose to guard some subnets from attacks. Some site crews are reporting the deluge of viruses and worms has kept them too busy to lend a hand on NLANR boxes. That issue has improved recently. [Bud Hale]

A total of 22 remote sites in the NAI infrastructure received attention during this period: 10 have been resolved and the monitors are again collecting data. 12 were still being investigated, or pending site action, at the end of the period. (Outages are considered "open" until the monitor is again collecting data.)

AMP -  16 problem sites:  9 resolved, 7 open
PMA -   6 problem sites:  1 resolved, 5 open

~ AMP machines

Sites amp-mit (Mass. Inst. of Tech.) and amp-ncsu (North Carolina State U.) were out due to site network and machine room reworking. amp-ncsu is back online, while amp-mit is still disconnected. amp-ncsu is on a new IP and all preparations have been completed to re-start it on the new subnet. Waiting to confer with Tony at this time to coordinate the use of the system manager to insure that the sshd update will not cause some disconnects as happened previously. [Bud Hale]

amp-uic (U. of Ill., Chicago) installed a new power supply in the AMP monitor there but it proved to have additional failures. A replacement machine was prepared and sent, as well as another replacement to amp-ufl (U. of Florida, Gainesville). Both have monitors on site to replace the failed ones, but they have not been installed. Also reported was that amp-ufl is under equipment rearrangement. That site is still offline but promised to be back online soon. [Bud Hale]

Sites amp-jhu (John Hopkins U.) and amp-umbc (U. of Md., Baltimore County) also had outages. Both were investigated. amp-jhu appeared to have a network problem which was resolved by rebooting, and the site is back online. After a reboot, amp-umbc (U. of Maryland, Baltimore County) remained blocked, but eventually resolved the network issue that had that site down and is back online. [Bud Hale]

amp-hutf (Helsinki U. of Tech, Finland) is back up this week after an outage. Site technicians installed the replacement disk and the system manager system re-initiated the machine. [Bud Hale]

A block was applied to the subnet to which the amp-sdsu (San Diego State U.) is connected. That net was highly infected with new viruses and an extensive clean-up is in progress. The site technician at SDSU promises to have the issues resolved and be back online shortly. [Bud Hale]

Worked with the NaukaNet site to get a router corrected to bring the amp-naukanetnwu back online. [Bud Hale]

Site amp-unm (U. of New Mexico) had an outage which was resolved by correction of a switch. It had a short outage due to a network block that was removed. [Bud Hale]

The amp-cudi (Univ. Corp. for Internet Dev. in Mexico) showed an outage on the data status page; However, it was collecting and transferring properly. Looking into the cause of the error on the data status page. [Bud Hale]

Site amp-odu (Old Dominion U.) reports they have been swamped and unable to correct problems with the AMP box there. It remains out due to shutdown to guard against worms and viruses. Also, Old Dominion U. suffered much damage by the hurricane Isabel. The IT department has a skeleton crew while restoration is going on. [Bud Hale]

Site amp-jpl (Jet Propulsion Lab) needed a shutdown last weekend for UPS change over. However, it appears that it was not totally successful. Claudia De Luna, the site technician there, asked for another shutdown this weekend. She reported she fully expects the power rework to be completed soon. [Bud Hale]

The Holland site, amp-surf (SURFNet) had intermittent outages which were corrected by machine reboots. Am attempting to learn why the machine is prone to hang-up and needs rebooting.

Site amp-mem (U. of Memphis) blocked the AMP monitor but I was able to get the blockage removed in a short time. [Bud Hale]

amp-dartmouth (Dartmouth U) had an outage due to a switch problem. [Bud Hale]

~ PMA machines  

The OC3mon at Old Dominion U. is still disconnected for security issues. The site technician promised to have those issues resolved by next week. It may be necessary to move the Ethernet connection to another network however. [Bud Hale]

I am working with the Kisti site, the AMPATH site and the MAX site to make or correct connections to the on-site PMA monitors. [Bud Hale]

One other site, pma-p-mem (U. of Memphis) was blocked temporarily for security. Discussion with the site technician got the block removed. [Bud Hale]

Chris Gross is back from the summer in New Zealand, which included working directly with the Dag card designers. We have already learned much from him since his return. [Bud Hale]

- 30 -

Home

AMP:  Active
Measurements


PMA:  Passive
Measurements


Citings: Data Users

Publications & Resources

Meet the Team

Feedback

Back to the Top       last modified: 10/16/03       Comments, questions, and suggestions are welcome:   Feedback .
acknowledgment