Summary of Research Activities - July 2004
Passive Measurement and Analysis (PMA) Project~ Continuing development of new metrics and real-time analysis for PMA Klaus Mochalski from University of Leipzig, arrived for a two month visit to work with us on real-time passive measurements, in particular a library of routines we intend to reuse in a number of different applicaitons. We are also looking into extending the work we started last year on packet delays and intend to gather some data from the IPLS instrumentation for this purpose. Before Klaus's arrival, Jörg had a long email dialog with him and Klaus Degner (also in Leipzig) regarding tool and library works. ~ New (and developing) strategically important measurements and deployments At the Joint Techs Meeting, Jörg had a very long and fruitful technical talk with Bill Owens from NYSERnet, in which they discussed our plans for putting an OC3MON back into Buffalo, NY, and the GIGEMON into the access link to Abilene. Jörg made arrangements to have an OC12 POS to Abilene installed for (NYSERNet Buffalo -CHINng) as well as a GigE Monitor to Abilene (NYSERNet NYC - NYCMng). We are awaiting IP addresses, as well as physical addresses for shipping the equipment. We expect this to be completed shortly. We have finished the description of the installation process at the Abilene IPLS location and have shared it with a number of people for proof reading and comments. The PMA team discussed and began preparations for the upgrade of the IPLS installation. The sole source justification for the router clamp equipment was completed (this will enable the purchase). Also at the Joint Techs Meeting, Jörg had several discussions with Debbie Montano of National LambdaRail (NLR) and her colleagues from various organizations, including CENIC and Indiana University, regarding the optical network infrastructures and the actual implementation of NLR to date. We have been working on questions of how to passively instrument emerging DWDM-based optical networks. At the JET meeting, Jörg approached Bill Johnston from ESNET to propose using some of the spare NLANR OC48MONs to instrument the ESnet/Abilene peering points. His response sounded positive, as did those from Matt Z and Guy Almes, with whom the idea was shared in a followup. ~ Upgrades, troubleshooting, and maintenance on the PMA servers and infrastructure We are close to resolving the IPMI remote control issue with the SuperMicro servers after an email exchange with the company. It appears as if LAN1 in the BIOS is actually eth0 in Linux, which explains the errors that we have been seeing. We ran some experiments on the IPMI on the nai-p-sda machine, and resolved the IPMI issue with that monitor. We now have a prototype that will work for other SuperMicro systems, which will be used for the Purdue monitor and the NYSERNet GigE monitor. The new pma server was made operational. As requested by Victor Hazelwood, the SDSC security chief, we have been in contact with local SDSC folks to resolve the problem of secure access to the HPSS from the new pma.nlanr.net machine, which is FreeBSD AMD64 for which no HSI client is presently available. After discussions with Mike Gleicher, the IBM consultant assigned to SDSC (who is the developer of the HSI HPSS interface), he volunteered to compile the code on the new server to create support for the HSI on the AMD64 platform. We are waiting until HPSS resolves the connection issue to finish testing the new executable file. We made arrangements to retain the access to our backup infrastructure before we transition fully and until the HSI issue is fully resolved. SDSC asked us to relocate the NLANR hosts at SDSC to a different address space. This included IP address migration for the PMA server SDA monitor, and all the clients in the field. We have been sorting and testing SCSI drives made surplus by various upgrades. We have a large quantity of 9 GB SCSI disks that we do not expect will have a future application. Adaptec controllers have not become surplus. Support and troubleshooting, existing PMA measurement sites: A total of three remote sites in the PMA infrastructure received attention during this period. (Outages are considered "open" until the monitor is again collecting data.) Details follow, numbers include the SDA monitor discussed above. 3 problem sites: 1 resolved, 2 open - at period end. nai-p-amp ~ Brief snag with replacing the AMPATH (Florida International University) unit when the site tech was unavailable, rescheduled at which time the machine was installed. nai-p-pur ~ Further dialog with Scott Ballew at Purdue regarding the GIGEMON there. We are still blind in one direction and Scott has finally considered attacking the problem with a light meter. He has various ideas as to what the problem might be.
Active Measurement Project (AMP)~ Progress on the reimplementation of AMP and the development of a new testing architecture
Jeremy Kallstrom, the new AMP Software Engineer, began work with us mid month. In addition to all of the usual new job settling in activities, he looked over the source code on which he will be working. He also spent some time becoming familiar with FreeBSD, including a reinstall, and checked out the graphics library which was used in the code. He and Tony met several times when Tony was in town and discussed AMP in general as well as more details concerning the dgraf program on which he was to begin work. He was able to really delve into the code and get a much better idea of how it works after this. He restructured some of the program options and partially added a new graphing type, a histogram. ~ New (and developing) strategically important measurements and deployments With some assistance from us, AMP was deployed on the CRCnet (NZ wireless network). 10GigE AMP ~
The four remaining AMP monitors for the Internet2 backbone locations prepared and shipped. These were for the Internet2 sites at Houston, Kansas City, Seattle, and Los Angeles. At this time, Chris Small has a total of seven monitors to be installed on the Internet2 backbone. We communicated with Brian Court of CENIC. Brian is preparing to submit requests for seven monitors for the CENIC backbone nodes. He has assigned the task to Darrell Newcomb. We will be working with Darrell to implement this installation. Chen-en from the HPNC and TAWREN in Taiwan has plans to deploy 11 AMPs as part of a local mesh. ~ IPMP and IPMP cross-traffic-from-trace (ctft) generator It appeared that the IPMP code for the Linux kernel was causing the kernel to panic when using the new AMP software on crc.net. It is likely that the flaw lies in sanity checking the packets. Work continued through a resolution. ~ IPv6 and IPv6 Scamper A new set of file-writing routines were created that record much more detail on what Scamper saw. A part of the problem is how to deal with PMTU data. Briefly, Scamper remembers MTU data it has seen to specific points given a path and does not re-probe, so a file format is needed that reflects this. Also, Scamper will try to discover PMTU when a router in the path does not send a fragmentation required message, so the file format has to record what MTUs it tried and how often. This becomes very complex when paths are load balanced. Integrating the new file format and trace structure into Scamper involved untangling the old structure that kept both data and state into a structure with just the data; the state is kept elsewhere. Time was spent optimising the code a bit. Instead of storing all trace objects in a single linked list, they are stored into a splay tree (a self balancing tree that brings commonly referenced objects to the top of the tree) for fast searches, and stored into applicable probe/wait/pause/done queues for scheduling. Scamper can now solicit more than one response per hop, which allows Scamper to collect more RTT measurements to each hop. Scamper also collects the source address used to transmit probes towards a destination, and the TTL received in a response packet, which can be used to guess at the reverse path's length. All address list details were moved into a separate file, and some code added to allow for per-trace parameters, such as the number of attempts to be made for each hop, the trace method (udp/tcp/whatever) etc. The new Scamper_file API was imported for writing traces, and it was decided to use an API to record PMTU discovery alongside a traceroute. We also have a solution to the problem with Linux netlink returning the PMTU of the path rather than the MTU of the outgoing interface. A bug was found in Scamper's logic that caused it to send probes at about half the specified rate. By breaking down large source files into units and generally tidying up the code, it is now easier to navigate and reuse. ~ Upgrades, troubleshooting, and maintenance on the AMP servers and infrastructure The failed disk, disk7 in the AMP server disk array, which had to be replaced last month, failed again. The data restoration to the replaced disk was completed and data collection was restored on both AMP and VOLT. A disconcerting element of this latest disk failure on AMP is that the am_slave process for collecting data did not halt. That event normally triggers a message warning that am_slave down message. Investigation began on this anomaly. Shortly afterward however, the same disk failed in the AMP Server Disk array (for the second time in two weeks). It was decided that instead of installing another 36GB disk, a return to the 18GB disks might have a positive effect on this chronic problem. We had recently shifted to using 36GB drives for the added capacity. The HPSS ftp interface used for archiving the AMP data still proves to be very unreliable at times, causing failure of the archiving process, sometimes requiring multiple restarts. IP address migration ~ SDSC asked us to relocate the NLANR hosts at SDSC to a different address space. For AMP, this involved IP address migration for several servers and updating the 150, or so, active monitors. Additional network interfaces in the AMP, VOLT and calorie machines were installed to facilitate the network transition. The transition is at the stage that the machines are currently on both the .123 and .74 networks. At this time we believe the target to complete the transition is August 1. In discussions with Jay Dombrowsky and Lyle Carlson we will be able to keep one of the machines on both networks through October. Or until such time that all the AMP monitors shipped but not installed are connected. This will provide for connecting to those machines from the .74 network to edit the /etc/hosts.allow file to allow connection from the new.123 network. The network transition of AMP and VOLT caused some anomaly with the connectivity to the HPSS. It appeared to be a TCP-wrapper issue within the HPSS. We worked with the HPSS people on this. Following his attendance at the Joint Techs meeting, Tony visited SDSC. During his visit much progress was made in planning transitions, especially the transition to the new AMP and VOLT servers. In light of the recent data disk failures there is an even greater need to phase out the old AMP and VOLT servers. Tony discussed his plan to make use the new servers RAID array disks by mounting them as NFS file servers with the AMP team. Also, during his visit, Tony worked closely with Jeremy helping him understand the structure of the AMP network and reimplementation (his primary reason for visiting at this time). Support and troubleshooting, existing AMP measurement sites: A total of 10 remote sites in the AMP infrastructure received attention during this period. (Outages are considered "open" until the monitor is again collecting data.) Details follow. 10 problem sites: 7 resolved, 3 open - at period end. amp-emory (Emory U. in Atlanta, GA) ~ experiencing an outage--the site technician is working on the diagnosis; not collecting data. amp-fsu (Florida State U.) ~ moved to a new network outside their firewall, system manager distribution of new IP; collecting data. amp-gatech (Georgia-Tech) ~ short outage due to power connection problems; collecting data. amp-korea (KROENet2 in Korea) ~ short outage, was corrected quickly; collecting data. amp-ncsa-dca (NCSA Access Center in Arlington) ~ short outage which a power cycle reboot brought back; collecting data. amp-ou (Oklahoma U.) ~ short outage while switch port cables were re-arranged at the site; collecting data. amp-rnpb (RNPNet in Brazil) ~ worked with the site technician and concluded a replacement monitor is necessary. The replacement was prepared and shipped under new, more complicated UCSD foreign shipping procedures; not collecting data. amp-surf (SURFNet in Amsterdam) ~ lost network connectivity three times; corrected through the "out-of-band" connection. However, the third time the connection was lost completely. There are plans to replace that monitor as soon as possible, and we are working to restore out-of-band capability; not collecting data. amp-uah (U. of Alabama, Huntsville)~ taken offline briefly because it was interfering with a ping diagnostic; collecting data. (Site technicians requested a plan to move the AMP monitor outside of the existing firewall.) amp-yale (Yale University) ~ was suffering a strange anomaly such that it could not create echo requests to other hosts. We worked closely with site technician Jeremy George to diagnose the problem. A replacement monitor was needed so it was prepared, shipped, installed on site, initialized, and after some weeks of outage, is now collecting data. (We will analyze the failed machine upon its return to us.)
Outreach, Collaborations, and Activities supporting Network Research~ Papers, Presentations, and Conference/Meeting Participation July Joint Techs Meeting ~
Initial planning and arrangements for Jörg to give a presentation on 10GigE application software at the Allliance SC2004 booth (November). There was a brief dialog with Paul Love (I2) who was pleased to receive our proposals for presentations and tutorials at the next Joint Techs (scheduled for the end of January). ~ Collaborations and activities supporting network research Regarding I2's HOPI project and various updates on regional optical networks (RON) at the July Joint Techs Meeting, Jörg spoke with Luke Fowler (Indiana University) and Dave Reese and his colleages at CENIC for details on the CISCO 15808 and 15454 used in NLR, with aims of future projects. He also spoke with Nicolas Simar on possible pan-European passive measurement activities and encouraged him to promote joint US-EU measurements, such as instrumenting the GEANT/Abilene peering points. John Hicks, Chris Robb, and Caroline Carver, Indiana University ~ Taiwanese R&D network ~ Margaret Murray, CAIDA~ kc claffy, CAIDA~ Brian Court, CENIC~ Bob Aiken, CISCO~ Chris Bruja, CISCO~ CRCnet (NZ wireless network) ~ Jason Hurd, Endace~ Greg Cole, NCSA, GLORIAD~ Ivan Koga~ Paul Love~ Warren Matthews~ Eric Boyd, Pipes Project~ Bill Cleveland and Scott Ballew, Purdue~ Teri Simas~ Rick Summerhill~ University of Auckland~ Matt Zekauskas ~ Followed up with Matt on the status of the Surveyor active measurement project. [Ronn Ritke]
Documentation, Web work, Utilization Improvement, PublicationsWork began on a database to store and sort information regarding our Citings and Collaborations/Partners. This will serve as a backend content source for the related Web pages and for print needs as well. In-depth planning meetings were held regarding the approach. Originally, we explored a combination flat file, Perl, and m4 solution. After a conference call with Tony to get his input, we decided to change from the flat file and instead use PostgreSQL to manage all our information in one big database (a big improvement). We are focusing first on the Citings as that is less complex than the Collabs. We developed a list of parameters which each citing/citation will have (in order to create the several different Web pages). Lana began learning about PostgreSQL databases and Perl query scripts. One of the benefits that we expect from this project is that updating these pages (Citings and Collaborations) will become quite easy. The Reports index page was updated and the layout redesigned. http://moat.nlanr.net/Reports/. Minor link error in the AMP template was corrected and pages updated. This process served as a great reminder of how well the m4/Makefile system is working; am very happy with it. A PMA monitor installation Web page was begun. While intended for internal use, we think that it could be useful for PMA installations by facilities remotely hosting passive monitors. Created a "media pitch" briefing for use by Greg Lund (SDSC External Relations) on our AMP and PMA, 10-gigabit, and international activities. Jay Dombrowski (SDSC NetOps) asked that NLANR switch to a new subnet to allow for the expansion of additional SDSC assets. A strategy for the IP address migration was developed and implemented for the MNA servers (a new VLAN). There was quite a bit of manual configuration necessary. An updated Windows OS PC was built and configured for increased speed, for use with more complex graphics and layout programs.
Activities of each individual on the projectAMP team
PMA team
MNA, AMP, and PMA Outreach, Documentation, Web work
Management and Administrative Hans-Werner Braun, Ronn Ritke, Tony McGregor, Jörg Micheel ~ - 30 -
|