NLANR/MNA logo

 

Summary of Research Activities - July 2004


line

Passive Measurement and Analysis (PMA) Project

~ Continuing development of new metrics and real-time analysis for PMA

Klaus Mochalski from University of Leipzig, arrived for a two month visit to work with us on real-time passive measurements, in particular a library of routines we intend to reuse in a number of different applicaitons. We are also looking into extending the work we started last year on packet delays and intend to gather some data from the IPLS instrumentation for this purpose.

Before Klaus's arrival, Jörg had a long email dialog with him and Klaus Degner (also in Leipzig) regarding tool and library works.

~ New (and developing) strategically important measurements and deployments

At the Joint Techs Meeting, Jörg had a very long and fruitful technical talk with Bill Owens from NYSERnet, in which they discussed our plans for putting an OC3MON back into Buffalo, NY, and the GIGEMON into the access link to Abilene. Jörg made arrangements to have an OC12 POS to Abilene installed for (NYSERNet Buffalo -CHINng) as well as a GigE Monitor to Abilene (NYSERNet NYC - NYCMng). We are awaiting IP addresses, as well as physical addresses for shipping the equipment. We expect this to be completed shortly.  

We have finished the description of the installation process at the Abilene IPLS location and have shared it with a number of people for proof reading and comments. The PMA team discussed and began preparations for the upgrade of the IPLS installation. The sole source justification for the router clamp equipment was completed (this will enable the purchase).

Also at the Joint Techs Meeting, Jörg had several discussions with Debbie Montano of National LambdaRail (NLR) and her colleagues from various organizations, including CENIC and Indiana University, regarding the optical network infrastructures and the actual implementation of NLR to date. We have been working on questions of how to passively instrument emerging DWDM-based optical networks.

At the JET meeting, Jörg approached Bill Johnston from ESNET to propose using some of the spare NLANR OC48MONs to instrument the ESnet/Abilene peering points. His response sounded positive, as did those from Matt Z and Guy Almes, with whom the idea was shared in a followup.

~ Upgrades, troubleshooting, and maintenance on the PMA servers and infrastructure

We are close to resolving the IPMI remote control issue with the SuperMicro servers after an email exchange with the company. It appears as if LAN1 in the BIOS is actually eth0 in Linux, which explains the errors that we have been seeing. We ran some experiments on the IPMI on the nai-p-sda machine, and resolved the IPMI issue with that monitor. We now have a prototype that will work for other SuperMicro systems, which will be used for the Purdue monitor and the NYSERNet GigE monitor.  

The new pma server was made operational. As requested by Victor Hazelwood, the SDSC security chief, we have been in contact with local SDSC folks to resolve the problem of secure access to the HPSS from the new pma.nlanr.net machine, which is FreeBSD AMD64 for which no HSI client is presently available. After discussions with Mike Gleicher, the IBM consultant assigned to SDSC (who is the developer of the HSI HPSS interface), he volunteered to compile the code on the new server to create support for the HSI on the AMD64 platform. We are waiting until HPSS resolves the connection issue to finish testing the new executable file. We made arrangements to retain the access to our backup infrastructure before we transition fully and until the HSI issue is fully resolved.

SDSC asked us to relocate the NLANR hosts at SDSC to a different address space. This included IP address migration for the PMA server SDA monitor, and all the clients in the field.  

We have been sorting and testing SCSI drives made surplus by various upgrades. We have a large quantity of 9 GB SCSI disks that we do not expect will have a future application. Adaptec controllers have not become surplus.

Support and troubleshooting, existing PMA measurement sites:  

A total of three remote sites in the PMA infrastructure received attention during this period. (Outages are considered "open" until the monitor is again collecting data.) Details follow, numbers include the SDA monitor discussed above.  

3 problem sites:  1 resolved, 2 open - at period end.

nai-p-amp ~ Brief snag with replacing the AMPATH (Florida International University) unit when the site tech was unavailable, rescheduled at which time the machine was installed.

nai-p-pur ~ Further dialog with Scott Ballew at Purdue regarding the GIGEMON there.  We are still blind in one direction and Scott has finally considered attacking the problem with a light meter. He has various ideas as to what the problem might be.

Active Measurement Project (AMP)

~ Progress on the reimplementation of AMP and the development of a new testing architecture

  • Worked on the core of the weekly page and a structure that supports adding display modules for different test types.
  • Good progress was made on adding most of the front page and destination selection page.  
  • The core of a generic map system was done that accepts arbitrary maps (could be anything from the world to a campus).  It was tested with a couple of NZ maps and the existing AMP world map.
  • Worked also continued on the central code.
  • AMP was deployed on the CRCNet (included the AMPlet repackage).

Jeremy Kallstrom, the new AMP Software Engineer, began work with us mid month. In addition to all of the usual new job settling in activities, he looked over the source code on which he will be working. He also spent some time becoming familiar with FreeBSD, including a reinstall, and checked out the graphics library which was used in the code.

He and Tony met several times when Tony was in town and discussed AMP in general as well as more details concerning the dgraf program on which he was to begin work. He was able to really delve into the code and get a much better idea of how it works after this. He restructured some of the program options and partially added a new graphing type, a histogram.

~ New (and developing) strategically important measurements and deployments

With some assistance from us, AMP was deployed on the CRCnet (NZ wireless network).

10GigE AMP ~

  • Progress in the development of a 10GigE AMP continued, albeit a bit slowly. There appear to be only three manufacturers of 10GigE NICs: Intel, S2io, and Nortel. We are working and sharing experiences with long time collaborator Phil Dykstra. He updated us, as he is also finding it slow going. Phil's update:
    • "The short answer on S2io is they have not been able to deliver LR cards because of problems with the optics modules.  They hope to have that resolved by August.  Meanwhile Intel has brought out a new version of their 10GigE NIC.  The old one is no longer available, and the new one in LR is also not available yet!  SR is.   Hmm, I wonder if they are using the same LR optics module as S2io? ... I am in a pickle right now because I need to buy some LR NICs and there are none on the market! I guess you call that the bleeding edge."
  • The AMP team decided on evaluating the S2io 10GigE network interface cards. We are working with Dan Hatton from S2io on getting the 10GigE xFrame interface cards. S2io is willing to loan cards, so far for a 30 day trial period. We are in talks with them, including regarding price, and have explained our plans and the goals we wish to accomplish. We were told that all available cards are the Short Range 850nm versions. Tony believes these will be fine for testing. We hope to have at least a set of these cards soon.

The four remaining AMP monitors for the Internet2 backbone locations prepared and shipped. These were for the Internet2 sites at Houston, Kansas City, Seattle, and Los Angeles.  At this time, Chris Small has a total of seven monitors to be installed on the Internet2 backbone.

We communicated with Brian Court of CENIC. Brian is preparing to submit requests for seven monitors for the CENIC backbone nodes. He has assigned the task to Darrell Newcomb. We will be working with Darrell to implement this installation.

Chen-en from the HPNC and TAWREN in Taiwan has plans to deploy 11 AMPs as part of a local mesh.

~ IPMP and IPMP cross-traffic-from-trace (ctft) generator

It appeared that the IPMP code for the Linux kernel was causing the kernel to panic when using the new AMP software on crc.net.  It is likely that the flaw lies in sanity checking the packets. Work continued through a resolution.

~ IPv6 and IPv6 Scamper

A new set of file-writing routines were created that record much more detail on what Scamper saw.  A part of the problem is how to deal with PMTU data.  Briefly, Scamper remembers MTU data it has seen to specific points given a path and does not re-probe, so a file format is needed that reflects this.  Also, Scamper will try to discover PMTU when a router in the path does not send a fragmentation required message, so the file format has to record what MTUs it tried and how often. This becomes very complex when paths are load balanced.

Integrating the new file format and trace structure into Scamper involved untangling the old structure that kept both data and state into a structure with just the data; the state is kept elsewhere.  Time was spent optimising the code a bit.  Instead of storing all trace objects in a single linked list, they are stored into a splay tree (a self balancing tree that brings commonly referenced objects to the top of the tree) for fast searches, and stored into applicable probe/wait/pause/done queues for scheduling.

Scamper can now solicit more than one response per hop, which allows Scamper to collect more RTT measurements to each hop.  Scamper also collects the source address used to transmit probes towards a destination, and the TTL received in a response packet, which can be used to guess at the reverse path's length.  All address list details were moved into a separate file, and some code added to allow for per-trace parameters, such as the number of attempts to be made for each hop, the trace method (udp/tcp/whatever) etc.

The new Scamper_file API was imported for writing traces, and it was decided to use an API to record PMTU discovery alongside a traceroute.  We also have a solution to the problem with Linux netlink returning the PMTU of the path rather than the MTU of the outgoing interface.

A bug was found in Scamper's logic that caused it to send probes at about half the specified rate.  By breaking down large source files into units and generally tidying up the code, it is now easier to navigate and reuse.

~ Upgrades, troubleshooting, and maintenance on the AMP servers and infrastructure

The failed disk, disk7 in the AMP server disk array, which had to be replaced last month, failed again.  The data restoration to the replaced disk was completed and data collection was restored on both AMP and VOLT. A disconcerting element of this latest disk failure on AMP is that the am_slave process for collecting data did not halt. That event normally triggers a message warning that am_slave down message.  Investigation began on this anomaly. Shortly afterward however, the same disk failed in the AMP Server Disk array (for the second time in two weeks). It was decided that instead of installing another 36GB disk, a return to the 18GB disks might have a positive effect on this chronic problem. We had recently shifted to using 36GB drives for the added capacity.

The HPSS ftp interface used for archiving the AMP data still proves to be very unreliable at times, causing failure of the archiving process, sometimes requiring multiple restarts.

IP address migration ~

SDSC asked us to relocate the NLANR hosts at SDSC to a different address space. For AMP, this involved IP address migration for several servers and updating the 150, or so, active monitors. Additional network interfaces in the AMP, VOLT and calorie machines were installed to facilitate the network transition.  

The transition is at the stage that the machines are currently on both the .123 and .74 networks. At this time we believe the target to complete the transition is August 1. In discussions with Jay Dombrowsky and Lyle Carlson we will be able to keep one of the machines on both networks through October. Or until such time that all the AMP monitors shipped but not installed are connected. This will provide for connecting to those machines from the .74 network to edit the /etc/hosts.allow file to allow connection from the new.123 network.

The network transition of AMP and VOLT caused some anomaly with the connectivity to the HPSS. It appeared to be a TCP-wrapper issue within the HPSS. We worked with the HPSS people on this.  

Following his attendance at the Joint Techs meeting, Tony visited SDSC. During his visit much progress was made in planning transitions, especially the transition to the new AMP and VOLT servers. In light of the recent data disk failures there is an even greater need to phase out the old AMP and VOLT servers. Tony discussed his plan to make use the new servers RAID array disks by mounting them as NFS file servers with the AMP team. Also, during his visit, Tony worked closely with Jeremy helping him understand the structure of the AMP network and reimplementation (his primary reason for visiting at this time).  

Support and troubleshooting, existing AMP measurement sites:  

A total of 10 remote sites in the AMP infrastructure received attention during this period. (Outages are considered "open" until the monitor is again collecting data.) Details follow.  

10 problem sites:  7 resolved, 3 open - at period end.

amp-emory (Emory U. in Atlanta, GA) ~ experiencing an outage--the site technician is working on the diagnosis; not collecting data.

amp-fsu (Florida State U.) ~ moved to a new network outside their firewall, system manager distribution of new IP; collecting data.

amp-gatech (Georgia-Tech) ~ short outage due to power connection problems; collecting data.

amp-korea (KROENet2 in Korea) ~ short outage, was corrected quickly; collecting data.

amp-ncsa-dca (NCSA Access Center in Arlington) ~ short outage which a power cycle reboot brought back; collecting data.

amp-ou (Oklahoma U.) ~ short outage while switch port cables were re-arranged at the site; collecting data.

amp-rnpb (RNPNet in Brazil) ~ worked with the site technician and concluded a replacement monitor is necessary. The replacement was prepared and shipped under new, more complicated UCSD foreign shipping procedures; not collecting data.

amp-surf (SURFNet in Amsterdam) ~ lost network connectivity three times; corrected through the "out-of-band" connection. However, the third time the connection was lost completely. There are plans to replace that monitor as soon as possible, and we are working to restore out-of-band capability; not collecting data.

amp-uah (U. of Alabama, Huntsville)~ taken offline briefly because it was interfering with a ping diagnostic; collecting data. (Site technicians requested a plan to move the AMP monitor outside of the existing firewall.)

amp-yale (Yale University) ~ was suffering a strange anomaly such that it could not create echo requests to other hosts. We worked closely with site technician Jeremy George to diagnose the problem. A replacement monitor was needed so it was prepared, shipped, installed on site, initialized, and after some weeks of outage, is now collecting data. (We will analyze the failed machine upon its return to us.)

Outreach, Collaborations, and Activities supporting Network Research

~ Papers, Presentations, and Conference/Meeting Participation

July Joint Techs Meeting ~

  • Ronn gave a presentation which covered AMP achievements and outlined 10GigE monitor "firsts" as a segue to Jörg's talk on PMA.
  • Jörg presented a talk on "Monitoring the 10 Gigabit Abilene Network," which included an insight into the actual hardware used and some of the first results of our instrumentation at IPLS. The feedback proved it was well received. Jörg used the opportunity to talk to Steve Corbato, I2 and NLR, to discuss various ideas on further network instrumentation. He also approached Rick Summerhill about the idea of possibly instrumenting the MAN LAN (Manhattan Landing) exchange point with a 10GIGEMON. There are some technical hurdles, but it appears like a real opportunity. For most of the dialogs, Matt Zekauskas was either present, or was briefed later.

Initial planning and arrangements for Jörg to give a presentation on 10GigE application software at the Allliance SC2004 booth (November).

There was a brief dialog with Paul Love (I2) who was pleased to receive our proposals for presentations and tutorials at the next Joint Techs (scheduled for the end of January).

~ Collaborations and activities supporting network research

Regarding I2's HOPI project and various updates on regional optical networks (RON) at the July Joint Techs Meeting, Jörg spoke with Luke Fowler (Indiana University) and Dave Reese and his colleages at CENIC for details on the CISCO 15808 and 15454 used in NLR, with aims of future projects. He also spoke with Nicolas Simar on possible pan-European passive measurement activities and encouraged him to promote joint US-EU measurements, such as instrumenting the GEANT/Abilene peering points.

John Hicks, Chris Robb, and Caroline Carver, Indiana University ~
PMA: In a discussion with John, by chance, Jörg mentioned that he had quesitons about details on the CISCO gear used at NLR. As a result, John invited Jörg to have a look at the 15454s Cisco gear that they are putting in place for the IU campus network and also the ETF link between IU and Chicago. John quickly arranged it; they received great support from Caroline and Chris. As a result of this meeting Jörg has relatively good confidence about opportunities to instrument modern DWDM networks, specifically using the recent CISCO gear.

Taiwanese R&D network ~
PMA: Had a longer talk and writeup with a representative regarding options for passive measurements and possible collaborations with us. [Jörg Micheel]

Margaret Murray, CAIDA~
PMA: Phone conversations with her about some possible future measurement collaborations and various project matters. [Ronn Ritke, Jörg Micheel]

kc claffy, CAIDA~
AMP: Maureen and I sent her some info about the AMPViz (divide and conquer tool). [Tony McGregor]

Brian Court, CENIC~
AMP: Communications with Brian, who is preparing to submit requests for seven monitors for the CENIC backbone nodes. He has assigned the task to Darrell Newcomb. I will be working with Darrell to implement this installation. [Bud Hale]

Bob Aiken, CISCO~
AMP & PMA: Phone call about possible active and passive measurements. [Ronn Ritke]

Chris Bruja, CISCO~
Phone call to check on feedback about the 10Gig AMP and some future planning. [Ronn Ritke]

CRCnet (NZ wireless network) ~
Tony supported them with their deployment of AMP on CRCnet.

Jason Hurd, Endace~
PMA: Final phone call with him and a very attractive offer for DAG equipment to use during the 3rd year of PMA. Ronn and I are trying to determine with Alma how quickly we can move forward with a PO, which will ultimately gate how long it will take us to get the gear into the field. [Jörg Micheel]

Greg Cole, NCSA, GLORIAD~
PMA: Dialog with him on the instrumentation of the OC3c links to Russia and China. He expressed interest in purchasing some PMA monitors for deployment and measurement on GLORIAD links. [Jörg Micheel, Ronn Ritke]

Ivan Koga~
Helped him to compile AMP; he didn't realise you need curses installed. [Tony McGregor]

Paul Love~
AMP & PMA: He responded positively to requests for an AMP presentation and PMA tutorial at the next Joint Techs meeting. [Ronn Ritke]

Warren Matthews~
AMP: Talked with him about recent changes in the GGF measurement standards.  I thanked him in my Joint Techs presentation. [Ronn Ritke]

Eric Boyd, Pipes Project~
AMP: He will contact the Mona Lisa project about using the AMP data.  [Ronn Ritke]

Bill Cleveland and Scott Ballew, Purdue~
PMA: Email exchanges with them, including on the Purdue GIGEMON. Dialog with Bill on monitoring ideas, and getting the Purdue system to work, nothing definitive, unfortunately. [Jörg Micheel]

Teri Simas~
We will work together on a project for the next PRAGMA meeting at SDSC. [Ronn Ritke]

Rick Summerhill~
Spoke with him about the OC192 monitors at IND. [Ronn Ritke]

University of Auckland~
AMP: Tuesday I went to Auckland to meet with some people from the Computer Science department, including Nevil Brownlee and Ilze Ziedins.  We talked about a range of things including some work they've been doing on calculating a metric called t-entropy that may be useful for clustering and event detection. [Tony McGregor]

Matt Zekauskas ~
PMA & AMP: We have finished the description of the installation process at the Abilene IPLS location and have shared it with a number of people for proof reading and comments. Quite a lot of emails with Matt Zekauskas on various topics, he has been extremly helpful to provide more details on the HOPI design process, technical specifications and further contacts to follow up on with questions, for which I am grateful. [Jörg Micheel]

Followed up with Matt on the status of the Surveyor active measurement project. [Ronn Ritke]

Documentation, Web work, Utilization Improvement, Publications

Work began on a database to store and sort information regarding our Citings and Collaborations/Partners. This will serve as a backend content source for the related Web pages and for print needs as well. In-depth planning meetings were held regarding the approach. Originally, we explored a combination flat file, Perl, and m4 solution. After a conference call with Tony to get his input, we decided to change from the flat file and instead use PostgreSQL to manage all our information in one big database (a big improvement). We are focusing first on the Citings as that is less complex than the Collabs. We developed a list of parameters which each citing/citation will have (in order to create the several different Web pages). Lana began learning about PostgreSQL databases and Perl query scripts. One of the benefits that we expect from this project is that updating these pages (Citings and Collaborations) will become quite easy.

The Reports index page was updated and the layout redesigned.  http://moat.nlanr.net/Reports/.  

Minor link error in the AMP template was corrected and pages updated. This process served as a great reminder of how well the m4/Makefile system is working; am very happy with it.  

A PMA monitor installation Web page was begun. While intended for internal use, we think that it could be useful for PMA installations by facilities remotely hosting passive monitors.

Created a "media pitch" briefing for use by Greg Lund (SDSC External Relations) on our AMP and PMA, 10-gigabit, and international activities.

Jay Dombrowski (SDSC NetOps) asked that NLANR switch to a new subnet to allow for the expansion of additional SDSC assets. A strategy for the IP address migration was developed and implemented for the MNA servers (a new VLAN). There was quite a bit of manual configuration necessary.

An updated Windows OS PC was built and configured for increased speed, for use with more complex graphics and layout programs.




Activities of each individual on the project

AMP team

  • Tony McGregor ~
    AMP reimplementation; 10GigE AMP; attended Joint Techs; Program Plan modifications; IP address migration; failed disk; working with Jeremy.
  • Jeremy Kallstrom ~
  • AMP reimplimentation programming.
  • Matthew Luckie ~
  • AMP IPMP; AMP IPv6/Scamper.
  • Bud Hale, Jim Hale ~
    10GigE NICs; IP address migration; failed disk; new deployment machine prep; upgrades, troubleshooting, and maintenance on the AMP servers and existing infrastructure; SDSC Systems Administrator Security meeting.

PMA team

  • Jörg Micheel ~
    Joint Techs; Program Plan modifications; IP address migration; new PMA server; IPMI; potential new deployments.
  • Klaus Mochalski ~
    PMA real-time applications.
  • Chris Gross ~
    Web logs project.
  • Jim Hale, Bud Hale ~
    new PMA server; IPMI; IP address migration; new deployment machine prep; upgrades, troubleshooting, and maintenance on the PMA server and existing infrastructure; failed Dag4.3GE card; SDSC Systems Administrator Security meeting; installation Web page. \

MNA, AMP, and PMA Outreach, Documentation, Web work

  • Maureen Curran ~
    write/edit (slides, reports, updated Program Plan ); database backend project; preparations and help for new AMP FTE; Web logs help; minor Web work.
  • Mike Gannis ~
    SC2004 initial prep; media pitch.
  • Lana Kennedy ~
    database backend project for Web and print (PostgreSQl, Perl); write/edit/compile information (report); built new PC.
  • Management and Administrative

    Hans-Werner Braun, Ronn Ritke, Tony McGregor, Jörg Micheel ~
    Weekly NLANR/MNA managers conference calls.

    • Hans-Werner Braun ~
      developed strategy for address migration of MNA servers (new VLAN) and implemented migration; HPWREN measurements.
    • Ronn Ritke ~
      Joint Techs; budget; preparations and help for new AMP FTE and visiting PhD student, Klaus Mochalski; SC2004 prep; media pitch; slides.

    - 30 -

    divider line
    2004 Aug 21       NLANR/MNA home page
    acknowledgment