Summary of Research Activities - Oct. 2003
~ Continuing development of new metrics and real-time analysis for PMA
- Started to rework my real-time analysis program. Recently the SDA monitor has been disconnected because of the switch in the Abilene connections at SDSC. Have gained access to another monitoring point at the MAX GigaPop (OC48 one direction). Spent time porting my code to the MAX machine and learning all the idiosyncrasies of the 4.2GE. Set things up on MAX to do data collection. Plan to generate graphs with RRDtools once data collection starts. [Chris Gross]
- Chris has requested access to another system in the field to run his real-time analysis program for testing (now that SDA is disappearing as an ATM OC12c system). I have suggested MRA or MAX. MRA is not working, MAX only on one leg, but these are the "heavy hitters" within the current PMA infrastructure, so the best test field. I am going to continue working with Chris on this. [Jörg Micheel]
- I continued coding work on my flow engine. After consulting the source code of tcptrace and the BSD TCP implementation I am feeling more confident that it is on the right track. I added support for direct Dag access via dagapi to the library I am using for trace handling. This has to be tested with a Dag monitor, which I will do once I will be back in Leipzig. This is an important prerequisite for any real-time application. [Klaus Mochalski]
- For the histogram sequence tool I implemented a mechanism to store and replay the sequences. This will help with the analysis of larger trace files. I intend to implement functions to fast-forward through a stored sequence and to search for anomalous events. [Klaus Mochalski]
~ Progress on the reimplementation of AMP and the development of a new testing architecture continued.
- Finished the rotation of the recovery files and use of multiple files (and associated testing). Finished the transfer of trace data code and the unit test for it. [Tony McGregor]
- Working on splitting the recovery file into a group of files and supporting rotation of the files when they reach a given size. That turned out to be a bit trickier than expected because of the possible need to cross file boundaries during recovery. [Tony McGregor]
- Completed the command interface (without remote IO, which I'll leave for a later release). [Tony McGregor]
- Implemented IPMP to the point where I can send echo requests and process and record the replies. [Tony McGregor]
- Currently there are ~12500 lines of code. [Tony McGregor]
- While coding IPMP I wanted to check the checksum I was generating from a tcpdump trace and went looking for a tool that I could feed a string of hex from tcpdump into that would give me the checksum. I could not find one so I wrote one in Javascript; it is something I have often wanted to do, so I am guessing it will be helpful to others as well. http://moat.nlanr.net/Software/HEC/index.html [Tony McGregor]
- Read through the add_test file which is included with the amplet code. Worked with Ben to get the amplet code to compile. Had some issues with cross-platform compiling, but couple quick type casts took care of that problem. [Lana Kennedy]
IPMP ~ Development work continued on the cross-traffic-from-trace (ctft) generator. Wrote up an experiment I want to do on the emulation network that involves using a 10mbit link in the path immediately after a 100mbit link, using my IPMP packet tailgating method and the ctft generator. [Matthew Luckie]
While working on the IPMP Internet-Draft, made a small change to the layout of the options field to make things more efficient (well, my opinion, we're talking 1 bit here), which was picked up by Tony during `conformance testing' (my user-space implementation of the draft differed to his). [Matthew Luckie]
Met with the CEO of Allied Telesyn in NZ (they do the development work for their layer 3 devices here). He has agreed to revisit the idea of them implementing IPMP. [Tony McGregor]
~ Progress on the reimplementation of the Cichlid 3-D Visualization System continued.
- Worked on the GUI interface to Cichlid, refining the way in which the QT objects are set up and that connections are initiated. Learning more OpenGL and writing (and adapting from Jeff Brown's code) routines for drawing shapes. [Ben Reesman]
- Studied 3-D graphics picking and clipping, with an effort to start picking points correctly out of a mass of objects. This is something that each graph must be able to do correctly on its own, because mouse coordinates are fed in through the Graph object. In addition, I am trying to learn how to draw text correctly next to objects in 3-D space. This is something that Cichlid does very well in the code that Jeff Brown wrote, but I do not understand it well enough to adapt it yet. I have a couple of trivial examples but it still does not look correct. [Ben Reesman]
- Began creating a 'release' version of the Cichlid code. Several features still need to be completed (particularly support for labels and vertex-edge graphs). Worked on the state table that governs the GUI (ensuring that the GUI driving the playback is always in a self-consistent state has proven an elusive goal). I did a little bit of GUI redesign after Todd and I had a relatively in-depth conversation about usability. I also spent a fair amount of time in the old Cichlid code, trying to understand and adapt the code that draws labels. This is complicated by the fact that Cichlid2 uses GLUT for its font management, so I have been working on duplicating those capabilities without GLUT. I also spent time trying to learn to use automake. I would like to have Cichlid configured and built using './configure'. There are some complications that arise out of the fact that the main Cichlid makefile comes out of a tool called qmake, which ships with Qt. [Ben Reesman]
Began work on a collaborative project with HPWREN/ROADNet which involves developing a Cichlid server for the display of earthquake related data (animated). Wrote code that uses triangles to approximate an arbitrary terrain, such that a terrain map of San Diego County could be read in and displayed. I also wrote code that will animate a ripple-like effect over the surface of the terrain. I have tried it with a couple of different terrains and the effect is quite stunning. Worked on the adaptation of the terrain map/seismic activity animation code into a Cichlid graph type. Now that the proof of concept has been completed, the biggest challenge that I have to solve with this problem is the generation of averaged normal vectors to make the shading of the surface look smooth and eliminate the natural stitching effect of the triangles. This project was suggested by Todd. [Ben Reesman]
~ New (and developing) strategically important measurements & deployments
I set up an amp-naukanetnwu to test to a set of machines in Russia, to which Greg Cole arranged for us to test. Greg hopes this will be a stepping stone to a Russian AMP mesh (as do we). [Tony McGregor]
Bud, Ronn, and I had a very positive conference call with Rick Summerhill and other folks from Internet2 about putting AMP monitors into Internet2 nodes. There is a bit of an issue with the availability of -48 VDC sourced power supplies for the one rack unit AMP monitor chassis, which Bud is working hard to try to resolve. If we can get over that, it looks like the first machine will be deployed early in the next period. The goal is to locate AMPs at the 11 Abilene backbone nodes as part of the Observatory Project. [Tony McGregor]
Visited two more NZ universities. In total, I have three new offers to host AMP machines in NZ. I think this is good news because it seems like we may have passed the critical mass for AMP in NZ. Waikato and some of the universities will be funding these machines. [Tony McGregor]
Continuing work on the OC192 monitor, in preparation for our OC192 presence at SC2003 next month (Phoenix, AZ):
- Working to share utilization of the Adtech equipment with Kevin Walsh. We are using it to provide signal generation for the OC192 monitor. Kevin also needs it for preparation of the equipment he plans to take to Supercomputing 2003. [Jim Hale]
- At Endace new results are available and it is now firm that the cards will undergo upgrading which involves soldering to the boards. I have asked Jim to package all four of the SDSC cards for return to NZ and he responded immediately, only held back by some custom paper works which still need to be resolved. Priorities at Endace are in support of completing the works in time to return the cards to the US before the end of October, so SC2003 can be covered. Jim also received the second Dell 2650, for us to have both systems available up and running at that stage. It appears that integration works will be supported by Stephen Donnelly from Endace Technical Support, who will be paying a visit to the US West Coast during the week of October 19th-23rd, with Thursday and Friday at UCSD. [Jörg Micheel]
- Packed up the DAG 6.1 cards and shipped them back to New Zealand to accommodate Jörg in his preparation for Supercomputing 2003. (Due to some new Homeland Security regulations, found I need to be certified, in order to qualify to ship international parcels; went through the certification process.) Jörg and Endace worked double time to prepare and return the OC192 cards for SC2003. At which point, Stephen Donnelly (Endace) came to SDSC to help us with OC192 card testing and assembly into the DELL 2650 machines. The time I spent with Stephen was very well spent. I feel I'm far better prepared for installing the equipment in Phoenix then before. [Jim Hale]
- In communication with Jon Dugan for preparation of the gear needed to pull off our objectives at the Supercomputing 2003 conference. [Jim Hale]
- We managed to eliminate nearly all of the issues with the Dag6.1 cards that had been uncovered. One residual issue occurred with a particular motherboard, the decision was made to return two out of the four cards to San Diego and retain the second pair in Hamilton. This is to be able to continue investigating the cause as well as having equipment at SDSC in time for Stephen's arrival. The plan worked, the cards arrived in time and Stephen and Jim had a very successful testing. No feedback on whether they had been successful with the pair of cards in the 2650 as well. Minimally, we will be able to show a working 10GIGEMON at SC2003. [Jörg Micheel]
- Conversations with Ronn, Matt Zekauskas, and Jon Dugan volunteering to support us with the installation at SC2003. Jim will also be there for the first day to help the installation process. We are targeting the bandwidth challenge as one of the measurement targets over there. [Jörg Micheel]
- Making arrangements with the Network Operations technicians to put the OC192 monitors on the TeraGrid network to capture data till we remove the monitor for SC2003. I placed the previous OC192 machine in the racks in the machine Room. I placed a request for a new address for the additional OC192 machine (OC192a). Due to the fires throughout San Diego many staff have been unavailable to respond to requests and the university was closed for three days due to extremely poor air quality, causing delays in responses to requests. [Jim Hale]
The new unit at the Internet2 location in Ann Arbor, Michigan is being installed and the /etc/network/interfaces file is being edited to a new subnet address that was changed since the machine was shipped. Site technician Dan Pritts assigned the new IP addressing while we walked him through the re-configuration of the machine. That machine is now reachable. The fiber connection is expected to be made soon. When Dan gets it mounted and connected to the switch we should be able to log into it. [Bud Hale]
Worked with Ohio State site; they are switching to a GigE connection, which they are interested in monitoring. We are arranging to create another GigE monitor for that site. Jim worked this week to plan the task and acquire components. [Bud Hale]
Making progress on the AMPATH site in Miami. We shipped a optical power meter to them on loan to verify signal power levels. The power levels measured on inbound and outbound traffic was -21 dbm and -15 dbm, respectively. This led to additional trouble shooting while we had a very helpful technician, Ernie Rubi, there on site. Ernie was able to help us isolate the problem to a failed Dag3.2 card. We are currently preparing and testing a replacement pair of Dag cards, which should be shipped and installed soon. [Bud Hale, Jim Hale]
Together with a close friend of mine I have also managed to install a new monitor at another university, with GPS support. This data source should become available shortly and deliver some new runs of valuable long term traces, as we used to collect a good year ago in a number of locations. [Jörg Micheel]
Received word from Jörg to begin work on an OC48 machine for the Pittsburgh Supercomputer Center (PSC). Began working on it. Contacted Deb Shaw of NetOptics and placed the order for the splitter equipment shell needed for this machine. [Jim Hale]
Order has been placed for additional SCSI disk storage for the OC192a machine, which will raise it from 130 GB to about 660 GB; should arrive early in the next period. [Jim Hale]
Progress was made on the development of GigE measurements on the passive monitor at SDSC:
- Had discussions regarding the future and evolution of the nai-p-sda monitor. [Jörg Micheel, Jim Hale, Klaus Mochalski, Chris Gross]
- Since the loss of the connection to the Abilene network of nai-p-sda, am continuing to pursue the connection to the Abilene network through the CENIC network, where we will be making use of the new regenerative tap. Bud received response from NetOps that the regenerative tap for the link to Abilene network through CENIC is active and ready for PMA to start monitoring. I will need to upgrade the machine, as the new Dag4.3GE cards (GigE) require a PCI-X bus. Am looking at the Intel Trinity Motherboard for replacement it only has ATA66 IDE capacity, but it does have 160 SCSI capacity.
- The Dag 4.3GE cards are now on order. One of which will go into nai-p-sda to measure the Abilene network through the CENIC network. It seems these cards are just now rolling off the assembly line. It appears getting the three cards we ordered may not occur too quickly. Jörg has assured us he will do what he can to get them to us as quickly as possible. [Jim Hale]
I noticed that William Maton has started using a variation of my script to generate IPv6 Scamper traces automatically. Contacted him about the automatically generated Scamper runs, mainly to thank him for doing them, but to also point out that it was being run twice. I will put together a new address list in the coming months. [Matthew Luckie]
There was a talk on one of Joe St Sauver's mailing lists about tools for collecting statistics on paths that support a >1500 byte MTU. I will put the MTU discovery item into Scamper in the near future. There are 9180 byte clean paths out of SDSC, and I believe Phil Dykstra might also be able to help out with paths if he is interested in this kind of thing. [Matthew Luckie]
~ Papers
Wrote and submitted an abstract on my IPMP Bandwidth Capacity work for the CAIDA ISMA Bandwidth Estimation Workshop (December) - which has already been accepted. Am in the process of putting together the presentation. I want to ensure that it will be compelling and interesting and will be using the emulation network to do additional experiments. http://voodoo.cs.waikato.ac.nz/~mjl12/luckie2003isma.pdf [Matthew Luckie, Maureen Curran (ed.)]
Worked on the IPMP Internet-Draft making changes as suggested by Randy Presuhn, editor of the SNMPv2 RFC. Also made a small change to the layout of the options field in the IPMP draft to make things more efficient (well, my opinion, we're talking 1 bit here), which was picked up by Tony during `conformance testing' (my user-space implementation of the draft differed to his). http://voodoo.cs.waikato.ac.nz/~mjl12/draft-mcgregor-ipmp-03.txt [Matthew Luckie]
Jae-Min Lee, Jian-Bo Gao, Ronn Ritke, and Tony McGregor's "Characterization of end-to-end packet dynamics in the Internet" was submitted to Sigmetrics 2004. Another version of this paper will be submitted to the Special issue of Performance Evaluation. [Tony McGregor, Ronn Ritke, Maureen Curran (ed.)]
Discussed several potential papers to be submitted to PAM2004, relating to the real-time analysis efforts. Klaus will definitely do an abstract re his Internet burstiness metric work and Chris will write one on his real-time tool. [Jörg Micheel, Klaus Mochalski, Chris Gross, Maureen Curran]
I started writing up the ideas of the two approaches I am currently pursuing. This is meant to be the foundation for one or two papers. [Klaus Mochalski]
Had a preliminary meeting with Chris re his paper for PAM2004. [Maureen Curran]
~ Presentations and Conference/Meeting Participation
Prepared and gave a talk on NLANR AMP to the JET meeting; spoke with Paul Love (Internet2) in advance regarding topic and length. [Tony McGregor]
Gave the NLANR AMP talk at Auckland University; this is the last of the NZ talks. [Tony McGregor]
JET meeting introduction for Tony McGregor's presentation. [Ronn Ritke]
Attended IMC2003; it was a good and worthwhile conference, even though I was disappointed with the content and impact of quite a few papers. I am still missing the applied component in those and it continues to upset me how academic the solutions are that are being proposed (over and over). I met a lot of people and had a fantastic number of good face-to-face conversations with many folks. With Matt Zekauskas we prepared the installation of the 10GIGEMON for SC2003. I also had a very good talk with Kathy Benninger at PSC, and we agreed to push the installation of an OC48MON at her place. The preparatory work is already in progress by Jim. [Jörg Micheel]
Reviewed Tony's slides for his JET talk, put together and sent a possible replacement site map. [Maureen Curran]
~ Collaborations and activities supporting network research
Working with Peter Arzberger of PRAGMA. Sent him some NLANR/MNA slides which he presented at the CUDI meeting in Mexico City. He also pointed out that the CUDI NOC in Mexico City is hosting an NLANR/MNA AMP machine. Peter will update Bill Chang of NSF about NLANR/MNA activities in the Pacific Rim area while attending the PRAGMA meeting in Taiwan. (I will not be able to attend this meeting the 3rd week in Oct.) Next PRAGMA meeting will be in China in May. We hope to get Peter, Greg Cole, and Director Yan together. Director Yan is from the CNNIC site in Beijing - which expressed interest in hosting an NLANR/MNA AMP monitor and is participating in the Gloriad project. [Ronn Ritke]
Emails, a phone call and a conference call with Rick Summerhill about the Observatory Project. The goal is to locate AMPs at the 11 Abilene backbone nodes. [Ronn Ritke]
Have been following up on a number of emails coming in from research groups in the US and Europe. Strong interest continues in the OC192c monitoring. I am talking to Casey O'Leary from PNNL who would like to offer his support in NLANR's OC192MON project for SC2003. [Jörg Micheel]
I am in touch with a number of groups in Europe who are intending to launch a passive measurement infrastructure for the benefit of the European community and I am currently considering the various options to participate and contribute, in the hope that strong ties between PMA and these folks will build a global environment with payoffs to all involved parties. For NLANR we would be able to draw on additional data collectors, without having to go through the process of installing or operating them, or even having to pay for the equipment. [Jörg Micheel]
I have been in contact with UPC Barcelona, and Pere Barlet is keen to pay NLANR a visit either early 2004 for six weeks or mid-2004 for 8-10 weeks. We are in discussion which options would be best for the parties involved. [Jörg Micheel]
I set up am_master and sent the am_slave code to Warren Matthew (formerly of SLAC now at Georgia Tech). I made some changes to am_* to make it compile cleanly on recent versions of Linux. Warren setup a SOAP Web services interface for some of the Internet2 data and wants to do something similar for AMP data, which is very cool. [Tony McGregor]
I looked into producing some data that Jonghyun Kim from ou.edu wanted. I'm waiting on a response from him before going further. Basically he was after a matrix of RTTs for all the sites, but did not tell me over what duration. [Tony McGregor]
I sent the latest international list to Ronn to send to Che-nan Yang in Taiwan and created a Web page where that can be fetched from (using a username and password). [Tony McGregor]
Talked with Stephen Campbell at Dartmouth about the ping filter. He has promised to open a hole for AMP very soon. [Tony McGregor]
Met with the CEO of Allied Telesyn in NZ (they do the development work for their layer 3 devices here). Amongst other things, he has agreed to revisit the idea of them implementing IPMP. [Tony McGregor]
Phone conversation with Ian Graham to discuss overall network measurement issues. [Hans-Werner Braun]
Met with Kevin Thompson (which I found very helpful). Afterwards, I met with Larry Landweber (who I had not met in years) to discuss various topics. Mark Ellismann also joined us later. [Hans-Werner Braun]
Emails and questions on the phone for Matt Z about the upcoming December measurement workshop at SDSC. [Ronn Ritke]
We worked with Matt Zekauskas this week to prepare for the monitors at SC2003. [Bud Hale]
It was a pleasure working with Stephen Donnelly from Endace while he was visiting this period. [Bud Hale]
Wrote a new overview paragraph for AMP, including the IPv6 mesh, and sent to Ronn for Rick Summerhill to use on their Web page re AMP and Abilene's Observatory project. [Maureen Curran]
Sent the new international AMP map that Ben created (he, Todd, and Tony pinpointed the problem with the script last period and the program was fixed) to Peter Arzberger for possible use on the back cover of the forthcoming PRAGMA brochure. [Maureen Curran]
~ Documentation, networked data
Completed the "NLANR International Successes" article, sent to NSF for comment. The article will be put on the Web in two forms: an overview on the SDSC Website (TBA) that links to a more detailed second part on the NLANR Web site (http://moat.nlanr.net/International/partB_article.html), and a version on the NLANR Web site that combines both of these into one document. http://moat.nlanr.net/International/overviewOct03.html [Mike Gannis, Ronn Ritke, Maureen Curran (rev. and Web), Lana Kennedy (Web)]
Decided to put together a quick issue of the NATimes for distribution at SC2003, quick because we will use the handout that Matthew and I put together on IPv6, other IPv6 related material, and an abbreviated version of Mike's International article, part B. In keeping with the IPv6 theme and focus on users, decided to ask Bill Owens (NYSERNET) and Joe St. Sauver (U. of Oregon) if they had time to contribute an article (each) to the issue. They have both readily agreed. So, it's shaping up to be a pretty good issue. [Maureen Curran]
I talked with Maureen about using the IPv6 flyer we did of the AMP IPv6 project in the NATimes. [Matthew Luckie]
Started putting together the layout for the new issue of the NATimes. [Lana Kennedy]
Worked on code in PHP for the AMP scripts that manage latitude and longitude and display the locations of AMPlets (for the AMP splash pages). One of the problems is that the large number of AMPlets makes displaying them clearly very difficult. Tony and I are starting to specify a system of scripts that will generate arbitrary maps of amplets, generalized in a way to be useful on any scale with any coordinate system. Tony suggested that we implement a more intelligent algorithm for placing sites on the map: one that would place stars far enough apart to not overlap excessively. I learned more PHP in order to do this. [Ben Reesman]
Discussed the final details of the new splash page for AMP with Ben and some bugs he is chasing. Also talked to Ben about using Cichlid for some work at Waikato (real-time header capture to Cichlid display). [Tony McGregor]
I did some work on the Otter topology display, adding options for displaying just the most common path each day and for omitting the tabular display. I also updated to the most recent otter. I hit a bug in otter, but Brad Huffake (from CAIDA), who is the main author of Otter, was able to fix it for me and do a new release. This version prints even sized nodes, which is good for our use of it. [Tony McGregor]
Developed a proposed PMA Web page directory infrastructure; discussed it with Jörg and Klaus during a conference call. After the call, Klaus and I went over ways to handle the redirects (we had discussed these previously, but Jörg wants to go a different route). During our discussion, we found a great answer. Klaus generated a list of all the html files in the Web directory. We removed the 27,000, or so ;^) belonging to the Datacube, leaving about 165 html files to redirect. [Maureen Curran]
Phone call with Klaus and Maureen. We reached an understanding wrt to the new PMA Web pages, which basically implies little delays "just get it out real quick". We discussed publishing the Leipzig-I and -II data sets and Klaus' return to Leipzig, as well as his continuing involvement with PMA. [Jörg Micheel]
I had a short discussion with Maureen related to the PMA Web site. We agreed that I will be available for further discussion after being back in Leipzig. [Klaus Mochalski]
Tony and I discussed cascading style sheets and what the real needs are regarding making changes to Web pages. CSS is not the best bet for our Web pages, Tony has several ideas on other ways we can handle this. Exchanged emails with Klaus which was also very helpful. [Maureen Curran]
Have been discussing with Maureen about making the new Web page templates more maintainable, probably using the C preprocessor. [Tony McGregor]
Installed the HEC calculators on the MNA software page using Maureen's template. http://moat.nlanr.net/Software/HEC/index.html [Tony McGregor]
Worked with Tony and Hans-Werner to post Tony's new tool (Javascript checksum calculator). When I added it to the tools page, I converted the page to the new template and made a couple of other quick fixes on it. At the same time, fixed the MNA navbar so that both of the feature images are live (now go to appropriate pages, vs. holder page). The two featured images currently are the Citings pages and the OC48 Abilene-I page. I did it this way in order to expedite it (vs. creating the middle summary pages, as will happen later). http://moat.nlanr.net/tools.html [Maureen Curran]
Met with Klaus to discuss the Web logs/data collected/downloaded project regarding picking up where Cooper left off, Chris joined us. We discussed this with Jörg during a conference call. It was decided that when Chris has time, he will do these (quick and clean). Followed up with Chris and with Cooper to obtain his latest versions. [Maureen Curran]
Began working on creating the new site map for the passive machines. [Ben Reesman]
Began creating an outline file with information about each of our activities from the past quarterlies and annual report, to be used with Maureen's next design for the MNA home page highlights, and possibly for use in the next issue of the NATimes. [Lana Kennedy]
Discussed the topology project with Bud re obtaining replacement diagrams for currently outdated ones, as well as sites without one. [Maureen Curran]
Discussed some project needs, clarified the passive users list Web page, etc. [Hans-Werner Braun, Maureen Curran]
Started to learn XML and the related languages, DTD and XSD. (The global grid forum Network Measurements working group is using Schemas written in XSD for exchange of measurement data). While continuing to bring myself up to speed with XML Schema, started to investigate SOAP. I noticed that DAST are using XML RPC for their advisor tool, so I may need to look at that too. [Tony McGregor]
There was improvement in the functioning of the AMP and VOLT servers with the addition of the new 36GB drive installation and they did well most of the month. However, the am_slave process on AMP again failed during the archiving process before the disk fill could be reduced. As discussed before, disk fill continues to accelerate as the sites grow and the consequent data collection rate grows. It seems now that archiving is needed on both servers at a frequency of about five weeks maximum (the am_slave process failure occurs at about ninety-one percent fill). [Bud Hale]
To improve disk balance, I often move directories, from /disk to /disk. The combination of moves applies to both the AMP and VOLT servers, since I am keeping the directory locations on the respective data disks identical. This is to facilitate disk copying in the event of failure. However I am not sure this is having the desired affect. Some changes in the archiving period may be needed soon. This would be an interim measure until major changes in AMP data storage are accomplished. [Bud Hale]
I have suggested to Bud that he start pricing replacement machines for the AMP and VOLT data collection servers. The current machines are quite old. I want to increase the disk space and use a RAID with redundancy so that recovering from faulty disks is easy. I'm proposing to go with IDE because it is much less expensive and we can now get enough disk space in one box. [Tony McGregor]
There has been a temporary up-side to some of the affects of the rash of worm problems of recently. Sites have blocked ICMP echoes at many routers. This has reduced the amount of data the AMP and VOLT servers are getting and thus slowed the disk fill. But those issues are getting resolved and the data rate is resuming. In regard to AMP and VOLT data disks, Tony has given the go ahead to upgrade the AMP and VOLT data servers. Jim and I have started working to get that upgrade accomplished. We are working on gathering data on price and availability developments regarding the components needed, in advance of Tony's arrival early next month, when we will discuss the options available. [Bud Hale, Jim Hale]
During a staff meeting discussion this period, it was concluded that more information regarding AMPlet security is needed. This information is to serve the purpose of informing site administrators and managers of the security measures implemented in the AMPlets. And how and why the AMPlets on sites do not pose a security risk. I took the lead to get this information source implemented. [Bud Hale]
I finally managed to collect the new 120 GB disk I needed to complete the disk chassis for the raid array on the PMA server (480GB total). Am now ready to install the additional storage. [Jim Hale]
Mid-October was the last week of the visit to NLANR by Klaus Mochalski from Leipzig, Germany. I have enjoyed working with Klaus during his three month visit and I value the experience and the great benefit derived from the association. I hope for more opportunities to work with him. I believe NLANR will derive much benefit by continued collaboration with Klaus as he returns to Leipzig University. [Bud Hale]
I got Klaus to sit down with me and explain some fundamental aspects of the DAG cards and tools that I didn't understand. Just the short time he has been here has been very beneficial to my understanding of the nuts and bolts of the composition of PMA mechanism. Before Klaus left, I spent a considerable amount of time with him picking his brains on the Endace Dag cards, Dag tools, and analysis of the results of the tools. [Jim Hale]
Weekly NLANR/MNA managers conference calls. [Hans-Werner Braun, Ronn Ritke, Tony McGregor, Jörg Micheel]
Existing measurement sites maintenance and troubleshooting:
As reported by Tony, all sites have had patched sshd versions 3.4 installed except amp-ufl (U. of Florida at Gainesville), amp-mit (Mass. Inst. of Technology), amp-odu (Old Dominion U.), and amp-sdsu (San Diego State U.), which were offline during the update. I will run the system manager to perform that update as soon as I get those sites back online. I will make that run while logged into the individual monitors to be sure connection is not lost as happened to a few sites during the first update. [Bud Hale]
A total of 29 remote sites in the NAI infrastructure received attention during this period: 15 have been resolved and the monitors are again collecting data. 14 were still being investigated, or pending site action, at the end of the period. (Outages are considered "open" until the monitor is again collecting data.)
AMP - 21 problem sites: 10 resolved, 11 open **see below
PMA - 8 problem sites: 5 resolved, 3 open
** the AMP numbers are a bit askew this period because several of the sites which reported outages were brought back online, but then wound up blocking ICMP traffic.
In regard to the ICMP blockage issues, it appears this started with the blaster worm. And it appears the nachi worm was created as a counter to the blaster worm. But it seems the nachi worm caused router tables to fill and create DOS conditions. As reported in previous reports some AMP machines were turned off as a result of the chaos. At it's peak fourteen AMP machines were either turned off or had ICMP blocks at the routers. [Bud Hale]
~ AMP machines
Site amp-ufl (U. of Florida at Gainesville) was off for machine room rework and network and equipment rearrangements and reconfiguration. That site came back online blocking ICMP. However, site tech Matt Grover was persuaded to create a router hole. [Bud Hale]
Site amp-mit (Mass. Inst. of Technology) was taken off line for network and equipment rearrangements and reconfiguration and also because of the blaster/nachi worm. However, the router blocking issue discussed over the past several weeks continues to exist. The plan there is to locate it on a different subnet and create a router hole for the needed traffic types. However that has not yet happened. Site people are very much aware of our desire to get it relocated and restarted. They are promising to move it as soon as possible. [Bud Hale]
Sites amp-odu (Old Dominion U.) and amp-sdsu (San Diego State U.) still report security issues, though site amp-sdsu expects to have the monitor back online right away. The sites were taken off line for security precautions. Of the two, amp-odu has been reconnected and is reachable with ssh login. However that site continues to block ICMP packets. This stops the traceroute function part of the AMP measurement. Will continue work with site people to resolve that issue. amp-sdsu is expected to be back online soon. [Bud Hale]
Helped to diagnose an outage at amp-cudi (Corp. Univ. Desarrollo de Internet, Mexico). The problem there was a simple network connection, and the site has been recovered. [Bud Hale]
Site amp-jpl (Jet Propulsion Lab) is again in the UPS rework mode this weekend. This weekend is said to be the last and final time. So amp-jpl is in an outage from late Friday until mid weekend. [Bud Hale]
Site amp-asu (Arizona State) was blocking ICMP but created a router hole for the AMP monitor. [Bud Hale]
Site amp-arizona (U. of Arizona, Tempe, AZ) was blocking ICMP but was persuaded to move the AMP monitor to a subnet such that a router hold could be created. I changed the IP and restarted it on the new addressing. [Bud Hale]
Sites blocking ICMP were: amp-cornell (Cornell U.), amp-dartmouth (Dartmouth U.), amp-ksu (Kansas St. U.), amp-montana (Montana St. U.) amp-nmsu (New Mexico St. U.), amp-odu (Old Dominion U.), amp-rpt (Rensselaer Poly. Inst.), amp-sdsu (San Diego St. U.), amp-ua (U. of Alabama at Tuscaloosa), amp-unm (U. of New Mexico, Albuquerque), and amp-wayne (Wayne St. U., Detroit, MI). Messages and other communications have been directed to all sites discussing the ICMP issue and requesting router holes created for the AMP monitors. Of these sites, five are back online: amp-dartmouth (Dartmouth U.), amp-montana (Montana St. U.), amp-ua (U. of Alabama at Tuscaloosa), amp-cornell (Cornell U.) and amp-ksu (Kansas State U). [Bud Hale]
Two new outages at sites amp-aarn (Australia Research and Education Network) and amp-thor (Norwegian U. of Sci. and Tech). I expect those outages to very transient in nature. [Bud Hale]
~ PMA machines
Worked on reviving a number of OC3MONs: TXS, APN and OSU, which brought the number of collecting machines from five up to eight. Have been busy chatting with Ronn about a plan of attack to ramp up the number of active data collectors in the field. We merely started, more planning work to be done. [Jörg Micheel]
nai-p-odu (Old Dominion U.) OC3mon and nai-p-ufl (U. of Florida, Gainesville) OC12mon sites are off line. The nai-p-odu site is still off line due to security issues. The nai-p-ufl site is off line due to machine room rework problems and the pending decision for the new rack space location. That should be resolved soon. The ODU was again connected. It is reachable and can be started collecting data again. However, as with the AMP monitor, ICMP packets are blocked. However that should not affect the passive monitor. Still working to get nai-p-ufl back online. [Bud Hale]
The nai-p-txs (Rice U.) OC3mon unit had to be rebooted this week. [Bud Hale]
Other sites we are working on at this time include nai-p-buf (U. of Buffalo), and nai-p-mra (Merit Communications GigaPop, Ann Arbor, Michigan). We examined nai-p-mra at some length. And after some additional work (running diagnosis with the Dagtools and restarting the measurement cards) Jim was able to start that monitor also. Working on communicating with the U. of Buffalo to get the machine there moved to a new network and connected to the OC12 connection. [Bud Hale, Jim Hale]
The NCAR (Nat. Center for Atmospheric Res.) GigE monitor was found to be unreachable. However working with a very cooperative and helpful site technician under Scot Coburn, we were able to restore it. Jim has since reworked the TCP wrapper files to insure the machine security. It is now in condition to be restarted. [Bud Hale]
Worked with Klaus on the AIX machine in getting it back collecting data. It appeared to be that the DAG software had just locked up and simply restarting the card was all that was necessary to get it back up. But Klaus went over all the processes and ran a lot of the tools to show me their functions and by the next morning the machine appeared in the PMA trace summary as a collecting machine. [Jim Hale]
Began working on the replacement machine for Front Range GigaPop. Short of some hardware needs, the machine is configured and nearly ready to ship. [Jim Hale]
Site naianl (Argonne Nat. Labs.) OC3mon machine is working and collecting data. However, it appears port 22 has been blocked preventing ssh log in. Despite this, it continues to transfer data. <Not included in counts above.> [Bud Hale]
Am preparing to visit the OC48 monitor site at the Mid-Atlantic Crossroad GigaPop (nai-p-max) in DC during the week of Nov. 9th to the 13th. I am planning to troubleshoot the remaining non-operating channel of the OC48 monitor there. [Bud Hale]
Jim and I worked on some of the OC3 monitors with Fore cards. [Bud Hale]
Bud, Jim, and I are working together to create a current PMA status list. [Ronn Ritke]
|