Summary of Research Activities - May 2004
Development and distribution of measurement and analysis toolsProgress on the reimplementation of AMP and the development of a new testing architecture: I made some further changes to the code that returns standard output from remote commands. That allowed Xing to complete the pathrate, pathload and Iperf test code. [Tony McGregor] Meeting with and Bill Decker to sign paperwork for the release of the new AMP software. [Ronn Ritke, Tony McGregor] New Path Display tool - divide and conquer graphic (pathviz) ~ I have re-enabled Xing's usercode on amp so he can put try the new pathviz code he has been working on with real data. Xing has finished a pretty decent demonstration version of the pathviz tool. [Tony McGregor]
Extending the Network Analysis Infrastructure (NAI) in support of new and developing HPC needsSpecial Traces ~ I have worked a considerable amount on the new Florida monitors. In Gainesville we now have five contiguous hours during a busy working day as a new special trace file. Anonymization took the best part of a working day, and quite a bit of time has been occupied with working on infrastructure. I have copied the Florida-I data set from Gainesville to SDSC, and processed it. Since then, I have turned the machine into daily 8x90 captures. ftp://pma.nlanr.net/traces/long/uflg/1/ [Jörg Micheel] ~ New (and developing) strategically important measurements and deployments I shipped out the third of the Internet2 AMP monitors. At this time there are a total of three new AMP monitors on sites, awaiting installation and initialization. Those are: amp-i2in (Internet2, Indianapolis), amp-i2ch (Internet2, Chicago), amp-i2su (Internet2, Sunnyvale). Installation progress depends upon the Internet2 crew scheduling. Seven more AMP monitors remain to be prepared and shipped. [Bud Hale] I contacted Brian Court of CENIC to inquire if he plans to continue with his earlier indication to place AMP monitors on CENIC. He said he expects to follow up with that very soon. [Bud Hale] I finished the machine for the University of Cambridge, England, amp-ucam, and it has been installed. I spent some time working with Tony and site people attempting to resolve some network issues there. It appears we resolved the final issue and I initialized the machine. It is now collecting data in the international mesh. [Bud Hale] Word has come that the AMP monitor we are placing in Beijing, China is on site and ready for installation. I sent additional instructions for the installation to the site contact, Wanming Luo. Issues arose having to do with equipment location, however. Following the connection, it appears there are routing issues needing to be resolved at the site. I expect the initialization to proceed soon. [Bud Hale] The AMP machine is now at the CNIC site in China. I will meet with Wanming Luo - he is the contact person for the AMP at CNIC. [Ronn Ritke] Alongside the meeting in Arlington, Jörg traveled to several passive monitor sites to deploy new PMA monitors at some locations and revive others: Indianapolis Abilene backbone OC192/OC48MON instrumentation Jörg has arranged to install a monitor at Department of Statistics at Purdue University in West Lafayette while he is in Indiana. I managed to remove the SDA monitor, collect and order the gear needed for the installation, and get it shipped to West Lafayette with the gear to Indianapolis. [Jim Hale] All the work put into getting monitoring gear distributed around the country paid off this week. The installations went well and have been successful. Jörg managed to install three monitors in Indianapolis (IPLS), one monitor in West Lafayette (Purdue) and work out all the details needed to get the Pittsburgh Supercomputer Center on line. Sent a replacement card to Kathy Benninger at the Pittsburgh Supercomputer Center; we are just waiting for it to be installed. The data from PCS is expected to be very interesting. [Jörg Micheel, Jim Hale] I have integrated the two new data collectors into our PMA sites page and added information per site for trace users as well. http://pma.nlanr.net/Sites/ [Jörg Micheel] A new AMP request was received from Purdue University in Lafayette, Indiana. That monitor was prepared and shipped and has been installed. When it was initiated, there was a slight glitch from a directory ownership issue that Tony helped me find. Now site amp-purdue is online and collecting data. <Long-time collaborator Bill Cleveland is now at Purdue.> [Bud Hale] I have moved amp-kiwi into the full international mesh. We have come to a special arrangement with the University for paying for measurement traffic. This means we do not pay by the byte any more for that traffic as long as we do not cause the University's link to fill. [Tony McGregor] ~ IPv6 and IPv6 Scamper I am designing and implementing a new file format that is flexible and allows Scamper to record MTU data, source routing data, IPv6 and IPv4 data, as well as list data. [Matthew Luckie] Began an implementation of a file format for recording Scamper traces. Kenjiro Cho (WIDE) supplied a patch to Scamper that fixed up some compile issues on NetBSD and changed the logic slightly for detecting loops. He reported a few issues with the operation of the PMTU code, which spurred me into action on dealing with some of them. The PMTU code seems like it is getting closer to being something with which I am comfortable. [Matthew Luckie] I spent quite a bit of time on the Scamper file format. I talked with Perry Loirer about the Linux equivalent of the BSD routing socket. I also received a Scamper-pmtu run on a 900 entry address run from Kenjiro Cho to dual-stack nodes. I was pleased to see that PMTU works for v4 addresses, having never actually tested the code myself. The Scamper run found one v4 path with a PMTU less than 1500. The MTU was limited by the end-link which appeared to be a PPPOE+ADSL connection with the optimal MTU of 1454 set. I implemented Scamper PMTU support on Linux by obtaining the outgoing MTU from netlink. Perry Loirier (WAND) pointed me at netlink, and I am very grateful for that. [Matthew Luckie] I ran Scamper in PMTU mode on a large address list from sorcerer and everything seemed to work fine. I found a router that returns ICMP6 Fragmentation Needed messages for 68 byte packets, with a reported next-hop MTU field of 0. On closer inspection, I noticed it generating two ICMP6 Fragmentation Needed messages for one probe packet, both saying the MTU of the next hop is 0. The first has Scamper's UDP header as sent in it, while the second has the source and destination ports zeroed out. I might get in contact with whomever is listed as the contact for the IP block with this particular system in it. [Matthew Luckie] I followed up with Bill Owens over something we had discussed many months ago. Bill had offered the use of a Linux machine with a Broadcom GigE card directly plugged into a NYSERnet router that can route packets >1500 bytes. The router can route 4470 byte packets, less than the 9180 byte packets the card can do, but still good enough to locate Ethernet links whose MTU constrain the PMTU. For most links, I have only seen these links at the last few hops to each AMP monitor, which is to be expected. [Matthew Luckie] I would like to find somewhere with a 9180 byte path clean to I2 to do a survey. [Matthew Luckie] Worked on the Scamper file API, nearly there. [Matthew Luckie] I exchanged emails with some AMP IPv6 site administrators about a troublesome router that has been returning ICMP responses that do not make sense. The router is obviously a v6 in v4 tunnel. Instead of returning a fragmentation required message with the MTU of the path, it returns Destination Unreachable, No Route. I emailed them about this problem first a few weeks ago, but I prompted them again this week. [Matthew Luckie]
Outreach, application support, utilization improvement, and documentation activities~ Papers I am an author of two papers accepted to the SIGCOMM Network Troubleshooting Workshop (NetTs). The workshop took 13 papers from 36 submitted. The first paper is "User Level Internet Path Diagnosis with IPMP" co-authored with Tony and the other is "Identifying IPv6 Network Problems in the Dual-Stack World" with Kenjiro Cho as the first author and Brad Huffaker as a co-author. The reviews were well done, certainly the most useful reviews I have seen of any of the conferences I have submitted to in the past. [Matthew Luckie] Used the material I wrote for the NetTS paper on diagnosing paths with IPMP and merged it into my thesis. [Matthew Luckie] The PRAGMA Web site has a link to the International Successes white paper written by Ronn Ritke and Mike Gannis. Article title: NLANR/MNA Builds an International Infrastructure for Network Research. http://www.pragma-grid.net/news_items.htm Worked on a paper about simulation using real code from FreeBSD with one of my PhD students. It has now been submitted to the International Conference on Network Protocols. [Tony McGregor] Meetings with a number of people at the PRAGMA meeting: Wanming Luo was assigned as the contact person at CNIC in Beijing for the NLANR AMP monitor. We met the first day for a few hours. I stepped him through my latest set of slides and answered a number of his questions. He would like AMP to be IPv6 compatible and measure Gloriad. I sent emails to Greg Cole about measuring Gloriad and to Tony about adding IPv6 capability to the AMP machine at that site. We also met again the next day. The CNIC site has a new equipment room, equipment is being moved and the AMP machine will be assigned a new IP address in about a month. [Ronn Ritke] Putchong Uthayopas, the Director of High Performance Computing and Networking Center in Thailand will purchase one PC to host the AMP software. If we are interested, he will put in a proposal for funding to purchase 11-12 more PCs to host NLANR AMP code for Universities on the Uninet network in Thailand. [Ronn Ritke] Larry Ang from Singapore will host an NLANR AMP monitor. [Ronn Ritke] Conversations with Bill Chang from NSF about the NLANR AMP in Beijing and the Gloriad proposal. [Ronn Ritke] Dr. Byeon from the KISTI group in Korea, which has deployed 14 Korean AMP monitors, has a new student working in measurement. [Ronn Ritke] ~ Collaborations and activities supporting network research Worked with Peter Arzberger regarding PRAGMA6 meeting planning and NLANR slides. [Ronn Ritke] I had some discussion with Andrew Moore at Cambridge about their new AMP box. There seem to be major network problems getting to it. [Tony McGregor] Communications with Bill Cleveland about a project for instrumentation at Purdue with PMA monitors; also discussed a possible AMP deployment with him and referred him to Tony. [Jörg Micheel] Talked to Bill Cleveland from Purdue who has offered to host a machine (after a suggestion from Jörg). [Tony McGregor] I added Warren Matthews to the throughput test list so he can test the Web services interface to that. [Tony McGregor] Positive response from Kathy Benninger to meet at Pittsburgh to make the OC48MON operational. [Jörg Micheel] Some email exchanges with the Klaus-es at Leipzig <Mochalski and Degner> re the research internship at SDSC during summer and late summer this year. [Jörg Micheel] Eric Harder at NCSC has asked about Colorado State data from the 26/27/28th of January this year, which I have pulled from the HPSS for him. [Jörg Micheel] I had an inquiry from trace data user Hao Jiang, at Georgia Tech, a student/colleague of Constantinos Dovrolis. [Jörg Micheel] There was an inquiry from a Canadian university regarding Auckland-4 data. [Jörg Micheel] I have gone back to Rick Summerhill at I2 to see how we can fit the IPLS installation along with this upcoming trip. Communications with John Hicks and I2 folks regarding the Abilene instrumentation. [Jörg Micheel] Discussions with Margaret Murray about possible joint work on measuring high performance systems; possible NLANR participation in TeraGrid measurements. [Ronn Ritke, Tony McGregor, Jörg Micheel] I have been talking to George Clapp from Telcordia about using traces for router and switch performance simulation. I pointed them at Jörg w.r.t. PMA traces. [Tony McGregor] Phone conversations with John Hicks about future TransPac plans. [Ronn Ritke] Conference call with Greg Cole about NLANR and the Gloriad proposal. Worked with Mike Gannis on writing and revising text for the Gloriad proposal and support letters. Letters of support for two proposals were sent out. [Ronn Ritke, Mike Gannis] ~ Documentation, Web work, networked data, publications In the process of preparing the collaborations for the Annual Report, I designed a way to do them for the AR that can also be used for the Web pages (using an otl file in NoteTab). This will automatically create the reference list of collaborations, and with a couple global search and replaces can be a separate page. Have compiled and converted the collabs through December, will add the recent months as well. [Maureen C. Curran] Made a few changes to the Meet The Team page update that Lana did, ran m4 and uploaded it. This is a little different since the MTM is the only m4 system html file on mave, so I run it on moat first and scp to mave. http://mave.nlanr.net/MTM/ [Maureen C. Curran, Lana Kennedy] I created the new NLANR Traces User Community page, using the old announcement with new intro text. Added the Web logs to the PMA navbar (Collection and Use Statistics). [Maureen C. Curran] Gathered together the Citings info to get our users' comments about our work. Looked through new meetings (PAM2004, IMC2003, and SIGMETRICS2003) for numbers and citings. Sent to Maureen for use in the NLANR handouts for the review. Had some graphics created that are very similar to the ones on the current Citings page. Maureen gave them a thumbs-up, so I added text to them to reflect the percentages. Will add those to the updated page soon. [Lana Kennedy] Set up a preliminary system to keep track of all pings, and when they are posted. Gathered some preliminary info on a couple of new Pings for the home page. [Maureen C. Curran] Updated Pubs list with new papers; fixed some tags that needed to be changed to reflect the new system. Posted by Maureen: http://mna.nlanr.net/Papers [Lana Kennedy] The first phase of the Weblogs for PMA's FTP activity is complete. I made some more changes to the look of the graphs generated for the Weblogs. Fixed a bug in the code and set up the internal structure of the /Stat directory on PMA; put the Weblogs into the PMA Web tree. Worked with Maureen on completing the language and look of those pages. Currently, there are three graphs for each month (the interval is Oct 2001 to Apr 2004). The graphs are volume of traffic, number of users, and number of files. I plan to produce year-long graphs for 2002 and 2003, and incorporate HTTP stats as well. http://pma.nlanr.net/Stats [Chris Gross] Worked with Chris on the Weblogs and their relevant Web pages. I changed some permissions so he can create the htmlm4 files and add them to the directory. Lined him up with the m4 templates on the PMA server. Sent him a possible change of language for the captions to the download graphs. We went over permissions with Jim, which are all fixed and Chris can run gmake. In the graphs, Chris charted number of users, total volume of traffic and number of files for the months from October 2001 through April 2004 and made three pages, one for each parameter with all the monthly graphs. I wrote the index page and did the layout of the links. [Maureen C. Curran] Tony, Chris, and I discussed ways to deal with the fact that Chris only needs to run Makefile on his Web logs pages and does not have permission on some of the other pages which stops the Makefile and it does not continue. Chris and Tony discussed that there is a way to change the Makefile to continue rather than stopping. They both thought that I would still receive error messages regarding any problems. Chris researched it and changed the PMA Makefile. When I used it a bit later, I found that I did not receive errors, so I have temporarily changed it back. [Maureen C. Curran] Copied the new NLANR/MNA logo into the Images directory on PMA, aliased the old moat image to it (per previous email discussion with Jörg). [Maureen C. Curran] I updated the PMA map for the review presentation this week. I wanted to make sure all the sites were represented. [Jim Hale] The date problem with the m4 makefile/preprocess system showed up on PMA. We had this problem on AMP and Tony corrected by changing it so that the date process runs from the template itself. Tony sent me a copy of the new AMP template head. I updated the PMA template head and the date part of the PMA Makefile. Works fine now. [Maureen C. Curran] Updated a few more pages with the template and updated some links in the template. I also added the AMP favicon into the template. I have written some PHP code to store user preferences in a posgresql data base. The list of preferences, and how to display them is also stored in the database. The user can retrieve them via an inter-session cookie or with a username/password login. I am pretty pleased with that effort since it is my first use of PHP, cookies and posgresql. I updated the splash pages that Ben did so that they now use the template. [Tony McGregor] I have been working on the AMP data interface this week. I have written a PHP extension with the basic functionality for RTT data. It is modelled around the way databases are interfaced to PHP and returns a PHP resource that can then be used repeatedly to get the next data item as a PHP 'object'. It starts at an arbitrary time and continues across multiple files if required. I am implementing the time zone translation code, so that users can get the data in whatever time zone they want. That turned out to be more complicated than I expected. I need to translate from one time zone to another taking account of daylight savings changes in both time zones. (We cannot do what some systems do and require the user to select either standard or daylight time because our data is currently in Pacific local time--the new data will be stored UTC). I could not find code that converted from one time zone to another that I was happy with. Since we will be doing this a lot, I am writing code that will parse the time zone data files myself. It is mostly done; I can dump out all the data in the zone file and make sense of it. I fixed a bug with the template which meant that sometimes people could not get permission to run it because of an outstanding date file. I noticed a bug with the current Web pages where sometimes an attempt to fetch a page was made to the opposite system to the one it was generated on and fixed that. [Tony McGregor] Tony did a lot of the file conversions to the new AMP m4 template for the AMP pages. Worked with him on a couple of things, including changing the AMP navbar link for IPv6 from Matthew's voodoo machine to a Waikato server. Discussed the new AMP Web Services page and efforts by Warren Matthews with Tony. [Maureen C. Curran] While working on one of the reports, I spotted a problem with the template used for the AMP splash pages and had an email discussion with Tony about it; he has fixed it. [Maureen C. Curran] Edited language for AMP Web services page. [Lana Kennedy] ~ Handouts to be used for the Review, and later for other uses I am very happy and proud of the handout packages we created for the NSF review which were the result of my, Lana's, and Gail's efforts. We have great results from some long running projects, previous work, and lots of new work; they turned out great. Gail and I created seven new handouts, many of which used previous work of Lana's: [Maureen C. Curran]
Gail gave each page a fabulous look. She created a wonderful cover in heavy stock using a still frame of the spread of the Slammer worm that Tony and some Waikato students created from a Cichlid animation of the network. Also included in the handout package were the previous 10GigE/OC192 and International handouts, as well as the most recent issue of the NATimes. [Maureen C. Curran] Had an idea about the handout page on the Special Traces that I sent to Gail along with the relevant pointers. I thought a page using a flowchart/ schematic type design showing the interrelations between the different Special Traces that are grouped for comparative studies would be quite cool. [Maureen C. Curran] In developing the list of Citings/ papers which reference our work last summer, Lana had found some comments in the text of some of the papers. I asked her to retrieve these, as well as do the figures for the recent PAM, IMC, and SIGMETRICS meetings, where she found some more good comments to use in the handouts. [Maureen C. Curran] Created a handout on the new AMP PathVis 'divide and conquer' tool. Tony sent me the text for it; and it, too, looks very good. Gail and I decided to expand it from one side to two. We also turned the AMP graphs into a handout (this one needed to be on an 11x17 page in order to have the graphs line up). Tony and I rewrote some important parts of the AMP PathVis handout which really improved the clarity. It was full throttle with loads of work, but definitely with good results. [Maureen C. Curran] Worked with Tony about the December 19th daily graphs that show quite well all of the graphs working together. Turns out that Tony regenerated them, except that the event plot graph is no longer available. I tracked down my copies of these graphs from the NATimes article of a couple of years ago and Gail recreated the event plot. Tony and I have also been scouting around for another good example (not so easy to find). In the queue for sometime later, I will create a static Web page with these graphs to use to demonstrate them working together. [Maureen C. Curran] Gail and I originally planned to have the cover with loose handouts inside. After some feedback regarding loose papers and losing them, it was apparent that it would be a good idea to contain the sheets somehow. I thought some "add a pocket" adhesive-backed inserts might take care of the problem, but unfortunately the first set of covers had been printed and cut too small to accommodate the new pockets. So Gail reprinted the covers and Lana bailed us out by going to Imprints and waiting to have them cut (bigger this time), then put the pockets in. Gail was happy to know about this solution for her future reference when working with other groups. [Maureen C. Curran] At the student meeting when she got to see the final package, Lana was surprised to see so many things that she has worked on over the past year as part of various handouts. She gathered and compiled the citations referencing our work and did the resulting calculations to determine the percentages for the pie charts. Long ago she had spotted a comment in one of the papers about us and we made this a part of the process to look over the text for comments. As a result, we have five really great quotes taken right out of published papers, in addition to some other comments. [Maureen C. Curran] Tony picked up the finished handout packages to take on the plane. [Maureen C. Curran] Created slides for the NLANR MNA review. [Ronn Ritke, Tony McGregor, Jörg Micheel] I gave Tony some material for the NSF review. [Matthew Luckie] For the slides, I also extracted some stats from the AMP Web server about the number of hits, sites, sites over multiple days, etc. that are accessing AMP data. [Tony McGregor] Dry run through the NSF review slides; edits were made, based on suggestions. Completed slides were sent to Kevin Thompson so he can distribute them to the review panel. [Ronn Ritke, Tony McGregor, Jörg Micheel, Hans-Werner Braun] We did a dry run of the presentations for the NSF NLANR review. Ronn, Tony, and Jörg did good, in my opinion. I attended NSF panel review of both remaining NLANR projects (MNA and DAST) via video conference, and had various discussions with people prior and after the meeting. [Hans-Werner Braun] We had the NLANR review at NSF in Arlington. Maureen worked extremely hard in advance to prepare some additional materials in paper form, and those were well received. Chris finally sent through some legible HTML pages on PMA FTP user statistics on Wednesday night and I used those straight away for the presentation. Kevin Thompson, our program manager, had taken special care to assemble a team of reviewers who were competent and professional to provide a fair and supportive assessment of our work to date. In particular David Meyer (UoOregon/CISCO) as head of the review team offered very balanced views of the various issues in discussion. It has to be highlighted that a group of people exposed to our research work has a very limited time budget to provide a picture of rich contrasts and while some of the comments are valid and to the point, some of the suggestions and solutions proposed will require deeper analysis, and will possibly still have to be discarded. Particular heat was generated around NLANR's anonymization policies protecting individuals and organizations from being exposed to third parties. The discussion delivered very few new insights into a problem that is more than a decade old and has no obvious solutions. I took the opportunity to ask Paul Barford from Wisconson-Madison for a contribution to the traces archived and I was promised some data upon sending him an email reminder, which I did on Friday. [Jörg Micheel]
Ongoing measurement and analysis, networked data, and infrastructure support~ Servers, system disk, and upgrades - AMP Warren Mathews has determined a need for additional space to the calorie.nlanr.net machine for the AMP Web Services database. Tony moved the databases around on calorie to make some immediate space for Warren. Many concepts for the long-term storage of this data were investigated. Initially, we thought that the addition of a three hundred GB disk would provide the answer. Testing determined that the legacy system would not support the larger IDE Drives. The next solution appeared to be a 288 GB disk array from the old surplus PMA server put into service as a RAID 0. Turns out there were some issues with the kernel on calorie we will have to deal with in the future. Matt provided critical assistance in this. He pointed out that the ccd could be installed by modules. After this, the installation was simple. Tony copied Warren's database into the concatenated disk array. [Jim Hale, Bud Hale, Tony McGregor, Matthew Luckie] Much attention was focused on the latest security event at SDSC, which affected NLANR infrastructure operability and the HPSS. I noticed it when I could no longer connect to email. The longer-term effect was that changes occurred in the HPSS path which disabled the AMP data archiving process for a time. This came at a time when the AMP and VOLT data disk fill had reached the limit. However, working with the HPSS people, the issues have been resolved. The archiving process on both AMP and VOLT were successfully restarted, and when it finished, the AMP and VOLT data disk fill was reduced to the mid-seventy percent level. [Bud Hale] Work on the implementation of the HSI HPSS interface for AMP and VOLT archiving. The access issues from AMP and VOLT are resolved. I am looking at the HSI command set for use in the AMP/VOLT script. Coleen Shannon of CAIDA sent a helpful archiving script. Don Fredrick of the HPSS group is working to create access to the HSI HPSS interface. [Bud Hale] This week we were treated to a two day visit by Tony McGregor while he was here preparing for the NSF review. And, as usual, it was a productive visit. We had some brief opportunities to discuss the upcoming AMP software distribution and implementation. Also conferred with Tony some on the HSI interface requirements to the HPSS for AMP and VOLT data disk archiving. [Bud Hale] I spent some time this week looking at the system manager system on photon. I have experienced some anomalies with it over the last few weeks. It appears to fail to upload files when all the parameters are correct. I will look more at this in the near future. [Bud Hale] ~ Servers, system disk, and upgrades - PMA Continued to pursue the crashing issue of the new PMA server. I spent a good amount of time looking for answers to the problem. Tyan continued to have me pursue part numbers, apparently sourcing the problem to us not following the compatibility list. I found the best research in the FreeBSD AMD 64 mailing list. I saw a lot of issues due to the BIOS delivered with the system. Upgraded to the latest BIOS from Tyan and flashed the BIOS ROM. Jörg ran the data transfer that usually delivers the fatal blow and the machine continued to run. Now the new PMA data collector/server is working. We are using the old PMA server equipment on other applications such as the storage upgrade on calorie. [Jim Hale, Bud Hale] Jim is my hero of the week--after persistently chasing the crashes on the new pma.nlanr.net, he has found that a firmware upgrade may be the key to the solution. It appears as if there was a specific issue with machines running on 4GB of main memory, which holds true for our server. I have kept the box busy all week with copy operations from the old system, and am more than 3/4s of the way through, thanks to the Gigabit Ethernet link we put in place between the two systems. With any luck, we may be in a position to swap the main service over soon, if there are no further crashes of the system. [Jörg Micheel] Working on the new pma.nlanr.net server to enable various services. Great news, we have had no more instabilities and crashes. Jim's work on tracking down the BIOS upgrade did do the trick. However, AMD64 remains a challenge as a platform. I have been facing sudden crashes of the ftpd server and managed to track it down as a subtle varargs problem. I have been helped by Tim Robbins at freebsd.org who devised a fix; since then ftp services seem stable with all debug flags (-d -ll) turned on. I ported the dagtools as well, in order to use the high performance system to process the newly collected special trace files, and to fully migrate all services to the new PMA server. I think we are just at the edge, 64bit servers are still a bit of a challenge, if you are willing to make the extra effort you will be rewarded with performance and valuable experience on how to run your code on the new systems, but there is an initial price to be paid for it and you have to be willing to spend the time and effort to start with. The new pma.nlanr.net is ultrafast. [Jörg Micheel] Existing measurement sites maintenance and troubleshooting: A total of 16 remote sites in the NAI infrastructure received attention during this period: 11 have been resolved and the monitors are again collecting data. Five were still being investigated, or pending site action, at the end of the period. (Outages are considered "open" until the monitor is again collecting data.) AMP - 14 problem sites: 10 resolved, 4 open ** does not include Jörg's travel to sites: ~ AMP machines Site amp-csupomona was back online temporarily while CENIC created a new network at CSU Pomona. That was completed and the monitor was moved. System Manager is currently running to initialize the machine on the new network and to upload the new HPC.list file. I was able to initiate it and start it collecting data after the resolution of a small issue on the router by Ken Diliberto, the site technician. [Bud Hale] Site amp-ncsa-dca (NCSA Access Center, in Arlington) went down. It was power cycled by site technician Tom Coffin and came back online. The logs did not yield information as to why it hung. [Bud Hale] Another outage reported was amp-cudi (Internet2 in Mexico City). That monitor also was brought back online with a power cycle. No indication as yet as to why it hung and needed the reboot. [Bud Hale] Site amp-ukans (U. of Kansas) was taken off line for fear of viruses. Again, as has happened before, the trace route function was mistaken for port scans a campus firewall. I have discussed the AMP monitor security with site people and expect to have the situation resolved soon. They reported a severe virus spread on campus. Discussions with site people have not yet resulted in the monitor restored to online. I am continuing to persuade them to put it back online for examination. [Bud Hale] Two foreign sites: amp-taiwan (TANet2, Taiwan) and amp-rnpb (RNPnet, Brazil) went off line. Site amp-rnpb is still down. I am working with site people to resolve the problem. [Bud Hale] Two other sites went down: amp-alaska (U. of Alaska, Fairbanks) and amp-gmu (George Mason U., Fairfax, VA). I am working with technicians at both sites to resolve the problems. Repeated messages and phone calls with the site technician at amp-alaska. That monitor was on a network on DREN and had been blocked due to security concerns. The site technician moved the monitor to a more suitable network and I restarted it on the new network. [Bud Hale] A message was received from Steve Campbell of the amp-dartmouth site (Dartmouth U.), saying the AMP machine had been hacked and was serving files. The machine was powered down. I recall that Dartmouth U. had that same problem about four months back. It proved to be their scanner was mistaking traceroute for file serving. After some discussion, the machine was put back online for examination. After a somewhat exhaustive examination nothing was found to indicate the machine was compromised. I am continuing to work with the site people to determine what was revealed by the scanner to indicate it was compromised. Nothing solid has come out and the monitor is online and working correctly. [Bud Hale] Site amp-odu is out at this time. It was taken down by disabling the switch port, due to some network changes. Communications with the site technician indicate the netmask need to be changed from .192 to .252. The site technician had the machine back online shortly after the changes edited into the rc.conf file. [Bud Hale] Site amp-fsu (Florida State U.) was another site taken down due to security concerns, similar to amp-dartmouth. After some discussion with the site technician it was put back online. I examined it and found nothing indicating a compromise. As with the Dartmouth site I am asking the site technician to share the information and sources that caused the security concerns. I will report anything I learn from that. [Bud Hale] A brief outage occurred on site amp-unc (U. of North Carolina, Chapel Hill) to move the power connection. It was halted and powered down and back up without incident. During that exchange they discussed moving the monitor to a more suitable network. That is expected to happen in the next week or two. [Bud Hale] Other sites of concern are amp-utexas (U. of Texas at Austin) and amp-vanderbilt (Vanderbilt U. in Nashville). Site amp-utexas moved the AMP machine to a new network and required initialization on the new location. The amp-vanderbilt site has an issue with ssh logins. It is collecting data but is failing to upgrade due failure of ssh login. Site people reported that port 22 is not blocked but I expect more checking will be needed to get the block removed. [Bud Hale] ~ PMA machines We had a brief but productive visit by Jörg Micheel. Besides discussing existing passive monitoring sites we exchanged some thoughts about ways to cut costs of implementing passive monitor interfaces. As a result we may explore ideas with organizations such as S2io, Intel and Nortel as well as Endace. [Bud Hale] I completed the Gigabit monitor for National Center for Atmospheric Research (nai-p-ncg). The NCG monitor is back for installation. It appeared that the system drive gave up the ghost. [Jim Hale] The replacement Old Dominion University machine (nai-p-odu) was installed. As we connected to the monitor we noticed the monitor was seeing no traffic. It appeared the optical fiber from the splitters had been inserted into the wrong ports. After communicating this issue with Sheila Beilsmith, traffic became visible; as data collection began again, the switch port was closed. She reported that the ODU PMA machine had been taken off line by disabling the switch port. Some network changes caused a netmask change in the machine from .192 to .252. Sheila placed the monitor online, at which time Jim made the corrections to the netmask. The site is now online and collecting traces. [Jim Hale, Bud Hale] Working on preparations for Jörg's arrival. Jörg will be visiting some PMA monitor sites, especially to install a very complete collection configuration at IPLS in Indianapolis, and also to visit sites such as Pittsburgh Supercomputer Center. Gathering all the gear and preparing it for shipping has been a challenge. The two OC192 passive monitors were removed from the SDSC rack and packed with a CDMA timer, a splitter frame and mounting rails for shipping, along with a new 2650 from Dell to collect the remaining OC48 traffic. As the new monitor arrived I installed the OC48 cards and operating system and configured what I could before shipping the machine out with the rest of the gear to Indianapolis. [Jim Hale] Jim has been busy working on the IPLS site (Internet2, Indianapolis) OC48 monitor, spending endless hours on the phone with Dell to acquire the Dell 2650 and components. For making this happen I owe many thanks to John Everett, a very senior and very talented buyer in UCSD purchasing. We both spent much time and effort getting the equipment for this machine prepared and shipped. All the equipment shipped to the IPLS site arrived and was installed. [Jim Hale, Bud Hale] I managed to get the new SDSC passive monitor in and configured. Jörg started it spinning and it is now collecting data again. I always learn a lot from his visits. I got the new AMPATH monitor in time for Jörg's visit. Oddly, we have been running into network interface issues on all the new DELL machines. It has been a little time since I found a solution for these issues so I will have it figured out soon. [Jim Hale] A great many preparations for my trip to Indianapolis, West Lafayette, and Pittsburgh. Great support by Jim and Bud. [Jörg Micheel] Alongside the meeting in Arlington, Jörg traveled to several passive monitor sites to deploy new PMA monitors at some locations and revive others: Indianapolis Abilene backbone OC192/OC48MON instrumentation A very busy week indeed. I flew into Indianapolis on Saturday and went into the Qwest POP pretty much straight after arrival late in the evening. I was supported by Caroline Carver and Chris Small from the I2 GlobalNOC team at Indiana. Together, we did the first round of installation work, which involved getting the three Dell 2650s in place, connecting -48VDC, getting the fiber optic splitters into the links to Kansas City, Chicago (both OC192c PoS), and Atlanta (OC48c PoS). We also did initial work on the CDMA time receiver support. Early in the morning we were joined by John Hicks, who had arrived from Beijing that same evening and brought various supplies from the previous backbone installation two years ago. We convinced ourselves that the fiber optic light levels would be sufficient for the data capture and left the POP exhausted, but satisfied, at 4:00 am. [Jörg Micheel] On Sunday, I drove to West Lafayette to meet with Bill Cleveland at the Statistics department, Purdue University. We installed the AMP monitor and ran preliminary tests on the PMA monitor and the CDMA time receiver unit. For the evening we met with Mary Ellen Bock, the head of the department, a group of PhD students, and Sonia Fahmy. Sonia is an assistant professor at Purdue in Computer Science and she wants to focus her research on network anomaly detection and denial-of-service attacks. Sonia is finishing her fifth year in a teaching position and is looking forward to spending her full time on research. [Jörg Micheel] On Monday early morning we got hold of Scott Ballew, who is the local guru-in-charge of the central network infrastructure at Purdue. We found him very supportive and available on short notice to walk with us into the Telecommunications building to survey the site for installation of the PMA monitor. We did all the configuration in place, but did not connect the system. Scott promised to carry out all the final works by Wednesday, when a fiber cut was scheduled already. This system is planned to go into place at the GigE link which sees all of the Internet2 traffic towards the Indy GigaPOP, as well as the legacy traffic towards Switch-and-Data, which connects in a complicated way via Indy to Chicago on an OC3c link. [Jörg Micheel] Having done all that could be done in the time available I rushed back to Indianapolis for the meeting with the I2 folks. There we met with Rick Summerhill (Assoc. Director Backbone Infrastructure for Abilene), Matt Zekauskas, Guy Almes, John Hicks (TransPAC), and Chris from I2 in Washington DC. Everyone was keen to hear the latest about our progress on the night from Saturday/Sunday. I was also grilled on numerous questions regarding our anonymization policies, trace data ownership, various collaboration ideas and plans with the gear while at IPLS. We agreed on the following agenda: [Jörg Micheel]
As a side note, the link to Atlanta is going to be upgraded at some stage and we may require a third OC192MON. Another request came in regarding support of Abilene's IS-IS measurements, and I have since confirmed with Hans-Werner that we would be in a position to support such network-internal works, as they do not violate any privacy concerns of ours. The same group met on Tuesday in Bloomington for further discussions, and from some of the feedback I gather that NLANR might be in a position to obtain further data sets. Some of the links are not likely to go 10Gigabit, and we would be in a position to use existing spare lower speed equipment to plant monitors at some of the new international links in the planning. Good news for the budget and future measurements. [Jörg Micheel] Tuesday morning we collected various pieces of equipment, and with Caroline Carver finished nearly all of the remaining works at the Qwest POP in a three hour shift. Various cabling works needed completion and rework. One of the OC192MONs did not successfully connect to the Ethernet switch--the I2 folks are looking into it. The other two monitors are all ready to go, and an initial trace run was successful right before I left for the airport to fly over to Pittsburgh. [Jörg Micheel] Early on Wednesday morning (6:00 am) I met with Kathy Benninger at PSC to work on the Pittsburgh GigaPOP OC48c monitor. We carried out quite a few smallish tasks and this visit proved crucial: we doubt we would have ever got this system to work without teamwork. Turns out this link is operating at 1550nm wavelength, which is very unusual, and we had to convince ourselves that both splitters and DAG4.2 cards would support it. After having swapped the splitter for a NetOptics 96142-20 we got lucky and the signals appeared to be at the right level. One of the cards would not come up reliably, and it turned out that a bit error was systemic and the card needed to be replaced. Jim sent one straight away from SDSC. Next, I worked on running the CDMA 1PPS signal into both of the cards, which took some soldering and chasing stairs at PSC up and down. Finally, we had to disable SMTP and upgrade sshd to OpenSSH 3.4.1p1 to keep PSC net administrators happy. You would think this is simple from inside PSC, but if a machine does not meet the required security criteria, they do not even allow for outbound connections, so one would be in a position to fetch and upgrade the needed packages. With a lot of copying via other machines we managed to fix that just before I had to rush off to the airport to catch up with Tony on the flight to Washington Reagan. [Jörg Micheel] On Friday morning I flew together with Ronn back to San Diego, where we were met by Jim at the airport (thanks Jim!). We used the six hours to work through various parts of the infrastructure. We convinced ourselves that the PSC monitor with the new OC48c card is now working. We reinstalled the SDA GigE monitor together. Jim informed me of his progress in getting the NCG box back into the field. Together we worked on the new Dell-based AMPATH system. I took an hour to have a longer phone call with Hans-Werner to discuss various project related matters. [Jörg Micheel] Overall, we have successfully installed, or nearly installed, six PMA monitors this week, all of them with CDMA time support, plus two new AMP monitors. Two more (NCAR and AMPATH) are in progress. This appears to be a new record for the team. [Jörg Micheel] ~ Management And Administrative Weekly NLANR/MNA managers conference calls. [Hans-Werner Braun, Ronn Ritke, Tony McGregor, Jörg Micheel] Reports. [Maureen Curran, Lana Kennedy, Mike Gannis (January)] Kept the two AMP FTE final candidates apprised of where we are in the process of conducting the references (barely begun). Talked to Tony a couple times about various aspects of the process, including an extended discussion regarding the references. [Maureen C. Curran] - 30 - |
|
|||||||
| ||||||||