NLANR/MNA logo

Summary of Research Activities - December 2003


Development and distribution of measurement and analysis tools

~ Continuing development of new metrics and real-time analysis for PMA

Did more development on my real-time tool. Worked up the flows on the sensor, it is coming along quite well. When I received comments back from the PAM2004 reviewers, I did additional work on the sensor, reflecting the comments. (Not accepted for the workshop, but I will try elsewhere as opportunities arise.) When SDA is back online, I plan to move some development back to it from the MAX machine. [Chris Gross]

Further email discussion with Pere Barlet at UPC Barcelona. Pere will be down here in Hamilton on January 6th to work with me for the next two months on various real time analysis work. We laid out a plan for period he is going to be here as below.  [Jörg Micheel]

  • Port SMARTxAC to other link layer technologies, OC3c/OC12c/OC48c ATM/PoS
  • Run performance tests at various link speeds, with different test patterns, assess the real-time performance
  • Develop a version which will display anonymized data, general use
  • Make SMARTxAC run on NLANR PMA monitors for display on the Web, starting, say, with the MAX OC48MON.

~ Progress on the reimplementation of AMP and the development of a new testing architecture

Completed as much if the IPMP code as I can without an implementation of the information exchange protocol. Matthew is going to give me that. I then implemented arbitrary file transfer, from the AMPlets to the server. This is needed for some of the more complex tests which I am expecting to include in the second release. Xing is working on pathchar for AMP and needed that. He also needs a way to start the pathchar server on demand.  [Tony McGregor]

Improved the mechanism that is used to terminate the transfer server and to check whether it is possible to restart the server (i.e., if the socket is free). This is done as part of the test suite and the new code speeds that up a lot. I am still having a few problems with that. [Tony McGregor]

Currently working on the command interface that allow tests that require a server to be run. Progress was slow because I ran into a bug while passing a structure to a thread. It turned out to be a race with the termination of the calling stack frame that I had not thought about. [Tony McGregor]

~ New Path Display tool - divide and conquer graphic (pathviz)

Finished writing and debugging the classes which handle the timestamp and data from the traceroute files. I modified the way files are read in so that now the process is much cleaner and can be used for all instances of file input I need, except the machine list. The algorithm to read in the list only requires a few lines of code, so I separated that out to make things clearer, and then spent quite a bit of time debugging it. I was having difficulty with pointers and passing char arrays to my functions, but after looking at some reference material, and speaking with Ben about C++ syntax conventions and memory allocation, I was able to figure out my problem and get that part of the code working properly. [Lana Kennedy]

Spoke with Tony to discuss data analysis. Wrote some preliminary code: drawing some pictures helped me visualize what I am doing. I put in code which does (right now) a basic comparison of the paths. I know I am going to need to add some more functionality to it, but for now I am just trying to get it to run and give me some sort of textual output. Continuing to work on debugging. Also installed the GD library and gave a bit of thought to the drawing output. [Lana Kennedy]

~ IPMP ~ I participated in the discussion of IPMP on IMRG a little. I have a few points I would like to make in the future. [Matthew Luckie]

~ Progress on the reimplementation of the Cichlid 3-D Visualization System Activities

Worked on integrating the project with automake/autoconf. The tools are very complicated and integrating them with Qt is proving difficult. I spent some time experimenting with the ACX_PTHREAD autoconf macro. Worked on the Datastream class, patching at least one of the leaks that it was suffering. [Ben Reesman]

Wrote a few more tests for the network threading elements. I am also trying to learn CVS, to make the sources available for others to evaluate. [Ben Reesman]

Worked on unit tests. I am changing the network threads over to use the boost.org threading library because it is seamless between pthreads/win32, and I never got the custom wrappers I wrote to work as well as they do. [Ben Reesman]

Activities extending the Network Analysis Infrastructure (NAI) in support of new and developing HPC needs

~ Special Traces

Took another long trace file at the University of Auckland, which is now instrumented with a DAG3.5E. Data collected is between 100MB/hour (night) and 500MB/hour (day), both directions. Since we have some 70GB space, about 7-10 days contiguous should be possible. I am working on getting the data off the monitor once we are finished capturing, I do not think copying over the Internet is a good option. The Auckland-8 data set seems like a success, it ran for 14 days contiguously. I have started postprocessing the trace, and managed to get the file size down to 55%, which looks like the entire data set will be somewhere around 40GB compressed. It is really showing that the PC itself is five years old, it takes about 3 times longer to anonymize and recompress the data then to scp transfer from Auckland to San Diego. Continued the postprocessing and the copy process to pma.nlanr.net, as well as archival to the HPSS. I am done with about 14GByte, or one third, of the overall data set. ftp://pma.nlanr.net/traces/long/auck/8/  [Jörg Micheel]

Have looked at a number of systems which became active lately, including FRG (Front Range GigaPOP) and NCG (new NCAR Gigabit tap). Both collecting data now. NCG is particularly busy, so I took the opportunity to collect a long data set at the site, for one hour, 7GBytes, uncompressed. NCG is connected to a SPAN port (I had a look at the data) and hence the timestamps might be distorted and some data could get lost on the switch before reaching the monitor. I would prefer we had a direct link tap (I believe we sent splitters with the box) but am reluctant to trade a working monitor against the uncertainty of making the link tap work. ftp://pma.nlanr.net/traces/long/ncar/1. [Jörg Micheel]

I am in the process of retrieving the Leipzig-II data set which was collected by Klaus Mochalski some time ago. This data set is of interest as is a two-point measurement at a university access link, thus comparable to Auckland-6, but with very different physical link settings and traffic characteristics. Data at ftp://pma.nlanr.net/traces/long/leip/2/.  [Jörg Micheel]

Initiated a long trace at MAX (MAX GigaPOP) to have a fresh long OC48c trace, and the machine hung with a SCSI disk error after about 40 minutes operation. Unfixable from my end, I have asked Jim to get in touch with Dan Magorian to see what they can do about it. [Jörg Micheel]

~ New (and developing) strategically important measurements and deployments

Shipped an AMP monitor in response to the request recently received from TANet2 in Taiwan. [Bud Hale]

Jim and I discussed the planned Internet2 AMP monitor addressing with Kevin Walsh (SDSC ENS/OPS). The issue is who is to create the IP address space. Kevin asked that the issue be settled the following week when most of the people involved, were scheduled to be at SDSC. Conversations with all involved (Kevin, Matt Zekauskas, Bud, Jim, Ronn, etc.) were held. Followed up with Matt Z., he reported that he has it resolved and will be assigning the addresses very soon. [Bud Hale, et al.]

Four new PMA deployments are in various states of construction, including the OC192mon for Internet2:

OC192 ~

Contacted Internet2's Rick Summerhill to introduce myself and give him a status report. Heard from Stephen Donnelly at Endace that he wants to do some testing on the machine. The OS on the OC192b machine was corrupted so the system needed reinstallation and reconfiguring. Arranged for the Adtech with Kevin Walsh. I installed two OC192 cards in the OC192a machine. The machine is up and connected to the Adtech for the continued development of the issue that gave us problems with two card systems. Early next period Stephen Donnelly from Endace will connect to the machine and install the new image and adjustments to get the two measurement cards working on the same machine. [Jim Hale]

The OC192MON is being prepared for Indianapolis. Endace has finally caught the Last Bug(tm) which now enables us to put a pair of DAG6.1 cards into the same Dell 2650. I am hearing the second pair of cards is also being returned from rework back to San Diego. [Jörg Micheel]

nai-p-i2a, Internet2 GigaPop in Ann Arbor, MI ~

This OC48 machine is on site in Ann Arbor, and Matt Zekauskas and Dan Pratt are making the connections. Matt reported that the machine had suffered considerable shipping damage by Federal Express. However Matt indicated he has or can fix the damage to make the machine function properly (and it was not as bad as he originally thought). Jim was able to work with Matt to get it connecting to the Ethernet and determine the machine is operating within expected parameters. [Bud Hale, Jim Hale]

nai-p-psc, Pittsburgh Supercomputer Center, Pittsburgh, PA ~

The OC48 machine and splitter were received after an unknown delay in Memphis. Kathy Berringer notified me she received the monitor. It is in good condition and should be installed soon. Received the CDMA timing unit and will be shipping it out shortly.  [Jim Hale]

The PSC OC48MON is coming along nicely, Jim is on the ball. Had some email exchange with Chris Rapier, who is office mate with Kathy Benninger, working on some future measurement proposal. I wonder how much opportunity for collaboration with NLANR there is. [Jörg Micheel]

nai-p-sda, San Diego Supercomputer Center, SDSC ~

Researched and purchased new server equipment and components for the machine. This Gigabit Ethernet monitor uses the Dag4.3GE cards from Endace and requires a system board with PCIX slots. After assembly of the machine and the installation of the four 80 GB SATA hard drives, it turns out the back plane controlling the drives is not functioning. Am arranging with SuperMicro to ship out another. [Jim Hale]

~ IPv6 and IPv6 Scamper

Made excellent progress on Scamper's path MTU discovery function (PMTU support). It seems to be working pretty well, although some work is still remaining. Bill Owens has offered to help me out with a 9k MTU clean path by installing an OC3 ATM card into a Linux machine and plugging that directly into a router. [Matthew Luckie]

amp-mtu's IPv6 feed was changed from a 6bone tunnel to native on Abilene. I changed their IPv6 address and sent a new HPCv6.list file out to the IPv6 AMP monitors. [Matthew Luckie]

before:   http://amp.nlanr.net/active/cgi-bin/v6_linkcomparison.cgi?from=amp-nysernet&to=amp-mtu&date=103.11.4

after:   http://amp.nlanr.net/active/cgi-bin/v6_linkcomparison.cgi?from=amp-nysernet&to=amp-mtu&date=103.12.19

I would like to try and get a few runs of the new release of Scamper done so that I can spend some time analyzing the data and creating a list of better IPv6 routes between same pairs of IPv6 addresses. I would like to make the "better routes" data public, and then measure a month later if things have "improved". I think this could be potentially an interesting paper. [Matthew Luckie]

Outreach, application support, utilization improvement, and documentation activities

Received notification that one of my PAM2004 abstracts, "Flow Clustering Using Machine Learning" was accepted. [Tony McGregor]

Gina Intrilligator (SDSC External Relations Dept.) has posted Mike and Ronn's article on our international efforts on the SDSC Web site. She created a great image and tag line. It was on the main SDSC page for a week or so, it is now available in the news section at:   http://www.sdsc.edu/Press/features/120203_NLANR.html

~ Presentations and Conference/Meeting Participation

Attended and presented at the ISMA Bandwidth Estimation Workshop. There were a number of good papers and discussions held. The presentation immediately before mine, "Cross-traffic: noise or data?" by Dina Katabi of MIT (who incidentally used PMA data in her work) was the highlight of the conference for me. My talk had a few tough questions at the end. I tried to present a theme that even if we could isolate the delay contributed by individual segments, increasing the signal to noise at later hops, then capacity estimation of those links is still very hard for reasons that we know already: the packet tailgating pair has to follow through the network as a pair, and this is hard to do on links where the egress is an order of magnitude larger than the difference between the size of the packets. [Matthew Luckie]

Matthew and Tony attended the ISMA Bandwidth Estimation Workshop (by invitation only). Bud, Jim, Ronn, Maureen and Mike sat in on Matthew's presentation.  

Prepared for and attended the Performance Measurement Architecture Workshop 2003 at SDSC. [Ronn Ritke, Tony McGregor]

~ Collaborations and activities supporting network research

Surasak Sanguanpong from Korea has been talking to Ronn and me about setting up an AMP mesh using their own hardware but our software. I would like to have the new AMPlet code out in time for him to do his deployment over the next couple of months. Bruce Morgan from AARNet, Australia, is attempting a deployment like this also. [Tony McGregor]

Met with Vijay Samalam, SDSC's new Program Director for Networking. He asked to see how NLANR/MNA might be able to help with performance monitoring tools for Grid computing and for me to sit in on a few meetings with Grid people. Hopefully with the existing infrastructure, we can provide some needed data. [Ronn Ritke]

Exchanged a few emails with George Michaelson (APNIC) about his desire to support the Scamper project through running it from various locations in the APNIC region. He is going to be running Scamper from Brisbane (AU), Japan, and Hong Kong. [Matthew Luckie]

Spoke with Matt Zekauskas re the Ann Arbor Michigan PMA deployment at the I2 Gigapop, he has the splitter and will test the PMA machine. [Ronn Ritke]

Worked with Sevcan Bilir, a new PhD student with The University of Texas At Dallas. We discussed him using our traceroute data in his research. I am not sure yet exactly what he is after or what he is doing. [Tony McGregor]

Spoke with Greg Cole. NSF just released a press release on the Gloriad Project and Greg would like to include some text on the NLANR/MNA collaboration. I sent him some draft text and he will add to it and post it online.   http://www.gloriad.org/  [Ronn Ritke]

Phone call to Tony regarding a number of things, including collaboration with a group in Thailand. [Ronn Ritke]

Spoke with Brian Tierny about the GGF measurement standards. He was happy to hear this standard will be used for AMP data. [Ronn Ritke]

Tue NLANR call about possible future collaborations with the DAST group. [Ronn Ritke]

Worked on PAM2004 abstract reviews. [Ronn Ritke, Jörg Micheel]

~ Documentation, networked data, publications

Developed the list of activities which will be highlighted on the new NLANR poster. It includes AMP software repackage goals, OC192 plans, OC48 traces, longer traces, real-time analysis, IPMP, Observatory project with international collaborations map at bottom. The idea is to provide a couple of sentences and/or a URL for each topic. Maureen began writing some of the needed text. [Ronn Ritke, Maureen Curran]

Editing on the NLANR International Successes article in response to new developments, and conversations with Vijay Samalan. We will be turning the NLANR/MNA International Successes article into a white paper. We met several times, and created a first draft. We will continue to organize the text and add to sections like the collaboration text where needed. With the Gloriad project going public, we can add text on Gloriad to the white paper. Also worked on modifying text regarding NLANR/MNA's activities at SC2003.  [Mike Gannis, Ronn Ritke]

Developed and created the final design style parameters for the MNA, PMA, and AMP Web pages. I reviewed some of the simpler elements of cascading style sheets (i.e., CSSI) and incorporated them into the page templates as embedded and inline style links to HTML. We do not want to use the advanced style sheet tags/style or external style sheet because a large number of our core audience do not use browsers capable of rendering them correctly. [Maureen Curran]

Learned about m4 macros and makefiles, structural templates, etc., checking the GNU pages and some tutorials and the templates that Tony created for me. Then I expanded on these originals and began to develop full templates that can be adapted for AMP, PMA, and MNA pages. Sent Klaus some info on the m4 stuff. Set up the htmlm4 template with comments, made format changes in the Web templates, test drove them - it works great! This is going to save me loads of time in creating, maintaining, and updating Web pages. Met with Tony when he was here and we went over his m4 makefile examples, showed him the one I developed for AMP. He answered several questions I had and helped me streamline the comments that I had written for the defines. [Maureen Curran]

Tony and I discussed the Splash page template and integrating it into the AMP templates (which cover both types/sizes of AMP pages). Began working with Ben on integrating the m4 makefile system with the splash page format. This will not be trivial, as there are significant differences between the underlying structure of the splash pages and that of the regular AMP pages (including the data ones). [Maureen Curran]

I had a long chat with Ben about the mechanism we should use for preserving state for the new AMP Web pages. We want something that works well over a range of different user types, in particular users who do not use cookies. [Tony McGregor]

Worked on the new AMP Web interface. I am writing a small library of methods to allow all AMP pages to use the same templates and to enable consistent session management. I had a long conversation with Tony about the project and we discussed many details. I am working on a way to force PHP to manage sessions in the URL rather than via cookies; I have not yet figured this out completely. Began experimenting with PHP's facilities with XML for the purpose of storing user data. I also worked on the AMP site code. [Ben Reesman]

Met with Maureen; she showed me the m4 html stuff she is working on, which is really interesting. Sat in on a meeting with her and Ben regarding his script and how it fits in with the templates she has. I will be using the m4 system for the Citings and other pages. [Lana Kennedy]

Did many Web development related activities, working towards revamping the PMA Web pages. Created drafts of five of the new PMA pages and posted them in the drafts folder. Jim set me up so I now have the access I need on the PMA Web pages directories. I fixed the bad Software links on the PMA home page. [Maureen Curran]

Researched and trial and errored different ways to redirect Web pages. Learned loads about redirects and aliases. Set up the 100 pixel navbar feature images in a new directory and created aliases for the current ones, including a new one I created for the NATimes latest issue. Thanks to help from several folks, decided on the best way to handle the rotating "more info" links in the navbar. For these, I will be sending Hans-Werner new redirects to paste into the config file. I plan to change them every two weeks, or so, once the system is fully in place. [Maureen Curran]

As I have been posting new pages, I have been using the new MNA page template to create pages. I received feedback that my intention/goal with the design of the navbar, using the rotating featured images, is working. Todd sent me an email saying that he liked the Citings page. When I asked him about it, it turns out that he was on the Meet the Team page, saw the pie chart image in the navbar and clicked to find out more info.  [Maureen Curran]

Met with Kevin Thompson from NSF about the new PMA and AMP Web pages.  [Ronn Ritke]

Discussed Chris's plans for the working prototype of his real-time tool and the Web logs project. [Maureen Curran, Chris Gross]

Ongoing measurement and analysis, networked data, and infrastructure support

~ Servers, system disk, and upgrades

AMPstaff discussed the implementation of the serial console on AMPlets. Worked with Matt to research the implementation of the serial console and it developed to be quite straightforward in FreeBSD. The kernel we ship already can do it, we just have to make small adjustments to two files (/boot.config and /etc/ttys). This may be something to be deployed on all AMP sites since it allows for the use of a laptop instead of a keyboard and monitor to examine misbehaving AMP monitors. This capability had been requested by site amp-surf (SURFnet, in Amsterdam), it has now been implemented on the SURFnet monitor. [Bud Hale, Jim Hale, Matthew Luckie]

The SURFnet people are interested in NLANR implementing a remote or automatic reboot on the machine because it is difficult for them to get it done manually. Therefore, we have also been researching up-to-date methods of remote rebooting. We have found a product from KVMSwitches Inc., that appears to be the solution. We will do some more research and may recommend the use of the product on some sites. We are testing an IP addressable power controller. With an additional assigned IP address we can remotely reboot a machine. [Bud Hale, Jim Hale]

We continued to test a new AMP monitor system board this week. It has been performing well, with no failures or anomalies. Researched alternate boards with a built-in GigE interface. As Tony suggests, this will become more important each passing day. [Bud Hale, Jim Hale]

Archived the VOLT server, but slow progress on that resulted in its only dropping to the mid-eighty percent range by the time of HPSS maintenance. I decided not to re-start archiving on VOLT until it again reached the full level. I discussed the apparent data rate of the VOLT archiving with the SDSC HPSS people. Their response seems to be as in the past. They continue to recommend we explore switching to a different interface system such as HSI. The AMP server was archived as well, and it did not perform very well either, only being reduced to the low-eighty percent level. The increase in data collection has caused the archiving to be needed on a two to three week interval. Both Jim and I are pushing to get the new AMP/VOLT servers delivered and online. [Bud Hale]

Spent time on problems that arose on the purchase of the new AMP and VOLT servers from RackSaver. As noted in earlier reports, we had initiated the purchase of the machines to a quote from RackSaver. After receiving this a short time back, we analyzed it and discussed it with Tony McGregor and Ronn Ritke. We spoke about it with Tony during his visit the first week of November and he approved the purchase. The purchase order was initiated and RackSaver was informed. However, the sales representative at RackSaver, Uros Vukovich, stated RackSaver could not accept the purchase order to that quote and re-quoted the machine almost three thousand dollars higher. After much re-examination and discussion, we discussed it with Cody Lutsch, who committed to accept the first quote and deliver the equipment to that price. We have had a very good relationship with RackSaver (CPP) over several years and hope it continues, but we remember it is a competitive business and we can go elsewhere when price and service dictate. [Bud Hale, Jim Hale]

For most of the long traces work, which involves UNIX terminal sessions running for days in a row, I have started using the screen(1) utility, so that I can keep sessions running while I move my laptop back and force from home to work and vice versa. Turns out to be very useful, especially if you have an unexpected network hang; your foreground screen might disconnect from your ssh session on the server, while the actual session you care about keeps running uninterruptedly. I had trouble with those situations in the past where my ADSL modem would pick up a different IP address every time the DSLAM got rebooted, thus ruining a good chunk of leg work, sometimes 2 or 3 times a day. [Jörg Micheel]

Existing measurement sites maintenance and troubleshooting:

A total of 23 remote sites in the NAI infrastructure received attention during this period: Twelve have been resolved and the monitors are again collecting data. Eleven were still being investigated, or pending site action, at the end of the period. (Outages are considered "open" until the monitor is again collecting data.)

AMP -  15 problem sites:  9 resolved, 6 open
PMA -   8 problem sites:  3 resolved, 5 open

~ AMP machines

I am happy to report that AMP site outage is back to levels enjoyed a few months back, before the rash of worms (blaster and nachi). [Bud Hale]

The failed machine at amp-AMPATH-mia (AMPATH GigaPop at Miami) was replaced and the replacement machine was started. [Bud Hale]

Site amp-surf (SURFnet in Amsterdam, Holland), has had frequent network disconnect problems, requiring reboot to bring it back online. By connecting keyboard and monitor, site people verified the machine was running and had merely lost the path to the default router in the routing table. Since the monitor is in a very remote location they suggested that we provide serial console capability, and were kind enough to help me implement it. The serial console capability allows a better diagnosis to determine if it is a network connection anomaly or a machine problem. After some false starts we have the access working well now. Besides having access to the arp and routing tables when it fails, we can quickly bring the machine back online. My experience on this late in the period was that the site loses connection about every two days. I worked with the site people to implement the "out-of-band" access to the AMP machine; the login account was set up such that I can now gain console access to the monitor when it has lost IP connection. The site appears to lose IP connectivity with the OS continuing to operate. We traced that to the apparent loss of the routing table. Of course, rebooting the machine restored the routing table and the connection. Through this console access we will be able to examine the machine when this anomaly re-occurs to search for the cause. This anomaly has been suspected in other monitors in the past. It was suspected to be a network anomaly but without access to the AMP machine it was difficult to diagnose.  [Bud Hale]

One troublesome site outage is amp-mit (Mass. Inst. of Tech). As reported quite some time back the machine was disconnected and set aside while the entire facility was rearranged. However, Jeff Schiller, the site leader, has promised to get it reinstalled soon. This has happened before and we will persist until it happens. Jeff has assured us they are working on it and now have it re-mounted in a rack. Next, they will boot it single user and edit the rc.conf file for the new subnet. [Bud Hale]

Site amp-utah went down this period. A power outage occurred on the Univ. of Utah campus, and shortly after the power came back on, the power supply in the AMP monitor failed. We shipped a power supply to the site technician, Joe Breen, and he had the machine back up very shortly. [Bud Hale]

Site amp-uwyo (U. of Wyoming) reported they needed to move the AMP monitor to a new subnet. I edited the new IP and GW into the rc.conf file and halted the machine. The move was completed, but when the monitor was reconnected the new router was blocking ICMP echo requests. I was able to get a router hole opened to the AMP monitor for ICMP traffic to and from the machine. I will start the system manager to upload the HPC.list file with the new IP for amp-uwyo. [Bud Hale]

Site amp-ucf (U. of Central Florida) went off line this period. Investigation revealed ICMP traffic had been blocked at the router. It is understandable that this would stop ping responses, but it also halted ssh logins. ssh login was restored with a power cycle of the monitor. However, ICMP traffic is still blocked. The site technician is working on a router hole for ICMP traffic, but there is no answer as to whether or not this affected the ssh login. Perhaps this is another routing table anomaly. The site continues to be down with unresolved network issues. [Bud Hale]

Site amp-unin (UNINet in Thailand) is showing an outage. A message has been sent to the site technician. [Bud Hale]

Minor and transient outages occurred at sites amp-okst (Okla. State), amp-ucsb (UC, Santa Barbara), amp-rnpb (RNPnet, Brazil), amp-penn (Penn. State U.), and amp-startap (STARtap at U. of Ill., Chicago). These required little to no corrective action and all are back online. [Bud Hale]

Sites amp-colostate (Colorado State) and amp-fiu (Florida International U) experienced outages. Both were brief and quickly corrected. The outage at amp-colostate was caused by an inadvertent power disconnect. The outage at amp-fiu was caused by a failed switch port. [Bud Hale]

Outages at sites amp-memphis (U. of Memphis) and amp-uc (U. of Cincinnati). Site amp-memphis reports a power outage, and that it will not be corrected until later. No report from amp-uc as of this time. [Bud Hale]

~ PMA machines

This was a productive time with PMA sites, especially when, working with site technicians, three PMA sites were corrected. As is generally known, PMA machines are often located in unattended sites. That makes it difficult to get assistance with the machine to make corrections. Site support people must travel from their normal locations to the unattended site. In one case, our support contact Scot Coburn traveled approximately 100 miles, from Boulder to Denver, CO. We had support at three of the needed sites. Those were: nai-p-ncg (National Center for Atmospheric Research GigE monitor), nai-p-frg (Front Range GigaPop in Denver) and nai-p-amp (AMPATH GigaPop in Miami). Logged in to PMA monitors and tested the interfaces; communicated with site people by email and phone. We were able to correct the problems in all three monitors. [Bud Hale, Jim Hale]

When Scot Coburn (National Center for Atmospheric Research) made the trip to the Front Range GigaPop to install the replacement PMA machine (nai-p-frg) for that site, he informed us that the container he received had arrived in bad shape. After examining the contents, he noticed that despite the secure packing of the equipment, the front cover of the new machine was broken off, indicating very rough treatment during transit. He was able to repair most of the damage and install it. However, we were unable to make a connection, not even able to ping the monitor. After examination it was clear the machine was not booting completely. It appeared that a connection - though glued before shipment - had dislodged in the data disk array and was preventing the machine from booting entirely. Scot worked quite a while with us, and we finally had the machine online and one Dag3.2 card was working. However, the other card could not be made to work. Scot will be returning to the site soon and he will use a Dag3.2 card from the old machine to replace the non-working card in the new one. The machine is online and functioning, though it is only using the one working card.
 [Jim Hale, Bud Hale]

We had made arrangements for Donnie Sakosky to help us get the National Center for Atmospheric Research Gigabit Ethernet Monitor (nai-p-ncg) and planned to have him transfer a file (which I had emailed to him) to disk and transfer the file to the monitor to restore ssh. After a little research and experimentation, I figured out that with the addition of a single command line we could bring the machine back up. With a little instruction to Donnie, he was able to bring the unit up in single user mode, add the command and give us access to the machine to complete the corrections to the corrupted files and get this machine back on line. [Jim Hale]

The monitor at Ampath (nai-p-amp) suffered a failure on a measurement card. We shipped another set of cards, for a bit it looked as though we might have to ship the monitor back for repairs. But at the last minute the technician was able to get two cards operating and we were able to get the monitor back reading again. [Jim Hale]

Have looked at a number of systems which came online lately, including FRG (Front Range GigaPOP), NCG (new NCAR Gigabit tap) and AMPATH (AMPATH Florida). AMPATH was unlucky, the link configuration seemed ok, but one of the cards is playing up, I hope it is a simple refit, but someone local will have to attend to the monitor. FRG and NCG are both collecting now. NCG is particularly busy, so I took the opportunity to collect a long data set at the site (see Special Traces section above). [Jörg Micheel]

nai-p-tau, Tel Aviv U. Israel - This site has been offline waiting for site people to install and connect. Much time has passed waiting for this to happen. In recent conversations with Hank Nussbacher, I was told that support has been completely withdrawn and the installation there will not go forward. I have explored other possible connection locations there and Hank has responded very negatively. I will be retrieving the equipment from this location unless other possibilities present themselves. [Bud Hale]

nai-p-fla, U. of Florida at Gainesville. The network/computer facility at this campus has been under re-work for some time. The PMA machine there was disconnected and removed from the rack. It has now been assigned rack space. Site technician Matt Grover reports that progress is being made, and promises to complete the reinstall very shortly now. Recall that the re-work at the site was also an equipment room remodel with everything going into new racks. [Bud Hale]

nai-p-buf, U. of Buffalo. This machine was disconnected due to lack of confidence in the security and the value of the data. These issues were not addressed by NLANR in a timely fashion. However, continued communications with this location indicates they would be willing to reconnect the monitor. However, we have not had the success we had hoped for. They reply that security is their cause for denial. We checked to see if Kevin Thompson can contact them and have a positive effect, and they will get additional information via email to answer their questions. [Bud Hale, Ronn Ritke]

The MAX GigaPop site is down. At this time it appears to need a reboot. This event with the MAX machine has sparked more discussion of the implementation of remote rebooting and access, especially for sites such as MAX, where just a reboot takes a major effort on the part of the site people and can take days or even weeks to make happen. [Bud Hale]

Have also had a go at the MAX OC48MON and have got it close to capturing, just the dagsplit tool is giving me a headache with a strange coredump, something must have got broken with the last fix we put in there. [Jörg Micheel]

I have had a good sweep around the infrastructure. Two monitors had been collecting empty data sets for some time, TXG and MRA. This appears to have been fixed, the machines are ok now. [Jörg Micheel]

In summary, the following PMA machines are fully operational and are ready to collect traces:

nai-p-aix, Ames Internet Exchange, Moffett Field, CA
nai-p-anl, Argonne National Labs, Argonne, Ill
nai-p-apn, APAN HPC, Chicago, Ill
nai-p-bwy, Columbia U. New, York, NY
nai-p-cos, Colorado State, Fort Collins, CO
nai-p-mem, U. of Memphis, Memphis, Tenn.
nai-p-mra, Merit Comm., U. of Michigan, Ann Arbor, MI
nai-p-nca, National Center for Atmospheric Research (OC3mon), Boulder, CO
nai-p-ncg, National Center for Atmospheric Research (GigE), Boulder, CO
nai-p-odu, Old Dominion U., Norfolk, VA
nai-p-osu, Ohio State U., Columbus, OH
nai-p-txg, Texas GigaPop, Texas A & M U., Houston, TX
nai-p-txs, Rice U., Houston, TX
nai-p-max, Mid-Atlantic Crossroads GigaPop, Washington, DC
nai-p-amp, AMPATH GigaPop, Miami, Fla.
nai-p-frg, Front Range GigaPop, Denver, CO

~ management and administrative

An important issue has become damage done by Federal Express in shipment. The packing of the nai-p-frg machine (discussed above) was as good as it could be. We have been gluing parts into place as well as adding internal packing to the machines. From the description of the damage to the package, it must have been dropped three or four feet. We have always been at a disadvantage with damaged packages in Federal Express shipment: because we are primarily focused on getting the machine installed and working and therefore, always work through it. However, we have now resolved to take this matter to Federal Express. If we must find an alternate shipping method we will. [Jim Hale, Bud Hale]

Lengthy phone discussion with Ronn, we went over various project management issues. My biggest concern is that we need to get out of a mode where monitors can fail in the field, and I have highlighted three areas: ramp up the quality of the systems that we ship, in terms of using robust chassis, where connectors and cards cannot shake lose or get damaged; the issue of using motherboards and IPMI modules to be in a position to shut down and reboot systems in the field remotely, and proper packaging of the systems before shipment, if necessary, add another layer of carton and stuffing to ensure that the systems arrive at the destination in working condition. I have suggested various chassis, motherboards, solutions that we have found to deliver satisfactory results from my work here for Endace Technology in the last 3-4 month. [Jörg Micheel, Ronn Ritke]

AMP FTE, Software Engineer ~

  • Wrote an email as a preliminary search re the AMP FTE programmer and sent to several campus contacts. In the process, put together a list of email addresses for CS advisors across departments, the SOLO folks (student organizations), the special programs advisors, and the Career Services folks, etc. Thanks to a helpful email from Sean, I was able to expand my list to include Career Services and all CS focus students, including Cog Sci, Math, and Physics. [Maureen Curran]
  • Received half dozen or so emails and resumes as a result of my contacts forwarding the preliminary search info. Met with Tony when he was in town and we went over the resumes/applicants. Two look promising, so we set up interview times for them on the following Monday. Tony will write a new (more difficult) coding skills test after he returns to NZ, which I'll administer to the applicants. Arranged with Ronn to use his office and the machine that Klaus used when he was here (for the tests). Due to Tony's illness (kidney stone) we cancelled the two interviews we had set. Corresponded with new applicants, forwarded their resumes to Tony.  [Maureen Curran]

Tony McGregor and Matt Luckie paid a visit to SDSC. They were able to attend our weekly staff meeting in person and had some interesting things to share. Tony demonstrated some progress on the Cichlid visualization. Matt and Bud worked together on the implementation of the serial console on AMPlets. Tony met individually with each of us to discuss various AMP related efforts while he was here.

Lengthy meeting to discuss NLANR/MNA objectives with Ronn, including visions for OC192 and international activities, the NLANR budget and project overview. Discussions with Tony when he was in town re state of AMP and NLANR/MNA in general. [Hans-Werner Braun]

Weekly NLANR/MNA managers conference calls. In addition, multiple planning and strategy discussions regarding specific plans and aspects of both the AMP and PMA projects. [Hans-Werner Braun, Ronn Ritke, Tony McGregor, Jörg Micheel]

- 30 -

see link to more info...

more info...

 
Home

AMP:  Active
Measurements


PMA:  Passive
Measurements


Citings: Data Users

Publications & Resources

Meet the Team

Feedback

 
see link to more info...

more info...

divider line
Back to the Top       last modified: 3/26/04        Comments and questions are welcome:   Feedback .
acknowledgment