NLANR/MNA logo

Summary of Research Activities - March 2004

line

Development and distribution of measurement and analysis tools

~ Continuing development of new metrics and real-time analysis for PMA

Pere Barlet and I met in order to finish off some loose ends and report on his work on SMART, a real-time analysis application performing at Gigabit speeds. What we have achieved in these two months of work can be summarized as follows.  [Jörg Micheel, Pere Barlet]

  • a real-time application capable of doing packet classification according to a number of different criteria (see below), recording those into an RRD database and displaying those at different levels of aggregation.
  • the smallest time interval displayed is 20 seconds, with graphs showing the performance over the last hour, day, week, month, and year
  • the performance of the engine generally depends on the following factors:
    • the level of sophistication of classification (flows, etc)
    • the traffic mix (diversity of IP addresses, etc)
    • traffic load (packets/etc., bytes/sec, ..)
    • the frequency of snapshots of the database (currently 20 seconds)
    • (remotely) the ability of the graphing Web server to produce the PNG graphs and HTML pages

During the last two weeks we have been successfully demonstrating the performance of the application on one of the OC192MONs at SDSC and have been sustaining about 1.5 Gigabits/sec over an interval of more than five minutes without loss. Unfortunately, in the nick of time, and due to other circumstances, we have not been able to stress test the application against a commercial traffic generator, to learn more about its performance characteristics and limits. [Jörg Micheel, Pere Barlet]

Continuing the list of challenges, it is worth noting that the interface to RRD is not well structured at present. Since SMART needs to drive (minimally) a pair of DAG cards, it is designed multithreaded, and there is, at present, no published interface to RRD, let alone one that is tread safe (reentrant). Things are left to chance for the moment. [Jörg Micheel]

The HTML and PNG graphs produced also deliver an enormous amount of detail, which requires the user to define quite a bit of structure up front, then enter into a trial-and-error fine tuning of the definitions for the database. Also, there is no dynamic way to add structure, only to completely redefine the database (a limit of RRD). In time, we would want user selectable structures (Web interface) to make this a tool which can be used by network operators on a day-to-day basis. [Jörg Micheel]

Sample graphs here:  http://pma.nlanr.net/Special/tera2.html

Having said that, I think that SMART is a significant achievement and I am looking forward to continue to work with Pere as he is back to Barcelona. [Jörg Micheel]

Continued testing my tool and implemented some new features, such as to collect from multiple interfaces at the same time. [Pere Barlet]

I took a look at my sensor code and fixed up some bothersome code, such as command line interfacing and the such. I also took a look at flows again. [Chris Gross]

~ Progress on the reimplementation of AMP and the development of a new testing architecture

I have continued on the debugging of the amplet code. At the present time I am pretty sure I am nearly there. The code has been happily testing away to a bunch of machines for a few hours without problems. I do have a problem to fix with transferring IPMP data, but I am pretty sure I know what that is, I just have not got to it yet. [Tony McGregor]

I am very hopeful I will be done coding soon and will have enough docs for a release.. Mind you, I keep finding more little jobs to do before it goes out. One outstanding issue is whether I should integrate Xing's throughput testing code into this release or a slightly later one. There is huge pressure to get it out to a bunch of people so I may take the quicker option. [Tony McGregor]

Much to my frustration, the amplet code still isn't quite ready to go beta. It seems that right at the last moment I keep coming across problems that I really need to fix before releasing it. [Tony McGregor]

The beta for the amplet package is finally done. I wrote up some release Web pages. I am just waiting on details of the appropriate public licence to release it under. Filled in the copyright release form for the amp code. [Tony McGregor]

I went through all of Xing's code this week. (Xing is the programmer here who has been working on AMP throughput and bandwidth estimation tests.) We have worked through a bunch of small things I wanted to see changed before integrating his code into the main distribution. It looks like Xing has dealt with most (or all) of those this week and we can probably release a second beta version of the AMPlet package with scheduled throughput tests soon.  [Tony McGregor]

Activities extending the Network Analysis Infrastructure (NAI) in support of new and developing HPC needs

~ Special Traces

Postprocessing the Auckland-8 data set. To my surprise quite a bit of tool rework had to be done, most of the graphing applications showed problems using the Endace ERF format. I have taken the same approach as in ipanon in terms of making them capable of processing any type of file format using a mixture of fixed C struct's and some code to determine context. Seems to be working well. Walking through the 15 days of data kept pma.nlanr.net busy for a good day and we have something over a thousand .png graphs with lots of interesting details to be explored. Draft page at http://pma.nlanr.net/Special/auck8/. Presently all pages index into a one hour display of various properties. Because it is hard to look for interesting features in such a way (342 individual pages), I have also computed daily displays. I am later intending to also produce pages which only display one property, but have multiple graphs along the time axis, such as bandwidth, packets, flows, for hours and days in a row.  [Jörg Micheel]

I have also worked on simi liar displays for the CESCA-I data set, but have not gotten nearly as far as with Auckland-8, due to time (and computing) limits. [Jörg Micheel]

Focus this week was on postprocessing the recent data sets we collected. I now have the application-related graphs for Auckland-8 in place. Have worked on enabling the dagflows support for ERF file format and done most of the processing on Auckland-8. Standard graphs for the CESCA-I data set are now also in place. It is surprising how smooth those graphs look like, it appears the more traffic the more steady the various characteristics are. Auckland tends to be much more eventful, in comparison. The pma.nlanr.net is presently busy day and night to compute all the graphs. I am waiting to start the processing for the San Diego-I and Tera-I data sets. [Jörg Micheel]

The analysis of Auckland-8 is now complete, we have over 3200 PNG graphs online, details at http://pma.nlanr.net/Special/auck8/. A total of nine different graphs are produced, on an hourly and daily basis, and those are indexed in two different sets of HTML pages, once on a time basis (each set for one hour and one day, respectively) and another set per each graph, displaying a particular property along the time axis, with hourly and daily graphs. [Jörg Micheel]

I have done the same processing with the CESCA-I data set and run into serious issues with the pma.nlanr.net data server on flow analysis. The process to calculate flow data would reach 350MBytes in size and with the server only having half a Gig of main memory would go into an endless loop of paging in and out and never finish for more than 48 hours in a row. Jim came to the rescue just in time and upgraded the server to 2 Gigabytes, the process now finishes in less than two hours and I am done with all the graphs, details at:   http://pma.nlanr.net/Special/cesc1.html (at the bottom). [Jörg Micheel]

I have used the opportunity and started computing graphs for NCAR-1 as well, there is an old bug with xmgrace that prevents the PNG graphs to be produced and I am now into debugging this program, which will take a bit longer. [Jörg Micheel]

I took a long trace at the OC192c machine and the first trace run failed due to some 2GB file restrictions in dagsplit (from dagtools). I have done the necessary changes to tools and makefiles and the second attempt was error free, we collected for more than 12 hours in a row (2x140GB). Unfortunately, space constraints on pma.nlanr.net prevent us from keeping those monster traces at the moment, hence the upgrade in progress (as above). [Jörg Micheel]

~ New (and developing) strategically important measurements and deployments

The 10 new AMPlet machines for the Internet2 backbone nodes have been picked up and are ready for final configuration. The IP addressing was received from Christopher Small. At this time I anticipate that I will use that information to edit the sites into the database. One of the new machines is up online as the amp-sdsc2 test site. I expect the machines for the Internet2 sites will start going out very soon. [Bud Hale]

This week the AMP machine shipped to Taiwan some time back was finally released from customs and installed. I am replacing a temporary machine that was created locally there, and I have initialized the new machine. It is now online, and collecting and transferring data. [Bud Hale]

Followed up on request from Brian Court for AMPs on CENIC hops. [Ronn Ritke]

OC192 ~ When Jörg took a break from collecting on the OC192a machine at SDSC. I took the opportunity to troubleshoot the traffic problem we were having on the other OC192 monitor (OC192b). After some confusion trying to figure out why there were no working fiber optic signal meters in all of SDSC, I finally figured out when I saw the cables to the router dangling connected to nothing (my last thought was that someone would unplug them from the system without so much as a little notice). I never even looked till I finally concluded there was no other possibility. Well I was told I would be reinstalled this week. But as of the time of the writing of this report, it has not been done. [Jim Hale]

I did communicate with Rick Summerhill regarding the installation of the OC192 monitors. It appears that Matt Zaukaukas and Rick have been out of the country for a time and incommunicado. Though I did reiterate when a few details are worked out and meeting are set. He wants to be included in them. [Jim Hale]

All the OC192 network problems have been worked out. Placement is now the primary interest. [Jim Hale]

It appears that we need to make a concerted effort to get systems into the field. I have written an email (following Jim approaching Rick Summerhill) to Matt Zekauskas and Steve Corbato to ask if we can go ahead with instrumenting Indianapolis (Abilene backbone). I have suggested two different scenarios, one just with one OC192MON, the other instrumenting all three (2x OC192c, 1x OC48c) backbone links into IPLS. [Jörg Micheel]

I have also sent an introductory letter to Pacific North West GigaPOP, in an effort to perhaps be in a position to place multiple OC48MONs and lower speed systems in a prestigious location. [Jörg Micheel]

Another thought that crossed my mind was to use PAM2004 in France to announce the availability of further monitors. This should go in line with some preliminary work for a Web site where we explain in detail our procedures and conditions, so that those are public and anyone can see them. It appears that we should also post a troubleshooting guideline for getting the splitters into place. [Jörg Micheel]

10GigE AMP ~ We began research to gather information in support of the assembly of an 10GigE AMPlet. We are interested in taking active measurements at this speed. CISCO has expressed interest in this. Les Cotrell (SLAC) and Phil Dykstra (WareOnEarth) are interested and engaged in pursuing this as well and have been very helpful. Have also spoken with Reagen Moore (SDSC). [Tony McGregor, Bud Hale]

~ IPv6 and IPv6 Scamper

I made some small modifications to Scamper to make it handle IPv6 anonymous routers (routers without global unicast interface addresses that address each other with site local or link local addresses). In this case, IPv6 routers send ICMP errors back with the source address set to the destination address of the probe - i.e., they spoof the source address of the response. This is pretty easy to detect - if you get a TTL expired message from the address you are probing, then you have hit an anonymous router. I had to change the trace termination condition to check for this. Recent versions of Scamper (that have not been used outside of my own Scamper runs) terminated on receiving any packet with the source address of the ICMP set to the target IP, as I found many IPv4 targets responding with TTL expired messages. [Matthew Luckie]

I started writing code using my Scamper library to read in Scamper traces and discover alternate paths between pairs of IP addresses. I will continue writing this code. [Matthew Luckie]

Wrote a program that picks out alternate paths out of Scamper traces. The idea is that short IPv6 routes may use tunnels over IPv4, which may not be the most direct route and lead to a higher RTT despite being a shorter [AS?] path. [Matthew Luckie]

The code is pretty much done. It does a depth first search, identifying possible locations of an alternative path as paths with a start node with >=2 entry nodes, and an exit node into >=2 nodes. From there, it takes the candidate list and removes entries that cannot be grouped into source and destination sets, and then filters out paths that have the shortest alternative route embedded in an alternative route i.e., the paths A B C D E and A B F D E are removed because the alternate path is between B and D, not A and E. [Matthew Luckie]

I am testing the code by feeding it Scamper traces that Henk @ RIPE NCC was kind enough to supply a few months back from a series of TTM monitors. [Matthew Luckie]

I did some work on my alternate paths detection code. The initial code held all the candidate paths in an array, which made insertion of unique paths difficult and deletion of duplicate paths/prefixes more difficult. I replaced the array with a splay tree for unique insertion and the array with a list for deletion. The code runs pretty quick now. The last bit of code I intend to write for this application is to cull out candidate paths that will not be alternate paths as I do my depth first traversals. [Matthew Luckie]

I wrote an application called `ring', which sends IPv6 strict-source routed packets to measure the round-trip over a specific route. If we have a route A B C D and an alternate path A E F D, then we can measure the relative delay between the two routes by the difference in RTT returned from probing D via A B C and A E F. The only difference between the RTT will be the one-way delay incurred between A and D. ring uses UDP probes to high numbered ports, although it would be simple to do ICMP6 ECHO as well (as ping6 can do). [Matthew Luckie]

Using the last Scamper run produced by William F. Maton, I found 178 alternate paths. I have probed a few with my ring tool to check that it works. [Matthew Luckie]

ring -a 2 -t 1 2001:400:0:61::2 2001:400:0:50::2 3ffe:80a::e 2001:2a0:0:bb0a::1 probe 0 ICMP from 2001:2a0:0:bb0a::1 ttl 53 type 1 code 4 rtt 205 ms probe 1 ICMP from 2001:2a0:0:bb0a::1 ttl 53 type 1 code 4 rtt 205 ms

ring -a 2 -t 1 2001:400:0:61::2 2001:400:0:80::2 3ffe:2900:a:4::1 2001:440:1239:1003::1 2001:2a0:3:7::4716 2001:2a0:0:bb0a::1 probe 0 ICMP from 2001:2a0:0:bb0a::1 ttl 51 type 1 code 4 rtt 277 ms probe 1 ICMP from 2001:2a0:0:bb0a::1 ttl 51 type 1 code 4 rtt 276 ms

Here, the second alternative route between 2001:400:0:61::2 and 2001:2a0:0:bb0a::1 is about 70ms longer. We can also confirm that these routes actually exist, as we strict-source routed the probe through them. [Matthew Luckie]

I am working with William F. Maton - who has been collecting Scamper runs for about 4 or 5 months now from one source to a constant target list - to generate data that will enable me to compare IPv6 routes that have changed over time. [Matthew Luckie]

My idea on how to cull paths out of the candidates before I run a (small) system out of memory failed, but not before spending the best part of a day on it. Live and learn, I guess. [Matthew Luckie]

I did some work on ring, trying to get it runnable on Linux so I can get William F. Maton to run it for me. It works perfectly on FreeBSD, but not everyone else lives in a perfect world. The glibc people have not got the inet6_rthdr* functions which provide an interface to adding routing headers to packets with a call to sendmsg, despite the RFC describing these functions being published in 1998. I can apparently pass some magic to sendmsg which will give me the same end as using the inet6_rthdr functions, but it is frustrating nonetheless. [Matthew Luckie]

I got a v6 tunnel of my own from Hurricane Electric and ran ring across a bunch of alternate paths. The code seems to be working well. I have been working on primitive analysis scripts, and the results seem to be encouraging so far, in that the mechanism I use for source routing seems to work just fine. [Matthew Luckie]

type 1 code 3 count 4  (Destination Unreachable, Address Unreachable)
type 1 code 4 count 124 (Destination Unreachable, Port Unreachable)
type 1 code 0 count 2  (Destination Unreachable, No Route)
type 1 code 1 count 6  (Destination Unreachable, Admin Prohib)
type 3 code 0 count 20  (Time Exceeded)

The table above says that the port unreachable messages I am soliciting seem to be provided, which is what I want. I will be doing more analysis work to make sure that the packet is being source routed all the way to the address I want. I want to keep working on this area of Scamper research, as I think there's a good paper here. [Matthew Luckie]

Outreach, application support, utilization improvement, and documentation activities

~ Impact: Data Users and Citings

The following paper briefly discusses IPMP in the context of network troubleshooting from the user's computer. The paper itself talks about what kind of network support would be necessary to debug loss / delay /re-ordering. Interestingly enough, they make the point that if IPMP had a field in the path record for a per-flow packet counter, you could identify the point of loss in a network using a stream of packets. This seems like something that would not be present in most of today's routers, and I am not sure how feasible this is in the fast path. Sections 3.2.2 and 6.2 are the interesting things for IPMP, but the whole paper is worth a read. [Matthew Luckie]

User-level Internet Path Diagnosis, Ratul Mahajan, Neil Spring, David Wetherall, and Thomas Anderson SOSP, October 2003.   PDF

~ Papers

A positive email from Klaus Mochalski, our paper on microscopic traffic analysis has been accepted for the Terena Networking Conference 2004 in Rhodos, Greece, in June.  [Jörg Micheel]

Met with Chris to discuss his abstract and possible options other than submitting to APAN's Network Measurement Workshop. We decided to wait until Chris has a chance to run more tests as well as more comparative research. [Maureen Curran]

Comments from Peter Arzberger and Teri Simas on the NLANR/MNA white paper. Mike Gannis has included those edits. I sent the white paper to Greg Cole for review, and finally emailed the final draft of the White Paper to Kevin Thompson, Doug Gatchell, Bill Chang and Dr. Fredrica Darema. [Ronn Ritke, Mike Gannis]

~ Presentations and Conference/Meeting Participation

Attended the CENIC Meeting in Marina Del Rey CA. Chris Thomas from UCLA will install a new Gig interface for the UCLA AMP machine. Met with Greg Hidley and will meet with him again to discuss the OptiPuter project. I distributed handouts on the 10GigE PMA traces and copies of the NATimes. [Ronn Ritke]

~ Collaborations and activities supporting network research

Been in touch with Nevil Brownlee, who was keen to get one of his students at Auckland introduced to the Auckland VIII data set to start some studies of his own. It will be interesting to get some feedback, in time. Nevil also promised to keep nagging ITSS to reconnect the monitor to the access link, the connection was lost during some network reconfigs end of January.  [Jörg Micheel]

Brief email exchange with Bill Cleveland (formerly Bell Labs) who is now at Purdue. We discussed his success in being able to instrument a link at Global Crossing and new opportunities for joint work in collecting and publishing more trace data. [Jörg Micheel]

Email conversations with Nick Duffield from AT&T Research Florham Park, asking about IPv6 traffic traces. We do not have any such in our program as yet, although all traces contain enough information to gather the total amount of IPv6 packets and bytes (and relative to other data). [Jörg Micheel]

I exchanged email with Eric Boyd and Warren Matthews about getting Warren's Web services code onto AMP. That went very well and Warren is going to do some work on it. I created him an account on AMP and VOLT, explained the machine set up, installed SOAP-Lite and generally tried to get things sorted for him to work on the project. It is a big relief for me to have him do that, as I was wondering when I would get the time. We are also trying to organize a meeting to discuss whether there is anything that we can to between OWAMP and AMP. [Tony McGregor]

I was contacted this week by Rasmus Hansen from Dante about possible cooperation between us and them in measuring their network (GEANT2). I am going to meet with one of his colleges when I am at PAM. [Tony McGregor]

Spoke with Vijay Samalan (SDSC Networking Director) about possible work with the OptiPuter project. [Tony McGregor]

Meeting with Alan Blatecky, Vijay Samalan and Hans-Werner. One action item is to schedule a meeting with Greg Hidley and others from the OptiPuter project to collect information. [Ronn Ritke]

Manager call - Vijay Samalan joined the call and gave Tony and Jörg background information on the OptiPuter project. [Ronn Ritke]

Met with Greg Hidley and Aaron Chen on Wed to discuss the OptiPuter project. Jörg was able to attend this information gathering meeting. [Ronn Ritke]

Wrote to Mark Allman about IPMP. [Tony McGregor]

CISCO expressed interest in the 10GigE AMP. Phone call to Tony about a faster AMP interface as a follow on to a meeting with Bob Aiken of CISCO. [Ronn Ritke]

At Ronn's suggestion I have started thinking about a 10 Gbps AMP for a possible grant request to CISCO. I have asked Bud to do some work on that. [Tony McGregor]

I had a talk with Les Cotrell from SLAC about 10 GigE cards and perhaps working with us on a 10 GigE AMP. Also exchange email with Phil Dykstra on the same issues. Both are very switched on people and it is great they are interested in working with us on this. [Tony McGregor]

I also talked to Les Cottrell of Stanford Linear Accelerator Labs. Les has been doing evaluation of 10GigE interface cards. And Les is interested in possible collaboration in the research. Les also said he is anxious to help in any way he can with or without being included. Les had me discuss his interests with Reagan Moore, here at SDSC, since it involves data transfer performance and management from the SLAC lab to the Storage Resource Broker. [Bud Hale]

Continued to research the availability of 10GigE network interface cards in support of the 10GigE AMPlet interest. Continued research to gather information to support the assembly of a machine for the 10GigE AMPlet. Also had more discussion with Les Cottrell of Stanford Linear Accelerator Lab who is interested in collaborating on the project. [Bud Hale]

This week, during discussions with Reagan Moore abut the 10GigE AMPlet, I discovered some possible interest in passive monitoring of links used to transfer large amounts of data from labs such as the Stanford Linear Accelerator to third party data storage such as the Storage Resource Broker. I hope to learn more on this with time. [Bud Hale]

Conference call with Greg Cole. Tony and Jörg participated on the call and both expressed interest in the GLORIAD project. The group in Russia will submit a formal request for an NLANR AMP. [Ronn Ritke]

Planning for JET conference call . [Ronn Ritke]

~ Documentation, networked data, publications

Ronn and I discussed creating a handout for the CENIC meeting highlighting our OC192mon/10GigE efforts. I pulled together all of the background text and current info I had and sent it to Ronn and Mike to create the draft of the handout. Fortunately I had recently created a time-line formatted history of our OC192 activities which made gathering the material quite easy. I would like to eventually have a time line overview and current status on each of our major sub-projects (and have them posted on the Web, archived off the Latest Pings and Highlights sections). Worked with Gail, who designed a great layout for the handout, on getting them produced, proofed the final version, then sent them off by FedEx (along with 100 copies of the NATimes) to Ronn's hotel (to arrive tomorrow, Saturday). So, thanks to the efforts of several people (Gail, Mike, Ronn, Jörg, and myself) we put together a really nice one page handout on our OC192/10GigE efforts, including the Tera I and II traces. It is two sided with the International one page handout on the other. I plan to post these on the Web sometime soon. [Maureen Curran]

I worked with Maureen Curran, Mike Gannis and Gail Bamber on the one page handout for the CENIC meeting next week. [Ronn Ritke]

Worked with Ronn Ritke and Maureen Curran to create a one-page flier on NLANR's OC192MON measurements on the TeraGrid, to be distributed at the CENIC meeting in Los Angeles. [Mike Gannis]

Discussions with Ronn Ritke about an article on NLANR for EnVision, and some discussions and preliminary graphics work on posters and handouts for the upcoming PAM 2004 conference. [Mike Gannis]

Michelle Merrill (front desk) called to let me know that they were running out of NATimes in the lobby; sent some over. [Maureen Curran]

User comments project: in anticipation of the May review meetings, Ronn asked me for the comments from users that I have gathered from various sources including Tony and Jörg forwarding them to me. These are scattered throughout my various and sundry citings and collaborations files and email folders. Went through my citings, pmaINFO, and ampINFO text and email archive files looking for possible quotes from users, grabbing the entire context (sometimes a lengthy email), then went through this first cull and edited out unnecessary text, keeping the context and date, and sorted as AMP or PMA related. Sent to masg. My plan is to make posting of data users comments part of the summer projects that Lana and I will be doing. Will ask for permission (and possible additional comments, and perhaps an NATimes article) of each before posting. [Maureen Curran]

Made progress on the PMA Special Traces pages. Fixed Leipzig I and II; they now work across all browsers, including older versions of Netscape (thanks to Jörg who ran HTML Tidy on them, a much better solution to my original plan of hand placing the needed quotes in the link tags). Made some minor changes to the templates and cleaned up the Makefile, adding a line re need to add new pages in two places. Began work on the other pages. Also changed the format for the teaser navbar images for the Special Traces by adding the trace name to them (this will hold us over until much later when I can create the page with the full images and descriptions). Added the trace name lines to the Leipzig I and II, CESCA-I, and SDSC-I images. Lana recently created several wonderful navbar size images for the new traces. [Maureen Curran]

For the PMA Web pages, did not have time to do as much as I would have liked, but I did create the 555 pixel images of the new address-less CENIC and TeraGrid topology images that Jim sent me and uploaded them. Also sent a note to Felix Hernandez-Campos who had been looking for information on our new traces a month or so ago, gave him the new index page pointer. Sent a redirect for this page (from the old temporary one) for Jim to incorporate into the Apache HTTP server configuration file. I wanted to have a way to direct the Special Traces users to the new traces that Jörg has been working on and posting and still keep the clean lines on the index page. So I added a new line after the introduction and before the list of traces that lists those traces which are "recently added" with links to the brief information further down the index page. Talked to Chris a bit about the Web stats work and the bug he is working on. [Maureen Curran]

By email Jörg and I discussed having different versions of the index of Special Traces (alpha, chronological, and grouped for comparison analysis). We only got to wave hi when he was here, so after I create the pages, I will get more comments from him about the comparison grouping by email. [Maureen Curran]

Sundry updates to the Web tree (for the several new special traces recently published). [Jörg Micheel]

Spent some time with Weblogs. [Chris Gross]

Warren seems to have been making good progress on the Web services implementation. He needs information about the CPU type and speed of the test boxes involved in a particular test. I installed x86info, on all the AMPlets and updated the database so it has fields for the information. I went through all the forms and I think I have updated them all to take account of the extra field in the database. I also wrote a Perl program (actmon:src/pinger/updateCPUinfo.pl) that works through the database a site at a time, remotely runs x86info, extracts the CPU type and speed and updates the database. So that information should be available to Warren now and we can update it from time to time by running the script. I also asked Ben to send Warren the new data interface specification. [Tony McGregor]

This week I worked on improving our AMP libraries in several ways. First, I redesigned the interface to the datalibrary to correctly support time zones, and to automatically query the database to discover what time zone the files are stored in. [Ben Reesman]

I have also made progress on an interface to the database that will be available seamlessly from C and PHP. I have started drafting a new version of all the AMP maps to use this new code rather than the somewhat less elegant solution that we have now. [Ben Reesman]

Tony and I have settled on an interface to use for access to AMP data and AMP database information from C language callers, and I have written bindings for it in PHP as well. I have written a reference implementation for this library which allows its use under current AMP implementations, though it will be modified in the future to support changes to the AMP server implementation.

This library provides the following services:

1. Query the database for information about meshes and machines in those meshes.

2. Retrieve measurement data from the AMP server from any of the test types, machine combinations, and time ranges.

3. Operate correctly across time zones, specified separately for the AMP server data as well as the user.

4. Cache said data aggressively enough to allow for real-time usage from live Web-page generating scripts and services.

This library will provide the foundation for all future development of AMP data pages and other live-content pages on the AMP Website, and any custom AMP site deployed by users of the amplet code. [Ben Reesman]

I spent time trying to debug the tracefile reading code in the library and throughly test the remaining methods. The hops in the tracefile were being garbled before making into the testing script. I found and fixed several bugs. I also autotooled the library from scratch. [Ben Reesman]

Talked with Ben about the PHP data and database accessing libraries he is working on. Also talked to him for a couple of hours about the data interface and the splash page. [Tony McGregor]

Ben and I spent a couple of hours on the phone discussing the new data interface library for C and php. It looks like we have got a first attempt pretty well specified and he is confidant of implementing it over the next few days. [Tony McGregor]

Ongoing measurement and analysis, networked data, and infrastructure support

~ Servers, system disk, and upgrades - AMP

I did some initial performance tests between the two new amp servers. The FreeBSD one is beating the Debian one well and truly. However, the Debian one is running on a 2.2.20 kernel so I have been trying to build a new kernel for it. That has been proving to be problematic without being there to see the error messages. [Tony McGregor]

The new AMP/VOLT server installation is proceeding. The final step is the port assignment and connection to the 198.202.74 network. That is in process. [Bud Hale]

Jim and I got the new AMP and VOLT servers installed and connected this week. At weeks end Jim was doing the final install and configuration on the FreeBSD unit. As requested by Tony, they have minimal installs of FreeBSD and Debian Linux. [Bud Hale]

The new AMP/VOLT servers are installed and working. Jim worked out the details of Tony's access to them. I did some more research on 10GigE PCIx network interface cards. This is in support of the possible proposal to create a 10GigE AMP monitor. [Bud Hale]

As discussed in earlier reports, new servers to replace the AMP and VOLT servers are installed in the NLANR racks in the machine room. At this time Tony is evaluating and testing to find the best OS and configuration. Following which those machines will replace the existing AMP and VOLT servers. This will relieve the need for the so frequent archiving of accumulated AMP data to the HPSS. [Bud Hale]

NetOps has completed the router assignments we need to complete the installation of the new Debian and Free BSD AMP servers. [Jim Hale]

I worked on some details on the new AMP servers. Tony was unable to connect and there were some issues with the TCP Wrapper. [Jim Hale]

As reported last period the issue of drivers for the 3 Com NIC chip on the ASUS system board under test was resolved. Following that Jim and I started a test of the board under the AMPlet configuration. That test has proceeded without incident and the board appears quite acceptable. Earlier this week I informed our sales representative at RackSaver to proceed with the existing order using the ASUS boards. On Friday, while I was in the field on HPWREN, that sales representative sent an email message reporting that board also had become unavailable. [Bud Hale]

We learned Friday that the mother boards we ordered for the new AMP monitors have been discontinued (unbelievable). I suspect the boards we use to replace this board will function with the AMP software after Matt's upgrade of the BGE driver. [Jim Hale]

Jim and I continue to search for a replacement system board for AMPlets. With response from RackSaver Inc. being less that hoped for we started looking at an alternate supplier. That alternate supplier is System Design Computer (sdcom.com), on Miramar Road. This supplier is about the size that RackSaver was when Hans-Werner first developed them as a supplier. Even though RackSaver prices have remained competitive I feel they may have outgrown us and are not able to give us the service and support as in the past. sdcom.com has suggested a GigaByte Inc. board 8I845GV. They supplied a board and Jim and I placed it in test. It is working well. It uses the Intel 845 chip set. But the onboard NIC is the RealTec 100 Mbit instead of the BroadCom Gigabit we had hoped for. AMPlets on gigabit interfaces will require add-in NICs as in the past. [Bud Hale]

We are finally moving ahead with the acquisition of AMPlet monitor machines. As previously reported the acquisition will be from SDCom (System Design Computers) at 8282 Miramar Road, San Diego, 92126. Attempts to acquire these machines from RackSaver have been canceled. Machines of this first acquisition are intended for the I2 sites. If this order goes smoothly and timely I will acquire an additional five units as recommended by Tony and Ronn. [Bud Hale]

AMP and VOLT data disk fill is going as expected. [Bud Hale]

~ Servers, system disk, and upgrades - PMA

With Jim and Ronn we have discussed a proposal for the next pma.nlanr.net (server), and we'll be building something similar to the machine that we had discussed with Hans-Werner last week in Ramona, based upon and Opteron and SATA RAID. [Jörg Micheel]

We installed additional memory to the PMA server to deal with really inadequate processing capabilities on the PMA server. I also tried to add an additional processor, but for some reason the motherboard refused to accept it. I have been getting some quotes together as possible replacements for the PMA server. [Jim Hale]

I am in the process of collecting prices, parts and permissions for the assembly of the new PMA server. Jörg has put together a very interesting list of components. I am looking forward to starting work on this machine. I am expecting to have it installed in a little over a week. [Jim Hale]

Worked to implement Jörg's request to increase the RAM of the PMA machine to two gigabytes. This is needed to support trace analysis processing on the machine. [Jim Hale,Bud Hale]

Continuing to pursue developing a test bench for the Dag cards and PMA monitors. Kevin Walsh configured the equipment Wednesday, and I finally began tests on the Endace Dag cards using the Spirent SMB 2000 unit. I was able to gain access into the testing unit, get it producing data and collected the data on the Dag 3.5 cards installed in the AMPATH replacement machine and the 3.2 Dag Cards installed in the new Old Dominion University machine. I am now shipping the new monitors to their prospective sites to begin collecting. [Jim Hale]

Jim has accomplished the setup of the Spirent 2000 machine to test the Dag3.5 cards before PMA machines are deployed. This will be a big plus in the future to know that the Dag interfaces are working before the machines are deployed to participating sites. The Dag interfaces have now been tested for replacement passive monitors for the nai-p-amp (AMPATH GigaPop in Miami) and nai-p-odu (Old Dominion U.) in Norfolk, Virginia. Jim has received a response from Sheila Beilsmith, the ODU technician. She said they are moving to a new building where they expect to install the replacement machine. Both those machines are now being packed for shipment. [Bud Hale]

This week was extremely productive with Jörg Micheel. A lot of information was shared. We spent time on inventory and objectives. I got a great feel for Jörg's thinking. I think Jörg relayed a great grasp on future planning. I look forward to some very interesting projects ahead. [Jim Hale]

Existing measurement sites maintenance and troubleshooting:

A total of 23 remote sites in the NAI infrastructure received attention during this period: 15 have been resolved and the monitors are again collecting data. 8 were still being investigated, or pending site action, at the end of the period. (Outages are considered "open" until the monitor is again collecting data.)

AMP -  14 problem sites:  11 resolved, 3 open
PMA -  9 problem sites:  4 resolved, 5 open

~ AMP machines

Site amp-arizona (U. of Arizona) corrected an ICMP echo request blockage that was recently installed. [Bud Hale]

Following a power failure at amp-fiu (Florida International U.), the AMP monitor there failed to come back up. Investigation revealed the power supply had failed. The speculation is that the power supply failure was caused by transients during the power failure. However, I had RackSaver ship a replacement supply, and the site technician had the monitor back online the following day.  [Bud Hale]

Sites amp-bcm (Baylor College of Medicine) and amp-missouri (U. of Missouri) are down. Site technicians investigated. It was later discovered amp-bcm had started an equipment move at the end of last week that took the monitor offline. The move was completed early this week and amp-bcm was restored.  [Bud Hale]

Site amp-aarn (Australia Research Network) was down for a short time this week. However, it came back online a short time after a message was sent to site tech Bruce Morgan requesting help. [Bud Hale]

Site amp-memphis (U. of Memphis) experienced a router power supply failure taking both the AMP and PMA monitors off line. The router was soon repaired and the NLANR monitors were back online.  [Bud Hale]

Site amp-unin (UNInet in Thailand) went down early in the week. It was restored shortly after I sent a message to site people. I have not had a complete report of the cause.  [Bud Hale]

Site amp-ncar (National Center for Atmospheric Research) went down of Friday of last week. Scot Colburn at that site is working on the problem. This started as merely a bad chassis fan creating enough noise to cause the site people to power it down. A replacement fan was shipped and installed. However, when it was powered up it appeared to turn on correctly but did not come back online. It continued to be offline due to ongoing site power reconfiguration. That was completed soon after and the machine was restored.  [Bud Hale]

Site amp-asu (Arizona State U.) had an outage which turned out to be merely a erroneous power shut off, and was corrected quickly.  [Bud Hale]

Site amp-dartmouth (Dartmouth U.) was down. Investigation revealed a router problem at the site that was quickly corrected.  [Bud Hale]

There was a short outage on the AMP site amp-eltn (ELTENet, Hungary). It was back online shortly after I sent a message to the site technician. I have not yet had an answer to my inquiry as to the cause.  [Bud Hale]

AMP site amp-msoe (Milwaukee School of Engineering) had a short outage due to a configuration error of the default router. The error sent that network offline, but the problem was corrected as a result of the AMP data. [Bud Hale]

Otherwise, on the AMP sites, I am investigating some site-to-site outages such as amp-csu-sb (Cal. State, Santa Barbara) to amp-caltech (Calif. Tech). These sites are not currently exchanging traffic. [Bud Hale]

~ PMA machines

Jim has sent a few notes about the status of various monitors and it appears we are having difficulties pushing the subject due to some technical constraints. I am investigating whether there is a chance for me to visit SDSC for a week later in March to work on those issues, in preparation for the trip second half of April. [Jörg Micheel]

I have used the window of opportunity to come over to San Diego for four days have a look at the gear and spend time with everyone to work on current issues, in particular with Jim to get the infrastructure into shape. [Jörg Micheel]

Tuesday I spent most of the afternoon going through various project management and budgeting questions. Wednesday and Thursday we worked on SDSC infrastructure. With Jim we tracked down the problem of the second OC192MON to a cable fault in one of the legs of one of the fibre runs from the splitter to the NLANR rack. Once we had figured that we cannot replace it easily (ENS has no spares) we decided to leave it at that in the light that we are wanting to ship the second OC192MON off site shortly anyways and SDSC now maintaining three load balanced OC192c's we could not take a complete snapshot of the traffic anyways. [Jörg Micheel]

A survey of the existing gear gave the following insight [Jörg Micheel, Jim Hale]:

OC3/OC12MONs cards for 3 systems available (1 system ready to ship)
OC48MONs cards for 4 systems available (1 system ready to ship)
GIGEMONs cards for 2 systems available (no systems)

We also have stock of UoWaikato DAG3.2 cards left, and we are considering to not use them except if unavoidable, as these are now three years old and are out of warranty and support and would not do any of the modern works we are interested in, such as real-time analysis. We are short on any type of splitters, if requests would come in. [Jörg Micheel]

At present, my plan is to use the return from PAM2004 for a visit to the following sites: Indianapolis (Abilene instrumentation and Purdue), Pittsburgh (Supercomp. Center, to initialize the OC48MON we sent there recently), Ann Arbor (OC3MON at the Internet2 facility, formerly the ADV monitor we had in White Plains), University of Florida at Gainsville (FLA) and the new AMPATH monitor we are about to ship (Miami, FL). It appears that the monitors in Texas (TXG, TXS) do not need any attention. Jim is also working to get the ODU monitor out, it is sitting right in front of us here). [Jörg Micheel]

Jim and I have been busy chasing sites on the installation and replacements of PMA monitors. we have been partly lucky with positive response from ODU and FLA. No feedback from Internet2 on the Abilene backbone instrumentation, or PNWGP thus far. [Jörg Micheel]

Old Dominion University (nai-p-odu)is having operating system problems. This is one of the original PMA monitors. We are replacing this monitor with an upgraded system. The monitor is ready to ship out to replace the failed unit there; I am just waiting on getting NetOps to set me up on the test equipment. I offered that NLANR might be willing to purchase the operating software for running the test gear, but Kevin Walsh informs me that is not a reasonable option. I confirmed Sheila Beilsmith of Old Dominion University is expecting the new monitor. Turns out the data center at ODU is moving to a new building and the new monitor will not be installed till about the end of the month. [Jim Hale, Bud Hale]

Merit (nai-p-mra) was down for a little bit. As soon as I saw it down, I was able to get it collecting again. [Jim Hale]

National Center for Atmospheric Research (nai-p-ncg). This monitor is suffering operating system failures. I have asked Scot Colburn and Donnie Sakosky to pack the monitor up and send it back for repairs; it should arrive soon. I am very anxious to get it returned and collecting. [Jim Hale, Bud Hale]

The monitor at the Front Range Gigapop has ceased to collect. I asked Scot Colburn and Donnie Sakosky to try to ping the monitor locally and if they could not to reboot the unit. They rebooted the unit remotely and conditions did not improve. Scot informed me that Donnie would be going to the Gigapop and he would have a look and let me know the condition. [Jim Hale]

University of Memphis (nai-p-mem) ran into hardware failures. After I noticed the PMA monitor stopped collecting data I got in contact with Chandrathilaka Wanigasekara, the technician at the University. Chandra informed me of a power supply failure to their router and it would be at least a day till repairs could be made. The repairs took slightly longer than expected, but they have now been made and PMA monitor is back up. This hardware problem also affected the AMP monitor at the same site. [Jim Hale]

As mentioned earlier, both the AMP and PMA machines at nai-p-mem (U. of Memphis) were taken down by the router failure. Jim coordinated the restoration with site people. [Bud Hale]

The situation at the Texas Gigapop (nai-p-txg) has gotten interesting. All of our records show the monitor was to be a OC12 monitor, though during the time I have been working with Jason Tasker and measuring the lack of data on the unit I come to find out we are only monitoring a OC3 link. I have asked a few questions of Jason though his time to work on this issue seems thin. I am still pursuing the possibility that there is an OC12 line that we are supposed to be connected to with a treasure of data. I just need to find vein. I worked with Jason Tasker at Texas Giga-Pop. Jason was able to get Traffic Tracew collection going. The monitor collected for about 24 hours then stopped. After a communication with Jason this afternoon and a "wiggle" on the fiber connection, traffic collection was resumed. Jason is planning to replace the fiber optic cable while I monitor the traffic remotely. I followed Jason's extended assist in troubleshooting, and it does seem to have been a cable problem. Now that traffic has become evident, it now appears some machine issues have developed. [Jim Hale]

After much effort and coordination with the site technician at the nai-p-txg (Texas GigaPop) site Jim accomplished more diagnosis and resolved the problem there. It turned out to be a bad fiber patch cord. Jim replaced the patch cord data collection resumed. Simple and small problems can take much effort. [Bud Hale]

The new monitor for nai-p-amp (AMPATH in Miami, Florida) is ready and awaiting shipment. Once testing on the Dag interface cards is done this unit will also be replaced. It was confirmed that Ernesto Rubi of AMPATH at Florida International University is expecting the new monitor. [Jim Hale, Bud Hale]

I pursued the sudden lack of traffic on the link to the GigE monitor at SDSC (nai-p-sda). There I looked at all the possibilities. It is at times like this where I really look forward to Jörg's visit next week to get hands-on experience from a master. It turns out the lack of traffic showing on the monitor was due to a switch on the Dag 4.3 GigE card. Once the "norxpkt" switch was turned to "rxpkt" the monitor began showing traffic. [Jim Hale]

I have been chasing the infrastructure, several changes to the PMA monitor pinging and reporting. The SDA machine (SuperMicro 1U SATA based) seems to be stable, I have integrated it into the 8x90 collection. I have also aliased the OC192-b machine at SDSC as TRG (TeraGrid) and that one is collecting as well. [Jörg Micheel]

I am working to get PMA machines returned from nai-p-tau (Tel Aviv U.), MAX GigaPop in DC and nai-p-buf (U. of Buffalo). Joe Pautler at U. of Buffalo promised that one soon. [Bud Hale]

~ management and administrative

AMP FTE:  

Composed the various letters and boiler plate language that I need for the AMP FTE. Sent announcement emails re the posting to my list of campus contacts, CS, Math, CogSci departments and Career Services, as well as to those possible applicants who had responded to my preliminary search around the holidays. Worked with Tony and improved my information paragraph regarding the coding skills test. So far two of our original responders have contacted me regarding the posting; one of them is a candidate that we rated pretty highly before. [Maureen Curran]

Tracked down the two preferential rehires and contacted them by email and phone. (This was almost a "Roseanne RosannaDanna - it is always something" experience, sort of a dueling HRs thing.) One does not appear to be the right fit, he stated that he had limited experience with C/C++ on his resume. Of course, he called me and expressed an interest in taking the coding skills (which I had offered). He did not have time this week, but will call me Monday. [Maureen Curran]

Maureen and I pre-interviewed one of the preferential re-hire candidates. He will do the programming test Sunday. I scanned some of the other applicants but did not get to go through them all. Maureen will get some of them in to do the programming test while I am offline next week. [Tony McGregor]

I tidied up a few issues with the coding skills test and answered some questions for a couple of the candidates who have taken the test. Also talked with Maureen quite a bit about the test, what to do with non-local people etc. We seem to still have a reasonable pool of applicants. [Tony McGregor]

Have been very busy with the AMP FTE. There was tons of wrangling with HR (both UCSD and SDSC) and SDSC's is short-staffed since Sharon left for another position. Lots of confusion and lost time, but I finally got access to the 16 applicants for the job, after partially handling the preferential rehires. I reviewed the 16 applicants and made determinations of their appropriateness for the position. Developed three categories and wrote the relevant new boilerplate emails and sent them to all. This of course released a ton of follow-up emails and phone calls. As I begin arranging the testing times, I realized that my annotated map and instructions on how to come to take the test were based on our old location in the SDSC building, so I created a new annotated map and instructions for our current location. [Maureen Curran]

In advance of administering the first of the coding skills tests, I worked with Tony, Bud, and Ben (who test drove the test for us) regarding several aspects of the test from getting scripts to work, policies (re standard libraries, and other questions) to logistics (thanks to Todd, Bud, Jim, et al. who got the flicker on the machine fixed as well as Bud and Jim who revived the mouse, the machine may have been jarred by custodians). I have administered the programming test to two applicants (worked out some kinks with the first one). It seems to be going smoothly now, long, but smoothly. I have 5 more tests (4+ hours each) lined up for the coming week, including one each on Saturday and Sunday. Worked with Tony on a plan for the out of state applicants who want to take the test. Wrote a follow-up boilerplate email to send to test takers asking how they thought they did (and any suggestions to improve). [Maureen Curran]

Of the 16 applicants, 6 are in the look very good category, 8 are in the may be good, and 2 are in the likely not qualified group. Five of the six excellent candidates have already contacted me and they want to pursue the position (one has taken the test and three have times arranged); two applicants from the may be good group have test times scheduled. [Maureen Curran]

Administered the coding skills test to 6 more applicants. Because the first few applicants took up to seven or so hours each and we told applicants that there is not a time limit per se, I only schedule one test per day (must be fair across the board - quite frankly it never occurred to me that folks would be willing to invest that much time). [Maureen Curran]

Contact with Thelma Vanesiac - the AMP FTE position was classified at the PAIII level. [Ronn Ritke]

I have been working this week with Suzanne Lee from Prolificx and Sheila Cullom from FedEx regarding an issue with New Zealand Customs. It appears there is some confusion about the purchase and shipping back and forth between SDSC and Endace of the OC192 cards. This is apparently an issue between Endace and New Zealand customs that could wind up costing NLANR a lot of money. I have spent a considerable amount of time going through what records I have and I have been in contact with Endace trying to get records from them. Hopefully I have provided what was needed and this transaction will be over. [Jim Hale]

Meeting with Bill Decker at UCSD to review copyright and other legal issues regarding Tony's planned AMP software release. [Ronn Ritke]

Ronn and I met at SDSC with Alan Blatecky, Vijay Samalam to discuss future NLANR strategies. [Hans-Werner Braun, Ronn Ritke]

Weekly NLANR/MNA managers conference calls. [Hans-Werner Braun, Ronn Ritke, Tony McGregor, Jörg Micheel]

I also spent some time with a Law lecturer discussing the new anti-hacker legislation in NZ. Unfortunately, it seems very poorly drafted and appears to make all measurement illegal. The lecturer said that the law is so unclear that no one will know whether that is the case until the courts make some determinations on the law. That leaves us in a bit of a limbo. He thought the courts probably wouldn't interpret the law that way, even though it was what the law actually said. [Tony McGregor]

- 30 -

see link to more info...

more info...

 
Home

AMP:  Active
Measurements


PMA:  Passive
Measurements


Citings: Data Users

Publications & Resources

Meet the Team

Feedback

 
see link to more info...

more info...

divider line
Back to the Top       last modified: 10 May 2004            Comments and questions are welcome:   Feedback .
acknowledgment