Overview
"Because, sometimes, the Internet doesn't quite work..."
The MIT RON (Resilient Overlay Networks) project is a DARPA-funded effort
motivated by the desire to improve the robustness and availability of Internet
paths between hosts by an order of magnitude over today's wide-area Internet
routing infrastructure. The key design goal in RON is to develop techniques
to allow end-hosts and applications to cooperatively gain improved reliability
and performance from the Internet. At a glance, RON nodes examine the condition
of the Internet between themselves and the other nodes, and, based upon
how the network looks, decide if they should let packets flow directly to
other nodes, or if they should send them indirectly via other RON nodes. For
instance, the group of cooperating systems below can mutually provide a more
available and better-performing routing service than what vanilla Internet
routing can provide.
RON is an architecture that allows a small group of distributed Internet
applications to detect and recover from path outages and periods of degraded
performance within several seconds, improving over today's wide-area routing
protocols that take at least several minutes to recover. A RON is an application-layer
overlay on top of the existing Internet routing substrate. The RON nodes
monitor the functioning and quality of the Internet paths among themselves,
and use this information to decide whether to route packets directly over
the Internet or by way of other RON nodes, optimizing application-specific
routing metrics.
The RON project has several components, including:
- Overlay configuration and maintenance.
- Probing and outage detection
- Routing around outages and performance failures
- Application-controlled routing
- Policy routing
- Multi-path routing; QoS routing
- Data forwarding
- API and RON libraries
- Applications (e.g., resilient VPN, resilient conferencing, etc.)
- Data analysis and understanding wide-area routing and fault-tolerance
behavior; BGP interactions
- Simulations of RON behavior
RON is part of a larger research agenda on large-scale, robust,
Internet-based distributed systems, which spans areas ranging from
resilient routing (as in RON) to emerging peer-to-peer systems. Our
work on peer-to-peer systems is based on Chord, a scalable p2p lookup
service.
RON is also closely related to other current projects at LCS in the
area of robust Internet infrastructures and uses some of the ideas
from these projects: CM , the
Inernet Congestion Manager; and Click-SMP , a modular
PC-based router.
RON data, Internet experiments
RON deployment sites
Since early 2001, we have run a real-life RON, which now has 17 sites located
around the Internet. Our deployment is international. We have
also collected extensive data sets and analyzed them. They will soon
be made publicly available on this page.
Papers
-
Scaling All-Pairs Overlay Routing
David Sontag, Yang Zhang, Amar Phanishayee, David G. Andersen, David Karger
CoNEXT, Rome, Italy, December 2009.
-
Measuring the Effects of
Internet Path Faults on Reactive Routing
Nick Feamster, David Andersen, Hari Balakrishnan, and Frans Kaashoek
ACM SIGMETRICS 2003,
San Diego, CA, June 2003.
Presentation
-
Mayday: Distributed Filtering for Internet Services
David G. Andersen
4th
Usenix Symposium on Internet Technologies and Systems,
Seattle, Washington, March 2003.
Presentation:
[Postscript (390k)]
[PDF (110k)]
-
Topology Inference from BGP
Routing Dynamics
David G. Andersen, Nick Feamster, Steve
Bauer, and Hari Balakrishnan
2nd SIGCOMM Internet
Measurement Workshop, Marseille, France, November 2002.
- Resilient
Overlay Networks
David G. Andersen, Hari Balakrishnan, M. Frans Kaashoek, Robert Morris
Proc. 18th ACM SOSP, Banff, Canada, October 2001.
Presentation (PDF) (292 KB)
- DNS
Performance and the Effectiveness of Caching
Jaeyeon Jung, Emil Sit, Hari Balakrishnan, and Robert Morris
Proc. 1st ACM SIGCOMM Internet Measurement Workshop, San Francisco,
CA, November 2001.
- Resilient Overlay Networks
David G. Andersen, SM Thesis, Massachusetts Institute of Technology, May 2001.
[Postscript
(8.9 MB)]
[ps.gz (1.2 MB)][
PDF
(2.2 MB)] (86 pages)
-
The Case for Resilient Overlay Networks
David G. Andersen, Hari Balakrishnan, M. Frans Kaashoek, and
Robert Morris
Proc. HotOS VIII,
Schloss Elmau, Germany, May 2001. (best student paper award)
Presentation:
[Slides (ps)]
[Slides (pdf)]
[Notes (ps)]
[Notes (pdf)]
- Fine-Grained
Failover Using Connection Migration
Alex C. Snoeren, David G. Andersen, and Hari Balakrishnan
Proc. 3rd USENIX USITS,
San Francisco, CA, March 2001.
(Also MIT-LCS-TR-812, September 2000.)
Talks
- Topology Inference from BGP Routing Dynamics. 2002 Internet Measurement Workshop.
[Postscript (400k)]
[PDF (150k)]
- RON: Choosing Resiliency. 2002 Opensig workshop, Lexington, KY. [Postscript (780k)]
[PDF (240k)]
- Resilient Overlay Networks,
18th SOSP, Lake Louise, Alberta, Canada, October 2001.
- Resilient Overlay Networks,
MIT LCS Annual Retreat, Cape Cod, June 2001.
- Resilient Overlay Networks,
DARPA PI Meeting, Colorado Springs, CO, July 2001.
- Slides
from an old presentation comparing existing link probing mechanisms.
Resources
-
RIPE NCC
stores data about BGP routing table updates.
People
Projects
-
The Detour Project
at the University of Washington. They developed "sting", which uses
TCP to determine forward andvreverse path packet loss rates. There has
also been a small project follow-on to Detour by some of David Wetherall's
students to test Detour. They simulated some algorithms for forming the
routing topology:
[Orig ps]
[Local Mirror]
The
projects list
is also available.
There are some important differences between RON and
Detour. First. RON seeks to prevent disruptions in end-to-end
communication in the face of failures. RON takes advantage of
underlying Internet path redundancy on time-scales of a few seconds,
reacting responsively to path outages and performance failures.
Second, RON is designed as an application-controlled routing overlay;
because each RON is more closely tied to the application using it, RON
more readily integrates application-specific path metrics and path
selection policies. Third, we present and analyze experimental
results from a real-world deployment of a RON to demonstrate fast
recovery from failure and improved latency and loss-rates even over
short time-scales.
- The Berkeley
SPAND project.
The Spared Passive Network Performance toolkit lets applications
measure and share performance information with other local clients
to make better guesses about which (for example) mirror site to
use.
The SPAND paper contains more information
[ps]
local ps]
as does Mark Stemm's thesis
[html]
[ps]
[local ps].
- RAMP
Reliable Adaptive Multipath Routing, from UCSD.
Network Characterization
Measurement Tools
Overlay Networks
Funding
We gratefully acknowledge funding for RON from DARPA under the
Fault-Tolerant Networking (FTN) program of the ATO; it is being
supported by DARPA and the Space and Naval Warfare Systems Center
(SPAWAR), San Diego, under contract N66001-00-1-8933.