Introduction
The conventional method for downloading files in the
computer network is to establish a single connection between a client
and a single server (eg ftp, http). The total download time in
this senario depends heavily on the network traffic across the single
route between the client and server. During a file transfer, the
throughput can drop significantly when sudden surges of heavy traffic
crosses any point on the path. A connection may have to be
restarted when the congestion is too heavy or when any of the routers
along the path fails.
In the past, researchers have explored various techniques to make
connections "route around" pockets of congestion. The fact that
traffic patterns change over time makes it a very hard problem to solve.
More recently, mirror servers have been established to provide both
reliability and the opportunity to pick an optimal path for downloading
a file. Clients can choose from a list of one or more mirror
locations from which to download data. Ideally, they would choose
the site with the highest bandwidth and the least amount of
traffic. However, accurate measurements and predictions on these
metrics are difficult to obtain. Hence, the clients end up
choosing the sites randomly. Even when the "optimal path" has been
chosen, throughput can still drop in the face of changing traffic
patterns.
Realizing that networking resources is scarce, the research community
have focused much attention on improving router performance and
end-to-end congestion control/avoidance algorithms. However, very
little work has been done to study the idea of accessing data in
parallel at the wide-area network level to help increase the performance
of downstream data transfers. By downloading various fragments of
a file in parallel across multiple servers and adapt to the
changing conditions of each path, this simple system can take advantage
of the aggregate bandwidth of the multiple network paths and avoid
pockets of heavy network congestions.
Goals The goal for this project is to
first determine whether parallel data transfers across multiple servers
is indeed a feasible and efficient method to help decrease download
time. We will begin by conducting simple experiments on real
traffic by comparing the times of downloading files of various sizes
from various sets of mirror servers. Based on the results
obtained, we will explore various optimization techniques that enhances
the performance of the paraloading scheme.
We will then use ns to generate simulations based on the results we
have gathered and make further analysis to explore the ways in which we
can build a system that can have each network connection adapt to the
dynamics of the network. As opening multiple connections tend to
exhibit "aggressiveness" in consuming network resources, we will also
use simulations to examine whether its impact is severe.
People:
Allen Miu
Papers
Miu, A., Shih, E., 6.892 Project Proposal, October 1999. Related links:
1. Digital Fountain
(research on using Tornodo codes for reliable data
distribution) Last modified on March 7, 2000 NMS - Projects
- People - Talks
- Papers Comments or questions on the NMS web pages?
E-mail webmaster@wind.lcs.mit.edu.
Eugene Shih
Hari Balakrishnan
Miu, A., Shih, E., Performance Analysis of a Dynamic Parallel
Downloading Scheme from Mirror Sites Throughout the Internet, 6.892
Term Paper, December 1999.
2. Performance
Characteristics of Mirror Servers on the Internet
(CMU)
3. On
Individual and Aggregate TCP Performance (Cornell)
4. Client Side
Load Balancing (Open Group Research Institute)
5. Mirror,
Mirror on the Web: A Study of Host Pairs with Replicated Content
(Compaq Systems Research Center)