COMET: A Scheme for Parallel Data Transfer Using Multiple Mirror Servers

Introduction

The conventional method for downloading files in the computer network is to establish a single connection between a client and a single server (eg ftp, http). The total download time in this senario depends heavily on the network traffic across the single route between the client and server. During a file transfer, the throughput can drop significantly when sudden surges of heavy traffic crosses any point on the path. A connection may have to be restarted when the congestion is too heavy or when any of the routers along the path fails.

In the past, researchers have explored various techniques to make connections "route around" pockets of congestion. The fact that traffic patterns change over time makes it a very hard problem to solve.

More recently, mirror servers have been established to provide both reliability and the opportunity to pick an optimal path for downloading a file. Clients can choose from a list of one or more mirror locations from which to download data. Ideally, they would choose the site with the highest bandwidth and the least amount of traffic. However, accurate measurements and predictions on these metrics are difficult to obtain. Hence, the clients end up choosing the sites randomly. Even when the "optimal path" has been chosen, throughput can still drop in the face of changing traffic patterns.

Realizing that networking resources is scarce, the research community have focused much attention on improving router performance and end-to-end congestion control/avoidance algorithms. However, very little work has been done to study the idea of accessing data in parallel at the wide-area network level to help increase the performance of downstream data transfers. By downloading various fragments of a file in parallel across multiple servers and adapt to the changing conditions of each path, this simple system can take advantage of the aggregate bandwidth of the multiple network paths and avoid pockets of heavy network congestions.

Goals

The goal for this project is to first determine whether parallel data transfers across multiple servers is indeed a feasible and efficient method to help decrease download time. We will begin by conducting simple experiments on real traffic by comparing the times of downloading files of various sizes from various sets of mirror servers. Based on the results obtained, we will explore various optimization techniques that enhances the performance of the paraloading scheme.

We will then use ns to generate simulations based on the results we have gathered and make further analysis to explore the ways in which we can build a system that can have each network connection adapt to the dynamics of the network. As opening multiple connections tend to exhibit "aggressiveness" in consuming network resources, we will also use simulations to examine whether its impact is severe.

People:

Allen Miu
Eugene Shih
Hari Balakrishnan

Papers

Miu, A., Shih, E., 6.892 Project Proposal, October 1999.
Miu, A., Shih, E., Performance Analysis of a Dynamic Parallel Downloading Scheme from Mirror Sites Throughout the Internet, 6.892 Term Paper, December 1999.