This paper describes our experiences with improving TCP and Web performance over a WaveLAN-based wireless network. In previous work, we analyzed the problems to TCP performance in error-prone wireless networks and designed the Berkeley Snoop protocol to significantly improve performance over error-prone wireless links [7, 5].
Most efforts to date have focused on improving performance for transfers to a mobile host. We present a novel protocol based on Explicit Loss Notification (ELN) to improve performance when the mobile host is the TCP sender, a situation that is becoming increasingly common. Then, we use experimental packet traces of wireless errors from a production wireless network to derive an empirical model of channel errors. We use this to evaluate the performance of TCP Reno, TCP Selective Acknowledgments  and the Snoop protocol for Web workloads to mobile hosts. We also discuss the scaling behavior of the Snoop protocol and reflect on some general lessons we have learned about efficient protocol design for reliable wireless transport.
Wireless technologies are playing an increasingly prominent role in the global Internet infrastructure. They are ideal as Internet access technologies, providing a convenient and cheap solution to the "last mile" problem. However, reliable transport protocols such as the Transmission Control Protocol (TCP) [22, 25], used by popular applications like the World Wide Web, file transfer, electronic mail, and interactive remote terminal applications, show greatly degraded performance over wireless networks. Over the past many years, TCP has been tuned for traditional networks comprising wired links and stationary hosts. It assumes congestion in the network to be the primary cause for packet losses and unusual delays, and adapts to it. The TCP receiver sends cumulative acknowledgments (ACKs) for successfully received segments, which the sender uses to determine which segments have been successfully received. The sender identifies the loss of a packet either by the arrival of several duplicate cumulative ACKs, triggering a fast retransmission, or by the absence of an ACK for a timeout interval equal to the sum of the smoothed round-trip delay and four times its mean deviation. TCP reacts to packet losses by retransmitting missing data, and simultaneously invoking congestion control by reducing its transmission (congestion) window size and backing off its retransmission timer. These measures reduce the level of congestion on the intermediate links.
Unfortunately, when packets are lost for reasons other than congestion, these measures result in an unnecessary reduction in end-to-end throughput and hence, in sub-optimal performance. Communication over wireless links is often characterized by high bit-error rates due to channel fading, noise or interference, and intermittent connectivity due to handoffs. TCP performance in such networks suffers from significant throughput degradation and very high interactive delays because the sender misinterprets corruption for congestion [see e.g., 8].
Recently, several schemes have been proposed to alleviate the effects of non-congestion-related losses on TCP performance over error-prone networks [3, 7, 26]. In previous work , we compared many existing schemes and argued for transport-aware link protocols to improve performance. The end result of this work was a more adaptive TCP/IP protocol architecture that adapted not only to network congestion, but also to error-prone wireless links.
However, most design and measurement efforts to date have focused on the (common) case of data transfers to a mobile host. The case of a mobile host transmitting data over a first-hop wireless link to hosts on the wired Internet has largely been ignored. Such access scenarios are becoming increasingly common in the Internet; for example, researchers at Daimler Benz are working on a prototype of a Mercedes car that runs a Web server, which disseminates information over wireless and wired links about the status of the vehicle to users on the Internet .
In this paper, we present a novel protocol based on Explicit Loss Notification (ELN) to improve transport performance when the mobile host is the TCP sender, a situation that is becoming increasingly common in many wireless access networks ( See . Transfers from a Mobile Host ). Then, we obtain experimental packet traces of wireless errors from a production outdoor wireless network deployed as part of the Reinas environmental monitoring project at UC Santa Cruz  and derive an empirical model of channel errors based on this data ( See . Modeling Wireless Errors ). We use this to evaluate the performance of TCP Reno, TCP SACK  and the Snoop protocol for an empirically derived Web workload to mobile hosts ( See . Performance: Fixed to Mobile Host ). Finally, we reflect on some lessons learned about improving wireless transport performance in the face of wireless bit-errors ( See . Discussion ) and conclude with a summary and directions for future work ( See . Summary ).
The main advantage of employing a link-layer protocol for loss recovery is that it fits naturally into the layered structure of network protocols and achieve local reliability. However, there are several occasions when transport performance does not improve when such protocols are used. There are three chief modes of adverse interaction that could arise between independent reliable transport and link layers:
Indirect-TCP  is a split-connection solution that uses standard TCP for its connection over the wireless link. Like other split-connection proposals, it attempts to separate loss recovery over the wireless link from that across the wireline network, thereby shielding the original TCP sender from the wireless link. However, as our experiments indicate, the choice of TCP over the wireless link results in several performance problems. Since TCP is not well-tuned for the lossy link, the TCP sender of the wireless connection often times out, causing the original sender to stall. In addition, every packet incurs the overhead of going through TCP protocol processing twice at the base station (as compared to zero times for a non-split-connection approach), although extra copies are avoided by an efficient kernel implementation. Another disadvantage of split connections is that the end-to-end semantics of TCP ACKs is violated, since ACKs to packets can now reach the source even before the packets actually reach the mobile host. This makes the system vulnerable to base station crashes. Also, since split-connection protocols maintain a significant amount of state at the base station per TCP connection, handoff procedures tend to be complicated and slow .
The main advantage of this approach is that it suppresses duplicate ACKs for TCP segments lost and retransmitted locally, thereby avoiding unnecessary fast retransmissions and congestion control invocations by the sender. The per-connection state maintained by the snoop agent at the base station is soft , and is not essential for correctness. A detailed description of the Snoop protocol appears in .
We now proceed to discuss the case of data transfer from a mobile host. While this has conventionally been a less common scenario for wireless information access, it is rapidly gaining in importance. There is an increasing number of Web and other information servers that serve data across first-hop wireless links [13, 9]. A representative topology of this scenario is shown in See Topology for data transfer from a mobile host. .
It might be tempting to come to the conclusion that a protocol very similar to the snoop protocol described in See . Background and  for data transfer to the mobile host can be used in this case as well. In particular, one of the important features of that case was that it required changes only to the base station and not to any other entities in the network. However, it is unlikely, if not impossible, that a protocol with modifications made only at the base station can substantially improve the end-to-end performance of reliable bulk data transfers from the mobile host to other hosts on the network, while preserving the precise semantics of TCP ACKs. For example, caching packets at the base station and retransmitting them as necessary is not useful, since most of the packet losses will be from the mobile host to the base station. Running a Snoop agent at the mobile host will be inappropriate because the same duplicate ACKs signify both corruption and congestion losses. There is no way for the mobile sender to know if the loss of a packet happened on the wireless link or elsewhere in the network due to congestion.
There are at least two ways in which the basic Snoop scheme can be enhanced to achieve good performance for this case -- the first involves the use of negative ACKs, and the second the use of Explicit Loss Notifications (ELN). We first describe the first approach, and then the second, which we believe to be an elegant and simple solution to the problem. We also show how the algorithm naturally generalizes to the case when a cellular wireless link is the transit link bridging two or more wired networks.
An agent at the base station keeps track of the packets that were lost in any transmitted window and generates negative acknowledgments (NAKs) for those packets back to the mobile sender. This is especially useful if several packets are lost in a single transmission window, a situation that happens often under high interference or in fades where the strength and quality of the signal are low. These NAKs are sent when either a threshold number of packets (from a single window) have reached the base station or when a certain amount of time has expired without any new packets from the mobile. Encoding these NAKs as a bit vector can ensure that the fraction of the sparse wireless bandwidth consumed by NAKs is relatively low. The mobile host used these NAKs to selectively retransmit lost packets.
NAKs can be implemented using the TCP SACK option. The agent could use SACKs to enable the mobile host to quickly (relative to the round-trip time of the connection) retransmit missing packets. The only change required at the mobile host is to enable SACK processing. No changes of any sort are required in any of the fixed hosts. Also, SACKs are generated by the snoop agent when it detects a gap corresponding to missing packets; we emphasize again that no transport-layer code runs at the base station to do this.
There is a danger of a possible violation of end-to-end semantics in this scheme, because the snoop agent would send SACK information about segments it receives, but they may not yet have been received by the intended receiver. This will lead to problems if the corresponding segments get lost later in the path. However, this is not strictly true because of the semantics of TCP SACKs: they are advisory in nature, and only cumulative ACKs are binding. What this means is that even if a TCP sender receives SACKs for certain data blocks, it must not clean those segments from it's transmission buffer or assume successful reception, until cumulative ACKs arrive for them.
However, this approach based on SACKs is inelegant because it relies on a relatively obscure feature of the SACK specification. Of course, we could implement an explicit NAK scheme, but that would require complex modifications at the sender, which would anyway implement SACKs in the future. In addition, we would like to avoid explicit protocol messaging as far as possible, and both SACKs and NAKs generated from the base station do not provide this. All these issues led us to investigate alternate protocols; what follows is one such scheme.
Explicit Loss Notification (ELN) is a mechanism by which the reason for the loss of a packet can be communicated to the TCP sender. In particular, it provides a way by which senders can be informed that a loss happened because of reasons unrelated to network congestion (e.g., due to wireless bit errors), so that sender retransmissions can be decoupled from congestion control. If the receiver or a base station knows for sure that the loss of a segment was not due to congestion, it sets the ELN bit in the TCP header and propagate it to the source. In the situation at hand, this ELN message is sent as part of the same connection (and not in a separate way, using ICMP for instance). This simplifies the sender implementation as it receives messages in-band. ELN is a general concept that has applications in a wide variety of wireless topologies. In this section, we describe it in the context of data transfer from a mobile host connected to the rest of the Internet via a cellular wireless link.
The snoop agent running at the base station monitors all TCP segments that arrive over the wireless link. However, it does not cache any TCP segments since it does not perform any retransmissions. Rather, it keeps track of holes in the sequence space as it receives data segments, where a hole is a missing interval in the sequence space. These holes correspond to segments that have been lost over the wireless link. However, it could also be the case that the packet was lost due to congestion at the base station. To avoid against wrongly marking a congestion hole as having been due to a wireless loss, it only adds a hole to the list of holes when the number of packets queued on the base station's input interface is not close to the maximum queue length.
When ACKs, especially duplicate ACKs, arrive from the receiver, the agent at the base station consults its list of holes. It sets the ELN bit on the ACK if it corresponds to a segment in the list before forwarding it to the data sender. It also cleans up all holes with sequence numbers smaller than the current ACK, since they correspond to segments that have been successfully received by the receiver. When the sender receives an ACK with ELN information in it, it retransmits the next segment, but does not take any congestion control actions. The sender also makes sure that each segment is retransmitted at most once during the course of a single round-trip, as the snoop agent would flag an ELN for each duplicate ACK following a loss.
Data structures: At the BS, the snoop agent detects holes in the data transmission from the MH. The basic data structure used for this purpose is a list of blocks, or maximal contiguous segments of data specified by a starting sequence number and size. Gaps between blocks correspond to missing segments in the transmission sequence. At the MH, a new variable is added to the connection's TCP control block to keep track of the last ELN-triggered retransmission.
Data path: When new data arrives from the MH, the agent attempts to coalesce it with an existing block and checks if it bridges a previous hole. A new data segment that is out of the normal increasing sequence creates a hole, because segments in between have been lost. We ensure that every hole was the consequence of a corruption-induced loss, and not caused by congestion. Congestion-induced losses can occur in two ways -- due to the base station's input queue overflowing, or due to the mobile host's output queue overflowing. Piggybacked on each packet from the MH is its instantaneous output queue length, which the base station can use to classify each loss it detects. When a new, out-of-sequence packet arrives at the BS, the snoop agent checks the size of its input queue and gets the size of the MH's output queue from the packet. If either of the instantaneous queue sizes is above a certain threshold of the maximum (set to 75% in our implementation), then this loss is not considered as one due to corruption. Because the agent should only flag ELN information for those losses that were unrelated to congestion, the implicitly maintained list of holes should only include packets lost due to corruption.
ACK processing: Whenever an ACK arrives at the BS, the agent updates the list of blocks by cleaning up all blocks (or parts of blocks) that have been acknowledged so far. It also checks if the ACK corresponds to a hole, i.e., if the next segment in sequence was lost due to corruption on the wireless link. It does so by comparing the ACK to the sequence number of the earliest block currently in its list for that connection. If the value of the ACK is smaller, it sets the ELN bit in the TCP header, modifies the TCP checksum, and forwards the ACK on to the MH. Because there is currently no specific bit in the TCP header for ELN, we use one of the unreserved bits for this purpose. Note that while ELN marking happens most often for duplicate ACKs, it can (and does) also happen for new ones. However, it does not happen when the list of blocks is empty, because there is no way the agent at the BS can know if any further data has been transmitted by the mobile host.
When the MH receives an ACK with the ELN bit set, it retransmits the missing packet and updates a variable ( eln_last_rxmit ) that keeps track of the last ELN-induced retransmission. This ensures that the MH does not retransmit the same packet on every ACK with ELN that it receives via the base station; thus, the retransmission policy is left to the end-host, and the base station only conveys relevant information to it. The TCP stack at the MH uses a configurable variable, eln_rexmt_thresh , to determine when to retransmit a segment upon receiving an ACK with ELN set. Upon receiving eln_rexmt_thresh ACKs with ELN, it retransmits the missing segment. In our implementation and experiments, we set its value to 2.
When the MH retransmits a segment based on ELN information, it does not reduce its congestion window and perform congestion control. It also bypasses the fast recovery code that inflates the congestion window for every duplicate ACK. Finally, we note that these actions are taken only for duplicate ACKs with ELN, and not for other duplicate ACKs that signify network congestion.
We performed several experiments to measure the performance of data transfer from the MH to the FH. These results, measured across a range of exponentially-distributed bit-error rates, are shown in See Throughput of TCP Reno and Reno enhanced with ELN across a range of exponentially distributed bit-error rates for transfers from a mobile host. . As before, there are significant performance benefits of using the snoop protocol coupled with the ELN mechanism in this situation. These measurements were made for wide-area transfers between UC Berkeley and IBM Watson, across one wireless WaveLAN hop and 16 Internet hops. At medium to high error rates, the performance improvement due to ELN is roughly a factor of 2. At lower error rates, TCP Reno performs quite well as expected, and the benefits of ELN are not as pronounced. The main advantage of ELN is that it helps maintain a large TCP congestion window even when wireless error rates are high, reacting only to congestion.
While the relative performance of ELN is about 100% better than Reno at high error rates, it is not as high as the relative improvement for the FH to MH case using the Snoop protocol. The reason for this is that the Snoop protocol performs quick local recovery in addition to shielding the sender from congestion. On the other hand, loss recovery with ELN is due to TCP retransmissions alone. When multiple losses occur in a window TCP with ELN incurs a coarse timeout, leading to "only" a factor of two improvement in throughput.
The ELN-based approach presented above for improving end-to-end performance also generalizes to cellular wireless transit links where the TCP ACKs traverse the same base stations as the data packets on the forward path. Consider a connection over one cellular wireless transit link and other wired links, as shown in See Topology for data transfer over a cellular wireless transit link. . Suppose a loss happened due to corruption over the wireless link. In this case, the agent at base station A would have seen the packet, while B would not, so the packet gets added to B's list of holes using the same algorithms as in See Using Explicit Loss Notifications . When duplicate ACKs arrive for the missing packet at B, its agent sets the ELN information bit and propagates it towards A. Since A originally saw the packet (it is not in its list of holes), it infers that the packet was obviously lost over the wireless link and simply lets the ACK go through to the data sender with the ELN information set on it. Of course, it is possible to deploy a snoop agent that caches and locally retransmits packets at A in this topology. In this case, the agent also suppresses duplicate ACKs for corrupted packets, as described earlier. It is important to note that local retransmissions from the snoop agent at A now happen only upon duplicate ACKs with ELN set.
Now suppose that the packet was lost due to congestion elsewhere on the path. If it was lost before A, then the agent at A, upon seeing an ELN ACK via B, resets this bit because this packet did not reach A to start with. The sender now reacts to this loss in the conventional way and performs congestion control. Similarly, if the packet was lost due to congestion after B, no agent in the network sets the ELN information on any duplicate ACK, leading to correct congestion control. It is easy to see how this approach generalizes to the case when there is more than one single-hop wireless link on the path.
Our performance experiments show that in the first case, observed performance is identical to the ELN case shown in See Throughput of TCP Reno and Reno enhanced with ELN across a range of exponentially distributed bit-error rates for transfers from a mobile host. . In the second case, performance closely tracks that of the last-hop snoop agent, presented in [7, 4].
Wireless testbed: We used the experimental network deployed in the Reinas Environmental Monitoring project at the University of California, Santa Cruz to obtain experimental data. This network has a number of point-to-point long-haul WaveLAN links as well as lower-bandwidth RF links. The Snoop protocol has been in daily production use in this network since 1996, during which time we have been able to perform numerous performance experiments and obtain error traces of real network behavior. We use ns (http://www-mash.cs.berkeley.edu/ns) as our simulator.
Analytic state models: The simplest error models are one-state models that are characterized by a single rate parameter and a statistical distribution that determines the error rate. For example, a one-state exponential error model generates errors at a certain rate based on an exponential distribution in the appropriate domain (packet, byte, or time). More complex multi-state error models are implemented by composing one-state models. Such models have been extensively studied in the literature and are characterized by a number of states organized as a Markov chain. Each state is a one-state error model governed by a certain rate and distribution, with transitions occurring between states according to a transition matrix of probabilities.
Empirical distribution-based models: While analytic state models are sound in theory, it is hard to determine in practice what the parameters of these models should be. The process of fitting experimental channel error data in this framework is not straightforward. While using simple one-state models enables us to quantify the relative performance as a function of a base error rate, it does not give us any insight into what "typical" performance might be like in a real wireless environment under realistic error conditions. This motivates us to develop an error model based on actual traces from a wireless environment and use that model to evaluate our protocols. This section describes how we do this.
Our methodology for constructing a realistic wireless error model is motivated by the approach presented in , where the authors analyze in-building traces of wireless traffic and construct empirical models for them. Rather than use the traces they collected, we decided to obtain fresh ones. This is because we could validate the performance observed in simulation using these error traces by comparing these results to actual implementation performance.
To develop an empirical error model of the channel, we proceed as follows. We transmit a constant rate UDP stream at rate R bps, with a fixed packet size, S bits. We ensure that the transmitted rate is smaller than the sustainable link rate, and that the size of the packets is large enough that the number of packets per second does not overwhelm the receiving processor, which needs to take an interrupt for every reception. Then, we send a large number of packets, N , with an application-level sequence number on each packet. Thus, the receiver can now determine which packets have arrived correctly, and which have not. UDP checksum processing at the receiver is enabled, so any packet whose lower-layer headers are correct but whose payload is corrupted will be discarded. Of course, no packets are retransmitted.
Using this data, the receiver constructs histograms of the lengths of error and error-free sequences. We obtain the distribution of the number of successive packets that were successfully received, as well as the number of missing ones once there was a loss. At this point, this histogram is in terms of packets, rather than time. However, the underlying characteristics of the channel's error properties are temporal, related to time-dependent factors like the average fade duration and duration of inter-symbol interference due to multi-path effects. Hence, we need to convert our packet-based distribution into a time-based one, which we achieve by multiplying the number of packets in any histogram bin by the packet transmission time, S/R .
We used the experimental network deployed in the Reinas Environmental Monitoring project at the University of California, Santa Cruz to obtain our data. This network has a number of point-to-point long-haul WaveLAN links as well as lower-bandwidth RF links.
We performed eight separate experiments using this approach, with N = 100,000 packets in each experiment, packet size S = 8000 bits and R = 1 Mbps. The resulting error and error-free temporal distributions are shown in See Cumulative Distribution Functions (CDF) of empirically measured error and error-free durations. The CDF was computed over a duration of 100,000 packets sent over 800 seconds. . In the experiments that follow, we picked one of these data sets. In this data set, 10,591 packets were lost due to wireless errors, a gross loss rate of 10.6%. While this might seem high, it is in fact quite representative of the quality of the outdoor wireless links in the Reinas network. The typical loss rates in the network vary from 1% to over 30%, depending on external factors and environmental conditions.
The cumulative distribution functions (CDF) of error and error-free durations are used to generate bit errors in the following manner. There are two linked states, corresponding to the error-free and error durations, that the error model transitions between. It starts in the error-free state where no errors occur, and stays there for an amount of time determined by a lookup into the corresponding CDF. After this time has elapsed, there is an automatic transition to the error state, where all incoming packets are corrupted. The amount of time spent in this state is, as before, determined by the corresponding CDF. Thus, this empirical error model shown in See Conventional multi-state analytic error models (left) and empirical distribution-based models (right). In analytic models, the different states correspond to different base error rates, and the arc probabilities (p) govern state transitions. Our empirical models are easy to fit to observed experimental data and contain two states -- error-free and error, which have CDF's associated with them that determine state occupancy. State transitions are automatic when this time elapses. is a two-state Markov chain, but is quite different from the analytic models. Here, the states themselves correspond to time, with automatic state transitions. The analytic models are richer in that each state has a different base error rate, with explicit transition probabilities between states. However, it is hard to reconcile experimental data to meaningful parameters that can be used in these models.
In the previous sections, we saw that the Snoop protocol significantly improved over TCP Reno and TCP SACK across a wide range of error rates and network topologies, for bulk transfer workloads. To investigate performance on shorter transfers, we performed a set of experiments with a Web-like access workload. We use the empirical Web workload based on data collected by Mah .
We conducted experiments using TCP Reno, TCP SACK and the Snoop protocol using the Web workload, varying the number of concurrent TCP connections from 1 to 4, as well as using persistent-HTTP . The results of these experiments are shown in See Performance of the TCP Reno, TCP SACK, and the Snoop protocol on a Web workload in different protocol configurations. The graphs show the number of separate downloads (connections) using 1 to 4 concurrent connections per client, as well as the performance of persistent-HTTP  for the three protocols. , which depicts the number of successfully completed individual downloads (connections) in 1000 seconds. We see that the performance of the Snoop protocol is between three and six times better than the other protocols. We also note that the P-HTTP performance does not account for user think times between successive downloads, and is hence an optimistic upper bound on the performance achievable in practice.
These results show that not only does the Snoop protocol perform well for large bulk transfers, but that it also results in significant performance improvements for shorter transfers (combined with occasional long ones) that characterize Web workloads today. The performance improvement is between a factor of three and six for this realistic workload under experimentally measured and realistic wireless error conditions.
In this section, we discuss the results of experiments to quantify the scaling behavior of the Snoop protocol as the number of connections grows. There are two potential sources of overhead associated with the protocol at the base station: processing overhead and memory consumption. We first show that the processing overhead is negligible and then evaluate the memory consumed by the snoop agent.
The per-packet processing required at the BS when no packet losses occur is not significantly more than without the Snoop protocol for transfers in either direction. For each incoming packet, the protocol performs a lookup in a hash table that maintains all active TCP connections keyed by source and destination address/port tuples and extract a pointer to the connection state. Then, depending on the direction, the packet is either added to a cache (without any extra memory copies) and forwarded on, or else forwarded on after a check is made for any holes in the transmission sequence. In both cases, this takes a negligible number of instructions compared to the (background) cost of routing and forwarding packets. Most of the protocol processing complexity occurs when losses occur and duplicate ACKs arrive, but our measurements show this to be a negligible component of the forwarding code time.
We now consider the question of memory consumption at the base station. For transfers originating from a mobile host, the amount of memory consumed is dependent on the number of holes in the transmission sequence and is largely independent of the transmission window size. It is also independent of the packet size, since no caching occurs in this case. For transfers to a mobile host, memory consumption is dominated by the per-packet storage cost, which we quantify below.
As the number of connections increases, the state maintained by the snoop agent increases roughly linearly. This state has two components -- a small amount of control state and state required to store unacknowledged packets. The maximum amount of cache memory consumed by the snoop agent per connection is bounded by the maximum transmission window size of the connection. In our implementation, this space is not statically allocated, but is dynamic, reflecting the actual window size changes in the connection.
Thus, the key scaling question is: as the total amount of cache memory in the base station varies, how does the number of sustainable connections vary, at a given error rate? Here, the term "sustainable" is used to imply that all connections must have a noticeable performance improvement by having the Snoop protocol run at the base station, since we can always sustain as many connections as we want, limited only by the wireless link bandwidth, with the snoop agents providing no performance benefit.
We answer this question by varying the amount of per-connection cache memory at the base station, and by measuring the performance of the Snoop protocol for bulk TCP transfers as a function of the amount of cache memory. The results of this experiment, performed on 1KB sized packets, is shown in See Throughput of a TCP connection as a function of the maximum memory allocated to the snoop cache. The maximum window size of the connection was 32 KBytes. This graph shows the worst-case degradation because the bandwidth-delay product was large (50 KB). . We see that the performance linearly increases with cache memory until it plateaus off at the maximum value when the cache size reaches a certain threshold value. This value is equal to the "effective" window size for the connection. We note that this curve highlights the worst-case performance degradation because the bandwidth-delay product of the connection is large (50 KB).
This linear performance degradation when cache memory is small is expected, as the following analysis shows. Clearly, the existence of the cache does not influence packets that were successfully received by the receiver. For any lost packet needing retransmission, let p be the probability of it existing in the Snoop cache. If W is the effective window size for the connection and m the available cache memory, then p = m/W . Now, for each lost packet in the cache, the corresponding connection throughput is B h (cache hit contribution). Similarly, the throughput for a cache miss is B m < B h . Thus, the effective connection throughput is pB h + (1-p)B m . Since p = m/W , the performance is linear in m , and is constant for m >= W .
For the error rates observed in our experiments, performance was close to the best possible for snoop cache sizes over 12 KBytes per connection, corresponding to the effective window size. Given current technology trends and the inexpensiveness of memory, we do not expect base station memory to be a practical limitation to schemes like the Snoop protocol, especially considering the performance benefits obtained. A nice property of the soft state in the Snoop protocol is that the performance degradation is linear in the amount of memory used.
Higher priority local retransmissions also allow us to avoid unnecessary local timeouts in the following way. When a retransmission is scheduled, the snoop agent precisely knows what the last in-sequence transmission was. Hence, it knows what the next expected new ACK should be, assuming no further losses occur. Now, if the next new ACK is smaller than the next expected one, a retransmission occurs without a timeout. This helps in recovering from multiple losses in a single window without a timeout.
In this paper, we presented a novel protocol based on Explicit Loss Notification (ELN) to improve transport performance when the mobile host is the TCP sender, a situation that is becoming increasingly common . Then, we obtained experimental packet traces of wireless errors from a production wireless network and derived an empirical model of channel errors based on this data. We used this to evaluate the performance of TCP Reno, TCP SACK and the Snoop protocol for Web workloads to mobile hosts. We discussed the scaling behavior of the Snoop protocol and reflected on some general lessons we have learned over the past two years about efficient protocol design for reliable wireless transport based on real-world deployments of the Snoop protocol.
We are grateful to Darrell Long, Patrick Mantey and Eric Rosen of U.C. Santa Cruz for continuous access to their WaveLAN-based Reinas wireless network. They were also very tolerant users of the Snoop protocol and suffered through crashes and downtimes during the early stages of deployment. We cannot thank them enough for their generous support that led to the Snoop implementation and deployment becoming extremely robust and tuned to perform well. We hope that our solutions continue to serve them well in the future.
This work was supported by DARPA contract DAAB07-95-C-D154, a Segasoft research grant from the Okawa Foundation, the California MICRO program, Metricom Inc., Hybrid Networks Inc., and Hughes Aircraft Corp.