Reliable Transport (TCP)

Next: Unreliable Transport (UDP) Up: Performance Previous: Performance

Reliable Transport (TCP)

The throughput of the system is defined as the the ratio of total number of bytes sent by the source to the time taken for the transfer. We present the TCP throughput results for several combinations of socket buffer and packet sizes. We then analyze the a representative connection by detailed tracing and explain the reasons for observed performance.

(a)

(b)

Figure 3: TCP Performance

Figure 3 shows the TCP throughput as a function of receiver socket buffer size for varying packet sizes. Measurements for both the commercial and local network are presented. We see different packet sizes and receiver socket buffer sizes yield very different performance. The reasons for these significant performance differences are that the receiver socket buffer size is the upper bound on the the maximum TCP window size advertised by the receiver. For a given TCP window size, the size of each packet determines the number of outstanding packets in the network. The flow of each of these packets must contend with the reverse flow of acknowledgements. This contention is the primary source of performance degradation since the Ricochet packet radios are half-duplex and incur a non-trivial cost for a switch between send and receive modes. Furthermore, during a mode switch packets must be buffered and if the window size is too large, packets or acknowledgements may be dropped, further degrading TCP performance.

The figure shows a clear peak at socket buffers of 4-8 KB with packet sizes of 512 B and 1 KB in the experimental network. This is primarily due to the fact that queue lengths and the link-level flow control algorithm used by the radios are optimized for these values. The commercial network performs best at a packet size of 512 bytes and degrades when the packet size is 1024 bytes; the difference is most likely because there was some other traffic in the system when we did our tests. This performance is tuned to buffer sizes used currently in most commercial operating systems and packet sizes used in most bulk and web transfer applications. Unfortunately, future operating systems (e.g., newer versions of Solaris) will perform path MTU discovery [5] which will then diversify the packet sizes used during transmissions. Our measurements imply that in the presence of this diversity the Metricom system will suffer a degradation in performance. In general, smaller segment sizes should be used, and fragmentation of segments done, only when the underlying link cannot support the transfer of larger sizes (e.g., Ethernets typically cannot support more than 1518 bytes per transmission) [3]. The hardware MTU for the Ricochet wireless link is 1750 bytes. In general, it is damaging to the overall performance of the network for unnecessarily small packets to be sent (which will happen because the system currently performs best at 512 bytes); this problem with the Ricochet network can be solved by techniques at the Metricom Gateway.

(a)

(b)
Figure 4: TCP Retransmit Timer variability and its effect on the connection.

We now analyze the performance of a single connection in order to understand the reasons for observed performance. In particular, we are interested in explaining the reasons for the large difference between TCP and UDP performance (Section 3.2). Even in the best case, TCP performance is less than 50% of the peak UDP throughput.

One of the most important parameters used by the TCP protocol is that of the the estimate of the connection round-trip time. This mean linear deviation of this value from the smoothed estimate is used to compute the value of the TCP retransmission timer, which is the amount of time TCP will wait for an unacknowledged segment before retransmitting. Ideally, the round-trip time will be relatively constant (i.e., low variance) and therefore accurately reflect the nature of the connection. Figure 4(a) shows a plot of individual TCP round-trip time estimate samples during a TCP connection over the Ricochet network. Observe that the individual estimates are highly variable, which makes the retransmission timer overly conservative. In general, it is correct for the retransmission timer to trigger a segment retransmission only after an amount of time dependent on both the round-trip time and the variance (or linear deviation), since this is the only way of avoiding spurious retransmissions. However, transport protocols should rely on timers for loss recovery only as a last resort, as explained in [7]. Unfortunately, for connections like the one described here, this situation happens often. Better ways of performing loss recovery are required for TCP-like protocols and this is a topic of active research.

The reasons for this variability can be directly attributed to interaction between the forward packets and reverse flowing acknowledgements in the half-duplex radios as described above. A complete analysis of the effects of such networks and of large round-trip time variance on TCP performance is another topic of current research.

The effect on the connection can be seen in Figure 4(b) which shows the progression of segment sequence numbers with time. We see that 35% of the connection is idle as a result of only nine segment losses in a 200 segment transfer with only three coarse timeouts (six losses are recovered by TCP's fast retransmission mechanism).

In summary, the primary cause of TCP performance degradation is the half-duplex nature of the packet radios which severely decreases the performance of a bi-directional protocol such as TCP. Topics of research include improving performance and analyzing the scalability of the network.

Next: Unreliable Transport (UDP) Up: Performance Previous: Performance

Elan Amir
Tue May 7 18:07:57 PDT 1996