Delayed ACK and Nagle’s Algorithm

In this article, I am taking a shot at trying to explain the interaction between Delayed ACK and Nagle’s algorithm and how this could add latency during TCP session that requires transmission of small packets.

MSS:

Maximum Segment Size or MSS denotes the data that is being sent in the “Segment” utilized in the TCP layer of the OSI model. Default MSS is 536 Bytes. Default MTU is 576 Bytes.

MTU = MSS + 20B (TCP Header) + 20B (IP Header)

TCP Transmission:

RFC 1122 talks about the conditions under which data can be transmitted in TCP implementations. Data is transmitted when any of the following 3 conditions is met.

1. Immediately, if a full MSS or more can be sent.

2. {[No unacknowledged data] && 
    [(PSH flag is set) || 
     (Buffered data > 1/2*(SND Window))]}

3. {(PSH flag set) && (Override timeout expired)}

1/2*(SND Window) is implementation dependent and can differ across Operating System and within different versions of Operating System. Override timeout is roughly 200ms. This value could change between OS too. ACK Number represents bytes and not packets.

Delayed ACK:

Delayed ACK helps in avoiding “Silly-Window-Syndrome” (SWS) at the Receiver. The Receiver will delay sending an ACK in response to data received when all the 3 conditions match.

1. When there are no 2 packets / 2*MSS received.

2. When the client has no data to send.

3. When the Delayed ACK timer has not yet expired.

In the UNIX World, 2*MSS has to be received by Receiver in order for it to send an ACK and in the Windows World, 2 packets of any size has to be received by Received in order for it to send an ACK.

Nagle’s Algorithm (RFC896):

The goal of Nagle’s algorithm is to lower the number of small packets exchanged during a TCP session. This helps in avoiding “Silly-Window-Syndrome” (SWS) at the Transmitter.

Nagles algorithm can be summarized as follows:

A. If there are unacknowledged data 
   (i.e., data in flight > 0 Bytes), 
   new data is buffered.

B. If data to be sent is less than MSS, it is 
   buffered till the data to be sent is 
   greater than or equal to MSS.

Problem:

Under the right conditions, the 1-3 points outlined under Delayed ACK and A-B points outlined in Nagle’s Algorithm will freeze the interaction between the sender and the receiver for the duration of timeout which is roughly 200ms. This is often seen in applications that rely on smaller packet sizes.

A Simple Scenario:

Sender is a client machine that updates the Receiver with information. Receiver could be some kind of a data warehouse which stores information on financial transactions. In this case, the Sender has data to send to the Receiver and the Receiver acknowledges the data received and does not transmit any data to the client in response other than a simple ACK.

During the course of a TCP session, Sender has just sent 500B of data to the Receiver and this matches condition A outlined in Nagle’s algorithm. The application at the Sender side moves 400B of data to the TCP stack. At this point, Sender has not yet received an ACK from the Receiver and because the next 400B of data meets the B condition outlined in Nagle’s algorithm.

Thus, the 400B of data will be buffered till either one of the Nagle’s condition is met:

A. ACK is received from Receiver for the 
   previously sent 500B of data.

B. Application sends the TCP stack more data 
   that will push the existing buffered data 
   (400B) more than MSS i.e., application needs 
   to send 136B or more to the TCP stack in 
   order to push the buffered data to or beyond
   the MSS limit.

On the Receiver side, the receiver will refrain from sending an ACK after receiving the first 500B of data because the 1-3 conditions outlined under Delayed ACK hasn’t been met.

1. Only 1 packet of 500B (less than MSS) has been 
   received.

2. Receiver does not have any data to transmit, 
   other than ACK.

3. Delayed ACK timer has not yet expired.

Sender keeps the 400B of data buffered. Receiver will not ACK the previously sent 500B of data. Effectively, there will be a communication freeze between the Sender and the Receiver till the timeout expires. Usually this timeout is 200ms in different OS implementations.

For further understanding, I would highly recommend this youtube video on this subject by Hansang Bae.