mirror of
				git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
				synced 2025-09-04 20:19:47 +08:00 
			
		
		
		
	 1e0ce2a1ee
			
		
	
	
		1e0ce2a1ee
		
	
	
	
	
		
			
			Update tcp.txt to fix mandatory congestion control ops and default CCA selection. Also, fix comment in tcp.h for undo_cwnd. Signed-off-by: Anmol Sarma <me@anmolsarma.in> Signed-off-by: David S. Miller <davem@davemloft.net>
		
			
				
	
	
		
			102 lines
		
	
	
		
			3.9 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			102 lines
		
	
	
		
			3.9 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| TCP protocol
 | |
| ============
 | |
| 
 | |
| Last updated: 3 June 2017
 | |
| 
 | |
| Contents
 | |
| ========
 | |
| 
 | |
| - Congestion control
 | |
| - How the new TCP output machine [nyi] works
 | |
| 
 | |
| Congestion control
 | |
| ==================
 | |
| 
 | |
| The following variables are used in the tcp_sock for congestion control:
 | |
| snd_cwnd		The size of the congestion window
 | |
| snd_ssthresh		Slow start threshold. We are in slow start if
 | |
| 			snd_cwnd is less than this.
 | |
| snd_cwnd_cnt		A counter used to slow down the rate of increase
 | |
| 			once we exceed slow start threshold.
 | |
| snd_cwnd_clamp		This is the maximum size that snd_cwnd can grow to.
 | |
| snd_cwnd_stamp		Timestamp for when congestion window last validated.
 | |
| snd_cwnd_used		Used as a highwater mark for how much of the
 | |
| 			congestion window is in use. It is used to adjust
 | |
| 			snd_cwnd down when the link is limited by the
 | |
| 			application rather than the network.
 | |
| 
 | |
| As of 2.6.13, Linux supports pluggable congestion control algorithms.
 | |
| A congestion control mechanism can be registered through functions in
 | |
| tcp_cong.c. The functions used by the congestion control mechanism are
 | |
| registered via passing a tcp_congestion_ops struct to
 | |
| tcp_register_congestion_control. As a minimum, the congestion control
 | |
| mechanism must provide a valid name and must implement either ssthresh,
 | |
| cong_avoid and undo_cwnd hooks or the "omnipotent" cong_control hook.
 | |
| 
 | |
| Private data for a congestion control mechanism is stored in tp->ca_priv.
 | |
| tcp_ca(tp) returns a pointer to this space.  This is preallocated space - it
 | |
| is important to check the size of your private data will fit this space, or
 | |
| alternatively, space could be allocated elsewhere and a pointer to it could
 | |
| be stored here.
 | |
| 
 | |
| There are three kinds of congestion control algorithms currently: The
 | |
| simplest ones are derived from TCP reno (highspeed, scalable) and just
 | |
| provide an alternative congestion window calculation. More complex
 | |
| ones like BIC try to look at other events to provide better
 | |
| heuristics.  There are also round trip time based algorithms like
 | |
| Vegas and Westwood+.
 | |
| 
 | |
| Good TCP congestion control is a complex problem because the algorithm
 | |
| needs to maintain fairness and performance. Please review current
 | |
| research and RFC's before developing new modules.
 | |
| 
 | |
| The default congestion control mechanism is chosen based on the
 | |
| DEFAULT_TCP_CONG Kconfig parameter. If you really want a particular default
 | |
| value then you can set it using sysctl net.ipv4.tcp_congestion_control. The
 | |
| module will be autoloaded if needed and you will get the expected protocol. If
 | |
| you ask for an unknown congestion method, then the sysctl attempt will fail.
 | |
| 
 | |
| If you remove a TCP congestion control module, then you will get the next
 | |
| available one. Since reno cannot be built as a module, and cannot be
 | |
| removed, it will always be available.
 | |
| 
 | |
| How the new TCP output machine [nyi] works.
 | |
| ===========================================
 | |
| 
 | |
| Data is kept on a single queue. The skb->users flag tells us if the frame is
 | |
| one that has been queued already. To add a frame we throw it on the end. Ack
 | |
| walks down the list from the start.
 | |
| 
 | |
| We keep a set of control flags
 | |
| 
 | |
| 
 | |
| 	sk->tcp_pend_event
 | |
| 
 | |
| 		TCP_PEND_ACK			Ack needed
 | |
| 		TCP_ACK_NOW			Needed now
 | |
| 		TCP_WINDOW			Window update check
 | |
| 		TCP_WINZERO			Zero probing
 | |
| 
 | |
| 
 | |
| 	sk->transmit_queue		The transmission frame begin
 | |
| 	sk->transmit_new		First new frame pointer
 | |
| 	sk->transmit_end		Where to add frames
 | |
| 
 | |
| 	sk->tcp_last_tx_ack		Last ack seen
 | |
| 	sk->tcp_dup_ack			Dup ack count for fast retransmit
 | |
| 
 | |
| 
 | |
| Frames are queued for output by tcp_write. We do our best to send the frames
 | |
| off immediately if possible, but otherwise queue and compute the body
 | |
| checksum in the copy. 
 | |
| 
 | |
| When a write is done we try to clear any pending events and piggy back them.
 | |
| If the window is full we queue full sized frames. On the first timeout in
 | |
| zero window we split this.
 | |
| 
 | |
| On a timer we walk the retransmit list to send any retransmits, update the
 | |
| backoff timers etc. A change of route table stamp causes a change of header
 | |
| and recompute. We add any new tcp level headers and refinish the checksum
 | |
| before sending. 
 | |
| 
 |