Enabling High Performance Data Transfers

From: http://www.psc.edu/networking/projects/tcptune

Enabling High Performance Data Transfers

System Specific Notes for System Administrators (and Privileged Users)

These notes are intended to help users and system administrators maximize TCP/IP performance on their computer systems. They summarize all of the end-system (computer system) network tuning issues including a tutorial on TCP tuning, easy configuration checks for non-experts, and a repository of operating system specific instructions for getting the best possible network performance on these platforms

This material is currently under active revision. Please send any suggestions, additions or corrections to us at nettune@psc.edu so we can keep the information here as up-to-date as possible.

Introduction

[show][hide]

Tutorial

[show][hide]

High Performance Networking Options

The options below are presented in the order that they should be checked and adjusted.

  1. Maximum TCP Buffer (Memory) space: All operating systems have some global mechanism to limit the amount of system memory that can be used by any one TCP connection. [more][less]

  2. Socket Buffer Sizes: Most operating systems also support separate per connection send and receive buffer limits that can be adjusted by the user, application or other mechanism as long as they stay within the maximum memory limits above. These buffer sizes correspond to the SO_SNDBUF and SO_RCVBUF options of the BSD setsockopt() call. [more][less]

  3. TCP Large Window Extensions (RFC1323): These enable optional TCP protocol features (window scale and time stamps) which are required to support large BDP paths. [more][less]

  4. TCP Selective Acknowledgments Option (SACK, RFC2018) allow a TCP receiver inform the sender exactly which data is missing and needs to be retransmitted. [more][less]

  5. Path MTU The host system must use the largest possible MTU for the path. This may require enabling Path MTU Discovery (RFC1191, RFC1981, RFC4821). [more][less]

Note that both ends of a TCP connection must be properly tuned independently, before it will support high speed transfers.

Using Web Based Network Diagnostic Servers

Most tuning problems (and many other network problems) can be diagnosed by with a single test from an appropriate diagnostic server. There are several different servers that test various aspects of the end-system and network path.

[show][hide]

Detailed procedures for system tuning under various operating systems

See the specific instructions for each system:

Note that the instructions below only indicate that they have been tested for specific OS versions. However, most OS vendors rarely make significant changes to their TCP/IP stacks, so these directions are often correct for many versions before or after the stated version. If you find that you need to tweak our directions (especially for newer OS versions), please let us know at nettune@psc.edu.


Procedure for raising network limits under FreeBSD

All system parameters can be read or set with 'sysctl'. E.g.:

sysctl [parameter]
sysctl -w [parameter]=[value]

You can raise the maximum socket buffer size by, for example:

	sysctl -w kern.ipc.maxsockbuf=4000000

FreeBSD 7.0 implements automatic receive and send buffer tuning which are enabled by default. The default maximum value is 256KB which is likely too small. These should likely be increased, e.g. with follows:

    net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216

You can also set the TCP and UDP default buffer sizes using the variables

	net.inet.tcp.sendspace
net.inet.tcp.recvspace
net.inet.udp.recvspace

When using larger socket buffers, you probably need to make sure that the TCP window scaling option is enabled. (The default is not enabled!) Check 'tcp_extensions="YES"' in /etc/rc.conf and ensure it's enabled via the sysctl variable:

        net.inet.tcp.rfc1323

FreeBSD's TCP has a thing called "inflight limiting" turned on by default, which can be detrimental to TCP throughput in some situations. If you want "normal" TCP behavior you should

         sysctl -w net.inet.tcp.inflight_enable=0

You may also want to confirm that SACK is enabled: (working since FreeBSD 5.3):

        net.inet.tcp.sack.enable

MTU discovery is on by default in FreeBSD. If you wish to disable MTU discovery, you can toggle it with the sysctl variable:

        net.inet.tcp.path_mtu_discovery

Contributors: Pekka Savola and David Malone.
Checked for FreeBSD 7.0, Sept 2008


Tuning TCP for Linux 2.4 and 2.6

NB: Recent versions of Linux (version 2.6.17 and later) have full autotuning with 4 MB maximum buffer sizes. Except in some rare cases, manual tuning is unlikely to substantially improve the performance of these kernels over most network paths, and is not generally recommended

Since autotuning and large default buffer sizes were released progressively over a succession of different kernel versions, it is best to inspect and only adjust the tuning as needed. When you upgrade kernels, you may want to consider removing any local tuning.

All system parameters can be read or set by accessing special files in the /proc file system. E.g.:

	cat /proc/sys/net/ipv4/tcp_moderate_rcvbuf

If the parameter tcp_moderate_rcvbuf is present and has value 1 then autotuning is in effect. With autotuning, the receiver buffer size (and TCP window size) is dynamically updated (autotuned) for each connection. (Sender side autotuning has been present and unconditionally enabled for many years now).

The per connection memory space defaults are set with two 3 element arrays:

	/proc/sys/net/ipv4/tcp_rmem       - memory reserved for TCP rcv buffers
/proc/sys/net/ipv4/tcp_wmem - memory reserved for TCP snd buffers

These are arrays of three values: minimum, initial and maximum buffer size. They are used to set the bounds on autotuning and balance memory usage while under memory stress. Note that these are controls on the actual memory usage (not just TCP window size) and include memory used by the socket data structures as well as memory wasted by short packets in large buffers. The maximum values have to be larger than the BDP of the path by some suitable overhead.

With autotuning, the middle value just determines the initial buffer size. It is best to set it to some optimal value for typical small flows. With autotuning, excessively large initial buffer waste memory and can even hurt performance.

If autotuning is not present (Linux 2.4 before 2.4.27 or Linux 2.6 before 2.6.7), you may want to get a newer kernel. Alternately, you can adjust the default socket buffer size for all TCP connections by setting the middle tcp_rmem value to the calculated BDP. This is NOT recommended for kernels with autotuning. Since the sending side is autotuned, this is never recommended for tcp_wmem.

The maximum buffer size that applications can request (the maximum acceptable values for SO_SNDBUF and SO_RCVBUF arguments to the setsockopt() system call) can be limited with /proc variables:

	/proc/sys/net/core/rmem_max       - maximum receive window
/proc/sys/net/core/wmem_max - maximum send window

The kernel sets the actual memory limit to twice the requested value (effectively doubling rmem_max and wmem_max) to provide for sufficient memory overhead. You do not need to adjust these unless your are planing to use some form of application tuning.

NB: Manually adjusting socket buffer sizes with setsockopt() disables autotuning. Application that are optimized for other operating systems may implicitly defeat Linux autotuning.

The following values (which are the defaults for 2.6.17 with more than 1 GByte of memory) would be reasonable for all paths with a 4MB BDP or smaller (you must be root):

	echo 1 > /proc/sys/net/ipv4/tcp_moderate_rcvbuf
echo 108544 > /proc/sys/net/core/wmem_max
echo 108544 > /proc/sys/net/core/rmem_max
echo "4096 87380 4194304" > /proc/sys/net/ipv4/tcp_rmem
echo "4096 16384 4194304" > /proc/sys/net/ipv4/tcp_wmem

Do not adjust tcp_mem unless you know exactly what you are doing. This array (in units of pages) determines how the system balances the total network buffer space against all other LOWMEM memory usage. The three elements are initialized at boot time to appropriate fractions of the available system memory.

You do not need to adjust rmem_default or wmem_default (at least not for TCP tuning). These are the default buffer sizes for non-TCP sockets (e.g. unix domain and UDP sockets).

All standard advanced TCP features are on by default. You can check them by:

	cat /proc/sys/net/ipv4/tcp_timestamps
cat /proc/sys/net/ipv4/tcp_window_scaling
cat /proc/sys/net/ipv4/tcp_sack

Linux supports both /proc and sysctl (using alternate forms of the variable names - e.g. net.core.rmem_max) for inspecting and adjusting network tuning parameters. The following is a useful shortcut for inspecting all tcp parameters:

sysctl -a | fgrep tcp

For additional information on kernel variables, look at the documentation included with your kernel source, typically in some location such as /usr/src/linux-<version>/Documentation/networking/ip-sysctl.txt. There is a very good (but slightly out of date) tutorial on network sysctl's at http://ipsysctl-tutorial.frozentux.net/ipsysctl-tutorial.html.

If you would like to have these changes to be preserved across reboots, you can add the tuning commands to your the file /etc/rc.d/rc.local .

Autotuning was prototyped under the Web100 project. Web100 also provides complete TCP instrumentation and some additional features to improve performance on paths with very large BDP.

Contributors: John Heffner and Matt Mathis

Checked for Linux 2.6.18, 12/5/2006

Tuning TCP for Mac OS X

Mac OS X has a single sysctl parameter, kern.ipc.maxsockbuf, to set the maximum combined buffer size for both sides of a TCP (or other) socket. In general, it can be set to at least twice the BDP. E.g:


sysctl -w kern.ipc.maxsockbuf=8000000

The default send and receive buffer sizes can be set using the following sysctl variables:


sysctl -w net.inet.tcp.sendspace=4000000
sysctl -w net.inet.tcp.recvspace=4000000

If you would like these changes to be preserved across reboots you can edit /etc/sysctl.conf.

RFC1323 features are supported and on by default. SACK is present and enabled by defult in OS X version 10.4.6.

Although we have never tested it, there is a commercial product to tune TCP on Macintoshes. The URL is http://www.sustworks.com/products/prod_ottuner.html. I don't endorse the product they are selling (since I've never tried it). However, it is available for a free trial, and they appear to do an excellent job of describing perf-tune issues for Macs.

Tested for 10.3, MBM 5/15/05

Procedure for raising network limits under Solaris

All system TCP parameters are set with the 'ndd' tool (man 1 ndd). Parameter values can be read with:

  ndd /dev/tcp [parameter]
and set with:
  ndd -set /dev/tcp [parameter] [value]

RFC1323 timestamps, window scaling and RFC2018 SACK should be enabled by default. You can double check that these are correct:

  ndd /dev/tcp tcp_wscale_always  #(should be 1)
ndd /dev/tcp tcp_tstamp_if_wscale #(should be 1)
ndd /dev/tcp tcp_sack_permitted #(should be 2)

Set the maximum (send or receive) TCP buffer size an application can request:

  ndd -set /dev/tcp tcp_max_buf 4000000

Set the maximum congestion window:

  ndd -set /dev/tcp tcp_cwnd_max 4000000

Set the default send and receive buffer sizes:

  ndd -set /dev/tcp tcp_xmit_hiwat 4000000
ndd -set /dev/tcp tcp_recv_hiwat 4000000

Contributors: John Heffner (PSC), Nicolas Williams (Sun Microsystems, Inc)

Checked for Solaris 10.?, 4/12/06

Procedure for raising network limits for Windows XP (and Windows 2000)

The easiest way to tune TCP under Windows XP (and many earlier versions of windows) is to get DrTCP from "DSL Reports" [download page]. Set the "Tcp receive window" to your computed BDP (e.g. 400000), turn on "Window Scaling" and "Selective Acks". If you expect to use 90 Mb/s or faster, you should also turn on "Time Stamping". You must restart for the changes to take effect.

If you need to get down in the details, you have to use the 'regedit' utility to read and set system parameters. If you are not familiar with regedit you may want to follow the step-by-step instructions [here].

BEWARE: Mistakes with regedit can have very serious consequences that are difficult to correct. You are strongly encouraged to backup the entire registry before you start (use the backup utility) and to export the Tcpip\Parameter subtree to a file, so you can put things back if you need to (use "export" under regedit).

The primary TCP tuning parameters appear in the registry under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters.

To enable high performance TCP you must turn on RFC1323 features (create REG_DWORD key "Tcp1323Opts" with value 3) and set the maximum TCP buffersize (create REG_DWORD key "GlobalMaxTcpWindowSize" with an appropriate value such as 4000000, decimal).

If you want to set the system wide default buffer size create REG_DWORD key "TcpWindowSize" with an appropriate value. This parameter can also be set per interface at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interface\interfaceGUID, which may help to protect interactive applications that are using different interfaces from the effects of overbuffering.

For the most up to date detailed technical information, go to the Microsoft knowledge base (at support.microsoft.com) and search product "windows XP" for "TCP/IP performance tuning".

Speedguide summarizes this material with an intermediate level of detail, however the target audience is for relatively low data rates.

There is also very good page on tuning Windows XP, by Carl Harris at Virginia Tech.

Contributors: Jim Miller (at PSC).
Checked for WindowsXP service pack 2, July 2006


Acknowledgments

Jamshid Mahdavi maintained this page for many years, both at PSC and later, remotely from Novell. We are greatly indebted to his vision and persistence in establishing this resource.

Thanks Jamshid!

Many, many people have helped us compile this information. We want to thank everyone who sent us updates, additions and corrections. We have decided to include attributions for all future contributors. (Sorry not to be able to give full credit where credit is due for past contributors.)

This material has been maintained as a sideline of many different projects, nearly all of which have been funded by the National Science Foundation. It was started under NSF-9415552, but also supported under Web100 (NSF-0083285) and the NPAD project (ANI-0334061).

Matt Mathis <mathis@psc.edu> and Raghu Reddy <rreddy@psc.edu>
(with help from many others, especially Jamshid Mahdavi)
$Id: index.php,v 1.21 2008/02/04 21:35:27 mathis Exp $
 

原文地址:https://www.cnblogs.com/LeoWong/p/2048506.html