ESPNIST指南中的节选

Encapsulating Security Payload (ESP)

ESP is the second core IPsec security protocol. In the initial version of IPsec, ESP provided only encryption for packet payload data. Integrity protection was provided by the AH protocol if needed, as discussed in Section 3.1. In the second version of IPsec, ESP became more flexible. It can perform authentication to provide integrity protection, although not for the outermost IP header. Also, ESP‘s encryption can be disabled through the Null ESP Encryption Algorithm. Therefore, in all but the oldest IPsec implementations, ESP can be used to provide only encryption; encryption and integrity protection; or only integrity protection. This section mainly addresses the features and characteristics of the second version of ESP; the third version, currently in development, is described near the end of the section.

ESP Mode

ESP is the second core IPsec security protocol. In the initial version of IPsec, ESP provided only encryption for packet payload data. Integrity protection was provided by the AH protocol if needed, as discussed in Section 3.1. In the second version of IPsec, ESP became more flexible. It can perform authentication to provide integrity protection, although not for the outermost IP header. Also, ESP's encryption can be disabled through the Null ESP Encryption Algorithm. Therefore, in all but the oldest IPsec implementations, ESP can be used to provide only encryption; encryption and integrity protection; or only integrity protection. This section mainly addresses the features and characteristics of the second version of ESP; the third version, currently in development, is described near the end of the section.

ESP has two modes: transport and tunnel. In tunnel mode, ESP creates a new IP header for each packet. The new IP header lists the endpoints of the ESP tunnel (such as two IPsec gateways) as the source and destination of the packet. Because of this, tunnel mode can be used with all three VPN architecture models described in Section 2. As shown in Figure 3-6, tunnel mode can encrypt and/or protect the integrity of both the data and the original IP header for each packet.28 Encrypting the data protects it from being accessed or modified by unauthorized parties; encrypting the IP header conceals the nature of the communications, such as the actual source or destination of the packet. If authentication is being used for integrity protection, each packet will have an ESP Authentication section after the ESP trailer.

ESP tunnel mode is used far more frequently than ESP transport mode. In transport mode, ESP uses the original IP header instead of creating a new one. Figure 3-7 shows that in transport mode, ESP can only encrypt and/or protect the integrity of packet payloads and certain ESP components, but not IP headers. As with AH, ESP transport mode is generally only used in host-to-host architectures. Also, transport mode is incompatible with NAT. For example, in each TCP packet, the TCP checksum is calculated on both TCP and IP fields, including the source and destination addresses in the IP header. If NAT is being used, one or both of the IP addresses are altered, so NAT needs to recalculate the TCP checksum. If ESP is encrypting packets, the TCP header is encrypted; NAT cannot recalculate the checksum, so NAT fails. This is not an issue in tunnel mode; because the entire TCP packet is hidden, NAT will not attempt to recalculate the TCP checksum. However, tunnel mode and NAT have other potential compatibility issues.29 Section 4.2.1 provides guidance on overcoming NAT-related issues.

Note 29：One possible issue is the inability to perform incoming source address validation to confirm that the source address is the same as that under which the IKE SA was negotiated. Other possible issues include packet fragmentation, NAT mapping timeouts, and multiple clients behind the same NAT device

Encryption Process

As described in Section 3.2, ESP uses symmetric cryptography to provide encryption for IPsec packets. Accordingly, both endpoints of an IPsec connection protected by ESP encryption must use the same key to encrypt and decrypt the packets. When an endpoint encrypts data, it divides the data into small blocks (for the AES algorithm, 128 bits each), and then performs multiple sets of cryptographic operations (known as rounds) using the data blocks and key. Encryption algorithms that work in this way are known as block cipher algorithms. When the other endpoint receives the encrypted data, it performs decryption using the same key and a similar process, but with the steps reversed and the cryptographic operations altered. Examples of encryption algorithms used by ESP are AES-Cipher Block Chaining (AES-CBC), AES Counter Mode (AES-CTR), and Triple DES (3DES).

ESP Packet Fields

ESP adds a header and a trailer around each packets payload. As shown in Figure 3-8, each ESP header is composed of two fields:
SPI. Each endpoint of each IPsec connection has an arbitrarily chosen SPI value, which acts as a unique identifier for the connection. The recipient uses the SPI value, along with the destination IP address and (optionally) the IPsec protocol type (in this case, ESP), to determine which SA is being used.
Sequence Number. Each packet is assigned a sequential sequence number, and only packets within a sliding window of sequence numbers are accepted. This provides protection against replay attacks because duplicate packets will use the same sequence number. This also helps to thwart denial of service attacks because old packets that are replayed will have sequence numbers outside the window, and will be dropped immediately without performing any more processing.
The next part of the packet is the payload. It is composed of the payload data, which is encrypted, and the initialization vector (IV), which is not encrypted. The IV is used during encryption. Its value is different in every packet, so if two packets have the same content, the inclusion of the IV will cause the encryption of the two packets to have different results. This makes ESP less susceptible to cryptanalysis.
The third part of the packet is the ESP trailer, which contains at least two fields and may optionally include one more:
Padding. An ESP packet may optionally contain padding, which is additional bytes of data that make the packet larger and are discarded by the packets recipient. Because ESP uses block ciphers for encryption, padding may be needed so that the encrypted data is an integral multiple of the block size. Padding may also be needed to ensure that the ESP trailer ends on a multiple of 4 bytes. Additional padding may also be used to alter the size of each packet, concealing how many bytes of actual data the packet contains. This is helpful in deterring traffic analysis.
Padding Length. This number indicates how many bytes long the padding is. The Padding Length field is mandatory.
Next Header. In tunnel mode, the payload is an IP packet, so the Next Header value is set to 4 for IP-in-IP. In transport mode, the payload is usually a transport-layer protocol, often TCP (protocol number 6) or UDP (protocol number 17). Every ESP trailer contains a Next Header value.
　　If ESP integrity protection is enabled, the ESP trailer is followed by an Authentication Information field. Like AH, the field contains the MAC output described in Section 3.1.2. Unlike AH, the MAC in ESP does not include the outermost IP header in its calculations. The recipient of the packet can recalculate the MAC to confirm that the portions of the packet other than the outermost IP header have not been altered in transit.
How ESP Works

Reviewing and analyzing actual ESP packets can provide a better understanding of how ESP works, particularly when compared with AH packets. Figure 3-9 shows the bytes that compose an actual ESP packet and their ASCII representations, in the same format used in Section 3.1.4. The alphabetic sequence that was visible in the AH-protected payload cannot be seen in the ESP-protected payload because it has been encrypted. The ESP packet only contains five sections: Ethernet header, IP header, ESP header, encrypted data (payload and ESP trailer), and (optionally) authentication information. From the encrypted data, it is not possible to determine if this packet was generated in transport mode or tunnel mode. However, because the IP header is unencrypted, the IP protocol field in the header does reveal which protocol the payload uses (in this case, ESP). As shown in Figures 3-6 and 3-7, the unencrypted fields in both modes (tunnel and transport) are the same.

Figure 3-9. ESP Packet Capture

Although it is difficult to tell from Figure 3-9, the ESP header fields are not encrypted. Figure 3-10 shows the ESP header fields from the first four packets in an ESP session between hosts A and B. The SPI and Sequence Number fields work the same way in ESP that they do in AH. Each host uses a different static SPI value for its packets, which corresponds to an ESP connection being composed of two one-way connections, each with its own SPI. Also, both hosts initially set the sequence number to 1, and both incremented the number to 2 for their second packets.

Figure 3-10. ESP Header Fields from Sample Packets

ESP Version 3
A new standard for ESP, version 3, is currently in development.31 Based on the current standard draft, there should be several major functional differences between version 2 and version 3, including the following:
The standard for ESP version 2 required ESP implementations to support using ESP encryption only (without integrity protection). The proposed ESP version 3 standard makes support for this optional.
ESP can use an optional longer sequence number, just like the proposed AH version 3 standard.
ESP version 3 supports the use of combined mode algorithms (e.g., AES Counter with CBC-MAC [AES-CCM]).32 Rather than using separate algorithms for encryption and integrity protection, a combined mode algorithm provides both encryption and integrity protection.
The version 3 standard draft also points to another standard draft that lists encryption and integrity protection cryptographic algorithm requirements for ESP.33 For encryption algorithms, the draft mandates support for the null encryption algorithm and 3DES-CBC, strongly recommends support for AES-CBC (with 128-bit keys), recommends support for AES-CTR, and discourages support for DES-CBC.34 For integrity protection algorithms, the draft mandates support for HMAC-SHA1-96 and the null authentication algorithm, strongly recommends support for AES-XCBC-MAC-96, and also recommends support for HMAC-MD5-96. The standard draft does not recommend any combined mode algorithms.

ESP Summary
In tunnel mode, ESP can provide encryption and integrity protection for an encapsulated IP packet, as well as authentication of the ESP header. Tunnel mode can be compatible with NAT. However, protocols with embedded addresses (e.g., FTP, IRC, SIP) can present additional complications.

In transport mode, ESP can provide encryption and integrity protection for the payload of an IP packet, as well as integrity protection for the ESP header. Transport mode is not compatible with NAT.
ESP tunnel mode is the most commonly used IPsec mode. Because it can encrypt the original IP header, it can conceal the true source and destination of the packet. Also, ESP can add padding to packets, further complicating attempts to perform traffic analysis.
Although ESP can be used to provide encryption or integrity protection (or both), ESP encryption should not be used without integrity protection.