Byte Order（字节序）

Byte Order

Byte order refers to the order multi-byte values (typically integers and floating point values, although floating point values are not used within the Linux kernel) are stored by the hardware. Big endian is the byte order where the big end, most significant byte, is stored first (at the lowest storage address). Little endian is the byte order where the little end, least significant byte, is stored first.

For example, the 4-byte integer value 0x01020304, will be stored as shown below on a big endian system.

Byte 0	Byte + 1	Byte + 2	Byte + 3
01	02	03	04

But, this same value stored on a little endian system, will be stored with the bytes in the opposite order as shown here.

Byte 0	Byte + 1	Byte + 2	Byte + 3
04	03	02	01

In general, it is not important whether an architecture is big endian or little endian; the CPU simply loads and stores the values from memory and presents them to your program in the format that you expect. However, when data is exchanged with another system, the systems must agree on a common format for the data.

The Linux kernel can be either big endian or little endian depending upon which architecture it is compiled for. The following table shows the byte order that will be used when the kernel is compiled for various architectures.

Big Endian	Little Endian	Either
AVR32 FR-V H8300 PA-RISC S390 Motorola 680x0 PowerPC SPARC	Alpha CRIS Blackfin Intel 64 IA-32 (x86) MN10300	ARM SuperH (sh) M32R MIPS Xtensa

Note: The ARM processor can be either big endian or little endian depending upon which chip is being used, but it is usually big endian. The PowerPC architecture can be configured to run in either big endian or little endian mode, but only big endian is used with Linux.

Why Worry About Byte Order

In general, the underlying byte order of the processor is completely transparent to the programmer. However, there can be a problem, for example, when data is exchanged with another system, since the other system may interpret multi-byte values differently.

For example, since it is not possible to predict the type of system at either end of the network, network protocols must define the byte order that is used for multi-byte values in their headers. This is called the network byte order, and for TCP/IP, it is big endian. Thus, the sending system converts the data from it local byte order to the network byte order. Then, the receiving system converts the data from network byte order to its local byte order. In practice, if either system uses the same byte order as the network byte order, the conversion operation is optimized out and no conversion takes place.

Another example is the USB protocol, which defines that multi-byte values will use the little endian byte order.

Determining Byte Order

You can write a simple user-space program to test the byte order of the current system.

1 union {

2 int i;

3 char c[sizeof(int)];

4 } foo;

6 foo.i = 1;

7 if (foo.c[0] == 1)

8 printf("Little endian ");

9 else

10 printf("Big endian ");

Lines 1-4 define a variable, foo, that can be accessed as either an integer or an array of characters. On line 6, the variable is initialized to the integer value 1, so the least significant byte will be one, and the most significant bytes will be zero.

If byte 0 is the least significant byte, it will be one, and the system is little endian. If byte 0 is the most significant byte, it will be zero, and the system is big endian.

Interfaces

The kernel defines a variety of variable types and macros for processing values that have byte order dependencies other than the byte order used by the processor on the current system.

Type Identifiers

The following type identifiers correspond to the u16, u32, and u64 types, except they are defined with the bitwise attribute, which is used to restrict their use as integers. The bitwise attribute is used by the sparse utility to make sure the variable is converted to the local processor type before other (unsafe) operations are performed on the variable.

The following types can be used for endian dependent variables after including the linux/kernel.h header file.

__le16

__le32

__le64

__be16

__be32

__be64

Conversion Macros

There are many macros for converting between the byte order used by the current processor and either the big or little endian byte order. In addition, for each type of conversion, there are different macros for 16-bit, 32-bit and 64-bit values. The names of the macros encode the source and target byte order and the size of the value, so it is fairly clear what each one does.

In order to write portable code, you should always use these macros to convert to or from the target byte order, even if you know it is not necessary for the processor you are using. If the source and target byte order are the same, the macro will not do anything, so there is no performance penalty.

The following macros return the value after it has been converted. Note: the linux/kernel.h header file is the header file that should be included in the source files where these macros are used, but it is not the header file where the macros are actually defined.

#include <linux/kernel.h>

__u16 le16_to_cpu(const __le16);

__u32 le32_to_cpu(const __le32);

__u64 le64_to_cpu(const __le64);

__le16 cpu_to_le16(const __u16);

__le32 cpu_to_le32(const __u32);

__le64 cpu_to_le64(const __u64);

__u16 be16_to_cpu(const __be16);

__u32 be32_to_cpu(const __be32);

__u64 be64_to_cpu(const __be64);

__be16 cpu_to_be16(const __u16);

__be32 cpu_to_be32(const __u32);

__be64 cpu_to_be64(const __u64);

The following macros are the same as the ones above, except the parameter is a pointer to the value to convert. Notice that the names of these macros are the same except for the "p" (for pointer) at the end of each name.

#include <linux/kernel.h>

__u16 le16_to_cpup(const __le16 *);

__u32 le32_to_cpup(const __le32 *);

__u64 le64_to_cpup(const __le64 *);

__le16 cpu_to_le16p(const __u16 *);

__le32 cpu_to_le32p(const __u32 *);

__le64 cpu_to_le64p(const __u64 *);

__u16 be16_to_cpup(const __be16 *);

__u32 be32_to_cpup(const __be32 *);

__u64 be64_to_cpup(const __be64 *);

__be16 cpu_to_be16p(const __u16 *);

__be32 cpu_to_be32p(const __u32 *);

__be64 cpu_to_be64p(const __u64 *);

The following macros are the same as the ones above, except the parameter is a pointer to the value to convert and the value is converted in its current location. Notice that the names of these macros are the same except for the "s" at the end of each name. ( The "s" stands for in situ, which is a Latin phrase for in the place.)

#include <linux/kernel.h>

void le16_to_cpus(__u16 *);

void le32_to_cpus(__u32 *);

void le64_to_cpus(__u64 *);

void cpu_to_le16s(__u16 *);

void cpu_to_le32s(__u32 *);

void cpu_to_le64s(__u64 *);

void be16_to_cpus(__u16 *);

void be32_to_cpus(__u32 *);

void be64_to_cpus(__u64 *);

void cpu_to_be16s(__u16 *);

void cpu_to_be32s(__u32 *);

void cpu_to_be64s(__u64 *);

The following macros provide aliases to the names that are commonly used for byte order conversions in networking code. The first two macros are used to convert from host to network byte order. The second two macros provide the reverse conversion. The "s" and "l" at the end of the names stand for short (16-bits) and long (32-bits).

#include <linux/kernel.h>

#define htons(x) cpu_to_be16(x)

#define htonl(x) cpu_to_be32(x)

#define ntohs(x) be16_to_cpu(x)

#define ntohl(x) be32_to_cpu(x)

As noted earlier, network byte order is always big endian, and the implementation of these macros will make sure the correct host byte order is used.

--------------------------------------------------------------------------------------------------------------------------------------------

The original paper:《Byte Order》