2. Instructions: Language of the computer (指令：计算机语言)

I speak Spanish to God, Italian to women, French to men, and German to my horse. Charles V, Holy Roman Emperor (1500-1558)

我会用西班牙语和上帝交流，用意大利语和女人聊天，用法语去命令男人，用德语驾驭我的马。

2.1　　 Introduce 简介

2.2 　　Operations of the computer Hardware 电脑硬件操作

2.3　　 Operands of the Computer Hardware 计算机硬件操作数

2.4　　 Signed and unsigned Number 符号数和无符号数

2.5　　Representing Instructions in the computer 计算机指令的表示

2.6　　 Logical Operations 逻辑操作

2.7 　　Instructions for Making Decisions

2.8 　　Supporting Procedures in computer Hardware 计算机硬件支持程序

2.9　　 MIPS Addressing for 32-Bit Immediates and Addresses MIPS 寻址32 位立即数和地址

2.10 　　Parallelism and Instructions: synchronization 并行和指令：同步

2.11 　　Translating and starting a Program 转换和开始一个程序。

2.12　　 A C short Example to put It All Together

2.13 Advanced Material : complling C

2.14 Real stuff : ARMv7 (32-bit) Instructions

2.15 Real stuff: x86 Instructions

2.16 Real stuff: ARMv8(64-bit) Instructions

2.17 Fallacies and pitfalls

2.18 concluding Remarks

2.19Historical Perspective and Further Reading

2.20 Exercises

2.1 Introduction

To command a computer's hardware, you must speak its language. The words of a computer's language are called instructions, and its vocabulary is called an instruction set. In this chapter, you will see the instruction set of a real computer, both in the form written by people and in the form read by the computer. We introduce instructions in a top-down fashion. Starting from a notation that looks like a restricted programming language, we refine it step-by-step until you see the real language of a real computer. Chapter 3 continues our downward descent, unveiling the hardware for arthmetic and the representation of floatin-point numbers.

　　You might think that the languages of computers would be as diverse as those of people, but in reality computer languages are quite similar, more like regional dialects than like independent languages. Hence, once you learn one, it is easy to pick up others.

　　The chosen instruction set comes from MIPS Technologies, and is an elegant example of the instruction sets designed since the 1980s. To demonstrate how easy it is to pick up other instruction sets, we will take a quick look at three other popular instruction sets.

　　ARMv7 is similar to MIPS. MOre than 9 billion chips with ARM processors were manufactured in 2011, making it the most popular instruction set in the world.
　　The second example is the Intel X86， which powers both the PC and the cloud of the PostPC Era.
The third example is ARMv8, Which extends the address size of the ARMv7 from 32 bits to 64 bits. Ironically, as we shall see, this 2013 instruction set is closer to MIPS than it is to ARMv7.

This similarity of instruction set occurs because all computers are constructed from hardware technologies based on similar underlying principles and because there are a few basic operations that all computers must provide. Moreover, computer designers have a common goal: to find a language that makes it easy to build the hardware and the complier while maximizing performance and minimizing cost and energy. This goal is time honored; the following quote was written before you could buy a computer, and it is true today as it was in 1947:

It is easy to see by formal-logical methods that there exist certain [instruction sets] that are in abstract adequate to control and cause the execution of any sequence of operations... The really decisive considerations from the present point of view, in selecting an [instruction set], are more of a practical nature: simplicity of the equipment demanded by the [instruction set], and the clarity of its application to the actually important problems together with the speed of its handling of those problems.

Burks, Goldstine, and von Neumann, 1974

　　The "simplicity of the equipment" is as valuable a consideration for today's computers as it was for those of the 1950s. The goal of this chapter is to teach an instruction set that follows this advice, showing both how it is represented in hardware and the relationship between high-level programming languages and this more primitive one. Our examples are in the C programming language; Section 2.13 shows how these would change for an object-oriented language like java.

　　By learning how to represent instructions, you will also discover the secret of computing: the stored-program concept. Moreover, you will exercise your "foreign language" skills by writing programs in the language of the computer and running them on the simulator that comes with this book. You will also see the impact of programming language and compiler optimization on performance. We conclude with a look at the historical evolution of instruction sets and an overview of other computer dialects.

　　We reveal first instruction set a piece at a time, giving the rationale along with the computer structures. This top-down, step-by-step tutorial weaves the components with their explanations, making the computer's language more palatable. Figure 2.1 gives a sneak preview of the instruction set coverd in this chapter.

2.2 Operations of the Computer Hardware

Every computer must be able to perform arithmetic. The MIPS assembly language notation :

　　　　　　　　　　　　add　　a,　　b,　　c

instructions a computer to add the two variables b and c and to put their sum in a.

　　This notation is rigid in that each MIPS arithmetic instruction performs only one operation and must always have exactly three variables. For example, suppopse we want to place the sum of four variables b, c, d, and e into variable a. (In this section we are being deliberately vague what a "variable" is; in the next section we'll explain in detail.)

　　The following sequence of instructions adds the four variables:

　　add　　a,　　b,　　c　　# The sum of b and c is placed in a

　　add　　a,　　a,　　d　　#The sum of b, c, and d is now in a

　　add　　a,　　a,　　e　　#The sum of b, c, d, and e is now in a

Thus, it takes three instructions to sum the four variables.

　　The words to the right of the sharp symbol (#) on each line above are comments for the human reader, so the computer ignoes them. Note that unlike other programming languages, each line of this language can contain at most one instruction. Another difference from C is that comments always terminate at the end of a line.

　　The natural number of operands for an operation like addition is three： the two numbers being added together and a place to put the sum. Requiring every instruction to have exactly three operands, no more and no les, conforms to the philosophy of keeping the hardware simple: hardware for a variable number of operands is more complicated than hardware for a fixed number. This situation illustrates the first of three underlying principles of hardware design:

　　Design Principle 1: Simplicity favors regularity.

We can now show, in the two example contains the five variables a, b, c, d, and e. Since Java evolved from C, this example and the next few work for either high-level programming language:

　　a　　=　　b　　+　　c;

　　d　　=　　a　　-　　e;

The translation from C to MIPS assembly language instructions is performed by the compiler. Show the MIPS code produced by a compiler.

A MIPS instruction operates on two source operands and places the result in one destination operand. Hence, the two simple statements abovecompile directly into these two MIPS assembly instructions:

　　add　　a,　　b,　　c

　　sub 　　d,　　a,　　e

Compiling a complex C Assignment into MIPS

A somewhat complex statement contains the five variables f, g, h, i, and j:

　　f　　=　　(g　　+　　h)　　-　　(i　　+　　j);

What might a C compiler produce?

This complier must break this statement into several assembly instructions, since only one operation is performed per MIPS instruction. The first MIPS instruction calculates the sum of g and h. We must place the result somewhere, so the compiler creates a temporary variable, called t0:

　　add　　t0,　　g,　　h　　# temporary variable t0 contains g+h

Although the next operation is substract, we need to calculate the sum of i and j before we can substract. Thus, the second instruction places the sum of i and j in another temporary variable created by the compiler, called t1:

　　add　　t1,　　i,　　j　　# temporary variable t1 contains i+j

Finally, the subtract instruction subtracts the second sum from the first and places the difference in the variable f, completing the compiled code:

　　sub　　f,　　t0,　　t1　　# f gets t0-t1, which is (g+h)-(i+j)

For a given function, which programming language likely takes the most lines of code? Put the three representations below in order.

1. 　　Java
2. C
3. 　　MIPS assembly language

Elaboration: To increase portability, Java was originally envisioned as relying on a software interpreter. The instruction set of this interpreter is called Java bytecode (See Section 2.13), which is quite different from the MIPS instruction set. To get performance close to the equivalent C program, Java systems today typically compile Java bytecodes into the native instruction sets like MIPS. Because this compilation is normally done much later for C programs, such Java compilers are often called Just In Time (JIT) compiler. Section 2.11 show how JITs are used later than C compilers in the start-up process, and Section 2.12 shows the performance consequence of compiling versus interpreting Java programs.

2.3 Oprands of the Computer Hardware

Unlike programs in the high-level languages, the operands of arithmetic instructions are restricted; they must be from a limited number of special locations built directly in hardware called registers. Registers are primitives used in hardware design that was also visble to the programmer when the computer is completed, so you can think of registers as the bricks of computer constructions. The size of a register in the MIPS architecture is 32 bits; groups of 32 bits occur so frequently that they are given the name word in the MIPS architecture.

　　One major difference between the variables of a programming language and registers is the limited number of registers, typically 32 on current computers, like MIPS. (See Section 2.19 for the history of the number of registers.) Thus, continuing in our top-down, stepwise evolution of the symbolic representation of the MIPS language, in this section we have added the restriction that the three operands of MIPS arthmetic instructions must each be chosen from one of the 32 32-bit registers.

　　The reason for the limit of 32 registers may be found in the second of our three underlying design principles of hardware technology:

　　Design Principle 2: Smaller is faster.

A very large number of registers may increase the clock cycle time simply because it takes electronic signals longer when they must travel farther.

　　Guidelines such as "smaller is faster" are not absolutes; 31 registers may not be faster than 32. Yet, the truth behind such observations causes computer designers to take them seriously. In this case, the designer must balance the craving of programs for more registers with the designer's desire to keep the clock cycle fast. Another reason for not using more than 32 is the number of bits it would take in the instruction format, as Section 2.5 demonstrates.

　　Chapter 4 shows the central role that registers play in hardware construction; as we shall see in this chapter, effective use of registers is critial to program performance.

　　Although we could simply write instructions using numbers for registers, from 0 to 31, the MIPS convention is to use two-character names following a dollar sign to represent a register. Section 2.8 will explain the reasons behind these name. For now, we will use $s0, $s1, . . . for registers that correspond to variables in C and Java programs and $t0, $t1, . . . for temporary registers needed to complie the ptogram into MIPS instructions.

Compiling a C Assignment using Registers

It is the compiler's job to associate program variable with registers. Take, for instance, the assignment statement from our earlier example:

　　　　f　　=　　(g　　+　　h)　　-　　(i　　+　　j);

The variables f, g, h, i, and j are assigned to the registers $s0, $s1, $s2, $s3, and $s4, respectively. what is the compiled MIPS code?

The compiled program is very similar to the prior example, except we replace the variables with the register names mentioned above plus two temporary registers, $t0 and $t1, which correspond to the temporary variables above:

　　add　　$t0,　　$s1,　　$s2　　# register $t0 contains g+h

　　add　　$t1,　　$s3,　　$s4　　# register $t1 contains i+j

　　sub　　$s0,　　$t0,　　$t1　　# f gets $t0 - $t1, which is (g+h) - (i+j)

Memory Operands

Programming languages have simple variables that contain single data elements as in these examples, but they also have more complex data structures--arrays and structures. These complex data structures can contain many more data elements than there are registers in a computer. How can a computer represent and access such large structures?

　　Recall the five components of a computer introduced in Chapter 1 and repeated on page 61. The processor can keep only a small amount of data in registers, but computer memory contains billions of data elements. Hence, data structures (arrays and structures) are kept in memory.

　　As explained above, arthmetic operations occur only on registers in MIPS instructions; thus, MIPS must include instructions that transfer data between memory and registers. Such instructions are called data transfer instructions. To access a word in memory, the instruction must supply the memory address. Memory is just a large, single-dimensional array, with the address acting as the index to that array, starting at 0. For example, in Figure 2.2, the address of the third data element is 2, and the value of Memory [2] is 10.

　　The data transfer instruction that copies data from memory to a register is traditionally called load. The format of the load instruction is the name of the operation followed by the register to loaded, then a constant and register used to access memory. The sum of constant portion of the instruction and the contents of the second register forms the memory address. The actual MIPS name for this instruction is lw, standing for load word.

Compiling an Assigment when an operand Is in Memory

Let's assume that A is an array of 100 words and that the compiler has associated the variables g and h with the registers $s1 and $s2 as before. Let's also assume that the starting address, or base address, of the array is in $s3. Compile this C assignment statement:

　　g　　=　　h　　+　　A[8];

Althought there is a single operaion in this assignment statement, one of the operands is in memory, so we must first transfer A [8] to a register. The address of this array element is the sum of the base of the array A, found in register $s3, plus the number to select element 8. The data should be placed in a temporary register for use in the next instruction. Based on Figure 2.2, the first compiled instruction is

lw 　　$t0,　　8　　($S3) # Temporary reg $t0 gets A[8]

(We'll be making a slight adjustment to this instruction, but we'll use this simpilified version for now. ) The following instruction can operate on the value in $t0 (which equals A[8]) since it is in a register. The instruction must add h (contained in $s2) to A[8] (contained in $t0) and put the sum in the register corresponding to g (associated with $s1):

　　add　　$s1,　　$s2,　　$t0　　# g　　=　　h　　+　　A[8]

The constant in a data transfer instruction (8) is called the offset, and the register added to form the address ($ s3) is called the based register.

In addition to associating variables with registers, the compiler allocates data structures like arrays and structures to locations in memory. The compiler can then place the proper starting address into the data transfer instructions.

　　Since 8-bit bytes are useful in many programs, virtually all architectures today address individual bytes. Therefore, the address of a word matches the address of one of the 4 bytes within the word, and addresses of sequential words differ by 4. For example, Figure 2.3 shows the actual MIPS addresses for the words in Figure2.2; the byte address of the third word is 8.

　　In MIPS， words must start at addresses that are multiples of 4. This requirement is called an alignment restriction, and many architectures have it. (Chapter 4 suggests why alignment leads to faster data transfers.)