Operating System: Three Easy Pieces --- Limited Directed Execution (Note)

In order to virtualize the CPU, the operating system needs to somehow share the physical CPU

among many jobs running seemingly at the same time. The basic idea is simple: run one process

for a little while, then run another one, and so forth. By time sharing the CPU in this manner,

virtualization is achieved. There are a few challenges, however, in building such virtualization

machinery. The first is performance: how can we implement virtualization withour adding excessive

overhead to the system? The second is control: how can we run processes efficiently while retaining

control over the CPU? Control is particularly important to the OS, as it is in charge of resources;

without control, a process could simply run forever and take over the machine, or access information

that it should not be allowed to access. Attaining performance while maintaining control is thus one

of the central challenges in building an operating system.

Basic Technique: Limited Direct Execution

To make a program runs as fast as one might expect, not surprisingly OS developers came up with

a technique, which we called Limited Direct Execution. The "direct execution" part of the idea is simple:

just run the program directly on the CPU. Thus, when the OS wished to start a program, it creates a process

entry for it in a process list, allocates some memory for it, loads the program code into memory (from

disk), locates its entry point (i.e. the main() routine or something similar), jumps to it, and starts running

the user's code. Sounds simple, no? But this approach gives rise to a few problems in our quest to

virtualize the CPU. The first is simple: if we just run a program, how can the OS make sure the program

does not do anything that we do not want it to do, while still running it efficiently? The second: when

we are running a process, how does the operating system stop it from running and switch to another

process; thus implementing the time sharing we require to virtualize the CPU? In answering these questions

below, we will get a much better sense of what is needed to virtualize the CPU. In developing these

techniques, we will also see where the "limited" part of the name arises from; without limits on running

program the OS would not be in control of anything and thus would be "just a library" --- a very sad state

of affairs for an aspiring operating system.

Restricted Operations

Direct execution has the obvious advantage of being fast, the program runs natively on the hardware CPU

and thus executes as quickly as one would expects. But running on the CPU introducing a problem: what if

the process wishes to perform some kind of restricted operation, such as issuing an I/O request to a disk,

or gaining access to more resources such as CPU or memory?

Tip: Use Protected Control Transfer

The hardware assists the OS by providing different modes of execution. In user mode, applications do not

have full access to hardware resources. In kernel mode, the OS has access to the full resources of the

machine. Special instructions to trap into the kernel and return-from-trap back to user mode programs

are also provided, as well instructions that allow the OS to tell the hardware where the trap table in the

memory.

One approach would simply be to let any process do whatever it wants in terms of I/O and other related

operations. However, doing so would prevent the construction of many kinds of systems that are desired.

For example, if we wish to build a file system that checks permissions before granting access to a file, we

can not simply let user process issue I/O to the disk; If we did, a process could simply read or write the

entire disk and thus all protections would be lost.

Thus, the approach we take to is introduce a new processor

mode, known as user mode; code that runs in user mode is restricted in what it can do. For example,

when running in user mode, a process can not issue I/O request; doing so would result in the processor

raising an exception; the OS would then likely kill the process.

In contrast to user mode is kernel mode, which the operating system (or kernel) runs in. In this mode,

code that runs can do what it likes, including privileged operations such as issuing I/O requests and

executing all types of restricted instructions.

We are still left with a challenge, however, what should a user process do when it wishes to perform some

kind of privileged operation, such as reading from disk? To enable this, virtually all modern hardware

provides the ability for user programs to perfrom a system call. Pioneered on ancient machines such as the

Atlas, system calls allow the kernel to carefully expose certain key pieces of functionality to user programs,

such as accessing the file system, creating and destroying processes, communicating with other processes,

and allocating more memory. Most operating systems provide a few hundred of calls; early Unix systems

exposed a more concise subset of around twenty calls.

To execute a system call, a program must execute a special trap instruction. This instruction simultaneously

jumps into the kernel and raises the privilege level to kernel mode; once in the kernel, the system can now

perfrom whatever privileged operations are needed (if allowed), and thus do the required work for the calling

process. When finished, the OS calls a special return-from-trap instruction, which, as you might expect, returns

into the calling user program while simultaneously reducing the privilege level back to user mode.

The hardware needs to be a bit careful when executing a trap, in that it must make sure to save enough of the

caller's registers in order to be able to return correctly when the OS issues the return-from-trap instruction.

On x86, for example, the processor will push the program counter, flags, and a few other registers onto a

per-process kernel stack; the return-from-trap will pop these values off the stack and resume execution of

the user-mode program. Other hardware systems use different conventions, but the basic concept are similar

across platforms.

There is one important detail left out of this discussion: how does the trap konw which code to run inside the

OS? Clearly, the calling process can't specify an address to jump to (as you would when making a procedure

call); doing so would allow programs to jump anywhere into the kernel which clearly is a bad idea (imagine

jumping into code to access a file, but just after a permission check; in fact, it is likely such an ability would

enable a wily programmer to get the kernel to run arbitrary code sequence). Thus the kernel must carefully

control what code executes upon a trap.

The kernel does so by setting up a trap table at boot time. When the machine boots up, it does so in privileged

kernel code, and thus is free to configure machine hardware as need be. One of the first things the OS thus

does is to tell the hardware what code to run when certain exceptional events occur. For example, what code

should run when a hard-disk interrupt takes place, when a keyboard interrupt occurs, or when program makes

a system call? The OS informs the hardware of the locations of these trap handlers, usually with some kind of

special instruction. Once the hardware is informed, it remembers the location of these handlers until the machine

is next rebooted, and thus the hardware knows what to do (what code to jump to) when system calls and other

exceptional events take place.

One last aside: being able to execute the instruction to tell the hardware where the trap tables are is a very

powerful capability. Thus, as you might have guessed, it is also a privileged operation. If you try to execute this

instruction in user mode, the hardware won't let you, and you can probably guess what will happen.

There are two phases in the LDE protocal. In the first (at boot time), the kernel initializes the trap table, and

the CPU remembers its location for subsequent use. The kernel does so via a privileged instruction. In the second (

when running a process), the kernel sets up a few things (e.g., allocating a node on the process list, allocating

memory) before using a return-from-trap instruction to start the execution of the process; this switches the cpu

to user mode and begins running the process. When the process wishes to issue a system call, it traps back into

the OS, which handles it and once again returns control via a return-from-trap to the process. The process then

completes its work, and returns from main(); this usually will return to the stub code which will properly exit the

program (say, by calling the exit() system call, which traps into the OS). At this point, the OS cleans up and we

are done.