Operating System: Three Easy Pieces --- Locks: Test and Set (Note)

Because disabling interrupts does not work on multiple processors, system designers started to

invent hardware support for locking. The earliest multiprocessor systems, such as the Burroughts

B5000 in the early 1960's, had such support; today all systems provide this type of support, even

for single CPU systems.

The simple bit of hardware support to understand is what is known as a test-and-set instruction,

also known as atomic exchange. To understand how test-and-set works, let's first try to build a

simple lock without it. In this failed attempt, we use a simple flag variable to denote whether the

lock is held or not.

In this first attempt, the idea is quite simple: use a simple variale to indicate whether some

thread has possession of a lock. The first thread that enters the critical section will call lock(),

which tests whether the flag is equal to 1 (in this case, it is not), and then sets the flag to 1 to

indicate that the thread now holds the lock. When finished with the critical section, the thread

calls unlock() and clears the flag, thus indicating that the lock is no longer held.

typedef struct __lock_t { int flag; } lock_t;

void init(lock_t* mutex) {
     mutex->flag = 0;
}

void lock(lock_t* mutex) {
    while (mutex->flag == 1)
             ;
    mutex->flag = 1;
}

void unlock(lock_t* mutex) {
    mutex->flag = 0;
}

If another thread happens to call lock() while that first thread is in the critical section, it will

simply spin-wait in the while loop for that thread to call unlock() and clear the flag. Once the first

flag does so, the waiting thread will fall out of the while loop, set the flag to 1 for itself, and

proceed into the critical section.

Unfortunately, the code has two problems: one of correctness, and another of performance. The

correctness problem is simple to see once you get used to thinking about concurrent programming

. Imagine the code interleaving; assume flag = 0 to being.

As you can see from this interleaving, with timely (untimely?) interrupts, we can easily produce

a case where both threads set the flag to 1 and both threads are thus able to enter the critical

section. This behavior is what professionals call "bad" - we have obviously failed to provide the

most basic requirement: providing mutual exclusion.

The performance problem, which we will address more later on, is the fact that the way a thread

waits to acquire a lock that is already held: it endlessly checks the value of flag, a technique

known as spin-waiting. Spin-waiting wastes time waiting for another thread to release a lock. The

waste is exceptionally high on a uniprocessor, where the thread that the waiter is waiting for

cannot even run (at least, until a context switch occurs!) Thus, as we move forward and develop

more sophisticated solutions, we should also consider ways to avoid this kind of waste.

　　　　　　　　　　　　　　　　　　　　Building A Working Spin Lock

While the idea behind the example above is a good one, it is not possible to implement without

some support from the hardware. Fortunately, some systems provide an instruction to support

the creation of simple based one this concepty. This more powerful instruction has different

names -- on SPARC, it is load/store unsigned byte instruction (ldstub), whereas on x86, it is the

atomic exchange instruction (xchg) -- but basically does the same thing across platforms, and is

generally referred to as test-and-set. We define what the test-and-set instruction does with the

following C code snippet:

int TestAndSet(int* old_ptr, int new) {
     int old = *old_ptr;
     *old_ptr = new;
     return old;
}

What the test-and-set instruction does is as follows. It returns the old value pointed to by the ptr,

and simultaneously updates said value to new. The key, of course, is that this sequence of

operations is performed atomically. The reason it is called test-and-set is that it enables you to

test the old value (which is what is returned) while simultaneouly setting the memory location to

a new value; as it turns out, this slightly more powerful instruction is enough to build a simple

spin lock, as we now examine in figure 28.3. Or better yet: figure it out first yourself!

Let's make sure we understand why this lock works. Imagine first the case where a thread calls

lock() and no other thread currently holds the lock; thus, flag should be 0. When the thread calls

TestAndSet(flag, 1), the routine will return the old value of flag, which is 0; thus, the calling

thread, which is testing the value of flag, will not get caught spinning in the while loop and will

acquire the lock. The thread will also atomically set the value to 1, thus indicating that the lock

is now held. When the thread is finished with its critical section, it calls unlock() to set the flag

back to zero.

typedef struct __lock_t {
    int flag;
}

void init(lock_t* lock) {
    lock->flag = 0;
}

void lock(lock_t* lock) {
    while (TestAndSet(&lock->flag, 1) == 1)
          ;
}

void unlock(lock_t* lock) {
    lock->flag = 0;
}

The second case we can imagine arises when one thread already has the lock held (i.e., flag is 1).

In this case, this thread will call lock() and then call TestAndSet(flag, 1) as well. This time,

() will return the old value at flag, which is 1 (because the lock is held), while simultaneouly

setting it to 1 again. As long as the lock is held by another thread, TestAndSet() will repeatedly

return 1, and thus this thread will spin and spin until the lock is finally released. When the flag is

finally set to 0 by some other thread, this thread will call TestAndSet() again, which will now

return 0 while atomically setting the value to 1 and thus acquire the lock and enter the critical

section.

By making both the test of the old lock value and set of the new value a single atomic operation,

we ensure that only one thread acquires the lock. And thst's how to build a working mutual

exclusion primitive!

You may also now understand why this type of lock is usually referred to as a spin lock. It is the

simplest type of lock to build, and simply spins using CPU cycles, until the lock becomes available.

To work corectly on a single processor, it requires a preemptive scheduler (i.e., one that will

interrupt a thread via a timer, in order to run a different thread, from time to time). Without

preemption, spin locks don't make much sense on a single CPU, as a thread spinning on a CPU

will never relinquish it.

　　　　　　　　　　　　　　　　　　TIPs: Think About Concurrent As Malicious Scheduler

From this example, you might get a sense of the approach you need to take to understand

concurrent execution. What you should try to do is to pretend you are a malicious scheduler, one

that interrupts threads at the most inopportune of times in order to foil their feeble attempts at

building synchronization promitives. What a mean scheduler you are! Although the exact sequence

of interrupts may be improbable, it is possible, and that is all we need to demonstrate that a

particular approach does not work. It can be useful to think maliciouly! (At least, sometimes.)