virtual memory(4)

Linux organizes the virtual memory as a collection of areas (also called segments).

An area is a contiguous chunk of existing (allocated) virtual memory whose pages are related in some way.

Figure 9.27 highlights the kernel data structures that keep track of the virtual memory areas in a process.

The kernel maintains a distinct task structure (task_ struct in the source code) for each process in the system.

The elements of the task structure either contain or point to all of the information that the kernel needs to run the process (e.g., the PID, pointer to the user stack, name of the executable object file, and program counter).

One of the entries in the task structure points to an mm_struct that charac- terizes the current state of the virtual memory.

The two fields of interest to us are pgd, which points to the base of the level 1 table (the page global directory),

and mmap, which points to a list of vm_area_structs (area structs), each of which characterizes an area of the current virtual address space.

When the kernel runs this process, it stores pgd in the CR3 control register.

For our purposes, the area struct for a particular area contains the following fields:

. vm_start: Points to the beginning of the area
. vm_end: Points to the end of the area
. vm_prot: Describes the read/write permissions for all of the pages contained in the area
. vm_flags: Describes (among other things) whether the pages in the area are shared with other processes or private to this process
. vm_next: Points to the next area struct in the list

Linux Page Fault Exception Handling

Suppose the MMU triggers a page fault while trying to translate some virtual address A.

The exception results in a transfer of control to the kernel’s page fault handler, which then performs the following steps:

1 Is virtual address A legal? In other words,does A lie within an area defined by some area struct?

To answer this question, the fault handler searches the list of area structs, comparing A with the vm_start and vm_end in each area struct.

If the instruction is not legal, then the fault handler triggers a segmentation fault, which terminates the process.

This situation is labeled “1” in Figure 9.28.

Because a process can create an arbitrary number of new virtual memory areas (using the mmap function described in the next section),

a sequential search of the list of area structs might be very costly.

So in practice, Linux superimposes a tree on the list, using some fields that we have not shown, and performs the search on this tree.

2 Is the attempted memory access legal?

In other words, does the process have permission to read, write, or execute the pages in this area?

For example, was the page fault the result of a store instruction trying to write to a read-only page in the text segment?

Is the page fault the result of a process running in user mode that is attempting to read a word from kernel virtual memory?

If the attempted access is not legal, then the fault handler triggers a protec- tion exception, which terminates the process.

This situation is labeled “2” in Figure 9.28.

3 At this point, the kernel knows that the page fault resulted from a legal operation on a legal virtual address.

It handles the fault by selecting a victim page, swapping out the victim page if it is dirty, swapping in the new page,and updating the page table.

When the page fault handler returns, the CPU restarts the faulting instruction, which sends A to the MMU again.

This time, the MMU translates A normally, without generating a page fault.

Memory Mapping

Linux (along with other forms of Unix) initializes the contents of a virtual memory area by associating it with an object on disk, a process known as memory mapping.

Areas can be mapped to one of two types of objects:

(1) Regular file in the Unix file system: An area can be mapped to a contiguous section of a regular disk file, such as an executable object file.

The file section is divided into page-sized pieces, with each piece containing the initial contents of a virtual page.

Because of demand paging, none of these virtual pages is actually swapped into physical memory until the CPU first touches the page (i.e., issues a virtual address that falls within that page’s region of the address space).

If the area is larger than the file section, then the area is padded with zeros.

(2) Anonymous file: An area can also be mapped to an anonymous file, created by the kernel, that contains all binary zeros.

The first time the CPU touches a virtual page in such an area, the kernel finds an appropriate victim page in physical memory, swaps out the victim page if it is dirty, overwrites the victim page with binary zeros, and updates the page table to mark the page as resident.

Notice that no data is actually transferred between disk and memory.

For this reason, pages in areas that are mapped to anonymous files are sometimes called demand-zero pages.

In either case, once a virtual page is initialized, it is swapped back and forth between a special swap file maintained by the kernel.

The swap file is also known as the swap space or the swap area.

An important point to realize is that at any point in time, the swap space bounds the total amount of virtual pages that can be allocated by the currently running processes.

(ref: Linux 中 mmap() 函数的内存映射问题理解？ - in nek的回答 - 知乎 https://www.zhihu.com/question/48161206/answer/110418693)

Shared Objects Revisited

An object can be mapped into an area of virtual memory as either a shared object or a private object.

If a process maps a shared object into an area of its virtual address space, then any writes that the process makes to that area are visible to any other processes that have also mapped the shared object into their virtual memory.

Further, the changes are also reflected in the original object on disk.

Changes made to an area mapped to a private object, on the other hand, are not visible to other processes, and any writes that the process makes to the area are not reflected back to the object on disk.

A virtual memory area into which a shared object is mapped is often called a shared area. Similarly for a private area.

Private objects are mapped into virtual memory using a clever technique known as copy-on-write.

For each process that maps the private object, the page table entries for the corresponding private area are flagged as read-only, and the area struct is flagged as private copy-on-write.

So long as neither process attempts to write to its respective private area, they continue to share a single copy of the object in physical memory.

However, as soon as a process attempts to write to some page in the private area, the write triggers a protection fault.

A private object begins life in exactly the same way as a shared object, with only one copy of the private object stored in physical memory.

When the fault handler notices that the protection exception was caused by the process trying to write to a page in a private copy-on-write area,

it creates a new copy of the page in physical memory, updates the page table entry to point to the new copy, and then restores write permissions to the page, as shown in Figure 9.30(b).

When the fault handler returns, the CPU reexecutes the write, which now proceeds normally on the newly created page.

By deferring the copying of the pages in private objects until the last possible moment, copy-on-write makes the most efficient use of scarce physical memory.

The fork Function Revisited

When the fork function is called by the current process, the kernel creates various data structures for the new process and assigns it a unique PID.

To create the virtual memory for the new process, it creates exact copies of the current process’s mm_struct, area structs, and page tables.

It flags each page in both processes as read-only, and flags each area struct in both processes as private copy- on-write.

When either of the processes performs any subsequent writes, the copy-on-write mechanism creates new pages, thus preserving the abstraction of a private address space for each process.