Computer System Basic Concept(2)

Note: this is copied from my google doc, the format is weird

Be able to describe how different virtual memory schemes are implemented, including base and bound, segmentation, single-level page tables, and multi-level page tables. Understand the advantages and shortcomings of each implementation.

Answer:

There are many ways to implement an MMU(Memory Management Unit)

- Base and bound registers

- Segmentation

- Page tables

- Multi-level page tables

Base and Bound

* BASE and BOUND are protected registers

-- only code in Ring 0 may modify BASE and BOUND

-- Prevents processes from modifying their own sandbox

* Each CPU has one BASE and BOUND register

-- BASE and BOUND must be saved and restored during context switching

* Advantages of Base and Bound

-- Simple hardware implementation

-- Simple to manage each process' virtual space

-- Process can be loaded at arbitrary fixed address

-- Offer protection and isolation

-- Offers flexible placement of data in memory

* Limitations of Base and Bound

- Processes can overwrite their own code

-- processes aren't protected from themselves

- No sharing of memory

-- Code(read-only) is mixed in with data(read/write)

- Process memory cannot grow dynamically

-- May lead to internal fragmentation

Segmentation

* Segmentation is a generalization of the base and bounds approach

- Give each process several pairs of base/bounds

- Each pair defines a segment

- Each segment can be moved or resized independently

* Advantages of Segmentation

- All the advantages of base and bound

- Better support for sparse address spaces

-- Code, heap, and stack are in separate segments

-- Segment sizes are variable

-- Prevents internal fragmentation

- Supports shared memory

- Per segment permission

-- Prevents overwriting code, or executing data

* Disadvantages of Segmentation

- External Fragmentation

Page Table

The paged memory model is a generalization of the segmented memory model

Page Table Implementation

* The OS creates the page table for each process

- Page table are typically stored in kernel memory

- OS stores a pointer to the page table in a special register in CPU

- On context switch, the OS swaps the pointer for the old processes table for the

new processes table

* The CPU(so in pointos it's assemble code) uses the page table to translate virtual

addresses into physical address

Advantages of Page tables

- All the advantages of segmentation

- Even better support for sparse address spaces

-- Each page is relatively small

-- Fine-grained page allocations to each process

-- Prevent internal fragmentation

- All pages are the same size

-- Each to keep track of free memory

-- Prevents external fragmentation

- Per segment permissions

-- Prevents overwriting code, or executing data

Problems With Page Tables

- Page tables are huge

- Page table indirection adds significant overhead to all memory accesses

- Page table are slow

Multiple-Level Page Tables

Key idea: spit the linear page table into a tree of sub-tables

Benefit: branches of the tree that are empty can be pruned

Trade Off: the tree must be traversed to translate virtual address which increased access time

Be able to discuss the TLB. What is it, why is it needed, and how is it implemented? Be able to discuss TLB replacement strategies (i.e. cache eviction), as well as problems due to aliasing. Be able to compare and contrast hardware and software TLB management.

What is it

Translation Lookaside Buffer (TLB) is a cache that cache page table entries directly in the CPU’s MMU to improve virtual address translation speed

Why is it needed

Issue of naive Page Table: table lookup is really time consuming

TLBs address the slowdown associated with page tables

- Frequently used page table entries are cached in the CPU

- Prevents repeated lookups for PTEs in main memory

Reduce the speed overhead of page tables by an order of magnitude or more

- Caching works very well in this particular scenario

- Lots of spatial and temporal locality

TLB replacement strategies

FIFO: easy to implement, but certain access patterns result in worst-case TLB hit rates

Random: easy to implement, fair, but suboptimal hit rates

LRU (Least Recently Used): algorithm typically used in practice

Potential solutions to problems due to aliasing

The problem with TLB aliasing occurs when two identical virtual memory addresses from two different processes point to different physical pages. For example, say we have process A and process B. They each have their own page tables, they are executing different code (no shared pages), and in both cases the .text segment has been mapped to a virtual page with starting address 0x0800FFFF.

Let's assume process A runs first, and populates the TLB with an entry mapping 0x0800FFFF -> 0x000A0000.

Now the OS context switches to process B. The true mapping for process B's code is 0x0800FFFF -> 0x00BB0000. However, process A's entry is still in the TLB. Thus when B executes, the CPU will erroneously use A's cached PTE, and try to execute code in A's frame.

There are two ways to deal with this. One is to simply clear the TLB before each context switch. This solves the aliasing issue, but also means that each process starts with a cold TLB, which hurts performance. The second approach is to "tag" each TLB entry with it's associated process. This requires hardware and OS support.

ImplementingSoftware TLB

- Key differences vs. hardware managed TLBs:

CPU doesn’t insert entries into the TLB

CPU has no ability to read page tables from memory

- On TLB miss, the OS must handle the exception

Locate the correct page table entry in memory

Insert the PTE into the TLB (evict if necessary)

Tell the CPU to retry the previous instruction

Note: TLB management instructions are privileged

Only the kernel can modify the TLB

Comparing Hardware and Software TLBS

Hardware TLB:

Advantages [Easier to program]

- Less work for kernel developers, CPU does a lot of work for you

Disadvantages

- Page table data structure format must conform to hardware specification

- Limited ability to modify the CPUs TLB replacement policies

Software TLB:

Advantages [Greater flexibility]

- No predefined data structure for the page table

- OS is free to implement novel TLB replacement policies

Disadvantages

- More work for kernel developers

- Beware infinite TLB misses!

. OSes page fault handler must always be present in the TLB

Be prepared to discuss swapping in detail. Why do most systems support swapping? When should memory be swapped out, and how does the system decide what should be swapped out? Be able to articulate the key pieces of information that the swapping system uses to make decisions and keep track of state. Furthermore, you should know which pieces of information are maintained by the hardware, and which are maintained by software. Be able to discuss eviction algorithms, ways to approximate them, and why approximations are necessary.

Why do most system support swapping?

If we completely run out of physical memory, we can use swapping to help

When should memory be swapped out?

On-demand approach

- If a page needs to be created and no free pages exist, swap a page to disk

Proactive approach

- Most OSes try to maintain a small pool of free pages

- Implement a high watermark

- Once physical memory utilization crosses the high watermark, a background

process starts swapping out pages

What should be swapped out?

Optimal eviction strategy

- Impossible to implement in practice

Practical strategies for selecting which page to swap to disk

- FIFO

- Random

- LRU (least recently used)

Key pieces of information that the swapping system uses

Bits related to swapping

– A – accessed bit – has this page been read recently?

– D – dirty bit – has this page been written recently?

– The MMU(hardware) sets the accessed bit when it reads a PTE

– The MMU(hardware) sets the dirty bit when it writes to the page

referenced in the PTE

– The OS(software) may clear these flags as it wishes

Approximate

LRU is difficult to implement

- On eviction, LRU needs to scan all PTEs to determine which have not been used

- But there are millions of PTEs(why approximate is necessary)

Is there a clever way to approximate LRU without scanning all PTEs

- Yes!

Clock Algorithm

欢迎补充

Be able to fill out a diagram describing the physical geometry of a spinning disk (i.e. cylinders, tracks, platters, heads, etc.). Be able to explain the two types of delay when accessing spinning disks, and how scheduling and caching can be used to improve disk performance. Know specific examples of types of disk cache, and types of disk scheduling algorithms, as well as their strengths and weaknesses.

Physical Geometry

Sector, hard drives expose a large number of sector

- Typically 512 or 4096 bytes(pintos 512 bytes)

- Individual sector writes are atomic

- Multiple sectors writes may be interrupted(torn write)

Track, sectors are arranged into tracks

Cylinder, a particular track on multiple platters

Platter, tracks arranged in concentric circles on platters

A disk may have multiple, double-sided platters

RPM(revolution per minute), drive motor spins the platters at a constant rate

Disk read/write heads are small parts of a disk drive, that move above the disk platter

and transform the platter's magnetic field into electrical current (read the disk) or vice versa

– transform electrical current into magnetic field (write the disk).

Two types of delay

Rotation delay

- Time to rotate the desired sector to read head

- Related to RPM

Seek delay

- TIme to move the read head to a different track

Caching

http://www.jb51.net/hardware/yingpan/99759.html

Read caching

- Reduces read delays due to seeking and rotation

Write caching

- Write back cache: drive reports that writes are complete after they have been

cached

- Write through cache: drive reports that writes are complete after they have been

written to disk

Disk Scheduling

- FCFS: lots of time spent seeking

- SSTF(shortest seek time first): optimal and easy to implement, but prone to starvation

- Scan, elevator algorithm: reasonable performance,no starvation, but average access

times are less(I think it should be high) for requests at high and low address

- C-Scan: only service requests in one direction. Good and fair, but bad performance than

Scan

- C-Look: peek at the upcoming addresses in the queue, and head only goes as far as the

last request

Where should disk scheduling be implemented?

– OS scheduling

• OS can implement SSTF or LOOK by ordering the queue by LBA

• However, the OS cannot account for rotational delay

– On-disk scheduling

• Disk knows the exact position of the head and platters

• Can implement more advanced schedulers (SPTF)

• But, requires specialized hardware and drivers

Be able to describe the different types of RAID arrays. What are the goals of each type of array? What type of data access results in worst-case performance for each type of array?

RAID(Redundant Array of Inexpensive Disks)

RAID: use multiple disks to create the illusion of a larger, faster, more reliable disk

RAID 0 : Striping

Key idea: present an array of disks as a single large disk

Maximize parallelism by striping data across all N disks

Reliability: 0, if any drive fails, data is permanently lost

What type of data access results in worst-case performance for each type of array?

RAID 1 : Mirroring

RAID 0 offers high performance, but zero error recovery,

Key idea: make two copies of all data

What type of data access results in worst-case performance for each type of array?

Random write: two copies of all data, thus half throughput

RAID 4: Parity Drive

RAID 1 offers highly reliable data storage, but it uses N/2 of the array capacity, so RAID 4

is designed to achieve the same level of reliability without wasting so much

capacity

What type of data access results in worst-case performance for each type of array?

RAID 4 has terrible write performance

RAID 5: Rotating Parity

Parity blocks are spread over all N disks

RAID 6: two parity blocks

Be able to state what the write amplification problem is, and how SSD controller software tries to deal with this issue. Also, understand the wear levelling problem, and strategies for dealing with this limitation of flash. Be prepared to explain what the TRIM command is, and why it is necessary for good SSD performance.

Write amplification problem

Flash memory is written in pages, but eased in blocks, Pages is about 4-16KB, but blocks

is about 128-256KB, because flash memory must be erased before it can be re-written, thus flash memory can be fragmented, and this leads to write amplification problem

How SSD deal with write amplification problem

Once all pages have written, valid pages must be consolidated to free up space. The SSD

controller will use a garbage collector to compact the block

Wear levelling

Each flash cell wears out after several thousand writes, SSD use wear leveling to spread

write across all cells.

The SSD controller periodically swap long lived data to different blocks

Trim

Although modern SSDs implement background GC(garbage collector), this doesn’t always

work correctly. For example, most file system don’t actually delete data.

Trim command allows the OS to tell the SSD that specific LBAs are invalid, may be GCed

Understand the layout of a FAT file system. Be prepared to explain how directories are implemented, and how file data is located and stored. Be prepared to explain all of the disk access that occur when a file is created, read, written, or deleted.

Directory

Definition

In computing, a directory is a file system cataloging structure which contains references to other computer files, and possibly other directories.

Traditionally, file systems have used a hierarchical, tree-structured namespace

- Directories are objects that contain other objects

- Files are leaves in the tree

By default, directories contain at least entries

- “..” points the parent directory, “.” is the self pointer

FAT

Super block

- Stores basic info about the file system

- FAT version, location of boot files

- Total number of clusters

- Index of the root directory in the FAT

File Allocation Table(FAT)

- Marks which clusters are free or in-use

- Linked-list structure to manage large files

Cluster

- Store file and directory

- Each cluster is a fixed size(4 - 64 KB)

- Files may span multiple clusters

How directories are implemented

Root directory,which is in super block, root directory index = 2

In FAT, (sub)directories are special files, contains the list of entries in the directory

How file data is located and stored

Explain all of the disk access that occur when a file is created, read, written, or deleted.

Understand the layout of an ext file system (inodes). Be prepared to explain how directories are implemented, and how file data is located and stored. Be prepared to explain all of the disk access that occur when a file is created, read, written, or deleted.

Advantages of inodes

• Optimized for file systems with many small files

– Each inode can directly point to 48KB of data

– Only one layer of indirection needed for 4MB files

• Faster file access

– Greater meta-data locality ‡ less random seeking

– No need to traverse long, chained FAT entries

• Easier free space management

– Bitmaps can be cached in memory for fast access

– inode and data space handled independently

EXT layout

Super block, storing:

- Size and location of bitmaps

- Number and location of inodes

- Number and location of data blocks

- Index of root inodes

Bitmap of free&used inodes

Bitmap of free&used data blocks

Table of inodes

- Each inode is a file/directory

- Includes meta-data and lists of associated data blocks

How directories are implemented

Directories are files contains the list of entries in the directory

How file data is located and stored

Disk access of file create, read, written or delete

EXT: The Good and the Bad

• The Good – ext file system (inodes) support:

– All the typical file/directory features

– Hard and soft links

– More performant (less seeking) than FAT

• The Bad: poor locality

– ext is optimized for a particular file size distribution

– However, it is not optimized for spinning disks

– inodes and associated data are far apart on the disk!

Be prepared to compare and contrast ext, ext2, ext3, and ext4. How has each file system improved on its predecessor? How are these improvements implemented, and what problem are these improvements designed to address?

EXT2

• In ext, there is a single set of key data structures

– One data bitmap, one inode bitmap

– One inode table, one array of data blocks

• In ext2, each block group contains its own key data structures

• ext2 attempts to keep related files and directories within the same block group

• Good and Bad

• The good – ext2 supports:

– All the features of ext…

– … with even better performance (because of increased spatial locality)

• The bad

– Large files must cross block groups

– As the file system becomes more complex, the chance of file system corruption grows

• E.g. invalid inodes, incorrect directory entries, etc.

EXT3

Ext3 and NTFS use journaling

• File system is optimized for spinning disks

– inodes are optimized for small files

– Block groups improve locality

• What’s next?

– Consistency and reliability

EXT4

• At this point:

– We not only have a fast file system

– But it is also resilient against corruption

• What’s next?

– More efficiency improvements!

Extent: includes a block pointer and a length

• ext4 inodes include 4 extents instead of block pointers

– Each extent can address at most 128MB of contiguous space (assuming 4KB blocks)

– If more extents are needed, a data block is allocated

– Similar to a block of indirect pointers

• ext4 and NTFS encode directories as B-Trees to improve lookup time to O(log N)

Understand the crash consistency problem, be able to describe how fsck addresses this issue, and the shortcomings of fsck. Be able to explain how journaling is implemented, and how it simplifies crash recovery.

Crash consistency problem

• The disk guarantees that sector writes are atomic

– No way to make multi-sector writes atomic

• How to ensure consistency after a crash?

1. Don’t bother to ensure consistency

• Accept that the file system may be inconsistent after a crash

• Run a program that fixes the file system during bootup

• File system checker (fsck)

2. Use a transaction log to make multi-writes atomic

• Log stores a history of all writes to the disk

• After a crash the log can be “replayed” to finish updates

• Journaling file system

FSCK

fsck Tasks

• Superblock: validate the superblock, replace it with a backup if it is corrupted

• Free blocks and inodes: rebuild the bitmaps by scanning all inodes

• Reachability: make sure all inodes are reachable from the root of the file system

• inodes: delete all corrupted inodes, and rebuild their link counts by walking the directory tree

• directories: verify the integrity of all directories

• … and many other minor consistency checks

fsck: the Good and the Bad

• Advantages of fsck

– Doesn’t require the file system to do any work to ensure consistency

– Makes the file system implementation simpler

• Disadvantages of fsck

– Very complicated to implement the fsck program

• Many possible inconsistencies that must be identified

• Many difficult corner cases to consider and handle

– fsck is super slow

• Scans the entire file system multiple times

• Imagine how long it would take to fsck a 40 TB RAID array

Journaling

• Key idea: make writes transactional by using a write-ahead log

What is write-ahead log

• Key idea: writes to disk are first written into a log

– After the log is written, the writes execute normally

– In essence, the log records transactions

Steps of Log

1. Begin a new transaction with a unique ID=k

2. Write the updated meta-data block(s)

3. Write the file data block(s)

4. Write an end-of-transaction with ID=k

Commits and Checkpoints

• We say a transaction is committed after all writes to the log are complete

• After a transaction is committed, the OS checkpoints the update

• Final step: free the checkpointed transaction

Journal Implementation

• Journals typically implemented as a circular buffer

– Journal is append-only

• OS maintains pointers to the front and back of the transactions in the buffer

– As transactions are freed, the back is moved up

• Thus, the contents of the journal are never deleted, they are just overwritten over time

Be able to describe, in detail, how passwords should be stored in a secure system. Be able to explain the process of taking a clear-text password input by the user and comparing it to the securely stored data. Be able to explain how these mechanisms improve security.

How password stored

1. Never store passwords in plain text

2. Always salt and hash passwords before storing them

3. Use hash functions with a high work factor

Process of verification

Authentication is the process of verifying an actor’s identity. A clear-text password provided by users will be stored somewhere and be validated by the system.

Comparing to clear-text password, securely stored data are:

Deterministic: hash(A) = hash(A)

High entropy:

md5(‘security’) = e91e6348157868de9dd8b25c81aebfb9

md5(‘security1’) = 8632c375e9eba096df51844a5a43ae93

md5(‘Security’) = 2fae32629d4ef4fc6341f1751b405e45

Collision resistant

Locating A’ such that hash(A) = hash(A’) takes a long time

Example: 221 tries for md5

Be able to describe how access control is performed on Unix systems, i.e. users, groups, and rwx permissions. Understand the limitations and exceptions (root, setuid) to the Unix model.

How access control is performed on Unix systems

Simple policies can be written as an access control matrix

	file 1	file 2	dir 1	file 3
user 1	---	r--	---	rw-
user 2	r--	r--	rwx	r--
user 3	r--	r--	---	---
user 4	rw-	rwx	---	---

group 1 = {user 2, user 3, user 4}; group 2 = {user 1, user 2}

file 1 – owner = user 4, group = group 1, rw-r-----

file 2 – owner = user 4, group = group 1, rwx---r---

dir 1 – owner = user 3, group = group 1, rwx------

file 3 – owner = user 1, group = group 2, rw-r-----

- Specifies actions that actors can take on objects

- Unix actions: read, write and execute

For directories, x traverse

Actors are users, each user has a unique ID

- Users also belong to >=1 groups

Files and directories have an owner and a group

Three sets of permissions:

- For the owner

- For members of the group

- For everybody else (other)

Users may only modify the permissions of files they own

Users may not change the owner of a file*

- Even if they own it

- Unless you are root

Users may only change to a group they belong to

Limitations and exceptions (root, setuid) to the Unix model

Limitations of the Unix model:

The Unix model is very simple

- Users and groups, read/write/execute

Not all possible policies can be encoded

Exceptions to the Unix model:

On Unix, the root user (ID=0) can do whatever it wants

- Access any file

- Change any permission

On Windows, called the Administrator account

Your everyday user account should never be Admin/root

Way to Access Root / How can you get root privileges?

1. Log in as root

$ ssh root@mymachine.ccs.neu.edu

2. The Switch User command (su)

$ su

Opens a new shell with as root:root

3. The Switch User Do Command (sudo)

$ sudo apt-get install python

Runs the given command as root:root

How to setuid

chmod u+s path/file that you want to set as root

- Be very careful with setuid

You are giving other users the ability to run a program as you, with your privileges

- Programs that are setuid=root should drop privileges

http://www.cs.berkeley.edu/~daw/papers/setuid-usenix02.pdf

Be able to identify exploitable vulnerabilities in a given piece of code, and give an example of how the vulnerability can be exploited. Be able to explain various countermeasures against exploitation, including canaries, NX-bits, and ASLR. For each countermeasure, you should be able to explain how it is implemented, how it makes exploitation more challenging for the attacker, and how it can be circumvented.

Types of Exploitable Flaws:

Stack overflow

Heap overflow

char * buf = malloc(100);

strcpy(buf, argv[1]);

Double free

free(buf);

Format string

printf(argv[1]);

Off-by-one

int vectors[100];

for (i = 0; i <= 100; i++) vector[i] = x;

… and many more

Triggering Exploitable Flaws:

Local vulnerabilities:

- Command line arguments

- Environment variables

- Data read from a file

- Date from shared memory or pipes

Remote vulnerabilities

- Data read from a socket

Basically, any place where an attacker can give input to your process

Countermeasures against exploitation:

Stack canaries

Implementation:

Canary code and data are inserted by the compiler

- gcc supports canaries

- Disable using the –fno-stack-protector argument

Canary secret must be random

- Otherwise the attacker could guess it

Canary secret is stored on its own page at semi-random location in virtual memory

- Makes it difficult to locate and read from memory

Example:

int canary = secret_canary;

char buf[8];

for (x = 1; x < argc; ++x) {

strcpy(buf, argv[x]);

num = atoi(buf);

check_for_secret(num);

}

…

assert(canary==secret_canary);

/* Overflow destroys the canary, assert fails, program safely exits*/

return0;

Working Mechanism:

Circumvention

Non-executable stack pages (NX-bit)

Implementation: make stack pages non-executable

- Compiler marks stack segment as non-executable

- Loader sets the corresponding page as non-executable

Mechanism:

NX prevents shellcode from being placed on the stack

- NX must be enabled by the process

- NX must be supported by the OS

Circumvention:

- Return-to-libc

Return to libc works by crafting special stack frames and using existing library code

No need to inject code, just data onto the stack

- Return-oriented programming (ROP)

Return-oriented programming (ROP) is a generalization of return to libc

. Why only jump to existing functions?

. You can jump to code anywhere in the program

. Gadgets are snippets of assembly that form a Turing complete language

. Gadgets + control of the stack = arbitrary code execution power

ASLR:

Implementation:

place code and data at random places in memory

- Address Space Layout Randomization (ASLR)

- Supported by all modern OSes

Modern compilers can produce Position Independent Code (PIC)

Also called Position Independent Executable (PIE)

Tradeoffs with PIC/PIE:

Pro: Enables the OS to place the code and data segments at a random place in memory (ASLR)

Con: Code is slightly less efficient

Some addresses must be calculated

In general, the security benefits of ASLR far outweigh the cost

Circumvention:

ASLR is much less effective on 32-bit architectures

- Less ability to move pages around randomly

- May allow the attacker to brute-force the exploit

Use a huge NOP sled

- If the sled is enormous, even a random jump will hit it

Use heap spraying

- Technique that creates many, many, many copies of shellcode in memory

- Attempts to fill all available heap memory

- Jump to a random address is likely to hit a copy