We have been focusing on CPU and Memory which are the heart of a computer system. But there are many other devices our CPU memory interface with.
How do we connect these devices to the system ?
What is the general protocol we have for communicating with them ?
Every device has an interface that it presents to the CPU. The CPU accesses the interface to ask the device to perform some action.
A device Controllers is a piece of hardware that presents a device interface which implements the device functionality.
A generic device interface has a few different standard parts
Naive Generic Device Communication Protocol:
CPU communicates with controller
What is ineffecient about the Naive Communiation protocol ?
How do we avoid the ineffiecency of polling the device status registers? We give the controller the ability to trigger an interupt on the cpu.
How do we take the work load off of the CPU for transferring data from memory to data registers on the device? We add specialized hardware, called Direct Memory Access controller. DMA read/writes directly from ram to/from device controller data register. Handles all copies.
CPU must change page settings to protect data prior to transfer the data doesn't change mid copy. This is calling pinning the page.
There are two ways a CPU can interact with device controller
Hard Drives
A hard drive is a PERSISTENT STORAGE DEVICE. Means when power gets shut off bits are preserved and come back when power comes back on.
How do we design an OS to use HD effeciently, speed ? power ?
Let's build an abstract model of a HD from a CS perspective.
hard drive componentsThe interface to harddrive is very similiar to interface to memory. It is just a storage device that you can read from, or write to.
The interface presents the hard drive as a series of numbered sectors, each sector holds 512 bytes. Each sector has an address. When we want to read or write we issue a read or write command read(sector address, length) or write (sector address, length, data)
The internals of a harddrive
The platters: A HD is made up of one or more hard metal platters which are magentacally coated on both sides. Each platter has a dedicated read write head that can move back and forth, just like a record play. The platters are all connected to the same motor spinning at the same speed.
The Data is organized into tracks. Just like a song on an LP. Each track is divided into 512 byte sectors, as mentioned previously.
The controller needs to know which sector is on which track and on which platter.
Each side of each platter has a dedicated head. heads move all together and only one sector is read or writen at a time.
There are 3 component actions involved in read/write
A hard drive is a Random Access device, but if things are far away from each other it can take along time to move the heads to the write postiion.
In a poorly designed system. There can be alot of OVERHEAD during SEEK and ROTATION. If you are reading and writing small chunks of data you will incur Seek and rotate costs if the head has to move for each small chunk. If the head can stay on a single track and ready lots of sectors in a row it will reduce the cost of seek and rotate.
Given: T(i/o) = T(seek) + T(Rotate) + T(Transfer)
How long does it take to transfer 1mb using the constants if we only have to move the arm once to read 1mb?
T(i/o) = 7ms(seek) + 3ms(rotate) + 10ms (@100mb/s transer) = 1mb/20ms = 50mb/sec (notice this is 50% of our transfer rate)
In this case 50% of the time is spent seeking and rotating(10ms) the other 50% is transferring(10ms)
What if we have do tiny little i/o's (1kb) for every arm movement using the same constants as above.
Transfer time @100mb/s for 1kb is essentially 0. but we still have to use 10ms for seek and rotate.
1kb/10ms = 100kb/second = .1Mb/sec
OMG!! our previous example had 50Mb/sec throughput
As you can see how the data is distributed across the disks is crucial to effeincy. We have to make a very concerted effort to group block of data into large chunks together to minimize seek and rotate overhead.
I/O scheduling
Given a set of I/O requests how do we decide which to do first if our goal is performance?
The nature of an I/O job is different than a scheduling the CPU. With a CPU scheduler we don't know much about the jobs we are running. With an I/O requests, with certain knowledge of disk arm location we can predict how long a job will take. We may be know the length of the job.
Remember in our chapters on CPU scheduling, shortest job first was the most effecient alogrithm. Because we know more about I/O jobs we can potentially use SJF.
At the OS level harddrive internals are hidden.We are just giving commands to read block 0, write block 1000, etc...
Inside the drive we have much more knowledge
Modern I/0 devices allow the OS to issue many requests to them and the device knowing it's internals can do the externals
Read the section on harddrive scheduling and HAPPY THNKSGIVING.
The purpose of a harddrive is persistence storage. But harddrives can fail so if we really want to ensure the persistence of data, we have to make multiple copies of it.
Modern Data centers, facebook, google, amazon, snapchat, tiktok replicate the data on their servers across data centers across the world, to ensure persistence.
Hard drives are very slow but cheap, and that's why they are still around.
Operations on harddrives are on the order of milliseconds while cpu is on the order of nano seconds, harddrives are 1 MILLION times slower than cpu's. Meaning.
It takes 1 million times longer to write 512 bytes on a hard drive compared to writing 512 bytes in ram.
If you are trying to write high performance systems. You try and AVOID doing IO as much as possible. Load as much into ram as possible in one shot.
It is always fastest to do the larges possible read and writes with data stored sequentially
RAID: Redundant array of independent disks
Raid uses many drives not one. Here's why
There are different raid types each of which emphasize each of these characterstics
Raid Interface
A hardware RAID is a physical device with some number of harddrives in it that presents itself as a single harddrive to the OS. It hides the fact that it is mulitple Drives running in parrallel.
Becuase it is made up of multiple drives it can present as having more effeciecny in the different characterstics than a single drive.
To the OS it looks just like an array of blocks that it can read or write.
To a computer a A Raid device just presents the interface of a single HD. But Notice if you look at what's in a Raid device it looks just like a computer. It has memory, a cpu and multiple hard disks.
Fault Model: how do we expect hard drives to fail, and how do we build a systemb on top of that ?
Lets make some very simplified assumptions about what it means for a drive to fail
In reality there are many other complicated and potential failures.
There are different types of raid, called raid levels. Each implement a different strategy managing redundant arrays of independent disks
There are levels 0,1,2,3,4,5,6 We will be talking about levels 0, 1, 4, 5.
Metrics:
With any raid we have some number of hard drives. The interface to a HD just looks like an linear array of blocks(sectors). We are doing reads and writes to the array of blocks and need to map them to the disks themselves. Which is similiar to Virtual Memory. We have to take a interface address and map it onto a particuliar physical disk and an offset into that disk.
Raid level 0: striping
striping maps addresses sequentially across the drives, each next address is on the next drive.
To translate from Interface Address => Disk address(HD_number:Offset)
So if we had 4 disks labeled 0 - 3 and we wanted interface address 100 it would be offset 25 on disk 0.
this is a static mapping
Striping Metric Evaluations
RAID 0 is highest level of performance of any raid or single drive.
Sequential large read/write. Given N disks with the ability to read/write at S MB/s our bandwidth will be N * S, In this case we are using all the disks in parralel. We have the power is parallelism
Random small read/write. Each disk with have a bandwidth of R MB/s where R is much less than S just as we saw last lecture, but we still have parrallelism so still faster than single drive
Striping is great for performance and capacity but you don't get any reliability. The more drives you have the more likely one is going to fail.
Raid level 1: Mirroring
for each logical block of data we will keep 2 physical copies
Capacity: given N drives and D bytes per drive, our capcity (N*D)/2 bytes
Mirroring Reliability
Given our fault model, where a whole drive fails. How many failures can we tolerate?
We can tolerate 2 failures, if it is the correct drives that fail, but to be safe we should replace drives when 1 fails.
Raid 1: performance
A classic unix file is an array of bytes stored on a disk.
You can read/write create and delete files
attributes that the os stores regarding files
They are specific type of file.
Directories are array of records, each record is the human readable name => mapped to a low level name(which is just a number) That number is called an inode number.
Files and Dirs are arranged in a tree structure, with a root. In Unix the root is named "/" (forward slash)
Files are managed using system calls. the commone pattern, is open, (read|write), (read|write), close
For each process, the OS mantains a "descriptor table", which is an array with a bunch of entries. The file descriptor returned by open is an index into this table Each entry in the table is apointer to a struct.
The struct that is being pointed to is apart of a system wide table that is a list of all files opened by all processess
the struct keeps track of
we use the system call unlink
int unlinkat(int dirfd, const char *pathname, int flags);
Why is it called unlink?
what happens when we rename a file using the mv command, what is being changed?
stat command gives us some information being stored about a file
when we just create a file it has 0 blocks and size 0
After we just echo one byte to the file we have allocated 8 blocks 32k?
a DIRECTORY is a table where col-1 is the human readable name of a file and col 2 is the inode number of the file, the indoe is a struct that holds information about the file.
Each row of the table is a different file.
There are several syscalls that interface directories.
You can't write to a directory. Directories are only written automatically when files are created. ls -i shows the directory table as specified above.
In a single directory we have a low level name that maps to an inode number
But to access files we use PATHNAMES. There are different kinds of pathnames
Absolute pathnames start at the root of the directory tree.
Opening a file requires a path traversal down the directory tree
relative path names start the traversal at the current working directory "." or from the parent directory "../"
notice every directory listing has a "./" and "../" listed in them
~ is interpreted by the shell to
There are also hardlinks ie: "ln file1 file2" will set file1 to map to the same inode number as file 2.
Now when we can see what happens when we remove a file using rm.
When we remove a file we are just removing the entry from a directory
We use the unlink syscall because we are unlinking the human readable lowlevel name from the inode number
Once an inode has no more human readable names linked to it, all the blocks are released and it is removed from the file system
Our abstraction into the file sytems is Files and Directories.
A file just an array of bytes and operations we can do on them, create, open ,read ,write, unlink, close
A directory is a way of organizing files and other direcotories, it is a list of mappings from human readable filename to inode number.
A directory can contain entries for regular files or other directories. That forms a hierarchy of directories called the file sytem tree. The tree starts at root "/".
In order to understand the file systme implementations we need understand "on disk data structures"
We need to understand access methods to interact with those data structures.
The very simple file system
The file system has an api. That api takes calls like read,write etc at the top and shoots out instructions to read and write the actual blocks on the disk.
How are those disk blocks structured ? Each block is made up of x number of 512b sectors
The disk is just a bunch of blocks numbered from 0 to max size. the file system has to map all the files and directories on those blocks
What info needs to be on disk?
In our "very simple file system" we separate some contigous part of the disk from block n to xn to be used to store data. Data in this case is files and directories
Block size must be a multiple of sector size.
A block is minimial unit of disk allocation for a file or directory
What are the trade offs when we have larger vs smaller blocks ?
Large blocks
small blocks
So we carve off most of the disk for blocks that will be storing data.
we carve off another small portion of the disk to store information about each file. This area is divided into an array of structures that contain information for each file. These structure are called inodes. AN INODE NUMBER, which is it's lowlevel name, IS AN INDEX INTO THE ARRAY. It is called the inode table. It is a fixed size array of possible inodes
The inode structure stores
Given an inode number you locate it on disk, by knowing the the base adddress of the inode array and the size of each inode.
We also need to track which blocks are free and which have been allocated for both the inode table area and the data area
We carve off block 1 and block 2 of the disk to be used for an Inode bitmap and Data bitmap respectively. Each bit specifies wether an inode is free or in use in the inode table, or whether a block is free or in use in the data section of the drive.
When we first access the file system,
Block is 0 is called the SUPER BLOCK, it contains all the system wide file system information