Contents
- The Minix 3 File System
- Superblock
- Inode and Zone Bitmaps
- Inodes
- Zones (data blocks)
- Directory Entries
- Files
- Allocating and Growing
The Minix 3 File System
The original Minix 1 file system was an educational file system used to demonstrate an indexed file system. However, Minix 3 is a simple, but useful file system that can be used in production. Just as a node, the term zone in Minix 3 is meant to mean the block that it is referring to.
The Minix 3 file system has the following structure.
Superblock
The superblock describes the entire file system and is placed 1024 bytes into the block device. The first 1024 bytes are reserved for booting, but we won’t use any of those bytes. The Minix 3 file system reserves exactly 1024 bytes regardless of block size. So, instead, we will read the first 32 bytes at offset 1024. The superblock is here regardless of the block size. This will give us the following superblock structure.
struct SuperBlock { u32 num_inodes; u16 pad0; u16 imap_blocks; u16 zmap_blocks; u16 first_data_zone; u16 log_zone_size; u16 pad1; u32 max_size; u32 num_zones; u16 magic; u16 pad2; u16 block_size; u8 disk_version; };
The superblock fields are explained below.
Name | Size (bytes) | Description |
---|---|---|
num_inodes | 4 | The number of inodes in the file system. This is both allocated and unallocated. Used to bounds check the inode array and inode bitmap. |
imap_blocks | 2 | The number of blocksize-byte blocks that the inode map takes. |
zmap_blocks | 2 | The number of blocksize-byte blocks that the zone map takes. |
first_data_zone | 2 | The zone number of the first data zone. This is generally not used. |
log_zone_size | 2 | 1024 << log_zone_size (left shift) size tells us the block size. This is usually the same as the block_size field. For 1024-byte blocks, this will be 0. For 2048-byte blocks, this will be 1, and so forth. |
max_size | 4 | The maximum file size. This is generally not used because we can calculate the maximum file size based on the zone pointers. |
num_zones | 4 | The number of blocksize-byte zones. This is the number of blocks after the boot block, super block, bitmaps, and inode table. |
magic | 2 | This identifies the file system. This must be the value 0x4d5a (characters ZM). If it is not, we cannot guarantee this is a Minix 3 file system, and an error should occur. |
block_size | 2 | This field is invalid for Minix 3 file systems. Do not read from it. |
disk_version | 1 | Unused, but usually 0. |
There are three padding padding shorts, labeled pad0, pad1, and pad2. These should always be the value 0, but for our filesystem, we don’t do anything with these values.
Inode and Zone Bitmaps
Bitmaps are maps that contain either a 1 or a 0 (hence bit). We have one bit for all inodes in our file system in the inode bitmap, and we have one bit for all zones in our file system in the zone bitmap. The bitmap makes it so that we can quickly and easily tell if an inode or zone is taken or free. So, when we read an inode, we first consult the bitmap to see if the inode is valid. Each byte is exactly 8 bits, so each byte in the bitmap can refer to 8 inodes or zones.
We can select which byte we need to look at by using the following formula.
\(\text{byte}=\text{inum}\div8\)We can then locate which bit corresponds to the inode in the byte given above by using the following formula.
\(\text{bit}=\text{inum}\bmod{8}\)For both of these formulas, inum is the inode number. Recall that the root (/) is inode number 1, and inode number 0 is reserved to mean an invalid inode.
When we locate the inode or zone, we can test the bit. If this bit is 1, that means that the inode or zone is occupied and is currently valid. If the bit is 0, then we can create a new inode or zone at this location. Again, recall that a zone simply means block in Minix 3.
When creating the Minix 3 file system, the size of the drive might not be an exact multiple of the block size. This means that the bitmap will contain bits that do not map to any block or inode. In this case, these are automatically set to 1, meaning occupied. So, when locating an inode or zone, check the bitmap FIRST. If the value of that bit is 0, that means the inode or zone does NOT exist, and hence you will not be able to locate that zone or inode. Then, check the size of the drive SECOND. After you calculate the offset of the zone, check to see if it fits within the size of the drive. If it does not, then the bitmap was set to 1 so that the given zone is not allocated.
Index Nodes (inodes)
The Minix 3 file system is an index-allocated file system, meaning that it contains pointers to blocks that belong to a file. Each file (including directories and links) are identified by an index node or inode. The inode contains the following information. For many of the examples, I will assume a 1024-byte block size. However, you should always verify against 1024 << log_zone_size or blocksize in the superblock.
struct Inode { u16 mode; u16 nlinks; u16 uid; u16 gid; u32 size; u32 atime; u32 mtime; u32 ctime; u32 zones[10]; };
The inode contains a mode, which tells you two things: (1) the type of file [directory, regular file, link, socket, etc], and (2) the permissions. The mode is usually read in octal. We can use the following macros to determine the type of file.
#define S_IFMT 00170000 #define S_IFSOCK 0140000 #define S_IFLNK 0120000 #define S_IFREG 0100000 #define S_IFBLK 0060000 #define S_IFDIR 0040000 #define S_IFCHR 0020000 #define S_IFIFO 0010000 #define S_ISUID 0004000 #define S_ISGID 0002000 #define S_ISVTX 0001000 #define S_ISLNK(m) (((m) & S_IFMT) == S_IFLNK) #define S_ISREG(m) (((m) & S_IFMT) == S_IFREG) #define S_ISDIR(m) (((m) & S_IFMT) == S_IFDIR) #define S_ISCHR(m) (((m) & S_IFMT) == S_IFCHR) #define S_ISBLK(m) (((m) & S_IFMT) == S_IFBLK) #define S_ISFIFO(m) (((m) & S_IFMT) == S_IFIFO) #define S_ISSOCK(m) (((m) & S_IFMT) == S_IFSOCK) #define S_IRWXU 00700 #define S_IRUSR 00400 #define S_IWUSR 00200 #define S_IXUSR 00100 #define S_IRWXG 00070 #define S_IRGRP 00040 #define S_IWGRP 00020 #define S_IXGRP 00010 #define S_IRWXO 00007 #define S_IROTH 00004 #define S_IWOTH 00002 #define S_IXOTH 00001
The mode itself is 16 bits, and the lower 9 bits contains three bits per field for the user, group, and others, respectively. The type of the file is in the upper four bits from bit 12 through bit 15. We can mask using S_IFMT to get the type of file (the format). The two types of file we will cover here are S_ISDIR and S_ISREG for a directory and regular file, respectively.
The nlinks field describes the number of hard links. This is useful when determining if a file can be deleted or not. Notice that in UNIX, there is no delete. Instead, there is a system call named unlink, which will decrement the number of links by 1. When the number of links reaches 0, it is considered deleted.
The uid and gid fields refer to the user id and group id, respectively. This is useful when determining the permissions given by the mode field.
The size field gives us the number of bytes this file will contain. Recall that in the file system, we are dealing with 1024-byte chunks, but the size tells us the exact number of bytes. For example, if the size is 2049, then we will have three blocks. The first two will be completely full, and the third block, still 1024 bytes, will only have 1 byte of actual data.
The atime, ctime, and mtime fields refer to accessed time, creation time, and modified time, respectively. The accessed time is updated to the current time whenever a file is read from or written to. The creation time is given the current time when the file is created. The modification time is updated to the current time whenever a file is written to.
The zones array contains 10 elements. The first 7 elements are called direct zone pointers. The 8th element is a singly-indirect zone pointer. The 9th element is a doubly-indirect zone pointer. The 10th element is a triply-indirect zone pointer. The interesting thing about these zone pointers is it already takes into account the boot block and super block. So, we take the zone pointer and multiply it by the block size to get the offset into the drive where the zone (aka block) is located on the disk.
The first inode in the Minix 3 file system is labeled 1, and the inode number 0 is reserved to mean invalid inode. To locate the logical block address of an inode, we use the following formula.
\(\text{offset}=1024 + \text{blocksize} +((\text{inum}-1)\times64)+(\text{iblocks}\times~\text{blocksize})+(\text{zblocks}\times~\text{blocksize})\)We first skip 1024 + blocksize, which skips the boot block and the super block. The inum is the inode number starting at 1. The iblocks is the number of inode bitmap blocks, which is reported by the superblock. The zblocks is the number of zone bitmap blocks, which is also reported by the superblock. I used 1024 bytes as an example, but you should use the blocksize field in the superblock. When we reach the inode table, each inode is listed one after the other. Each inode is 64 bytes, which is why we multiply the index (num – 1) by the size of an inode, or 64 bytes.
If we now read 64 bytes, the size of the inode structure, we now have the inode for the inode given by inum.
Zones
The zones are blocks placed after all of the other data in the Minix 3 file system. The zone pointers in an inode already takes into account the boot block, super block, bitmaps, and inodes, so we can simply multiply the zone by the block size to find the offset we need to read from or write to.
The zones point to the start of a block. The superblock’s block size will tell us how big each block is. What is located in these zones depends on the file type.
If the mode is S_IFDIR (directory), then the blocks are filled with directory entries. See below for more information about directory entries. Each directory entry is 64 bytes, so each block contains at most \(1024\div64=16\) directory entries for a blocksize of 1024 bytes. However, just like a regular file, we have direct pointers and three indirect pointers.
If the mode is S_IFREG (regular file), then the blocks are filled with the data the file contains. Take note that the zone pointers do not necessarily point to blocks in ascending order. Furthermore, not all zones may be used. If the zone is equal to 0, that means the zone is invalid and must be skipped. A zone that points to 0 is not an error, but it just means that the zone pointer isn’t being used.
Direct pointers are multiplied by 1024 (after checking they are not 0) to get to the block that belongs to the file. The zone pointers already take into account the boot block, superblock, bitmaps, and inodes. This is the simplest type of pointer, however it only means we can point to \(1024\times7=7168\) bytes of data if every block is 1024 bytes. To point to larger files, Minix 3 has three types of indirect pointers.
The singly-indirect pointer is index 7. Instead of pointing to a block that contains the data for the file, it points to a block that contains other pointers. Since each pointer is exactly 4 bytes, there are \(1024\div4=256\) pointers for the singly-indirect pointer. Just like the direct pointer zones, some of these pointers can point to 0, which means that the particular zone pointer is unused. The singly-indirect pointer can point to \(1024\times256=262144\) bytes.
The doubly-indirect pointer is index 8. Just like a singly-indirect pointer, it points to a block that contains 256 pointers. However, each of those 256 pointers points to yet another block that contains yet another 256 pointers. Then each of those 256 pointers points to a block where the data is located. This means that with the doubly indirect pointer alone, we can refer to \(1024\times256\times256=67108864\) bytes or about 67 MB.
Finally, the triply-indirect pointer is index 9. Just like singly and doubly indirect pointers, it points to a block that contains 256 pointers. Each of those 256 pointers points to another block that contains 256 pointers. Then those 256 pointers point to yet another 256 pointers. Then, each of those 256 pointers points to a block where the data is located. This means that the triply-indirect pointer can point to \(1024\times256\times256\times256=17179869184\) bytes or about 17 GB.
All of these pointers together means our maximum size is 4 GB based on the limitation of the inode’s size field.
The following graphic shows all three of the pointer schemes in Minix 3.
Directory Entries
If the format part of the mode is S_IFDIR, then the data pointed to by the zone pointers is a block of directory entries. Each directory entry has the following format.
struct DirEntry { u32 inode; char name[60]; };
A directory entry is what actually gives the name to each file. Notice that there is no name field in the inode. Instead, we get the name from the 60-byte field in the DirEntry structure. Since DirEntry structures are only pointed to by S_IFDIR modes, a DirEntry has to tell us which inode it is naming by using the inode field. Recall that the inode field can be the value 0, which means that this DirEntry is unused and should be skipped.
The . (current directory) and .. (parent directory) will be the first two directory entries in the block. The root inode will point back to itself for both the current and parent directory. Recall that the root inode is number 1.
Just like a file, the directory entries are in the order of the zone pointers. Also, a zone pointer can point to 0, which means it is invalid and must be skipped. For directory files, we can ignore the size field. Each directory entry is 64 bytes, so the number of directories per block is \(1024\div64=16\). Since directories can also use indirection, we can have up to \(17247252480\div64=269488320\) (about 269 million) total number of directory entries for a block size of 1024 bytes. Since we aren’t using the size field, the 4GB limitation does not apply. Although, reading every block and skipping where the inode is 0 is a very slow process. So, using the size field and accepting the 4GB limitation might be a good idea.
A file name in Minix 3 can have a maximum of 60 bytes. If a filename is shorter than 60 bytes, then the rest of the bytes are padded with 0s. For example, if we have a filename of ‘this_is_a_minix_file’, which is 20 bytes, would have the following elements in the name array:
this_is_a_minix_3_file\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
Files
A file’s data is contained in the zones pointed to by the zone pointers. The data in the zone pointers is in order, meaning that zone[1] contains the data that comes before the data in zone[2]. However, a zone pointer can point to the value 0, which means it needs to be skipped. For example, if zone[1] is 6114, zone[2] is 0, and zone[3] is 12, then the first 1024 bytes of the file is in zone 6114 and the next (second) 1024 bytes of the file is in zone 12, for a 1024-byte block size.
We completely ignore any zone pointer that points to 0 as if it isn’t there. Therefore, we cannot use the index of the zone to keep up with which bytes we are relating to the file.
Allocating and Growing
When creating a file, it must have a new inode associated with it. First, you should check the inode bitmap (imap) and find where an inode is 0 (not allocated). This will then give you an inode where you can store the size, mode, and times as well as increase the number of links to the inode. Then, you have to find the containing directory. In the containing directory, you will need to add a directory entry with the name and make it point to the newly allocated inode.
You will then need to allocate blocks for your new inode. Given the size, you will need enough blocks to cover the full size of the file.
If the new file is a directory, you will need at least one block to store directory entries. You will also need to add at least two entries: . and .. which stand for current directory and parent directory, respectively. You will need to set the current directory’s inode to itself, and then set the parent directory entry’s inode to the parent directory. The only time the parent directory points to itself is at the root.
When growing a file, you need to see if the new size will exceed the block allocation already given to the file. If the new size does NOT exceed the block allocation, then you just need to write the data to the end of the last block and update the size. However, if it DOES exceed the block allocation, you will need to look in the zone map (zmap) and find a new block. When you find this block, you will need to store the allocation somewhere in the zones array of the inode. Recall that the order in which the zones are presented in this array will be the order in which the data in the file is arranged.