Contents

  1. Introduction
  2. The RISC-V System
  3. Booting to SBI
  4. Setting Up Harts in SBI
  5. Linker Script
  6. Core Local Interrupts
  7. Platform Level Interrupts

Introduction

This course is an operating system’s course, so naturally, we have to pick an architecture to support the low-level operations of an OS. Will be using RISC-V to write our operating system, and more specifically, QEMU’s virtual RISC-V. I have used real hardware before, but due to costs and aging of equipment, the virt system is plenty powerful enough for what we need here.

We need to know some specifics about the virt system that are layered on top of the RISC-V system as a whole. This page will discuss the virt system, including busses and fixed memory locations for MMIO as well as the systems that RISC-V uses and specifies.

The specification has several weird acronyms, such as WARL or WPRI. These have specific meanings which are explained in the beginning of the specification.

Reserved Writes Preserve Values, Reads Ignore Values (WPRI)

Some whole read/write fields are reserved for future use. Software should ignore the values read from these fields and should preserve the values held in these fields when writing values to other fields of the same register. For forward compatibility, implementations that do not furnish these fields must make them read-only zero. These fields are labeled WPRI in the register descriptions.

Some read/write CSR fields specify behavior for only a subset of possible bit encodings, with other bit encodings reserved. Software should not write anything other than legal values to such a field and should not assume a read will return a legal value unless the last write was of a legal value, or the register has not been written since another operation (e.g., reset) set the register to a legal value. These fields are labeled WLRL in the register descriptions.

Hardware implementations need only implement enough state bits to differentiate between the supported values but must always return the complete specified bit-encoding of any supported value when read.

Implementations are permitted but not required to raise an illegal instruction exception if an instruction attempts to write a non-supported value to a WLRL field. Implementations can return arbitrary bit patterns on the read of a WLRL field when the last write was of an illegal value, but the value returned should deterministically depend on the illegal written value and the value of the field prior to the write.

Some read/write CSR fields are only defined for a subset of bit encodings but allow any value to be written while guaranteeing to return a legal value whenever read. Assuming that writing the CSR has no other side effects, the range of supported values can be determined by attempting to write a desired setting then reading to see if the value was retained. These fields are labeled WARL in the register descriptions.

Implementations will not raise an exception on writes of unsupported values to a WARL field. Implementations can return any legal value on the read of a WARL field when the last write was of an illegal value, but the legal value returned should deterministically depend on the illegal written value and the architectural state of the hart.


The RISC-V System

The RISC-V system is split into two pieces: (1) the privileged specification and the (2) unprivileged specification. The privileged specification is what we will use most in this course, as the operating system and SBI run in the supervisor privileged mode of the CPU. However, some things might be relevant in the unprivileged specification.

The RISC-V system is an ISA that consists of one or more hardware threads or HARTs for short. These can be thought of as the same thing as cores in multiple-core systems. RISC-V abstracts the implementation, so a HART is an “execution unit.” Each hardware thread is a CPU core, so it consists of its own integer and floating-point register files, MMU, memory protection, and control and status register file. However, RISC-V differentiates a core from a thread based on the ability to fetch an instruction. Processing units with an instruction fetch functional unit are considered “cores”.

The RISC-V system has three protection levels: (level 3) machine mode, (level 1) supervisor mode, and (level 0) user mode. Level 2 is for hypervisors, which we will not be covering in this course. We will be placing our SBI in machine mode, our operating system in supervisor mode, and all applications in user mode.


Booting to SBI

The system boots by jumping all HARTS to the physical memory address 0x8000_0000. All harts will be in machine mode at this time. The harts all run asynchronously, so it is important to maintain a level of control through semaphores or other locking methods.

In RISC-V, all harts have a unique identifier stored in the control and status register mhartid (machine hart identifier). The RISC-V specification requires that hart id 0 be available, but it makes no requirement of the hart ids to be sequential–but the virtual machine we will use does make them sequential. So, if you have 8 harts, you will have harts 0-7.

When all HARTs jump to 0x8000_0000, nothing is set up or ready to go, including the stack and global pointer. All HARTs will be in machine mode, so there are no physical memory protection (PMP) checks, and the MMU is turned off. In machine mode, PMP and the MMU are always disabled.

You will have to handle booting starting in assembly, but quickly, you can move to a higher-level language, such as C++. Before moving into C++, you will need to set the stack pointer (sp) to the bottom of the stack. We will use the values inside of the linker script, however the stack in the linker script is for ALL harts, so you will have to do a little bit of arithmetic to determine which stack each hart gets.

.section .text.init
.global _start
_start:
        csrr  a0, mhartid
.option push
.option norelax
        la    sp, _stack_end
        la    gp, __global_pointer$
.option pop
        slli  t0, a0, 12    # mhartid * 4096
        sub   sp, sp, t0
        tail  main

The code above is from the SBI boot loader. In this case, we load the hart id into register a0 so that main can know which HART it is running. Since we’re in machine mode, we can use CSR_READ, but in this case, it’s passed as an argument.

The global pointer is a register that stores the address where the global variables and constants start. To load a global variable or constant, it can be done through the global pointer, allowing for a much lower address range. We have to turn off linker relaxation before we load the address, otherwise the global pointer won’t be properly set. The symbol __global_pointer$ comes from the linker script. NOTE: This seems to have been fixed in a later version of RISCV64-GCC, but the Tesla/Hydra machines still use an older version.

The stack pointer is set to _stack_end, which also comes from the linker script. However, since we have multiple HARTs, we set the stack based on the HART id. Each hart gets 4,096 bytes of stack space, which is a number of art. It seems like 4,096 bytes of stack space should be enough for a small SBI. We multiply the HART ID by the stack space, and then subtract that from the bottom of the stack. So, the formula is \(\text{sp}=\text{_stack_end}-4096\times\text{mhartid}\).


Setting Up Harts in SBI

The SBI will park the non-zero HARTs, meaning they will be in a wait loop. The wait loop will turn off the hart to conserve power. The only way a parked hart can be started is through an interrupt or exception. To be controlled, we must enable interrupts BEFORE we park it, otherwise it will not be controllable unless somehow a non-maskable exception occurs–which is unlikely since it is waiting in the off state.

We bifurcate the responsibility here between hart 0 and non-0 harts. The 0 hart will be the bootstrap processor, which will configure the SBI. The non-0 harts will just be set up and parked. The zero hart is responsible for configuring everything else before the HARTs get parked. The code below shows the zero-hart’s main.

In main, I used the attribute naked and noreturn. This tells C not to set up a prologue or epilogue for the function, since it won’t return ever (since we do an mret in it).

#include <csr.h> // Several defines come from here. Shown in SBI lecture.

// We will use this structure as our trap frame
// to save registers from the OS when it traps
// to the SBI
struct {
    int64_t gpregs[32];
} SBI_REGS[8];

// Forward declaration from sbi/asm/trap.S
void sbi_trap_vector();

// main goes here.

Now that we have all the set up ready, we can go ahead and write main.

ATTR_NAKED_NORET
void main(int hartid) {
    if (hartid != 0) {
        // We will change this when we start managing multiple HARTs
        while (1) { WFI(); }
    }
    // If we get here, we are hart #0, the bootstrapper

    // We will write these functions in the SBI
    clear_bss_section();
    uart_init();
    plic_init();
    pmp_init();
    
    // Now that the HART is in a state we recognize, we can jump to the OS
    CSR_WRITE("mscratch", SBI_REGS + hartid);

    // The OS doesn't have access to mhartid, so we're going to pass it through
    // sscratch.
    CSR_WRITE("sscratch", hartid);

    // Now we have to set up interrupts and get ready to jump into supervisor mode (MPP=1)
    // mi = machine interrupts (asynchronous)
    // me = machine exceptions (synchronous)
    CSR_WRITE("mideleg", SIP_SEIP | SIP_STIP | SIP_SSIP);
    CSR_WRITE("medeleg", MEDELEG_ALL);
    CSR_WRITE("mie", MIE_MEIE | MIE_MTIE | MIE_MSIE);

    CSR_WRITE("mtvec", sbi_trap_vector);  // in sbi/asm/sbitrap.S
    CSR_WRITE("mepc", OS_LOAD_ADDRESS);   // 0x8005_0000

    // Set bits 14:13 to 1 (floating point system = initial)
    // Set bits 12:11 to 1 (supervisor mode)
    // Set bit 7 to 1 (interrupts enabled)
    CSR_WRITE("mstatus", MSTATUS_FS_INITIAL | MSTATUS_MPP_SUPERVISOR | MSTATUS_MPIE);

    // MRET instruction is "machine return"
    MRET();
}

The non-zero harts will have their PMP access open and then wait for the zero hart to finish. We set each hart start data to the stopped state. We set the interrupts enabled through the mie and mstatus.

Much of the code above is just like the non-zero harts. The first thing that must be done is to clear the BSS section. The compiler expects all values to initialize to 0 in this section. However, the syscall_hsd (hart start data) is in the BSS section, so the non-zero harts have to wait for the BSS section to be cleared before it can set the HSD. This is where the primitive locking variable, hart_0_is_good, comes into play. Since only hart 0 writes to it, and it starts in the locked state, we don’t need any fancy locks here–just a single variable.

Since hart 0 will jump to the operating system (at 0x8005_0000), it must delegate the proper supervisor interrupts to supervisor mode. This is what the mideleg register does. You need to make sure that only the proper interrupts are delegated, otherwise the SBI will never be able to handle certain calls.

The non-zero harts will be parked, and they will come out of the wait loop whenever they receive an interrupt, which will come from an inter-processor interrupt (IPI) from the MSIP bit which is connected to the CLINT. This is why it is important that interrupts are enabled AND the trap vector is set properly.


Linker Script

When we compile, we need the value of certain symbols are located, and how much memory we have. We also need to start our OS kernel at 0x8005_0000 (SBI at 0x8000_0000). This is all done through a linker script. This is passed to the linker using the -T switch.

The following is the linker script for the OS. Notice that the origin point is address 0x80050000.

OUTPUT_ARCH( "riscv" )
ENTRY(_start)
MEMORY
{
  ram  (wxari) : ORIGIN = 0x80050000, LENGTH = 220M
}

PHDRS
{
  text PT_LOAD;
  data PT_LOAD;
  rodata PT_LOAD;
  bss PT_NULL;
}

SECTIONS
{
  PROVIDE(_memory_start = ORIGIN(ram));
  PROVIDE(_memory_end = _memory_start + LENGTH(ram));

  .text : {
    PROVIDE(_text_start = .);
    *(.text.init) *(.text .text.*)
    PROVIDE(_text_end = .);
  } >ram AT>ram :text

  . = ALIGN(8);
  PROVIDE(__global_pointer$ = .);

  .bss : ALIGN(4096) {
    PROVIDE(_bss_start = .);
    *(.sbss .sbss.*) *(.bss .bss.*)
    PROVIDE(_bss_end = .);
  } >ram AT>ram :bss
  
  .rodata : ALIGN(4096) {
    PROVIDE(_rodata_start = .);
    *(.rodata .rodata.*)
    PROVIDE(_rodata_end = .);
  } >ram AT>ram :rodata

  .data : ALIGN(4096) {
    . = ALIGN(4096);
    PROVIDE(_data_start = .);
    *(.sdata .sdata.*) *(.data .data.*)
    PROVIDE(_data_end = .);
  } >ram AT>ram :data

  .eh_hdr : {
    *(.eh*)
  } >ram AT>ram :data

  /* We need to make sure that the stack and heap are aligned by
   a page size, which for Risc-V (and most architectures) is 4096.
  */
  . = ALIGN(4096);

  PROVIDE(_stack_start = .);
  PROVIDE(_stack_end = _stack_start + 8K * 8);
  PROVIDE(_heap_start = _stack_end);
  PROVIDE(_heap_end = _memory_end);
}

We use the linker script to set relative offsets, but also to make sure that we put the instructions in the correct place. We create a new section, called .text.init, so that the linker script knows to put that section first, which will be placed at 0x8005_0000.


Core Local Interrupts (CLINT)

The core local interrupts are per-hart (per-core) interrupts. We will be using the CLINT through the SBI and not directly through our operating system.

Core Local Interrupt Controller (CLINT) Memory Map

The diagram above shows 5 HARTs, but the same memory map can be expanded for more harts or contracted for fewer harts.

There are three functions of the CLINT: (1) to generate software interrupts to other HARTs, (2) to read the current time, which can also be done through the rdtime pseudo-instruction (which reads the rdtime CSR), and (3) to set the timer compare register.

The CLINT’s interrupts on the virt machine are level sensitive for both the MSIP and MTIMECMP registers to generate interrupts. This means that as long as MSIP is set to 1, it will continue to interrupt through the MSIP pin until we clear it to 0. So, part of the interrupt handling of the MSIP interrupt is to clear the MSIP for the given HART on the CLINT.

The mtimecmp register holds a time in the future. If the mtime register is greater than or equal to the mtimecmp register, it will signal a MTIP (machine timer interrupt pending). We will be using this to schedule a context switch.

The virt implementation of mtimecmp is a signed long. Therefore, to put the mtimecmp sufficiently in the future so that it will stop interrupting, we have to put the value 0x7FFF_FFFF_FFFF_FFFF instead of -1UL.


Platform Level Interrupt Controller (PLIC)

The PLIC is an interrupt controller that is programmable. Every interrupt that makes it through the PLIC will come to a HART as an external interrupt, which is mcause = 8 (user), 9 (supervisor), or 11 (machine). The PLIC we will be programming supports only supervisor and machine mode external interrupts.

The PLIC registers are separated into 6 sections: (1) global priority [per interrupt], (2) interrupt pending, (3) interrupt enables, (4) priority thresholds, (5) claims, and (6) completion.

The virt machine connects the following interrupt numbers to the PLIC.

IRQ #DeviceMMIO Physical Address
10UART00x1000_0000
11RTC (realtime clock)0x0010_1000
32PCIE INTA#programmable
33PCIE INTB#programmable
34PCIE INTC#programmable
35PCIE INTD#programmable
Virt machine interrupts

There are also virtio interrupts from 1-8 for memory-mapped virtio, but we will be using PCIe, which use interrupts 32, 33, 34, and 35. The implementation of which device is connected to which interrupt will be discussed later in the PCIe notes.

The PLIC works by routing and prioritizing interrupts through the external pin. The PLIC has an external pin connected to each HART, and each hart can be configured separately through the PLIC.


PLIC Registers

Some PLIC registers are per-HART and some are per-HART AND per-MODE.

Each individual interrupt number can be given a priority between 0 and 7. If we give an individual interrupt the priority 0, it can never be heard since the PLIC will mask anything AT or BELOW the threshold, and the smallest threshold is 0.

If we want to assign interrupt 35 the priority 5, we would first calculate the memory address \(\text{0x0C00_0000} + 4\times35=\text{0x0C00_0000} + 140= \text{0x0C00_008C}\), and then set this memory address to 5.


The PLIC will mask any interrupt AT or BELOW the threshold. The global threshold allows us to move it up and down depending on what important task the OS is doing. Thresholds can be 0 through 7, inclusively. A threshold of 7 will mask ALL interrupts, meaning no interrupt will be “heard” through the external pin. A threshold of 0 means all non-zero priority interrupts are allowed. If we set a specific interrupt’s threshold to 0, it can’t be heard by the PLIC.

Each hart and each mode has a priority threshold register. Each register is 4 bytes, and there are two registers for each hart (one for each mode). The base memory address for the priority threshold register is 0x0C20_0000.

#define PLIC_BASE           0x0c000000UL
#define PLIC_PRIORITY_BASE  0x4
#define PLIC_PENDING_BASE   0x1000
#define PLIC_ENABLE_BASE    0x2000
#define PLIC_ENABLE_STRIDE  0x80
#define PLIC_CONTEXT_BASE   0x200000
#define PLIC_CONTEXT_STRIDE 0x1000

#define PLIC_MODE_MACHINE    0x0
#define PLIC_MODE_SUPERVISOR 0x1

#define PLIC_PRIORITY(interrupt) \
    (PLIC_BASE + PLIC_PRIORITY_BASE * interrupt)

#define PLIC_THRESHOLD(hart, mode) \
    (PLIC_BASE + PLIC_CONTEXT_BASE + PLIC_CONTEXT_STRIDE * (2 * hart + mode))

#define PLIC_CLAIM(hart, mode) \
    (PLIC_THRESHOLD(hart, mode) + 4)

#define PLIC_ENABLE(hart, mode) \
    (PLIC_BASE + PLIC_ENABLE_BASE + PLIC_ENABLE_STRIDE * (2 * hart + mode))

Per-HART Enable Registers

Enaaaaaaaable
Base AddressDescription
0x0C00_2000Hart 0 M-mode enable registers
0x0C00_2080Hart 0 S-mode enable registers
0x0C00_2100Hart 1 M-mode enable registers
0x0C00_2180Hart 1 S-mode enable registers
0x0C00_2200Hart 2 M-mode enable registers
0x0C00_2280Hart 2 S-mode enable registers

Each enable register is 4 bytes, and there is enough space for 0x80 (128 bytes / 4 = 32 registers) per mode, even though the current PLIC only uses 16 of those 32 registers. To get from one hart to the next, we offset by 0x100. To get from one HART mode to the next (HART 0 M to S mode) we offset by 0x80.

We still need to enable specific interrupts. The PLIC has registers connected for each HART and each mode of the HART that allow us to specifically turn on or turn off interrupts to each HART and what mode those will be taken. These are the “enable” registers, which are connected to each HART and each mode.

The configuration of virt is “MS”, meaning that machine mode registers come first, then the supervisor registers. So, each hart has two banks of registers: one for machine mode and one for supervisor mode. The virt machine does not allow the PLIC to trap in user mode. The enable register is the only bitfield register. For example, to enable interrupts on number 8, we would set bit 8 of the enable register. Since we have interrupts above 32, there are multiple enable register up to and including 15. Each enable register is 4 bytes. So, to enable interrupt 35, we would need to go to \(35/32=1\) enable register 1 and enable bit index \(35%32=3\).

Each enable register is 4 bytes, so enable 15 would be at memory address \(\text{0x0C00_2000} + 4\times15=\text{0x0C00_203C}\)


The pending register shows us which interrupts are still left. We don’t use the pending interrupt register directly. Instead, we will use the claim register to receive the next register in priority order and then the complete register to clear the pending bit.


Per-HART Threshold and Claim/Complete Registers

The priority register for HART 0 in M-mode
The claim/complete register for HART 0 in M-mode

There are two priority registers and two claim/complete registers per hart. The first for M-mode and the second for S-mode. Each is 4 bytes displaced past the priority register for the same hart. Each HART mode is displaced 0x1000 from the previous starting at 0x0C20_0000. This means each HART is displaced 0x2000 from the previous.

Base AddressDescription
0x0C20_0000Hart 0 M-mode priority register
0x0C20_0004Hart 0 M-mode claim/complete register
0x0C20_1000Hart 0 S-mode priority register
0x0C20_1004Hart 0 S-mode claim/complete register
0x0C20_2000Hart 1 M-mode priority register
0x0C20_2004Hart 1 M-mode claim/complete register
0x0C20_3000Hart 1 S-mode priority register
0x0C20_3004Hart 1 S-mode claim/complete register
Example PLIC memory map for 2 HARTs. U-mode is not a valid PLIC mode.

You can see from above that each HART takes 0x2000 worth of MMIO address space (first 0x1000 for M mode and second 0x1000 for S mode). However, there are only two registers per hart, per mode. First comes the 4-byte priority register followed by the 4-byte claim/complete register, which I describe below.


Claim/Complete Sequence

When we read all four bytes of the claim register, it will give us an interrupt ID of the next interrupt in priority order. For example, if interrupt 35 was made, when we read the claim register, it will give us the value 35. We can then handle the interrupt, or we can delay work on the interrupt. However, to finish the interrupt and allow for more interrupts, we have to complete the interrupt. When we write 35 back into this same 4-byte memory address, we are telling the PLIC that we just completed interrupt #35. The PLIC will then remove 35 from the list and the next priority interrupt will come next.

Generally, when we receive an interrupt from the PLIC, we want to handle all interrupts until there are no more to handle. We can tell if there are no more because we will get 0 from the claim register. So, essentially, place the claim inside of a while loop. It might be a good idea to have a maximum claims per interrupt so that you don’t starve out processes.


Using the macros above, we can write the functions necessary to control the PLIC. If the following are put into our operating system, we need to be in PLIC_MODE_SUPERVISOR. However, if we are writing these in our SBI, then these would be in PLIC_MODE_MACHINE.

void plic_set_priority(int interrupt_id, char priority)
{
    uint32_t *base = (uint32_t *)PLIC_PRIORITY(interrupt_id);
    *base = priority & 0x7;
}
void plic_set_threshold(int hart, char priority)
{
    uint32_t *base = (uint32_t *)PLIC_THRESHOLD(hart, PLIC_MODE_SUPERVISOR);
    *base = priority & 0x7;
}
void plic_enable(int hart, int interrupt_id)
{
    uint32_t *base = (uint32_t *)PLIC_ENABLE(hart, PLIC_MODE_SUPERVISOR);
    base[interrupt_id / 32] |= 1UL << (interrupt_id % 32);
}
void plic_disable(int hart, int interrupt_id)
{
    uint32_t *base = (uint32_t *)PLIC_ENABLE(hart, PLIC_MODE_SUPERVISOR);
    base[interrupt_id / 32] &= ~(1UL << (interrupt_id % 32));
}
uint32_t plic_claim(int hart)
{
    uint32_t *base = (uint32_t *)PLIC_CLAIM(hart, PLIC_MODE_SUPERVISOR);
    return *base;
}
void plic_complete(int hart, int id)
{
    uint32_t *base = (uint32_t *)PLIC_CLAIM(hart, PLIC_MODE_SUPERVISOR);
    *base = id;
}

Interrupt Routing

  • An interrupt is generated through the device itself. For this section, I will take UART0 as an example. UART0 has an interrupt-enable register. If this register is 0, then the UART0 cannot generate interrupts.
  • When the UART0 generates an interrupt, it first must pass the interrupt-enable register. After that, the signal is routed to the PLIC. In the case of virt, this pin is #10.
  • The PLIC then receives the signal through pin #10 and checks the PLIC_ENABLE. If bit 10 is cleared, the interrupt is discarded.
  • If bit 10 is enabled, the PLIC then checks the priority for pin 10. If the priority is > the threshold, then the PLIC will send the interrupt to each hart in the given mode.
  • Each HART will then receive the interrupt. If the mode is M, the HART checks the MIE bit in the MSTATUS register. If this bit is cleared, the interrupt is discarded. If the mode is S, the HART checks the SIE bit in the MSTATUS (and SSTATUS) register. If this bit is cleared, the interrupt is discarded.
  • If the MIE/SIE bits are set, then the HART checks the MIE (machine interrupt enable) bit 11 (MEIE) if this is an M-mode interrupt. If this bit is cleared, the interrupt is discarded. If this is an S-mode interrupt, then the HART checks the SIE (supervisor interrupt enable) bit 9 (SEIE). If this bit is cleared, the interrupt is not heard.
  • PLIC interrupts are level triggered. It will keep trying to interrupt until we claim and complete it.
  • If an interrupt finally makes it here, then the HART will trap. In M-mode, the MCAUSE register will be an asynchronous trap with the cause 11 (machine external interrupt). In S-mode, the SCAUSE register will be an asynchronous trap with the cause 9 (supervisor external interrupt).

In short, if you are having a hard time “hearing” interrupts:

  1. Make sure the device’s interrupts are enabled.
  2. Make sure the PLIC enabled the correct pin.
  3. Make sure the PLIC threshold allows the priority of the interrupt to pass.
  4. Make sure the mode’s [MS]STATUS register is accepting interrupts.
  5. Make sure the mode’s [MS]IE register is enabling interrupts.