RISC-V in Your Browser with Verilog#

RISC-V is the fifth generation of an open CPU architecture with its roots in academia. Now that it is headed for ISO/IEC standardization, it’s attracting more interest.

One nice thing about the RISC-V instruction set is its modularity. There are 32-bit and 64-bit instruction sets, each with a full and compact version. The base instruction sets are very basic; even multiplication and division are provided in an optional extension.

We can easily run the FemtoRV32 Verilog implementation in 8bitworkshop. This gives us a basic RV32I (32-bit integer) instruction set that we can use in projects.

RISC stands for “Reduced Instruction Set Computer”, and the philosophy is to have a lot of registers but just a few instructions. For example, here’s a snippet from the CPU module which decodes all 10 instruction types:

wire isLoad    =  (instr[6:2] == 5'b00000); // rd <- mem[rs1+Iimm]
wire isALUimm  =  (instr[6:2] == 5'b00100); // rd <- rs1 OP Iimm
wire isAUIPC   =  (instr[6:2] == 5'b00101); // rd <- PC + Uimm
wire isStore   =  (instr[6:2] == 5'b01000); // mem[rs1+Simm] <- rs2
wire isALUreg  =  (instr[6:2] == 5'b01100); // rd <- rs1 OP rs2
wire isLUI     =  (instr[6:2] == 5'b01101); // rd <- Uimm
wire isBranch  =  (instr[6:2] == 5'b11000); // if(rs1 OP rs2) PC<-PC+Bimm
wire isJALR    =  (instr[6:2] == 5'b11001); // rd <- PC+4; PC<-rs1+Iimm
wire isJAL     =  (instr[6:2] == 5'b11011); // rd <- PC+4; PC<-PC+Jimm
wire isSYSTEM  =  (instr[6:2] == 5'b11100); // rd <- cycles

Each instruction can encode one of eight different ALU (Arithmetic Logic Unit) functions:

// The ALU function, decoded in 1-hot form (doing so reduces LUT count)
wire [7:0] funct3Is = 8'b00000001 << instr[14:12];

wire [31:0] aluOut =
    (funct3Is[0]  ? instr[30] & instr[5] ? aluMinus[31:0] : aluPlus : 32'b0) |
    (funct3Is[1]  ? leftshift                                       : 32'b0) |
    (funct3Is[2]  ? {31'b0, LT}                                     : 32'b0) |
    (funct3Is[3]  ? {31'b0, LTU}                                    : 32'b0) |
    (funct3Is[4]  ? aluIn1 ^ aluIn2                                 : 32'b0) |
    (funct3Is[5]  ? shifter                                         : 32'b0) |
    (funct3Is[6]  ? aluIn1 | aluIn2                                 : 32'b0) |
    (funct3Is[7]  ? aluIn1 & aluIn2                                 : 32'b0) ;

Including the CPU module in a Verilog project is relatively simple. It’s not too dissimilar from the CPU8 and CPU16 examples already provided in 8bitworkshop. You just instantiate the module with a couple of parameters, and hook up the address and data buses:

FemtoRV32 #(
.RESET_ADDR(32'h00010000),  // Start execution from ROM area
.ADDR_WIDTH(24)             // 24-bit address space
) cpu (
.clk(clk),                  // Clock
.reset(reset),              // CPU reset (active high)
.mem_addr(mem_addr),        // Read/write address
.mem_wdata(mem_wdata),      // Write data
.mem_wmask(mem_wmask),      // Write mask
.mem_rdata(mem_rdata),      // Read data
.mem_rstrb(mem_rstrb),      // Read strobe
.mem_rbusy(mem_rbusy),      // Read busy
.mem_wbusy(mem_wbusy)       // Write busy
);

We can make a simple memory map by selecting certain bits of mem_addr:

wire ram_sel = (mem_addr[15] == 1'b0);       // 0x0000-0xFFFF: RAM (64KB)
wire rom_sel = (mem_addr[16:13] == 4'b1000); // 0x10000-0x10FFF: ROM (4KB)
wire io_sel = (mem_addr[16:13] == 4'b1100);  // 0x18000-0x18FFF: I/O

The rest of the project implements a simple frame buffer, using a simple RISC-V assembler defined in 8bitworkshop’s JSON file format. Here’s the RISC-V code, which just repeatedly fills the frame buffer with a pattern:

start:
    lui x2, 0x0           ; x2 = 0x0 (framebuffer start)
    addi x1, x0, 0        ; x1 = 0 (counter)
    lui x4, 0x20          ; x4 = 0x20000 (end addr, 0x20 << 12)
    lui x5, 0x18          ; x5 = I/O address

loop:
    add x3, x1, x0       ; x3 = counter value as pattern
    sw x3, 0(x2)         ; Store pattern to framebuffer
    addi x2, x2, 4       ; Increment address by 4 bytes
    addi x1, x1, 1       ; Increment counter
    blt x2, x4, loop     ; Loop if address < end

    ; Infinite loop to restart
    lui x2, 0x2          ; Reset to start
    addi x1, x1, 1       ; Increment counter
    jal x0, loop         ; Jump back to loop

You can try it online in the 8bitworkshop dev branch.