Mira

Written September 15, 2021

Mira is an 8-bit CPU architecture designed to have an efficient implementation in Minecraft completed in September 2021.

CPU

Instruction set

The instruction set architecture (ISA) for the CPU takes inspiration from the 6502 and 8008 microprocessors. The CPU has an accumulator and could be classified as CISC with its variable length instructions and ability to use memory directly as an argument. The feature-rich and orthogonal ISA makes it a joy to write programs for. It features:

ISA

This is the canonical bit-encoding for the ISA. An implementation is free to shift instructions and bits around to decode it more efficiently, as was done for Minecraft. An implementation is considered valid as long as any assembly program can be executed on it.

Flags

On the left, we have the four processor flags. The presence of a letter in a column indicates whether the instruction updates this flag. The flags can not be written to or read from directly. They are modified as a side effect of other instructions, or read when performing conditional branching.

Addressing modes

Shifting our attention to the top-right, we have the addressing modes. All instructions with an arg field can use one of the addressing modes in this list. Whether 000 refers to the A register or an immediate (K) depends on the "instruction group" (explained in following section).

Mira has 5 registers - A, B, C, X, and Y. Register A (accumulator) functions implicitly as an operand and destination in "Group 0" instructions. B, C, X, and Y serve as general-purpose registers. X and Y can also function as index registers during indexed memory addressing.

Mira lacks a program counter since all jumps use direct addressing which eliminates the need to compute offsets. Function call return addresses are stored as constants and pushed to the stack manually

Instruction groups

The first two bits of the opcode decode the four instruction groups.

Group 0: Two-operand arithmetic and logical operations. arg==000 is a NOP instruction.

Group 1: Same as group 0, but the arg "becomes the accumulator", and the operation is performed with an immediate (a constant stored in the next byte of the program). arg==000 is the A register.

Group 2: A move instruction that moves the arg to a dest. dest/arg==0 is the A register. If dest==arg, then arg is an immediate. This also supports moving an immediate to memory. Memory to memory moves are not supported by any implementation, but there's nothing intrinsic that would prevent them.

Group 3: Single argument arithmetic and logical operations. arg==0 is the A register. Shoehorned into this group are the conditional and unconditional jump instructions. For unconditional jumps, arg==0 is an immediate. Conditional jumps use the conditions in the bottom-right branching table. For example, to branch if the carry flag is set, use bcs, for branch if carry clear, use bcc. The same goes for the other flags n,z,v.

Minecraft implementation details

The Minecraft implementation features:

It also incorporates techniques to reduce the number of updates it causes during program execution, including:

ISA rationale

Why no SEC/CLC instructions like the 6502?

The 6502 couldn't inc/dec the accumulator, and it didn't have add/sub without carry instructions. Having these instructions makes SEC/CLC unneeded.

Why don't INC and DEC update carry flag?

Why no dedicated call/subroutine instructions?

Why no fancy hardware, such as stack, barrel-shifter, multiplier, etc.?

Why not use port-mapped I/O?

Why an accumulator architecture as opposed to a 3-op load-store?

Memory-mapped I/O

Memory-mapped devices are a great way to extend the functionality of the CPU. A device just has to define some interface of reading and writing to designated addresses. Here are some examples:

8x16 TTY Screen
[screen]: read char at cursor, write char to cursor
[flags]: auto-increment, clear, character-set, etc.
[cursor]: the index location of where to write to
7-bit value for location of character, rrrcccc

Ticker tape: for printing streams of decimal values
[ticker]: write a value

Keyboard:
[keyboard]: works as a queue of characters
[flags]: blocking/non-blocking read

Multiplier 8x8=16
[multiplicand]: Write multiplicand, read lower byte of product
[multiplier]: Write multiplier, read upper byte of product

Divider 8/8=(8,8) 
[dividend]: Write dividend, read quotient
[divisor]: Write divisor, read remainder

Assembler

I wrote an assembler in Rust.

Assembler directives

.str    "hello"
.strz   "hello" ; null-terminated string
.u8     5       ; define 8 bit integer
.i16    32      ; define a signed 16-bit integer
.u8     32, 26  ; define a list of unsigned 32-bit integers

Addressing modes

jmp $0x01   ; jump to the address stored at memory address 0x01
jmp 0x01    ; jump to the address 0x01
jmp label   ; jump to label or ram address
jmp B       ; jump to the value stored in register B

add $0x05   ; add the value stored at memory address 0x05
add 0x05    ; add the value 0x05

add A, arr[1]   ; add the value at memory[arr+1] to A

Example programs

; Countdown program
; Print A..=0 to the TICKER device
; input a: the starting count

        mov a, 10       ; move the decimal constant 10 to A
loop:   mov TICKER, a   ; write value in A to the ticker device
        dec a           ; decrement the value in A, updating the flags
        bnc loop        ; Branch to label "loop" while A is not negative


; Array sum
; input: x = length of the array minus 1
; result: sum in A
arr:    .i8 1, 0, 2, 5, -2, 4 
        mov x, 5        ; store length of array minus 1 in X
loop:   add arr[x]      ; add the value at memory address [arr+x]
        dec x           ; decrement X register
        bnc loop        ; while x >= 0, goto loop


; Count set bits (popcnt) (5 bytes)
; input: B, C=0
; result: A
loop:   adc c       ; add the carry flag to the accumulator
start:  shr b       ; shift LSB into carry flag
        bne loop    ; loop while b != 0
        adc c       ; dont forget final bit


; 8x8=8 multipy (7 bytes)
; multiplicant in b
; multiplier in c
; output in a
do_add: add b
loop:   shl b
start:  shr c       ; is multiplier odd?
        bcs do_add  ; add the multiplicand
        bne loop    ; else loop while bits remaining in multiplier


; Collatz conjecture (14 bytes)
; starting value in A
loop:   mov TICKER, a   ; print value
        shr a           ; divide by 2
        bzs end         ; if we hit 0 we're done
        bcc loop        ; loop while even
        rol a           ; restore true odd value
        mov b, a        ; compute a = 3a+1
        shl a
        add b
        inc a           ; a = 3a+1 at this point
        jmp loop
end:


; u8 insertion sort
arr:    .u8 8, 6, 4, 10, 36, 23, 48, 12, 42, 5

        mov c, 9        ; length of array minus one

outer:  mov b, arr[x+1]
        mov y, x

inner:  mov a, arr[y]
        cmp b
        bcc next
        mov arr[y+1], a
        dec y
        bpl inner

next:   mov arr[y+1], b
        inc x
        dec c
        bne outer

Calling convention

The calling convention isn't strict and up to the user. My preference is for the stack to grow downward with post-decrement for push. Here's an example implementation of that

- post-decrement stack for push
- pre-increment for pop
- PC pushed first, then function args in reverse order


; def my_func(a, b):
;     return a - b

my_func:
    mov a, [sp+1]    ; access 1st arg
    sub [sp+2]       ; access 2nd arg

    ; despite being accumulator, we can add imm constants
    ; to registers, handy for collapsing stack frame
    add sp, 3        ; num args + 1 for pc
    jmp [sp]         ; return


; my_func(3, 5)
    ; push the program counter
    mov [sp], PC
    dec sp

    ; push args in reverse order
    mov [sp], 5      ; can push imm/reg directly
    dec sp
    mov [sp], 3
    dec sp
    jmp my_func      ; call