← Will Donnelly

For a recent hobby project I tried to write a self-hosting Lisp which compiled to an x86 binary. I'm not happy enough with the main compiler code to publish it at present, but there is one bit that I'm rather proud of.

See, a lot of the time when a minimal Lisp is "self hosting" it actually compiles down to C or assembly language [1] [2] [3], but I didn't want to go that route, I wanted the system to actually be self-hosting without external tools, because one stretch goal would have been to later port it to bare metal on a microcontroller or retro CPU.

For the full source code, see: asm.scm and run.sh.

Expanding Declarations

This isn't quite a traditional standalone assembler. Because I built it as part of a self-hosting compiler and didn't want to mess around with object files and integrating with normal linkers, everything starts out chunked into declarations.

So the first step is to expand these declarations via an iterative rewrite process which transforms each declaration into a list of lower-level ones, until we end up with a list of nothing but generic "objects".

(define (decls xs)
  "Iteratively desugar declarations into OBJECTs."
  (define (step out xs)
    (match xs
      [`((object ,@b) ,@ys) (step (cons (car xs) out) ys)]
      [`(,d ,@ys) (step out (append (decls/expand d) ys))]
      [_ (reverse out)]))
  (step '() xs))

(define (decls/expand decl)
  "Desugar a single declaration into simpler forms."
  (match decl
    [`(variable ,name)
     `((object ,name vars (value 4 0)))]
    [`(variable ,name ,val)
     `((object ,name vars (value 4 ,val)))]
    [`(asm ,name ,@body)
     `((object ,name code ,@(asm body)))]
    [`(func ,name ,args ,@body)
      (error "This is an Assembler, Not a Compiler!" name)]
    [err (error "Invalid Declaration" err)]))

In the original project, the func declaration was handled by invoking some compiler code to translate a function's arguments and body into assembly, which the iterative expansion process would lower further by invoking the assembler on it.

There were also additional declaration forms for pairs and symbols and the like, because this was part of a self-hosting Lisp backend.

Assembling x86 Instructions

The whole "iterative rewrite" process worked so well for declarations that I decided to use it a second time to assemble individual instructions. The only forms allowed to escape this process were generic (CONST name expr) and (VALUE size expr) instructions.

The full list of rewrite rules for all the instructions I needed ended up somewhat long. There's probably plenty of instruction-set regularities I could have used to simplify the mapping, but that would have made the actual logic much more complex than "here's a big list of patterns to match against", and it's still only 100 lines of code.

(define (asm xs)
  (define (step out xs)
    (match xs
      [`((const ,n ,x) . ,ys) (step `((const ,n ,x) . ,out) ys)]
      [`((value ,n ,x) . ,ys) (step `((value ,n ,x) . ,out) ys)]
      [`(,inst . ,ys) (step out (append (asm/desugar inst) ys))]
      [_ (reverse out)]))
  (step '() xs))

(define (asm/desugar inst)
  (match inst
    ;; Data Literals
    [`(vals ,n) `()]
    [`(vals ,n ,x . ,xs) `((value ,n ,x) (vals ,n . ,xs))]
    [`(u8 . ,xs)  `((vals 1 . ,xs))]
    [`(u16 . ,xs) `((vals 2 . ,xs))]
    [`(u32 . ,xs) `((vals 4 . ,xs))]
    [`(u64 . ,xs) `((vals 8 . ,xs))]
    [`(str ,x) `((u8 . ,(string->bytes x)) (u8 0))]
    [`(repeat 0 ,x) `()]
    [`(repeat ,n . ,xs) `(,@xs (repeat ,(- n 1) . ,xs))]
    [`(label ,n) `((const ,n $))]
    ;; Instruction "Pieces"
    [`(disp ,addr)       `((u32 (- ,addr (+ $ 4))))]
    [`(disp.short ,addr) `((u8 (- ,addr (+ $ 1))))]
    [`(modrm ,mod ,rm) `((u8 (+ ,mod ,(asm/regix rm))))]
    [`(modrm ,mod ,reg ,rm)
     `((u8 (+ ,mod (* ,(asm/regix reg) 8) ,(asm/regix rm))))]
    ;; Instructions
    [`(nop) `((u8 #x90))]
    [`(cdq) `((u8 #x99))]
    [`(inc ,rd) `((u8 (+ #x40 ,(asm/regix rd))))]
    [`(dec ,rd) `((u8 (+ #x48 ,(asm/regix rd))))]
    [`(mov.imm32 ,rd ,x)
     `((u8 (+ #xB8 ,(asm/regix rd))) (u32 ,x))]
    [`(add.imm32 ,rd ,n)
     `((u8 #x81) (modrm #xC0 ,rd) (u32 ,n))]
    [`(mov.ld32 ,rd esp)
     `((u8 #x8B) (modrm #x00 ,rd esp) (u8 #x24))]
    [`(mov.st32 esp ,rs)
     `((u8 #x89) (modrm #x00 ,rs esp) (u8 #x24))]
    [`(mov.ld32 ,rd ebp) `((mov.ld32 ,rd (+ ebp 0)))]
    [`(mov.st32 ebp ,rs) `((mov.st32 (+ ebp 0) ,rs))]
    [`(mov.ld32 ,rd (+ ebp ,n))
     `((u8 #x8B) (modrm #x40 ,rd ebp) (u8 ,n))]
    [`(mov.st32 (+ ebp ,n) ,rs)
     `((u8 #x89) (modrm #x40 ,rs ebp) (u8 ,n))]
    [`(mov.ld32 ,rd ,rs) `((u8 #x8B) (modrm #x00 ,rd ,rs))]
    [`(mov.st32 ,rd ,rs) `((u8 #x89) (modrm #x00 ,rs ,rd))]
    [`(mov ,rd ,rs)      `((u8 #x8B) (modrm #xC0 ,rd ,rs))]
    [`(add ,rd ,rs)      `((u8 #x03) (modrm #xC0 ,rd ,rs))]
    [`(sub ,rd ,rs)      `((u8 #x2B) (modrm #xC0 ,rd ,rs))]
    [`(and ,rd ,rs)      `((u8 #x23) (modrm #xC0 ,rd ,rs))]
    [`( or ,rd ,rs)      `((u8 #x0B) (modrm #xC0 ,rd ,rs))]
    [`(xor ,rd ,rs)      `((u8 #x33) (modrm #xC0 ,rd ,rs))]
    [`(cmp ,rd ,rs)      `((u8 #x3B) (modrm #xC0 ,rd ,rs))]
    [`(not ,rd)          `((u8 #xF7) (modrm #xD0 ,rd))]
    [`(neg ,rd)          `((u8 #xF7) (modrm #xD8 ,rd))]
    [`(mul ,rd)          `((u8 #xF7) (modrm #xE0 ,rd))]
    [`(div ,rd)          `((u8 #xF7) (modrm #xF0 ,rd))]
    [`(imul ,rd)         `((u8 #xF7) (modrm #xE8 ,rd))]
    [`(idiv ,rd)         `((u8 #xF7) (modrm #xF8 ,rd))]
    [`(shl ,rd)    `((u8 #xD1 (+ #xE0 ,(asm/regix rd))))]
    [`(shr ,rd)    `((u8 #xD1 (+ #xE8 ,(asm/regix rd))))]
    [`(sar ,rd)    `((u8 #xD1 (+ #xF8 ,(asm/regix rd))))]
    [`(shl ,rd ,n) `((u8 #xC1 (+ #xE0 ,(asm/regix rd)) ,n))]
    [`(shr ,rd ,n) `((u8 #xC1 (+ #xE8 ,(asm/regix rd)) ,n))]
    [`(sar ,rd ,n) `((u8 #xC1 (+ #xF8 ,(asm/regix rd)) ,n))]
    [`(shl.cl ,rd) `((u8 #xD3 (+ #xE0 ,(asm/regix rd))))]
    [`(shr.cl ,rd) `((u8 #xD3 (+ #xE8 ,(asm/regix rd))))]
    [`(sar.cl ,rd) `((u8 #xD3 (+ #xEF ,(asm/regix rd))))]
    [`(push ,rs) `((u8 (+ #x50 ,(asm/regix rs))))]
    [`(pop ,rd)  `((u8 (+ #x58 ,(asm/regix rd))))]
    [`(ret)      `((u8 #xC3))]
    [`(ret ,n)   `((u8 #xC2) (u16 ,n))]
    [`(call.reg ,rs) `((u8 #xFF (+ #xD0 ,(asm/regix rs))))]
    [`(call ,addr) `((u8 #xE8)      (disp ,addr))]
    [`(jmp ,addr)  `((u8 #xE9)      (disp ,addr))]
    [`( jb ,addr)  `((u8 #x0F #x82) (disp ,addr))]
    [`(jae ,addr)  `((u8 #x0F #x83) (disp ,addr))]
    [`( je ,addr)  `((u8 #x0F #x84) (disp ,addr))]
    [`(jne ,addr)  `((u8 #x0F #x85) (disp ,addr))]
    [`(jbe ,addr)  `((u8 #x0F #x86) (disp ,addr))]
    [`( ja ,addr)  `((u8 #x0F #x87) (disp ,addr))]
    [`( jl ,addr)  `((u8 #x0F #x8C) (disp ,addr))]
    [`(jge ,addr)  `((u8 #x0F #x8D) (disp ,addr))]
    [`(jle ,addr)  `((u8 #x0F #x8E) (disp ,addr))]
    [`( jg ,addr)  `((u8 #x0F #x8F) (disp ,addr))]
    [`(jmp.short ,addr) `((u8 #xEB) (disp.short ,addr))]
    [`( jb.short ,addr) `((u8 #x72) (disp.short ,addr))]
    [`(jae.short ,addr) `((u8 #x73) (disp.short ,addr))]
    [`( je.short ,addr) `((u8 #x74) (disp.short ,addr))]
    [`(jne.short ,addr) `((u8 #x75) (disp.short ,addr))]
    [`(jbe.short ,addr) `((u8 #x76) (disp.short ,addr))]
    [`( ja.short ,addr) `((u8 #x77) (disp.short ,addr))]
    [`( jl.short ,addr) `((u8 #x7C) (disp.short ,addr))]
    [`(jge.short ,addr) `((u8 #x7D) (disp.short ,addr))]
    [`(jle.short ,addr) `((u8 #x7E) (disp.short ,addr))]
    [`( jg.short ,addr) `((u8 #x7F) (disp.short ,addr))]
    [`(set.eq ,rd) `((u8 #x0F #x94) (modrm #xC0 ,rd))]
    [`(set.ne ,rd) `((u8 #x0F #x95) (modrm #xC0 ,rd))]
    [`(set.lt ,rd) `((u8 #x0F #x9C) (modrm #xC0 ,rd))]
    [`(set.le ,rd) `((u8 #x0F #x9E) (modrm #xC0 ,rd))]
    [`(set.gt ,rd) `((u8 #x0F #x9F) (modrm #xC0 ,rd))]
    [`(set.ge ,rd) `((u8 #x0F #x9D) (modrm #xC0 ,rd))]
    [`(int ,n)     `((u8 #xCD ,n))]
    ;; Error
    [err (error "Invalid Instruction" err)]))

Linking and ELF Header

After lowering everything to a list of generic objects, these objects were concatenated and wrapped in an ELF header to produce a Linux x86 executable. This step was pretty straightforward because I didn't bother to implement multiple segments, instead it's all combined into a single RWX area.

(define (link objs)
  (catmap link/flatten (link/elf-wrapper objs)))

(define (link/flatten obj)
  (match obj
    [`(object ,name ,type . ,body)
     `((const ,name $) ,@body (const ,(symcat name 'end) $))]
    [err (error "Invalid Object" err)]))

(define (link/elf-wrapper syms)
  "Wraps SYMS in a 32bit x86 ELF header."
  `((object %forig% data (origin #x08048000))
    (object %fbase% data)
    (object %elfhdr% data ,@(asm
      `((u8 127 69 76 70 1 1 1 3)  ;; ident[0:8]
        (u8 0 0 0 0 0 0 0 0)       ;; ident[8:16]
        (u16 2 3)                  ;; type/machine
        (u32 1 %start%)            ;; version/entry
        (u32 (- %elfpht% %fbase%)) ;; phoff
        (u32 0 0)                  ;; shoff/flags
        (u16 52 32 1)              ;; ehsize/phentsize/phnum
        (u16 0 0 0))))             ;; shentsize/shnum/shstrndx
    (object %elfpht% data ,@(asm
      `((u32 1 0)                   ;; type/offset
        (u32 %fbase% %fbase%)       ;; vaddr/paddr
        (u32 (- %flast% %fbase%))   ;; filesz
        (u32 (- %flast% %fbase%))   ;; memsz
        (u32 7 4096))))             ;; flags/align
    ,@syms
    (object %start% exec ,@(asm
      `((call init)
        (mov ebx eax)               ;; Exit Status
        (mov.imm32 eax 1)           ;; Syscall 1 (Exit)
        (int #x80))))
    (object %flast% data)))

(define (concat xs) (apply append xs))
(define (catmap f xs) (concat (map f xs)))
(define (intersperse y xs)
  (cdr (apply append (map (lambda (x) (list y x)) xs))))
(define (symcat . xs)
  "Concatenate symbols with ':' in between"
  (string->symbol
    (strcat (intersperse ":" (map symbol->string xs)))))

A more professional implementation would group objects by type and emit multiple PHT entries with appropriate R/W/X flags.

Resolving Symbolic Expressions

Throughout all of the assembly and linking, none of the actual values present in the code have really been touched, it's all been about rewriting declarations and instructions into simpler forms. An important function of assemblers beyond "knowing what opcode corresponds to an instruction" is resolving expressions with label and constant values.

I implemented a two-pass assembler, first scanning over the input and accumulating a mapping from CONST names to their actual values, and then mapping over the input a second time to resolve VALUE statements into actual integers.

(define (bin/eval expr addr env)
  "Resolve a value expression into an actual number."
  (define (subexpr x) (bin/eval x addr env))
  (match expr
    ['$ addr]
    [(? integer?) expr]
    [(? symbol?)
     (if (assoc expr env)
         (cdr (assoc expr env))
         (error "Not Defined" expr))]
    [`(+ . ,xs) (apply + (map subexpr xs))]
    [`(- . ,xs) (apply - (map subexpr xs))]
    [`(* . ,xs) (apply * (map subexpr xs))]
    [`(/ ,x ,y) (quotient (subexpr x) (subexpr y))]
    [`(% ,x ,y) (modulo (subexpr x) (subexpr y))]
    [err (error "Invalid Expression" err)]))

(define (bin/bytes len val out)
  "Expand VAL into LEN bytes (little-endian)."
  (if (<= len 0) out
    (bin/bytes (- len 1) (floor (/ val 256))
      (cons (modulo val 256) out))))

(define (bin/values out off stmts env)
  "Assemble primitive statements into a list of bytes."
  (match stmts
    [`() (reverse out)]
    [`((const ,name ,expr) . ,xs) (bin/values out off xs env)]
    [`((origin ,newoff) . ,xs) (bin/values out newoff xs env)]
    [`((align ,size) . ,xs)
      (let ((len (modulo (- 0 off) size)))
	(bin/values (bin/bytes len 0 out)
          (+ off len) xs env))]
    [`((value ,n ,x) . ,xs)
     (let ([val (bin/eval x off env)])
       (bin/values (bin/bytes n val out)
         (+ off n) xs env))]
    [err (error "Invalid Statement" err)]))

(define (bin/consts off stmts env)
  "Accumulate a mapping from constant names to offsets."
  (match stmts
    [`() env]
    [`((const ,name ,expr) . ,xs)
     (if (assoc name env)
       (error "Already Defined" name)
       (bin/consts off xs
         (cons (cons name (bin/eval expr off env)) env)))]
    [`((origin ,newoff) . ,xs) (bin/consts newoff xs env)]
    [`((align ,size) . ,xs)
      (let ((len (modulo (- 0 off) size)))
        (bin/consts (+ off len) xs env))]
    [`((value ,n ,x) . ,xs) (bin/consts (+ off n) xs env)]
    [err (error "Invalid Statement" err)]))

(define (bin/resolve stmts)
  (bin/values '() 0 stmts (bin/consts 0 stmts '())))

In the CONST and VALUE expressions, there is a special symbol $ representing the virtual address of each statement, and there are ORIGIN and ALIGN statements which manipulate that address. Note that there's no LABEL statement, a label is just a constant whose value is the current address and gets expanded as such in the assembler.

Hello, World

Putting all the pieces together, here's a simple "Hello, world" program written in the assembler I've just described:
(define *code*
  '((asm hello (str "Hello, world!\n"))
    (asm init
      (mov.imm32 ebx (- hello:end hello))
      (mov.imm32 eax hello)
      (call sys:write)
      (mov.imm32 eax 0)
      (ret))
    (asm sys:write
      (mov edx ebx)
      (mov ecx eax)
      (mov.imm32 ebx 1)
      (mov.imm32 eax 4)
      (int #x80)
      (ret))))

(hexdump (bin/resolve (link (decls *code*))))

Conclusion

It's not very difficult to implement the core of an assembler, even for a fiddly architecture like x86. The combination of dynamic types, symbol-tagged lists, pattern matching and quasiquotation enables complex rewrite rules to be implemented in very few lines of code.

This was a fun project.