Architecture


Overview

Eta’s compiler is structured as a six-stage pipeline that transforms UTF-8 source text into stack-based bytecode and executes it on a virtual machine. All phases are orchestrated by the Driver class, which owns the full runtime state and supports incremental execution (each REPL input shares the same VM globals and linker state).

Pipeline:  Source → Lex → Parse → Expand → Link → Analyze → Emit → Execute

System Diagram

flowchart TD subgraph Reader["Reader (eta/reader/)"] LEX["Lexer\nlexer.h"] PAR["Parser\nparser.h"] EXP["Expander\nexpander.h"] LNK["Module Linker\nmodule_linker.h"] end subgraph Semantics["Semantics (eta/semantics/)"] SEM["Semantic Analyzer\nsemantic_analyzer.h"] IR["Core IR\ncore_ir.h"] EMT["Emitter\nemitter.h"] end subgraph Runtime["Runtime (eta/runtime/)"] VM["VM\nvm.h"] HEAP["Heap\nheap.h"] GC["Mark-Sweep GC\nmark_sweep_gc.h"] NB["NaN-Box\nnanbox.h"] IT["Intern Table\nintern_table.h"] BI["Builtins\nbuiltin_env.h"] end subgraph Frontends["Frontends"] REPL["eta_repl\nmain_repl.cpp"] INTERP["etai\nmain_etai.cpp"] LSP["eta_lsp\nmain_lsp.cpp"] end REPL --> DRV["Driver\ndriver.h"] INTERP --> DRV LSP --> LSPS["LSP Server\nlsp_server.h"] LSPS --> DRV DRV --> LEX --> PAR --> EXP --> LNK LNK --> SEM --> IR --> EMT EMT --> VM VM --> HEAP VM --> IT VM --> NB VM --> BI HEAP --> GC DIAG["Diagnostic Engine\ndiagnostic.h"] DRV -.-> DIAG LEX -.-> DIAG PAR -.-> DIAG EXP -.-> DIAG LNK -.-> DIAG SEM -.-> DIAG VM -.-> DIAG

Stage-by-Stage Walkthrough

1. Lexer

File: lexer.h / lexer.cpp

The lexer converts raw UTF-8 source into a stream of Token values. It handles:

Each token carries a Span with file ID, line, column, and byte offset for precise diagnostic messages.


2. Parser

File: parser.h / parser.cpp

The parser consumes tokens and produces an S-expression tree. The AST node types form a variant (SExpr):

NodeDescription
NilThe empty list '()
Bool#t or #f
CharCharacter literal
StringString literal
SymbolIdentifier
Numberint64_t or double
ListProper or dotted list
Vector#(…) vector literal
ByteVector#u8(…) byte-vector
ReaderForm', `, ,, ,@ sugar
ModuleForm(module name …) — parsed structurally

3. Expander (Macro Expansion)

File: expander.h / expander.cpp

The expander desugars high-level Scheme forms into a minimal core language. After expansion, only these forms remain:

if  begin  set!  lambda  quote
define  defun  module
dynamic-wind  values  call-with-values  call/cc  apply

Derived forms that are expanded away:

Surface formExpands to
letImmediately-applied lambda
let*Nested let chain
letreclet + set! chain
named letletrec + lambda
condNested if
and / orNested if
when / unlessif + begin
doNamed letrec loop
defundefine + lambda
define-record-typeConstructor + predicate + accessor defines
syntax-rules macrosPattern-matched template instantiation

The expander also rewrites internal defines in lambda bodies into letrec per R7RS semantics (controlled by enable_internal_defines_to_letrec).

For syntax-rules, literal keywords are matched by lexical binding identity (not raw symbol text), and free template identifiers are rebound to their definition-site context so use-site locals do not capture them.


4. Module Linker

File: module_linker.h / module_linker.cpp

The linker resolves inter-module dependencies in two phases:

  1. Index — scan all (module …) forms, collecting define and export names into a ModuleTable.
  2. Link — resolve import clauses, applying only, except, and rename filters, populating each module’s visible set.

Errors: UnknownModule, CircularDependency, ConflictingImport, ExportOfUnknownName, DuplicateModule.


5. Semantic Analyzer

File: semantic_analyzer.h / semantic_analyzer.cpp

The semantic analyzer walks the desugared S-expressions and produces a Core IR graph (core_ir.h). Key responsibilities:

Core IR Node Types

All nodes are arena-allocated (core::Arena, 16 KB blocks) for pointer stability and cache locality.

Node  =  Var | Const | Quote | If | Begin | Set
       | Lambda | Call | Values | CallWithValues
       | DynamicWind | CallCC | Apply
NodeSemantics
Var { addr }Variable read — Local, Upval, or Global
Const { literal }Literal value — bool, char32_t, string, int64_t, double
Quote { datum }Quoted data — arbitrary S-expression preserved at runtime
If { test, conseq, alt }Conditional branch
Begin { exprs }Sequencing
Set { target, value }Mutation (set!)
Lambda { params, rest, locals, upvals, body }Function with captured upvalues
Call { callee, args }Function application
Apply { proc, args }Variadic apply (last arg unpacked)
Values { exprs }Multiple return values
CallWithValues { producer, consumer }call-with-values
CallCC { consumer }call/cc
DynamicWind { before, body, after }Dynamic wind guards

6. Emitter

File: emitter.h / emitter.cpp

The emitter walks the Core IR tree and produces BytecodeFunction objects containing sequences of Instructions. Each module produces a top-level _init function. Nested lambdas are emitted recursively and stored in a thread-safe BytecodeFunctionRegistry (backed by std::deque for pointer stability). Named functions use deterministic <module>:<binding> labels, and anonymous lambdas use source-stable labels <lambda@module:line:column>.

The emitter’s key decisions:


7. VM Execution

File: vm.h / vm.cpp

The VM is a stack-based interpreter with a frame stack. See Bytecode & VM for full details.


Diagnostic System

File: diagnostic.h

All compiler phases report errors through a single DiagnosticEngine. Each error carries:

Stage-specific error types (LexError, ParseError, ExpandError, LinkError, SemanticError, VMError) are converted via to_diagnostic<T>() template specializations.


Frontends

ExecutableEntry pointRole
etacmain_etac.cppAhead-of-time compiler — emits .etac bytecode
etaimain_etai.cppFile interpreter — loads prelude, runs .eta or .etac
eta_replmain_repl.cppInteractive REPL — incremental execution, shared state
eta_lspmain_lsp.cppLanguage Server — JSON-RPC over stdio, publishes diagnostics
eta_dapmain_dap.cppDebug Adapter — DAP over stdio (breakpoints, stepping, heap inspection)

All frontends create a Driver instance, which constructs the Heap, InternTable, VM, BuiltinEnvironment, and BytecodeFunctionRegistry and wires them together.


Builtin Catalog Contract

Files: builtin_catalog.h / builtin_catalog.cpp

The builtin catalog is the single source of truth for builtin slot metadata (name, arity, rest-arg flag, owner, and optional docs metadata).