Bytecode Compiler (etac)


Overview

etac is Eta’s ahead-of-time bytecode compiler. It runs the full compilation pipeline (lex → parse → expand → link → analyze → emit) and serializes the resulting bytecode to a compact binary .etac file instead of executing it.

The interpreter (etai) can then load .etac files directly, skipping every front-end stage and jumping straight to VM execution. This is useful for:

etac  hello.eta        →  hello.etac       # compile once
etai  hello.etac                           # run from cache (instant load)

Quick Start

# Compile an Eta source file to bytecode
$ etac cookbook/basics/hello.eta
compiled cookbook/basics/hello.eta → cookbook/basics/hello.etac (3 functions, 1 module(s))

# Run the compiled bytecode
$ etai cookbook/basics/hello.etac
Hello, world!
2432902008176640000

CLI Reference

Usage: etac [options] <file.eta> [-o <file.etac>]
FlagDescription
-o <output>Output file path. Defaults to <input>.etac.
-O, --optimizeEnable IR optimization passes (constant folding, primitive specialisation, dead code elimination).
-O0Disable optimization (default).
--disasmPrint disassembly to stdout instead of writing a .etac file.
--no-debugStrip debug info (source maps) from the output, producing a smaller file.
--path <dirs>Module search path (semicolon-separated on Windows, colon-separated on Linux). Falls back to ETA_MODULE_PATH.
--helpShow the help message.

Compilation Pipeline

etac reuses the same six-stage pipeline as the interpreter, but replaces the final VM execution step with binary serialization:

flowchart LR SRC["Source\n(.eta)"] --> LEX["Lexer"] LEX --> PAR["Parser"] PAR --> EXP["Expander"] EXP --> LNK["Module\nLinker"] LNK --> SEM["Semantic\nAnalyzer"] SEM --> OPT["Optimization\nPasses (-O)"] OPT --> EMT["Emitter"] EMT --> SER["Serialize\n(.etac)"]

When etai is given a .etac file it takes the fast-load path — deserializing the function registry directly into the VM:

flowchart LR ETAC[".etac file"] --> DES["Deserialize"] DES --> VM["VM\nExecution"]

Optimization Passes

When the -O flag is supplied, etac runs an IR optimization pipeline between semantic analysis and bytecode emission. The following passes are currently implemented:

PassDescription
Constant FoldingEvaluates compile-time-constant arithmetic expressions and replaces them with their result.
Primitive SpecialisationLowers proven builtin calls to dedicated VM opcodes (Add/Sub/Mul/Div/Eq/Cons/Car/Cdr).
Dead Code EliminationRemoves pure non-tail expressions in begin blocks.

Passes are composable — the pipeline runs them in sequence and can be extended with additional passes in the future.

Full details: Optimization Pipeline — architecture, IR node reference, pass authoring guide.

# Compile with optimizations enabled
$ etac -O cookbook/basics/hello.eta -o hello-opt.etac

Disassembly

The --disasm flag prints a human-readable dump of the emitted bytecode to stdout, which is useful for understanding what the compiler produces or debugging performance issues:

$ etac --disasm cookbook/basics/hello.eta

Both etac and etai support disassembly:

CommandWhat it disassembles
etac --disasm file.etaCompile from source, print bytecode to stdout (no .etac written).
etai --disasm file.etaCompile + execute from source, then dump the registry.
etai --disasm file.etacLoad a pre-compiled .etac file and dump its bytecode.

Binary Format (.etac)

Each .etac file stores a serialized BytecodeFunctionRegistry — the same structure the emitter produces in memory. The format is designed for fast loading with minimal parsing.

┌──────────────────────────────────────────────────────┐
│  Magic          4 B   "ETAC"                         │
│  Version        2 B   format version (1)             │
│  Flags          2 B   endianness, debug-info present │
├──────────────────────────────────────────────────────┤
│  Source Hash     8 B   hash of the .eta source       │
├──────────────────────────────────────────────────────┤
│  Module Table   var    one entry per module          │
│    ┌─ name_len     4 B                               │
│    ├─ name         var   UTF-8                       │
│    ├─ init_func_ix 4 B   index into function table   │
│    ├─ total_globals 4 B                              │
│    └─ main_slot    4 B   (0xFFFFFFFF if absent)      │
├──────────────────────────────────────────────────────┤
│  Constant Pool  var    NaN-boxed literals, interned  │
│                        strings, function indices     │
├──────────────────────────────────────────────────────┤
│  Function Table var   one entry per BytecodeFunction │
│    ┌─ arity        4 B                               │
│    ├─ has_rest     1 B                               │
│    ├─ stack_size   4 B                               │
│    ├─ name_len     4 B                               │
│    ├─ name         var   UTF-8                       │
│    ├─ n_consts     4 B   indices into constant pool  │
│    ├─ const_refs   var                               │
│    ├─ n_instrs     4 B                               │
│    └─ instructions var   (opcode:u8 + arg:u32) × n   │
├──────────────────────────────────────────────────────┤
│  Debug Info     var    (optional) source spans per   │
│                        instruction for diagnostics   │
└──────────────────────────────────────────────────────┘

Note

Use --no-debug to strip the Debug Info section, producing a smaller file at the cost of losing source-location information in error messages.


Source-Hash Validation

Every .etac file embeds a hash of the original .eta source text. The BytecodeSerializer computes this hash at compile time and stores it in the file header. This enables cache-invalidation strategies — a tool can compare the stored hash against the current source to detect stale bytecode.

uint64_t hash = BytecodeSerializer::hash_source(source_text);

Integration with etai

The interpreter auto-detects the .etac extension and uses the fast-load path (Driver::run_etac_file). No special flags are needed:

$ etai hello.etac          # just works — skips lex/parse/expand/link/analyze/emit

The fast-load path:

  1. Deserializes the BytecodeFunctionRegistry from the binary stream.
  2. Merges the deserialized functions into the interpreter’s live registry.
  3. Installs builtin primitives and resizes the globals vector.
  4. Executes each module’s _init function, then invokes (defun main …) if present.

This means .etac files participate in the same module system as .eta files — the prelude is loaded normally, and cross-module imports resolve as expected.


Key Source Files

FileRole
main_etac.cppetac entry point — CLI parsing, pipeline orchestration, serialization.
bytecode_serializer.hBytecodeSerializer — serialize / deserialize .etac binary format.
disassembler.hDisassembler — human-readable bytecode dump.
optimization_pipeline.hOptimizationPipeline — composable IR pass runner.
constant_folding.hConstant folding pass.
dead_code_elimination.hDead code elimination pass.
driver.hDriver::run_etac_file — fast-load path in the interpreter.