Eta Optimisation Plan

This plan inventories existing IR-level optimisation in Eta and lists the highest-ROI improvements available in the compiler, the bytecode emitter, and the VM. Items are ordered by expected payoff per unit of work.

Current state (April 2026)

Three passes exist under eta/core/src/eta/semantics/passes/:

The optimisation pipeline is wired and active under etac -O:

Important correctness caveat: current ConstantFolding identifies arithmetic by global binding name only. If user code rebinds +/-/*//, -O can fold using builtin semantics even when runtime call target differs.

For calls proven to target immutable builtins with supported arity, etac -O now emits dedicated opcodes (Add/Sub/Mul/Div/Eq/Cons/Car/Cdr) instead of generic LoadGlobal + Call.

Calls that are not proven safe (shadowed names, unsupported arity, or non-builtin/global targets) still take the generic call path.

Shortlist (best ROI first)

  1. Self-recursive tail call -> backward Jump in the emitter.
  2. Drop the shared_lock on BytecodeFunctionRegistry::get and pass std::span<LispVal> to primitives (zero-copy).
  3. Harden and extend ConstantFolding (builtin-identity checks, n-ary, comparisons, identities, fixpoint loop).
  4. Constant + copy propagation through let-bindings.
  5. Extend primitive specialisation coverage (more builtins/arity forms) where semantic proof is straightforward.

Items 1-2 alone should produce a multi-x speedup on arithmetic-heavy workloads (AAD primal, Monte Carlo path generation in pure Eta) without introducing any new opcodes or new IR.


Detailed items

1. Primitive call specialisation status and next extensions

Where: implemented in eta/core/src/eta/semantics/passes/primitive_specialisation.h, consumed by the existing emitter.

bytecode.h defines dedicated opcodes Add, Sub, Mul, Div, Eq, Cons, Car, Cdr. The pass currently lowers calls when:

to PrimitiveCall, which the emitter lowers to the matching opcode.

Remaining work in this area:

2. Harden and extend constant folding

3. Constant / copy propagation through let (i.e. Begin + Set{Local})

Let is desugared into a Begin that starts with Set{Local} initialisers (see letrec pattern detection in emit_begin). For a non-mutated local whose initialiser is a Const, Var, or Quote, replace each LoadLocal slot with the initialiser, then DCE drops the Set. Combined with item 2 this enables folding through (let ((x 2) (y 3)) (* x y)) style code that macro expansion produces everywhere. BindingInfo::mutable_flag already records mutability.

4. Inlining of small lambdas / direct-recursive calls

Two cheap forms:

5. Global resolution caching (“ICs lite”)

LoadGlobal is already O(1). But every global call still does callee object checks and function resolution from the registry, currently with a shared_lock on the read path.

Options:

6. Stop reallocating args per primitive call

dispatch_callee builds a std::vector<LispVal> args for every primitive call and primitive code usually iterates once. Either:

This is a runtime change, not an IR pass, but it is low-hanging VM work.

7. Bytecode peephole

After emission, a quick scan can collapse:

8. Closure / upvalue elimination for non-escaping lambdas

If a Lambda is only used as the immediate callee of a Call (no escape), it does not need MakeClosure / heap allocation - its body can be inlined or compiled as a direct sibling function with shared frame access.

9. AAD / numerical hot-path specific

Given the XVA plan and existing AD tape integration:

10. Dead-store / dead-local elimination

DCE today only touches pure expressions inside Begin. It does not detect:

A liveness sweep over IR can implement both.

11. Constant lifting and quote interning at emit time

emit_const already caches string constants per function. Quote data and numeric constants are not interned across the constant pool, so repeated quoted literals still duplicate work and storage.

A hashed datum pool keyed by value would shrink bytecode and improve constant-pool locality.

12. Type-driven specialisation (longer term)

There is no static type system, but a flow-sensitive pass could prove “this var is always closure arity N” or “always fixnum” along a path and emit specialised opcodes.


Suggested rollout order

  1. Self-recursive tail-call -> jump (emitter only, no new opcode). [done]
  2. Drop the shared_lock on BytecodeFunctionRegistry::get; pass span<const LispVal> to primitives. [done]
  3. Harden ConstantFolding correctness under builtin shadowing.
  4. Extend folding (n-ary/comparisons/identities/fixpoint) plus constant/copy propagation through let-bindings.
  5. Bytecode peephole.
  6. Dead-store elimination + frame shrinking.
  7. AAD-specific flonum fast path + active-tape flag.
  8. Closure elimination for non-escaping lambdas.
  9. Constant pool / quote interning.
  10. Flow-sensitive type specialisation.

Each step is independently shippable and benchmarkable.

Step 1 implementation notes

Step 2 implementation notes


Optimization Pipeline Reference


Overview

Eta includes a composable IR-level optimization pipeline that transforms the Core IR graph between semantic analysis and bytecode emission. Optimizations at this level benefit from high-level type, scope, and control-flow information that is lost once the IR is lowered to bytecode.

The pipeline is invoked by the etac compiler when the -O flag is supplied. The interpreter (etai) does not run optimization passes — it prioritises fast turnaround during development.

flowchart LR SEM["Semantic\nAnalyzer"] --> OPT["Optimization\nPipeline"] OPT --> EMT["Emitter"] subgraph OPT_PASSES ["Passes (in order)"] CF["Constant\nFolding"] PS["Primitive\nSpecialisation"] DCE["Dead Code\nElimination"] CF --> PS --> DCE end OPT -.-> CF
# Compile with optimizations enabled
$ etac -O hello.eta

# Compile without optimizations (default)
$ etac hello.eta
$ etac -O0 hello.eta

Core IR — The Optimization Target

Passes operate on the Core IR — a typed, tree-structured intermediate representation produced by the semantic analyzer. Every expression in an Eta program is lowered to one of the following node types before bytecode emission:

NodeDescription
ConstLiteral value (integer, double, boolean, character, string).
VarVariable reference — resolved to a Local, Upval, or Global address.
QuoteQuoted datum (deep-copied S-expression).
IfConditional: test, conseq, alt.
BeginSequence of expressions — result is the last one.
SetMutation (set!) — writes to a resolved address.
LambdaFunction: parameters, captured upvalues, body.
CallFunction call: callee + argument list.
PrimitiveCallSpecialised builtin call lowered to a dedicated VM opcode.
Apply(apply proc args ...) — spread-call.
CallCC(call/cc consumer) — first-class continuation capture.
Values / CallWithValuesMultiple-return-value support.
DynamicWind(dynamic-wind before body after).
Raise / GuardException raise and guard (catch).
MakeLogicVar / Unify / DerefLogicVar / TrailMark / UnwindTrailLogic programming (unification, backtracking).

All Node* pointers are arena-allocated inside ModuleSemantics — old nodes replaced by a pass are simply abandoned and freed when the arena destructs. This makes pass authoring cheap: no manual deallocation is required.


Architecture

OptimizationPass — Base Class

Every pass implements the OptimizationPass interface:

class OptimizationPass {
public:
    virtual ~OptimizationPass() = default;

    /// Human-readable name (for diagnostics / --dump-passes).
    virtual std::string_view name() const noexcept = 0;

    /// Transform the IR for a single module.
    virtual void run(ModuleSemantics& mod) = 0;
};

A pass receives a ModuleSemantics reference containing the module’s toplevel_inits (the IR roots) and its bindings metadata. It may mutate the IR graph in place, replace nodes, or remove them.

OptimizationPipeline — Pass Runner

The pipeline is an ordered container of passes. Passes execute in registration order over each module:

OptimizationPipeline pipeline;
pipeline.add_pass(std::make_unique<ConstantFolding>());
pipeline.add_pass(std::make_unique<PrimitiveSpecialisation>());
pipeline.add_pass(std::make_unique<DeadCodeElimination>());

pipeline.run_all(modules);  // runs all passes, in order, on every module

The pipeline also exposes introspection helpers:

MethodDescription
size()Number of registered passes.
empty()True when no passes are registered.
pass_names()Returns a vector<string> of all pass names (useful for --dump-passes).

IRVisitor<Derived> — CRTP Tree Walker

Passes typically use the IRVisitor CRTP base class to walk the IR. It provides depth-first traversal of all node children, calling pre_visit before and post_visit after each subtree:

struct MyFolder : IRVisitor<MyFolder> {
    ModuleSemantics& mod;

    // Called after all children of `node` have been visited.
    // Return a replacement node, or `node` itself to keep it.
    core::Node* post_visit(core::Node* node, bool tail_context) {
        // ... inspect and optionally replace node ...
        return node;
    }
};

The visitor handles every Core IR node type — If, Begin, Lambda, Call, Set, DynamicWind, Values, CallWithValues, CallCC, Apply, Raise, Guard, Unify, DerefLogicVar, UnwindTrail, and all leaf nodes. A bool context parameter (typically “tail position”) is threaded through the walk.


Built-in Passes

Constant Folding

Name: constant-folding

Evaluates compile-time-constant arithmetic expressions and replaces them with their result. This is a conservative peephole — only calls to the builtin +, -, *, / primitives where both arguments are Const literal nodes with numeric payloads are folded.

SourceFolded Result
(+ 2 3)5 (fixnum)
(+ 1.5 2.5)4.0 (double)
(- 10 3)7
(* 4 5)20
(/ 10 2)5 (exact integer division)
(/ 7 2)3.5 (inexact → double)
(/ x 0)not folded (division by zero)
(+ x 1)not folded (x is a variable)

Type promotion rules:

Effect on bytecode: A folded expression emits a single LoadConst instruction instead of LoadConst + LoadConst + LoadGlobal + Call.

Primitive Specialisation

Name: primitive-specialisation

Lowers eligible builtin calls to dedicated primitive opcodes.

Current lowering targets:

Lowering only applies when the callee resolves to a proven immutable builtin global with matching arity. Calls that fail those checks remain generic Call/TailCall.

Effect on bytecode: Replaces LoadGlobal + Call with direct opcode instructions (Add, Sub, Mul, Div, Eq, Cons, Car, Cdr).

Dead Code Elimination

Name: dead-code-elimination

Removes side-effect-free expressions whose results are discarded. The pass targets Begin blocks (sequences):

SourceAfter DCE
(begin 42 99)99
(begin (+ 1 2) x)x
(begin 1 2 3 4 5)5
(begin (set! x 1) x)(begin (set! x 1) x)set! has side effects, kept.
(begin x)x — single-element begin simplified.

A node is considered pure (side-effect-free) if it is a Const, Var, or Quote. Non-tail expressions in a Begin that are pure are removed. The last expression is always kept — it produces the block’s value.

Combined Example

When both passes run in sequence, they compose naturally:

;; Source
(begin (+ 1 2) (+ 3 4))

;; After constant folding:
(begin 3 7)

;; After dead code elimination:
7

The final bytecode emits a single LoadConst 7 — no arithmetic instructions, no discarded intermediate values.


Writing a Custom Pass

To add a new optimization pass:

  1. Create a header in eta/core/src/eta/semantics/passes/:
#pragma once

#include "eta/semantics/optimization_pass.h"
#include "eta/semantics/ir_visitor.h"
#include "eta/semantics/core_ir.h"

namespace eta::semantics::passes {

class MyPass : public OptimizationPass {
public:
    std::string_view name() const noexcept override {
        return "my-pass";
    }

    void run(ModuleSemantics& mod) override {
        Walker w{mod};
        for (auto*& node : mod.toplevel_inits) {
            node = w.visit(node, false);
        }
    }

private:
    struct Walker : IRVisitor<Walker> {
        ModuleSemantics& mod;
        explicit Walker(ModuleSemantics& m) : mod(m) {}

        core::Node* post_visit(core::Node* node, bool /*ctx*/) {
            // Inspect node->data (a std::variant of all IR node types).
            // To replace a node, create a new one via mod.emplace<T>(...)
            // and return it. The old node stays in the arena.
            return node;
        }
    };
};

} // namespace eta::semantics::passes
  1. Register the pass in main_etac.cpp (under the -O flag):
if (optimize) {
    auto& pipeline = driver.optimization_pipeline();
    pipeline.add_pass(std::make_unique<ConstantFolding>());
    pipeline.add_pass(std::make_unique<PrimitiveSpecialisation>());
    pipeline.add_pass(std::make_unique<DeadCodeElimination>());
    pipeline.add_pass(std::make_unique<MyPass>());  // ← new
}
  1. Add tests in eta/qa/test/src/optimization_tests.cpp using the OptFixture helper — it compiles source through the full pipeline with your passes applied and lets you inspect both runtime results and emitted bytecode.

Tip

Pass ordering matters. Constant folding can create new dead code (e.g. (begin (+ 1 2) x)(begin 3 x)x), so running DCE after folding and primitive specialisation is more effective than the reverse.


Key Source Files

FileRole
optimization_pass.hOptimizationPass — abstract base class for all passes.
optimization_pipeline.hOptimizationPipeline — ordered pass runner.
ir_visitor.hIRVisitor<Derived> — CRTP depth-first tree walker.
core_ir.hCore IR node types (Node, NodeData variant).
constant_folding.hConstant folding pass.
primitive_specialisation.hPrimitive-call specialisation pass.
dead_code_elimination.hDead code elimination pass.
optimization_tests.cppUnit tests for the pipeline, visitor, and optimisation passes.
main_etac.cppetac entry point — pass registration under -O.