Neural Networks with libtorch


Overview

Eta integrates with libtorch — the C++ backend of PyTorch — to provide GPU-accelerated tensor operations, automatic differentiation, and a neural-network API directly from Eta code. The integration is implemented as native C++ bindings exposed through the std.torch module, so there is no FFI overhead or marshalling boundary beyond a thin wrapper.

etai cookbook/ml/torch.eta

Note

libtorch is a required dependency. If not found by find_package(Torch) the build system downloads the official pre-built CPU archive automatically (~200 MB). Pass -DETA_TORCH_BACKEND=cu126 to select a CUDA variant (~2 GB).


What is a Neural Network?

A neural network is a function approximator composed of layers of simple operations. Each layer applies a linear transformation followed by a non-linear activation function. By adjusting the parameters (weights and biases) through gradient descent, the network learns to map inputs to desired outputs.

flowchart LR X["Input\nx"] --> L1["Linear Layer\nw·x + b"] L1 --> A1["Activation\n(ReLU / σ)"] A1 --> L2["Linear Layer\nw·x + b"] L2 --> Y["Output\nŷ"]
ConceptExplanation
TensorAn n-dimensional array of numbers — the fundamental data structure for inputs, weights, and outputs.
Linear LayerComputes y = W·x + b where W (weights) and b (bias) are learnable parameters.
ActivationA non-linear function (e.g. ReLU, sigmoid) applied element-wise, giving the network the power to learn non-linear mappings.
Loss FunctionMeasures how far the network’s prediction is from the target (e.g. MSE, cross-entropy).
BackpropagationComputes ∂loss/∂parameter for every parameter in one backward pass through the computation graph.
OptimizerUpdates parameters using the computed gradients (e.g. SGD, Adam).

Training Loop

Training a neural network repeats a fixed cycle until the loss converges:

flowchart TD ZG["1. Zero Gradients\n(optim-zero-grad!)"] FW["2. Forward Pass\n(forward model input)"] LS["3. Compute Loss\n(mse-loss pred target)"] BW["4. Backward Pass\n(backward loss)"] UP["5. Update Weights\n(step! optimizer)"] ZG --> FW --> LS --> BW --> UP UP -->|"next epoch"| ZG

Eta’s train-step! helper wraps all five steps into a single call:

(define loss (train-step! model optimizer loss-fn input target))

Architecture — How Eta Talks to libtorch

flowchart TD ETA["Eta Program\n(import std.torch)"] STD["std.torch Module\n(Eta wrappers)"] PRIM["C++ Primitives\n(torch_primitives.h)"] LT["libtorch\n(ATen / Autograd / NN)"] CPU["CPU"] GPU["CUDA / ROCm\n(optional)"] ETA --> STD --> PRIM --> LT LT --> CPU LT --> GPU

Three new heap object kinds manage native libtorch resources through the garbage collector:

Heap KindWrapsGC Behaviour
TensorPtrtorch::TensorReference-counted via libtorch’s intrusive_ptr; destructor releases storage.
NNModulePtrtorch::nn::ModuleShared pointer to the module; destructor releases parameters.
OptimizerPtrtorch::optim::OptimizerShared pointer to the optimizer state.

Quick Start — Tensor Basics

(module tensor-demo
  (import std.core)
  (import std.io)
  (import std.torch)
  (begin
    ;; Create tensors
    (define a (ones '(3)))           ;; [1, 1, 1]
    (define b (from-list '(2.0 3.0 4.0)))

    ;; Arithmetic
    (tensor-print (t+ a b))         ;; [3, 4, 5]
    (tensor-print (t* a b))         ;; [2, 3, 4]

    ;; Reductions
    (println (item (tsum b)))       ;; 9.0
    (println (item (mean b)))       ;; 3.0

    ;; Matrix multiply
    (define m1 (ones '(2 3)))
    (define m2 (ones '(3 4)))
    (println (shape (matmul m1 m2)))  ;; (2 4)
  ))

Autograd — Automatic Differentiation

libtorch’s autograd engine records operations on tensors that have requires-grad enabled, building a computation graph that is traversed in reverse by backward to compute gradients.

;; y = x² at x = 3  →  dy/dx = 2x = 6
(define x (from-list '(3.0)))
(requires-grad! x #t)
(define y (t* x x))
(backward y)
(println (item (grad x)))          ;; 6.0
;; Multivariate: f(x,y) = x·y at (2, 3)
;;   → ∂f/∂x = 3,  ∂f/∂y = 2
(define xa (from-list '(2.0)))
(define ya (from-list '(3.0)))
(requires-grad! xa #t)
(requires-grad! ya #t)
(backward (t* xa ya))
(println (item (grad xa)))         ;; 3.0
(println (item (grad ya)))         ;; 2.0

Building a Neural Network

Single Linear Layer

(define layer (linear 4 2))       ;; 4 inputs → 2 outputs
(println (module? layer))          ;; #t
(println (length (parameters layer)))  ;; 2  (weight + bias)

(define out (forward layer (randn '(1 4))))
(println (shape out))              ;; (1 2)

Multi-Layer Network

Stack layers with sequential:

(define net (sequential
              (linear 4 8)
              (relu-layer)
              (linear 8 2)))

(define out (forward net (randn '(1 4))))
(println (shape out))              ;; (1 2)

This builds the following network:

flowchart LR IN["Input\n(1 × 4)"] --> L1["Linear\n4 → 8"] L1 --> R["ReLU"] R --> L2["Linear\n8 → 2"] L2 --> OUT["Output\n(1 × 2)"]

Complete Training Example — Learning y = 2x + 1

This example trains a single linear layer to approximate the function y = 2x + 1 using gradient descent with MSE loss.

(module train-demo
  (import std.core)
  (import std.io)
  (import std.torch)
  (begin
    ;; ── Training data ────────────────────────────────────────────
    (define train-x (reshape (from-list '(1.0 2.0 3.0 4.0 5.0)) '(5 1)))
    (define train-y (reshape (from-list '(3.0 5.0 7.0 9.0 11.0)) '(5 1)))

    ;; ── Model: y ≈ w·x + b ──────────────────────────────────────
    (define model (linear 1 1))
    (define opt   (sgd model 0.01))

    ;; ── Training loop ────────────────────────────────────────────
    (display "epoch |   loss\n")
    (display "------+-----------\n")
    (letrec ((loop (lambda (epoch)
               (if (> epoch 200) 'done
                   (let ((loss (train-step! model opt mse-loss
                                            train-x train-y)))
                     (if (= (modulo epoch 50) 0)
                         (begin (display epoch)
                                (display "   |  ")
                                (println loss)))
                     (loop (+ epoch 1)))))))
      (loop 1))

    ;; ── Test predictions ─────────────────────────────────────────
    (define test-x (reshape (from-list '(6.0 7.0 10.0)) '(3 1)))
    (println (to-list (reshape (forward model test-x) '(3))))
    ;; Expected: values close to (13.0 15.0 21.0)
  ))

What happens during training:

sequenceDiagram participant TL as Training Loop participant Optim as Optimizer (SGD) participant Model as Linear(1,1) participant AG as Autograd TL->>Optim: zero gradients TL->>Model: forward(train-x) → pred TL->>TL: mse-loss(pred, train-y) → loss TL->>AG: backward(loss) AG-->>Model: ∂loss/∂w, ∂loss/∂b TL->>Optim: step! (w -= lr·∂loss/∂w) Note over TL: Repeat for each epoch

Optimizers

Eta exposes two optimizers from libtorch:

OptimizerConstructorDescription
SGD(sgd model lr)Stochastic gradient descent — simple, robust, well-understood.
Adam(adam model lr)Adaptive moment estimation — maintains per-parameter learning rates; often converges faster.
;; SGD with learning rate 0.01
(define opt (sgd model 0.01))

;; Adam with learning rate 0.1
(define opt (adam model 0.1))

Both optimizers support the same interface:

(optim-zero-grad! opt)   ;; clear accumulated gradients
(step! opt)              ;; apply one parameter update
(optimizer? opt)         ;; → #t

Loss Functions

FunctionEta NameFormula
Mean Squared Errormse-loss(1/n) Σ (pred − target)²
L1 (Mean Absolute Error)l1-loss(1/n) Σ |pred − target|
Cross-Entropycross-entropy-loss−Σ target · log(pred)
(define pred   (from-list '(1.0 2.0 3.0)))
(define target (from-list '(2.0 2.0 2.0)))
(println (item (mse-loss pred target)))   ;; 0.666…
(println (item (l1-loss pred target)))    ;; 0.666…

Device Management (GPU)

Tensors and modules can be moved between CPU and GPU:

(println (gpu-available?))       ;; #t if CUDA is present
(println (gpu-count))            ;; number of GPUs

;; Move a tensor to GPU and back
(define t (ones '(3 3)))
(define t-gpu (to-gpu t))
(println (device t-gpu))         ;; "cuda:0"
(define t-cpu (to-cpu t-gpu))

;; Move an entire model to GPU
(nn-to-gpu model)

std.torch API Reference

Tensor Creation

FunctionSignatureDescription
tensor(val [requires-grad?])Scalar or list → tensor
ones(shape)Tensor filled with 1.0
zeros(shape)Tensor filled with 0.0
randn(shape)Tensor from normal distribution
manual-seed(seed)Set torch RNG seed for reproducible sampling
arange(start end step)Range tensor
linspace(start end n)Evenly spaced points
fact-table->tensor(ft col-idxs)Convert live fact-table rows and selected columns to a float64 tensor
from-list(list)Flat list → 1-D tensor

Arithmetic

FunctionDescription
t+, t-, t*, t/Element-wise arithmetic
matmulMatrix multiplication
dotDot product (flattened)

Linear Algebra and Sampling

FunctionSignatureDescription
cholesky(cov)Cholesky factor of a square SPD covariance matrix
matrix-exp(t)Matrix exponential of a square matrix tensor
mvnormal(mean cov)Draw one sample from N(mean, cov)

Unary Operations

FunctionDescription
neg, tabsNegation, absolute value
texp, tlog, tsqrtexp, log, √
relu, sigmoid, ttanhActivation functions
softmax(t dim) — softmax along dimension

Shape Operations

FunctionSignatureDescription
shape(t)Shape as a list of fixnums
reshape(t shape)Reshape tensor
transpose(t d0 d1)Swap two dimensions
squeeze / unsqueeze(t [dim])Remove / add size-1 dimension
cat(tensor-list dim)Concatenate along dimension

Reductions

FunctionDescription
column-l2-norm / torch:column-l2-norm(t axis) — L2 norm along the selected axis
tsum, meanSum, mean of all elements
tmax, tminMaximum, minimum
argmax, argminIndex of max / min

Conversion

FunctionDescription
itemScalar tensor → Eta number
to-listFlat tensor → Eta list
numelNumber of elements

Autograd

FunctionDescription
requires-grad!(t flag) — enable/disable gradient tracking
requires-grad?(t) — query gradient tracking
detach(t) — detach from computation graph
backward(t) — run backpropagation
grad(t) — read accumulated gradient
zero-grad!(t) — reset gradient to zero

Neural Network Layers

FunctionDescription
linear(in out) — fully connected layer
sequential(layer ...) — stack layers
relu-layer() — ReLU activation module
sigmoid-layer() — Sigmoid activation module
dropout(rate) — dropout regularisation
forward(module tensor) — forward pass
parameters(module) — list of parameter tensors
train! / eval!(module) — set training / evaluation mode

Serialization

FunctionDescription
tensor-save(t path) — save tensor to file
tensor-load(path) — load tensor from file
tensor-print(t) — display tensor to stdout

Building with Torch Support

CMake Options

cmake -B build                                       # CPU-only (auto-downloads libtorch)
cmake -B build -DETA_TORCH_BACKEND=cu126             # CUDA 12.6
OptionDefaultDescription
ETA_TORCH_BACKENDcpulibtorch variant: cpu, cu126, cu128, or cu130
ETA_LIBTORCH_VER2.11.0libtorch version to download

libtorch is downloaded automatically if not found by find_package(Torch). The build system fetches the official pre-built archive from download.pytorch.org. The download is cached in the build directory so re-configures do not re-download.

Manual libtorch

If you prefer to manage libtorch yourself:

cmake -B build -DTorch_DIR=/path/to/libtorch/share/cmake/Torch

Test Suite

The cookbook/tests/torch/ directory contains a comprehensive integration test suite that exercises every std.torch primitive through the full Eta pipeline (lex → parse → expand → link → emit → execute). Each file is a standalone module that uses the std.test framework and prints a pass/fail summary.

FileCoverage
tensor_creation.etatensor, ones, zeros, randn, arange, linspace, from-list, tensor?
arithmetic.etat+, t-, t*, t/, matmul, dot, neg, tabs, texp, tlog, tsqrt, relu, sigmoid, ttanh, softmax
shape_ops.etashape, reshape, transpose, squeeze, unsqueeze, cat, numel
reductions.etatsum, mean, tmax, tmin, argmax, argmin
autograd.etarequires-grad!, requires-grad?, backward, grad, zero-grad!, detach
nn_layers.etalinear, sequential, relu-layer, sigmoid-layer, dropout, forward, parameters, module?, train!, eval!
loss_functions.etamse-loss, l1-loss
optimizers.etasgd, adam, step!, optim-zero-grad!, optimizer?
device_info.etagpu-available?, gpu-count, device, to-device, to-cpu
training.etatrain-step!, SGD/Adam convergence, sequential network training

The stdlib test suite also includes stdlib/tests/torch.test.eta, which validates cholesky and mvnormal via the Eta test runner.

Running the tests

Run all torch tests at once:

# All tests (Linux / macOS)
for f in cookbook/tests/torch/*.eta; do echo "── $f"; etai "$f"; done

# All tests (Windows PowerShell)
Get-ChildItem cookbook\tests\torch\*.eta | ForEach-Object { Write-Host "── $_"; etai $_ }

Run a single test file:

etai cookbook/tests/torch/tensor_creation.eta

The C++ unit tests in eta/qa/test/src/torch_tests.cpp mirror these Eta tests at the primitive level, verifying heap lifecycle, GC behavior, error paths, and end-to-end training convergence directly against the C++ API.

# Run C++ torch tests
ctest --test-dir build -R torch

Implementation Notes

The integration is structured as three layers:

  1. eta/builtins/torch/ — C++ header-only library (torch_primitives.h) that registers ~60 builtins with the VM’s BuiltinEnvironment.

  2. stdlib/std/torch.eta — Eta wrapper module that re-exports the builtins under clean names (t+, matmul, forward, etc.) and provides the train-step! helper.

  3. cookbook/ml/torch.eta — End-to-end example covering tensor creation, autograd, network construction, and training with both SGD and Adam.

  4. cookbook/tests/torch/ — 10-file integration test suite covering every exported function in std.torch.

The C++ primitives handle type marshalling between NaN-boxed LispVal values and libtorch’s torch::Tensor / torch::nn::Module / torch::optim::Optimizer types through three dedicated heap object kinds (TensorPtr, NNModulePtr, OptimizerPtr), each with a destructor that runs when the GC collects the wrapper.