RESEARCH2026-03-08

How We Model a Tube Amp — The Physics

We don't hide the math. We use it as art.

THE BACKBONE: WAVENET

The core architecture is a WaveNet-style network — a stack of dilated causal convolutions. “Causal” means the output at time T depends only on inputs at time T and earlier. No future information. This is not a design choice — it is a requirement. A guitar amplifier is a causal system. Our model must be too.

“Dilated” means each successive layer looks further back in time. Layer 1 has a dilation factor of 1 (adjacent samples). Layer 2 has a dilation factor of 2 (every other sample). Layer N has a dilation factor of 2^N. This exponential growth means a modest number of layers creates a very large receptive field — 6,139 samples, or 0.128 seconds at 48 kHz.

Why 0.128 seconds? Because tube amplifier circuits have memory. Push a power tube into saturation and the power supply sags — the capacitors drain, the voltage drops, and the amp compresses. Release, and the supply recovers. This “sag” is a defining characteristic of tube tone, and it happens over tens to hundreds of milliseconds. Our receptive field captures it.

THE CONDITIONING: FILM

Feature-wise Linear Modulation (FiLM) is how the knobs work.

At each layer of the network, the knob positions (Volume, Treble, Bass, Mid-Bite) are transformed into two vectors: γ (scale) and β (shift). The layer's activations are then modulated:

FiLM(x) = γ · x + β

where:
  x  = layer activation
  γ  = f(knob_positions)  — learned scale
  β  = g(knob_positions)  — learned shift

This is elegant. The knob positions don't just select a preset — they continuously modulate the signal processing at every stage of the network. Turn the treble knob and every layer responds, exactly as the resistor-capacitor tone stack in the real amplifier would reshape the signal at every stage of the circuit.

THE ACTIVATION: SNAKE

Standard neural network activations (ReLU, tanh, sigmoid) are poor models of vacuum tube nonlinearity. Tubes produce a specific kind of distortion: smooth, asymmetric soft-clipping that generates predominantly even-order harmonics. This is why tubes “sound warm.”

We use the Snake activation function:

Snake(x) = x + sin²(αx) / α

where α is a learned parameter per channel

Snake produces periodic, smooth nonlinearities that naturally model the saturation behavior of a vacuum tube's transfer curve. The parameter α controls the frequency of the nonlinearity — learned per channel, allowing different layers to model different aspects of the tube's behavior.

THE NUMBERS

MetricValue

Memory footprint393 KB

Receptive field6,139 samples

Time memory0.128 seconds @ 48kHz

MACs per sample~1,440

Latency0 ms (strict causal)

Training lossESR + high-pass pre-emphasis

Cabinet IRMorgan AC20 convolution

1,440 multiply-accumulate operations per sample. At 48 kHz, that is 69 million operations per second. A modern CPU core handles billions. The model runs comfortably in real-time alongside your DAW, other plugins, and virtual instruments.

393 KB. The entire behavior of a legendary tube amplifier. Not a photograph. A living, breathing twin.