How We Model a Tube Amp — The Physics
We don't hide the math. We use it as art.
THE BACKBONE: WAVENET
The core architecture is a WaveNet-style network — a stack of dilated causal convolutions. “Causal” means the output at time T depends only on inputs at time T and earlier. No future information. This is not a design choice — it is a requirement. A guitar amplifier is a causal system. Our model must be too.
“Dilated” means each successive layer looks further back in time. Layer 1 has a dilation factor of 1 (adjacent samples). Layer 2 has a dilation factor of 2 (every other sample). Layer N has a dilation factor of 2^N. This exponential growth means a modest number of layers creates a very large receptive field — 6,139 samples, or 0.128 seconds at 48 kHz.
Why 0.128 seconds? Because tube amplifier circuits have memory. Push a power tube into saturation and the power supply sags — the capacitors drain, the voltage drops, and the amp compresses. Release, and the supply recovers. This “sag” is a defining characteristic of tube tone, and it happens over tens to hundreds of milliseconds. Our receptive field captures it.
THE CONDITIONING: FILM
Feature-wise Linear Modulation (FiLM) is how the knobs work.
At each layer of the network, the knob positions (Volume, Treble, Bass, Mid-Bite) are transformed into two vectors: γ (scale) and β (shift). The layer's activations are then modulated:
FiLM(x) = γ · x + β where: x = layer activation γ = f(knob_positions) — learned scale β = g(knob_positions) — learned shift
This is elegant. The knob positions don't just select a preset — they continuously modulate the signal processing at every stage of the network. Turn the treble knob and every layer responds, exactly as the resistor-capacitor tone stack in the real amplifier would reshape the signal at every stage of the circuit.
THE ACTIVATION: SNAKE
Standard neural network activations (ReLU, tanh, sigmoid) are poor models of vacuum tube nonlinearity. Tubes produce a specific kind of distortion: smooth, asymmetric soft-clipping that generates predominantly even-order harmonics. This is why tubes “sound warm.”
We use the Snake activation function:
Snake(x) = x + sin²(αx) / α where α is a learned parameter per channel
Snake produces periodic, smooth nonlinearities that naturally model the saturation behavior of a vacuum tube's transfer curve. The parameter α controls the frequency of the nonlinearity — learned per channel, allowing different layers to model different aspects of the tube's behavior.
THE NUMBERS
1,440 multiply-accumulate operations per sample. At 48 kHz, that is 69 million operations per second. A modern CPU core handles billions. The model runs comfortably in real-time alongside your DAW, other plugins, and virtual instruments.
393 KB. The entire behavior of a legendary tube amplifier. Not a photograph. A living, breathing twin.