GPT-5.3 Instant / Sonnet 4.6 Ext

Reverb Algorithm

Technical Reference Document

Integrated Overview of Theory, Implementation, Perception, and Modern Approaches

Freeverb3 / FDN / Dattorro / Lexicon / Convolution Reverb / Neural FDN

Chapter 1: The Nature of Reverb — An Overview

1.1 What Is Reverb?

When listening to music in a large concert hall, the sound continues to resonate for some time after the instrument stops playing. Even during the performance, the sound resonates in a way that adds richness and luster to the music — this is why performing in a hall is preferred. Suntory Hall in Akasaka, Tokyo, is one of the most celebrated examples of a hall prized for its beautiful acoustics.

Reverb is not merely a "residual sound" — it is the spatial information itself, determining the depth, sense of distance, and texture of the sound. Acoustically, two temporal domains coexist simultaneously:

Early Reflections: Allow the listener to perceive the size of the space and the positions of the walls (approximately a few ms to 100 ms)
Late Reverb: Generates the sense of envelopment and richness of sound (the reverberant tail that follows)

1.2 The Two Major Methods of Reverb Generation

Using a real concert hall is impractical due to time and cost constraints, and in CD mastering and similar applications, software is used to artificially recreate the acoustic properties of a hall. Freeverb3 is a well-known example. There are two primary methods for generating such reverberation:

Method	Overview / Characteristics
FIR (Convolution Reverb)	Records and uses the actual impulse response of a real space. Highly realistic but less flexible and computationally expensive. "Technology for copying a real hall."
IIR (Algorithmic Reverb)	Artificially generates reverb using mathematical models. Lightweight, adjustable, and creatively flexible, but complex to design. "Technology for creating the feeling of a hall." Used in nearly all commercial hardware units such as Lexicon.

1.3 The Nature of Depth Perception in Sound

The front-to-back perception (depth) of a sound is primarily determined by three factors:

Direct sound / reverberant sound ratio: Higher ratio = closer; lower ratio = farther
High-frequency content: More = closer; less = farther (perceived air absorption)
Timing of early reflections: Short = nearby walls; long = large space

In other words, depth perception is reverb design itself.

1.4 Human Perceptual Characteristics and Reverb Design

Humans do not perceive individual reflections; rather, they perceive patterns of energy distribution and temporal change. What is needed in a mix is not precise reflection placement but density, decay, and frequency balance.

FIR (IR) reverb can be difficult to use in certain contexts for the following reasons:

Fixed to a specific recording position, microphone placement, and directional pattern — sounds unnatural when the source changes
The sense of distance (direct/reflected ratio) is baked into the recording, making it difficult to adjust
The "sparse buildup" of a real hall can sound thin or hollow in a mix
May contain unwanted information such as floor resonance, HVAC noise, or uneven room modes

By contrast, algorithmic reverb functions as a "perceptually optimized fiction" and proves advantageous in contexts where musical pleasantness is the priority — pop music, film scores, and game audio.

Chapter 2: The Lineage of Algorithms

2.1 Schroeder Reverb (Classic / Foundational Form)

The oldest and most widely implemented form — including in free software — is the reverb based on a combination of comb filters and allpass filters. A detailed explanation is provided on the ARI-WEB website. The precursor to Freeverb3, known as Freeverb, is based on this algorithm, which is referred to as "Schroeder Reverb."

Structure

Comb filter: Creates reverb density (multiple in parallel)
Allpass filter: Diffuses the sound (connected in series)

Characteristics

Extremely simple to implement
Prone to a metallic sound quality
Susceptible to ringing (resonance emphasis) at specific frequencies

The fundamental issue is modal unevenness — i.e., insufficient randomness. The comb filter produces equally spaced frequency peaks (analogous to a standing-wave structure), which is the root cause of the characteristic metallic "ping" sound.

As a result, it is rarely used in applications demanding high quality. However, its metallic character is sometimes exploited as a substitute for a plate reverb (a real unit that inputs sound into a metal plate and outputs the vibration). Freeverb generates its filters using empirically derived values and is considered to have relatively less metallic character, though its limitations remain.

2.2 Improved Schroeder Variants (Implementations in Freeverb3_VST)

Implementation	Characteristics
Freeverb	Improved Schroeder type. Lightweight and simple. Metallic character relatively suppressed through empirical parameters.
CCRMA NRev	A more refined version of the same algorithm as Freeverb.
NVerb (v2)	Uses nested allpass filters plus input feedback to minimize ringing as much as possible. Improved low-frequency resonance.

Nested Allpass Filter Structure

In a standard (series) configuration: input → AP → AP → AP. In a nested structure, another reverb algorithm is inserted inside an allpass filter. This results in:

A more complex temporal structure
More uniform energy distribution
Significantly improved low-frequency response

It is not widely known that nested allpass filters outperform series allpass filters in quality, but Schroeder himself referenced this algorithm long ago.

◆ Adding modulation to the feedback section of a comb filter (making it a chorus) tends to suppress ringing. The Lexicon Concert Hall algorithm is believed to be based on "Chorus comb filter + allpass filter + nested allpass filter."

2.3 FDN (Feedback Delay Networks)

One of the methods developed to address the limitations of Schroeder reverb. The algorithm consists of several delay lines of varying lengths along with a matrix for mixing and outputting the signal. In Freeverb3, this is implemented as Hibiki Reverb.

Basic Model (Mathematical Formulation)

An FDN can be described as a multi-input, multi-output recursive system as follows:

x(n) = A · x(n-D) + B · u(n)

y(n) = C · x(n)

x(n): State vector of each delay line
D: Delay lengths (different per channel)
A: Feedback matrix
B, C: Input/output weights

Stability

The condition for system stability is: spectral radius ρ(A) < 1. However, because FDN includes delays, a more precise condition is: "stable when A is unitary (or orthogonal) and attenuation is applied."

Unitary matrix A: A^T · A = I → Energy preservation, norm invariance → No frequency bias in sonic energy
Inserting an LPF into the feedback loop → |eigenvalue| < 1 → Attenuation occurs

Eigenvalues and Sound

Phase of eigenvalue → Determines resonant frequency
Magnitude of eigenvalue → Determines decay rate

In other words, an FDN is a "collection of many decaying oscillation modes." If the eigenvalues are uneven, specific frequencies are emphasized (ringing); if the delay lengths form simple ratios, modes overlap. This is the same structural issue that causes the Schroeder reverb to sound metallic.

Eight or More Delay Lines and the Hadamard Matrix

Using eight or more delay lines together with a Hadamard matrix makes it relatively straightforward to generate high-quality reverberation. Many implementations address frequency phase variations present in real reverb by dynamically varying the matrix in real time. This approach also accommodates algorithmic generation of early reflections and is widely used.

2.4 Loop Tank Reverb

Vintage digital reverbs such as the Lexicon 224 and EMT 250 are highly regarded for their "warm" and beautiful sound — much of which stems from their loop-based (tank) algorithm using allpass filters. By constructing a large loop, a reverb can be created that, like FDN, increases in density over time.

EMT 250 Structure

Pre-delay
3-stage allpass filter (diffuser)
Allpass-based feedback loop totaling approximately 80–120 ms
Output taken from multiple taps within the feedback loop
Passed through a Schroeder decorrelator (series of allpass filters)
One path is filtered through LPF and HPF before being fed back to the input

While the algorithm itself is simple, finding the optimal delay lengths and gains is challenging.

Lexicon 224 Structure (Dattorro Algorithm)

The Lexicon 224 was reverse-engineered and implemented as the ESP reverb, and was published in an academic paper. The ESP manual is mirrored on the Freeverb3 website.

Input diffuser
Figure-8 cross-feedback (cross-feedback) with modulation via allpass interpolation
Summed output from several taps

The feedback loop is relatively short and is classified as a Plate Reverb, but a suitable character can be achieved by adjusting the coefficients. In Freeverb3_VST, it is implemented as STRev, and separately as ProG Reverb (Progenitor Reverb) based on a different loop structure optimized for hall use.

◆ The main feedback loop of Progenitor Reverb is based on the published Dattorro algorithm, but the input diffuser and output taps are original. The feedback coefficient inside the loop and diffuser is approximately 0.5 as a baseline, with experimentation suggesting an optimal range of 0.4–0.8.

Dattorro Algorithm Block Structure

The overall structure follows this flow: Input → Pre-delay → Multiple Allpass (diffuser) → Main Loop (2-channel cross-feedback) → Tap output (multiple points) → Damping.

The core is the cross-feedback. The left loop and right loop each feed back into themselves while also feeding back into each other (cross). This causes information to mix, causing density to increase explosively.

Component	Perceptual Role
Allpass diffuser (immediately after input)	Disperses transients, eliminates early echoes — "diffusion at the moment of entering the space"
Allpass in the loop	Additional diffusion. Optimal feedback coefficient: 0.4–0.8
Damping (LPF)	Naturally attenuates high frequencies. Without it, the sound is "digitally sterile"
Modulation	Breaks fixed modes over time. Removes metallic quality; gives a "living" sense of animation
Tap outputs (multiple points)	Output from different temporal positions and phases. Generates stereo image and spatial width

The essential difference from FDN: FDN mixes all signals at once via a matrix, whereas Dattorro mixes structurally (gradually and temporally). This makes it more "musically" controllable.

2.5 Frequency Division (Multiband Processing)

More recent Lexicon reverbs use frequency division to apply the most appropriate algorithm to each band when generating reverberation.

Plate reverb algorithms: Short loops, which make it difficult to generate low-frequency reverb internally
Hall algorithms: Long loops, which make it difficult to generate high-frequency reverb

The Bass XOV parameter is a mechanism for dividing and processing low-frequency content to extend the low-band decay time, among other functions. Lexicon adopted this division algorithm quite early on, resulting in a reverb with powerful and warm low-frequency response. This approach is also physically justified, as low frequencies have low directivity in real environments while high frequencies have high directivity.

Perceptual Significance of Frequency Dependency

Low frequencies: Strong masking and low phase sensitivity → Can be extended without becoming unpleasant
High frequencies: High resolution and sensitivity → Must be short to sound natural

From an information-theoretic perspective: the low band achieves entropy increase (richness) early on, while the high band rapidly achieves a reduction in mutual information (diffusion).

2.6 Evolutionary Lineage of Algorithms (Summary)

Generation	Representative	Core Structure	Sonic Character
1st Gen	EMT 250 (1976)	Short loop + Allpass	Imperfect but warm, granular quality
2nd Gen	Lexicon 224 / 480L	Perceptual optimization + multiband + randomization	Musical, magical tail
3rd Gen	Bricasti M7	Waveguide-like physical approximation	Reproduces real spatial feel
4th Gen	DiffFDN / PINN	Learning-based integration	Unified perception, physics, and information

Chapter 3: Convolution and IR Reverb Technology

3.1 Accurate Definition of "Convolution"

The precise meaning of convolution is the mathematical operation commonly written as follows. The term "folding" as a rendering is sometimes seen in non-technical contexts, but the correct technical term is convolution.

Mathematical definition:

y(n) = Σ_k x(k) · h(n-k)

x(n): Input signal
h(n): Impulse response (IR)
y(n): Output

Acoustic meaning: IR = the "complete response" of a space (or device); convolution = "computing the result of playing the input signal in that space."

3.2 IR and Deconvolution (Inverse Filter)

Converting between microphone types (e.g., SM57 → U87) requires deconvolution (an inverse filter).

H_IR(ω) = Y(ω) / X(ω)

X: Input (known)
Y: Recorded output

A practical issue is noise amplification due to "division by near-zero" at frequencies where the denominator approaches zero. Regularization and band limiting are used as countermeasures.

3.3 IR Measurement: Using Sweep Signals

Ideally, a delta function input would be used, but in practice, the signal-to-noise ratio is too poor. The modern standard method uses a sine sweep or TSP (Time-Stretched Pulse).

Procedure

Play the sweep signal
Record the response

Deconvolution: h(n) = y(n) * x^-1(n)

Why Sweeps Are Superior

High SNR is achievable
Nonlinear distortion components can be separated in the time direction of the sweep (nonlinear components appear at different times)

3.4 Real-Time Convolution and FFT Acceleration

Computational Complexity

Direct computation: O(N²) → Impractical for long IRs
FFT convolution: O(N log N) → Practical

FFT convolution formula: Y = IFFT(FFT(X) · FFT(H))

Latency Problem

Because FFT operates on blocks, a delay equal to the block size is introduced (e.g., 1024 samples ≈ 20 ms).

Solution: Hybrid Method

Early portion (short time) → Direct convolution (low latency)
Late portion (long time) → FFT (fast)

This achieves both low latency and fast processing simultaneously.

3.5 Fundamental Limitations of IR (LTI Systems Only)

IR can only represent LTI (Linear Time-Invariant) systems.

Can Be Reproduced	Cannot Be Reproduced
Frequency characteristics, reverb, phase response	Even-order harmonic distortion in tube amplifiers (nonlinear distortion)
General linear spatial response	Nonlinear compression effects of speakers
FM-type modulation (time-dependent frequency variation)

IR can reproduce the "shape" of a sound but not its "behavior." Additionally, while the characteristics of recording equipment (speakers, microphones) can be partially removed through deconvolution, complete removal is impossible due to constraints of SNR, dynamic range, and distortion.

3.6 Software History and Major Products

Representative convolution reverb software of the era:

Software	Characteristics
SIR (Windows-only, freeware)	Initially high latency; progressively improved across versions
WAVES IR-1 / IR-L	Industry-standard commercial products
Altiverb	Supports stereo I/O using 4-channel IR. High spatial accuracy and high-quality bundled library. However, requires iLok and is expensive.
Voxengo Pristine Space	High-value commercial product
WizooVerb W2	Commercial product

◆ Altiverb's 4-channel IR enables superior spatial accuracy for stereo I/O. A procedure for converting Altiverb's proprietary format (little-endian 24-bit PCM + gain correction data in the resource fork) to WAV was shared publicly. Free IR distribution sites such as Noisevault, Voxengo, and memi.com were also widely used.

3.7 Creative Applications of IR

IR is not limited to "recording spaces" — it has a wide range of creative applications:

Real-space IRs (halls, rooms): The original use case
Equipment IR: IRs captured from premium digital reverb units such as the Lexicon 480/960
Instrument body IR: Violin or piano body resonance IR → adds resonance
Vintage equipment IR: Transferring the characteristics of cassette MTRs, Game Boys, etc.
Non-IR data: Feeding voice, drum loops, sawtooth waves, etc. into a reverberator yields a spectrum conversion effect in the form y = x * (arbitrary signal)

Chapter 4: FDN Implementation (C++)

4.1 Basic FDN Implementation

Below is a basic C++ implementation of an FDN. The template parameter length specifies the dimension of the FDN. The range of feedback is [-1.0, 1.0]; values outside this range will cause the system to diverge.

template<typename Sample, size_t length>

struct FeedbackDelayNetwork {

size_t bufIndex = 0;

std::array<std::array<Sample, length>, 2> buf{};

std::array<std::array<Sample, length>, length> matrix{};

std::array<Delay<Sample>, length> delay;

std::array<RateLimiter<Sample>, length> delayTimeSample;

Sample process(Sample input, Sample feedback) {

bufIndex ^= 1;

auto &front = buf[bufIndex];

auto &back = buf[bufIndex ^ 1];

front.fill(0);

for (size_t i = 0; i < length; ++i)

for (size_t j = 0; j < length; ++j)

front[i] += matrix[i][j] * back[j];

input /= Sample(length);

for (size_t idx = 0; idx < length; ++idx) {

auto &&sig = input + feedback * front[idx];

front[idx] = delay[idx].process(

sig, delayTimeSample[idx].process());

}

return std::accumulate(front.begin(), front.end(), Sample(0));

}

};

◆ Reset-related methods are omitted. When the template parameter length is large, rewriting with std::vector is recommended (approximately dim=200 is the practical upper limit).

Structure of the Processing Loop

First section (up to the first blank line): Swaps the buffer that receives feedback
Middle section (up to the second blank line): Computes the feedback matrix
Final section: Computes input/output for each delay line

The argument feedback is a scalar coefficient that uniformly scales the feedback matrix values. Note that because Delay performs linear interpolation, the output gradually attenuates if the delay time is non-integer.

4.2 The Nature of Stability

Stability Condition

Condition for FDN not to diverge: ρ(g·A) ≤ 1, where ρ denotes the spectral radius.

Meaning of Orthogonal Matrices

For an orthogonal matrix A, A^T · A = I holds. This ensures |λ_i| = 1 and thus energy is preserved:

‖y‖² = ‖Ay‖²

That is, an orthogonal matrix + |g| < 1 guarantees stability. Attenuation is controlled solely by g.

◆ According to Schlecht and Habets' "On lossless feedback delay networks," making the feedback matrix unitary or triangular prevents an FDN from diverging.

4.3 Designing Delay Lengths

Key Conditions

Choose values that are approximately coprime
Avoid values that share common multiples

Bad and Good Examples

Bad: 100 ms, 200 ms → Creates strong periodicity
Good: 97 ms, 131 ms, 173 ms, 211 ms ... → Prevents modal unevenness

Automatic Generation Algorithms

Coprime generation: Use prime or pseudo-prime sequences (e.g., 97, 131, 173, 211...)
Minimum-correlation optimization: Minimize J = Σ_{i≠j} corr(D_i, D_j) using simulated annealing or similar
Logarithmic spacing: D_i = D_min · r^i (uniform in the frequency domain)

The optimal solution cannot be determined analytically; it is a hybrid of search and heuristics.

Chapter 5: Feedback Matrix Design

5.1 Matrix Types and Acoustic Properties

According to research by Schlecht and Habets, the main types of feedback matrices that prevent FDN from diverging are unitary matrices (possibly including complex numbers) and triangular matrices. For the implementations discussed here, real orthogonal matrices are sufficient.

Matrix Type	Characteristics / Sonic Tendency
Random orthogonal matrix	High diffusion, uniform energy distribution. Smooth with minimal metallic character. Equivalent to scipy.stats.ortho_group.rvs()
Special orthogonal matrix	Determinant = 1. Difference in sound vs. random orthogonal is difficult to perceive
Householder matrix	Form: H = I - 2(vv^T)/(v^Tv). Constructable from dim random numbers. Somewhat weaker diffusion with more distinct character
Hadamard matrix	Only ±1 values. Perfectly orthogonal. Fast (additions only). Similar sound to random orthogonal but CPU-efficient
Circulant matrix	Shift structure with periodicity. "Pipe-like" sound. Diagonalizable via DFT
Triangular matrix	Strong early reflections and metallic character. Short delays are prominent
Conference matrix	Diagonal elements are 0. Very natural; minimizes self-loop (metallic sources). Excellent sound quality
Absorbent allpass matrix	From Schlecht-Habets Eq. (10). May be less efficient than not using FDN in some cases

Guidelines for Matrix Selection

For fixed matrix values, Hadamard or Conference matrices are good choices (only -1, 0, and 1 — supports integer arithmetic)
Orthogonal and special orthogonal matrices allow tonal control by adjusting the magnitude of diagonal elements
Triangular and Schroeder-type matrices can take on a distinctive character when modulated

5.2 Randomized Orthogonal Matrix (C++ Implementation)

Implementation based on Mezzadri's "How to generate random matrices from the classical compact groups" — a C++ translation of scipy.stats.ortho_group.rvs().

template<size_t dim>

void randomOrthogonal(unsigned seed,

std::array<std::array<Sample, dim>, dim> &H) {

pcg64 rng{}; rng.seed(seed);

std::normal_distribution<Sample> dist{};

H.fill({});

for (size_t i = 0; i < dim; ++i) H[i][i] = Sample(1);

std::array<Sample, dim> x;

for (size_t n = 0; n < dim; ++n) {

auto xRange = dim - n;

for (size_t i = 0; i < xRange; ++i) x[i] = dist(rng);

Sample norm2 = 0;

for (size_t i = 0; i < xRange; ++i) norm2 += x[i] * x[i];

Sample x0 = x[0];

Sample D = x0 >= 0 ? Sample(1) : Sample(-1);

x[0] += D * std::sqrt(norm2);

Sample denom = std::sqrt(

(norm2 - x0*x0 + x[0]*x[0]) / Sample(2));

for (size_t i = 0; i < xRange; ++i) x[i] /= denom;

for (size_t row = 0; row < dim; ++row) {

Sample dotH = 0;

for (size_t col = 0; col < xRange; ++col)

dotH += H[col][row] * x[col];

for (size_t col = 0; col < xRange; ++col)

H[col][row] = D * (H[col][row] - dotH * x[col]);

}

◆ PCG is used as the random number generator because it is adopted in NumPy's default_rng(). It is preferred over std::minstd_rand due to its superior stability.

5.3 Hadamard Matrix (Sylvester's Construction)

Initialized with 1/sqrt(dim) and recursively constructed by tiling. dim must be a power of 2.

template<size_t dim>

void constructHadamardSylvester(

std::array<std::array<Sample, dim>, dim> &mat) {

static_assert(dim && ((dim & (dim-1)) == 0), ...);

mat[0][0] = Sample(1) / std::sqrt(Sample(dim));

size_t start = 1; size_t end = 2;

while (start < dim) {

for (size_t row = start; row < end; ++row)

for (size_t col = start; col < end; ++col) {

auto &&value = mat[row-start][col-start];

mat[row-start][col] = value; // Upper right

mat[row][col-start] = value; // Lower left

mat[row][col] = -value; // Lower right

}

start *= 2; end *= 2;

}

5.4 Householder Matrix

Expressed in the form H = I - 2(vv^T)/(v^Tv). An orthogonal matrix can be constructed from dim random numbers, and the result is symmetric (H = H^T).

template<size_t dim>

void randomHouseholder(unsigned seed,

std::array<std::array<Sample, dim>, dim> &matrix) {

// ... (initialize vec with uniform random values)

auto scale = Sample(-2) / denom;

for (size_t i = 0; i < dim; ++i) {

matrix[i][i] = Sample(1) + scale * vec[i] * vec[i];

for (size_t j = i+1; j < dim; ++j) {

auto value = scale * vec[i] * vec[j];

matrix[i][j] = value; matrix[j][i] = value;

}

5.5 Conference Matrix

The size of a Conference matrix must satisfy the condition: "q+1, where q is an even number expressible as the sum of two squares (equivalent to OEIS A286636)." Candidate sizes from OEIS sequence A000952: 62, 54, 50, 46, 42, 38, 30, 26, 18, 14, 10, 6, 2.

Because the diagonal elements are 0, there are no simple comb filter sections, resulting in superior sound quality.

Construction Procedure

Set modulo = dimension - 1 and compute the set of quadratic residues
Generate an array of Legendre symbols (values 0, +1, -1)
Rotate the symbol array to construct matrix S

Final matrix: C[0][0]=0, C[0][i]=C[i][0]=1/sqrt(modulo), C[i][j]=S[i][j]

5.6 Absorbent Allpass Matrix

A matrix introduced in Schlecht-Habets' "Time-varying feedback matrices in FDN." It represents a nested allpass filter structure. Converges when α is in the range (-1, 1).

template<size_t dim>

void randomAbsorbent(unsigned seed, Sample low, Sample high,

std::array<std::array<Sample, dim>, dim> &mat) {

// dim must be even

constexpr size_t half = dim / 2;

// Generate orthogonal matrix A of size half×half

randomOrthogonal(seeder(rng), A);

for (size_t col = 0; col < half; ++col) {

auto gain = dist(rng);

mat[half+col][half+col] = gain; // Bottom-right

mat[half+col][col] = Sample(1) - gain*gain; // Bottom-left

for (size_t row = 0; row < half; ++row) {

mat[row][half+col] = A[row][col]; // Top-right

mat[row][col] = -A[row][col]*gain; // Top-left

}

5.7 Generating Near-Identity Random Special Orthogonal Matrices

By modifying the random number generation section of randomSpecialOrthogonal as follows, the proximity to the identity matrix can be adjusted:

x[0] = Sample(1);

for (size_t i = 1; i < xRange; ++i)

x[i] = identityAmount * dist(rng);

As identityAmount approaches 0, the matrix approaches the identity (diagonal components are enhanced)
Values significantly greater than 1 reduce diagonal components relative to off-diagonal

Acoustically, lowering identityAmount tends to strengthen early reflections while weakening the late reverb tail.

5.8 Direct Eigenvalue Design

Normally, eigenvalues are determined by the feedback matrix A. However, "direct eigenvalue design" — specifying eigenvalues first and then constructing the matrix — is also possible.

Design Procedure

Eigenvalue placement: Set θ_k uniformly in the form λ_k = r_k · exp(jθ_k), with r_k set for frequency-dependent decay

Matrix reconstruction: A = V Λ V^-1 (Λ: designed eigenvalues, V: arbitrary orthogonal basis)

Practical Solutions

Keep the matrix unitary and separate attenuation as a separate filter (high numerical stability)
Frequency-dependent feedback: Use g(ω) = exp(-α(ω))

The essence of eigenvalue design is the "design of the decay spectrum."

Chapter 6: Perceptual Models and Reverb Design

6.1 Perceptual Conditions for a "Good Reverb"

Good Reverb	Bad Reverb
No echo sensation (density increases over time)	Metallic
Low frequencies extend naturally	Periodic (ping-pong feel)
High frequencies are not harsh	Muddy, or loses definition
Does not degrade stereo imaging	Specific band is prominent
Energy distribution becomes uniform over time	Ringing is clearly audible

6.2 JND (Just Noticeable Difference)

Reference values for the minimum perceptible difference (JND):

Change in RT60: approximately 5–10%
Pre-delay: on the order of a few ms

Knowing the JND leads to the design principle that "unnecessarily precise design is unnecessary." Optimization at the 1 ms level or frequency resolution design below the hearing limit carries no practical benefit.

6.3 Auditory Masking

Frequency Masking

A loud sound masks weaker sounds in adjacent frequency bands. The "granular quality" of the EMT 250 creates masking and can sound more pleasant than complete diffusion — a paradox in which "imperfection is the perceptual optimum."

Temporal Masking

An effect that makes sounds immediately following a loud sound inaudible. Early reflections are masked by the direct sound, so complete physical accuracy is not required.

Design Principles

Ignore differences that cannot be heard
Focus on differences that can be heard
In many cases, prioritizing pleasantness over perfect physical accuracy is the right call

6.4 Information-Theoretic Perspective

Treating the sound field as an "information source" rather than a "wave":

Early reflections: Low entropy (structured, predictable)
Late reverb: High entropy (random, unpredictable)

Goal of a good reverb: Entropy increases smoothly over time.

Mutual Information Perspective

Early stage: I(X;Y) is high (strong correlation between input and output)
Late stage: I(X;Y) is low (input information has been sufficiently diffused)

Information-theoretic interpretation of metallic sound: A phenomenon in which information fails to diffuse and structure remains perceptible.

6.5 Three-Layer Structure of Reverb Design

① Early Reflections

Role: Perception of spatial size and sense of distance from the sound source
Design: A set of distinct delays from a few ms to 100 ms, using random or geometry-based placement
Key point: Delay design is more dominant than FDN or allpass

② Diffusion

Role: "Breaks up" the sound and eliminates the echo sensation
Primary tool: Allpass filter
Key parameters: Feedback coefficient (0.4–0.8), delay length (shorter = smoother)
Insufficient diffusion results in a "grainy" texture

③ Late Reverb

Role: The reverberant tail, the texture of the space
Implementation: FDN / Loop / Schroeder-type
The primary choice point for "which algorithm to use"

Chapter 7: Perceptual Optimization in Commercial Reverbs

7.1 Lexicon 480L Random Hall

The perceptual "magic" of the Lexicon 480L lies in the combination of Bass XOV (multiband processing), controlled modulation, and the Loop Tank structure.

CHO Value (Modulation)

In Random Hall, internal modulation (Decay Optimization) corresponds to CHO.

Optimal value: CHO at medium to low settings (approximately 0.1–0.3)
Setting it too deep causes chorusing (unnatural pitch modulation), increasing the "digital" quality
Subtle randomization breaks fixed modes and introduces a "faint, living flutter"
In V2 algorithm, deeper modulation is explicitly described as "designed to be noticed" — V2 uses deeper modulation than V1

Optimal Perceptual Settings for Bass XOV

Bass XOV (crossover): approximately 500 Hz (for boost) to 1.5 kHz (for natural hall reproduction)
Bass Multiply (BAS): approximately 1.2–1.5X to extend low-frequency decay time
RT HF CUT: medium (around 0.5)
HF CUTOFF: low (around 5–7 kHz)
Pre-delay: 20–40 ms (Large RHall)

◆ The manual states that "BASS MULT maximum, RT HF CUT medium, HF CUTOFF low" is the ideal concert hall setting.

Mathematical Interpretation of Why Lexicon Sounds "Warm"

Bass XOV is a mechanism that varies the optimal information diffusion rate by frequency band, based on the following physical and perceptual facts:

Low band: RT60 is long (large τ in y_low) → Strong masking, contributes to entropy increase
High band: RT60 is short → High resolution and sensitivity, so it decays quickly
Result: "Envelopment," "depth," and "warmth" are substantially enhanced

7.2 Mathematical Reproduction of M7 V2 Modulation

According to the official manual, V2 modulation is pitch variation in the late reverb tail — different from chorus or flange.

Each delay tap time D_i(t) is independently modulated:

D_i(t) = D_i + Δ · m(t, level)

Level-Dependent Modulation Function m(t)

Low setting (1–3): m(t) = sin(2π·f·t + φ_i) (f ≈ 0.1–0.5 Hz). Left-right phase difference creates "smooth left-to-right movement"
High setting (6–High): Low-frequency smoothed noise (composite of Perlin noise or 1/f noise). Independent noise source per delay line
Mid setting (4–5): m(t) = α·sine + (1-α)·noise (α = 0.5–0.7)

// C++ implementation (simplified)

Sample sine = std::sin(2.0f * M_PI * modFreqBase * phase[i]);

Sample noise = 0;

for (int k = 0; k < 3; ++k)

noise += std::sin(2.0f * M_PI * (modFreqBase*(k+1)) * np[i]) / (k+1);

noise /= 3.0f;

// Low: sine-dominant; High: noise-dominant; Mid: blend

7.3 Perceptual Optimization of the EMT 250

The world's first commercial digital reverb (1976). The originator of the Loop Tank concept.

Slight nonlinear granularity creates masking, preventing the perception of metallic tone
Rich low-frequency response — described as "more physical than plugin reverbs"
A classic example of "imperfection as the perceptual optimum": sometimes more pleasant than complete diffusion
For FDN reproduction: short loop + strong allpass diffuser + natural damping

7.4 Speculative Algorithm Analysis of the Bricasti M7 (Waveguide-Like)

The internal algorithm of the Bricasti M7 is entirely proprietary, but its acoustic characteristics strongly suggest a structure inspired by a scattering-type Waveguide Network.

Waveguide-Like Characteristics

Treats sound propagation as an approximation of a discretized wave equation
Connects multiple delay lines via a scattering matrix
Geometrically accurate early reflections (realistic ER), with statistically natural density buildup in the late reverb
Creates a sense of realism in which "the sound exists inside the room"

Contrast with FDN

FDN: Mixes everything at once via a matrix → Statistical naturalness ("air")
M7 (speculated Waveguide): Gradual scattering + low-dimensional wave mesh → Directionality and sense of distance reproduced physically ("space")

7.5 Modern Plugin Comparison

Characteristic	Lexicon 480L	Valhalla VintageVerb	Bricasti M7
3D depth	Magical (perceptually optimized)	Beautiful spatial spread	Overwhelming realism
Warmth / low end	Rich via multiband	Sufficiently warm	Natural but full-bodied
Musicality	Supreme (emotionally evocative)	High (Lexicon-like)	Somewhat restrained
Metallic quality	Good via randomization	Excellent	Nearly zero
Mix compatibility	Prominent, hard to lose	Forward, pop-oriented	Blends naturally
Character	Vintage magic	Modern lush	Modern realistic

◆ The Lexicon 480L embodies "perceptual correctness over physical correctness." It anticipated in the 1980s — through human ears — the perceptual loss optimization that DiffFDN and PINN now pursue computationally.

Chapter 8: Modern Approaches — Neural / Physical Hybrids

8.1 DiffFDN: Optimizing for Perceptual Objective Functions

Traditional FDN design has mathematically pursued stability and diffusion, but it is now possible to directly optimize for "how humans perceive the output."

Objective Function

J(θ) = Σ_t w_t · d(P(y_θ(t)), P(y_target(t)))

θ: FDN parameters (matrix A, delay lengths D_i, gain g, etc.)
P(·): Perceptual transform (ERB filterbank + loudness compression + time integration)
d(·): Perceptual distance (mean absolute difference of mel-spectrogram, EDC error, etc.)

Mel-Spectrogram-Based Perceptual Loss (Implementation Example)

def perceptual_loss(y, y_target):

Y = loudness(mel(y))

Yt = loudness(mel(y_target))

return torch.mean(torch.abs(Y - Yt))

8.2 Neural FDN (Constrained Learning)

Learning the FDN itself with a neural network. The orthogonality constraint is preserved through parameterization.

class FDN(nn.Module):

def __init__(self, N=8):

super().__init__()

self.S = nn.Parameter(torch.randn(N, N))

self.delay = nn.Parameter(torch.rand(N) * 2000)

def orthogonal_matrix(self):

S = self.S - self.S.T # Antisymmetric matrix

return torch.matrix_exp(S) # exp(S) is always orthogonal

By taking the matrix exponential of antisymmetric matrix S, A = exp(S) is always orthogonal, guaranteeing stability.

8.3 Composite Loss Function

def total_loss(x, y, y_target):

lp = perceptual_loss(y, y_target)

le = -entropy_loss(y) # Maximize entropy

lmi = mutual_info_loss(x, y) # Minimize mutual information

ld = decorrelation_loss(y) # Decorrelation

return lp + 0.1*le + 0.1*lmi + 0.1*ld

8.4 Neural + Physical Hybrid Model (PINN)

Decomposition

y_early: Ray tracing or geometric acoustics (initial solution of the wave equation)
y_late: FDN(θ) + F_NN(x) (physical constraint + NN correction)

Application of Physics-Informed Neural Networks

The wave equation ∂²p/∂t² = c² ∇²p is incorporated into the loss function:

Loss = DataLoss + PhysicsLoss

PhysicsLoss = ‖∂²p/∂t² - c² ∇²p‖²

Advantages of the Hybrid Approach

Data-efficient (can learn from a small number of measured IRs)
Nonlinear effects (light distortion, compression) can be added via NN
Real-time processing: early via GPU ray tracing, late via lightweight FDN + NN

8.5 Neural IR (Dynamic IR)

Traditional IR: y(n) = x(n) * h(n) (h is fixed). Neural IR: y(n) = Σ_k x(k) · h(n, k) (h is time-dependent).

Time-varying filter: h = h(n, θ) (θ depends on state, time, or input)
Neural network: h = f_θ(x, t) (IR is generated from input)
Characteristics: Indirectly expresses nonlinearity and state dependence. However, dependent on training data and difficult to interpret.
Essence: "Learning the function that generates an IR"

8.6 LiquidSonics Seventh Heaven vs. M7

Seventh Heaven reproduces the M7 using its proprietary Fusion-IR technology (modulated capture of multiple IRs). It is fundamentally different from static IRs (e.g., Samplicity M7 IR).

Pure IR: Fixed impulse response. Density and decay are "baked in" and static. May produce metallic tones.
Seventh Heaven: Synthesizes multiple IRs via modulation. Reproduces M7's "dynamic modulation" and "living tail"
Evaluation: In blind listening tests, frequently rated as "close to hardware M7"; slightly inferior to the hardware itself, but widely regarded as reproducing over 90% of the M7's character

8.7 Real-Time Control of Eigenvalue Distribution

A model that dynamically controls eigenvalues:

λ_k(t) = r_k(t) · exp(jθ_k(t))

Implementation Methods

Matrix interpolation: A(t) = V Λ(t) V^-1
Rotation update: A(t+1) = R(t) · A(t) (R(t): rotation matrix)
Time-varying feedback matrices (demonstrated in existing papers)

Effects and Cautions

Metallic tones due to fixed modes are completely eliminated
Smooth transitions are possible when changing reverb time in real time
Abrupt changes generate noise → Control with LFO or perceptual JND considerations is necessary

◆ This is an important idea that shifts reverb from a "static structure" to a "dynamic field."

Chapter 9: Volterra Nonlinear Extensions

9.1 Why Nonlinearity Is Needed

IR represents only the "first-order term" of the Volterra series. Real acoustic environments contain the following nonlinearities:

Energy-dependent decay: Louder sounds decay faster
Intermodulation distortion (IMD): High frequencies become distorted
Dynamic compression: Each reflection is compressed

All of these can be expressed via the second- and third-order terms of the Volterra series. A simple downstream distortion (waveshaper) is insufficient — nonlinearity must be embedded within the convolution process itself.

9.2 Definition of the Volterra Series

Linear system: y(t) = ∫ h₁(τ) x(t-τ) dτ

Volterra series (extended to nonlinear):

y(t) = ∫ h₁(τ) x(t-τ) dτ

+ ∬ h₂(τ₁,τ₂) x(t-τ₁) x(t-τ₂) dτ₁dτ₂

+ ...

h₁: Standard IR (first-order term)
h₂: Nonlinear interaction (second-order term) → harmonics, IMD
h₃ and higher: Higher-order distortion

9.3 Integration into FDN (Linear + Nonlinear)

Nonlinear FDN (Volterra Type)

x_i(n+1) = Σ_j A_ij x_j(n) + Σ_{j,k} B_ijk x_j(n) x_k(n)

Practical Simplification (Lightweight Quadratic Approximation)

// C++ implementation example

y[i] += alpha * x[i] * x[i]; // Second-order Volterra term

◆ α should be a small value on the order of 0.001–0.01. Clipping prevention (e.g., tanh) is essential.

9.4 Learnable Volterra in PyTorch

class NonlinearFDN(nn.Module):

def __init__(self, N=8):

super().__init__()

self.S = nn.Parameter(torch.randn(N, N))

self.alpha = nn.Parameter(torch.tensor(0.01))

def forward(self, x):

A = torch.matrix_exp(self.S - self.S.T)

y_lin = x @ A

y_nl = self.alpha * (x ** 2)

return y_lin + y_nl

9.5 Dynamic Volterra (Time-Varying Nonlinearity)

Rather than fixing the nonlinear coefficient α, it can be made time- or state-dependent.

Design Patterns for α(t)

Envelope-dependent: α(t) = f(|x(t)|) = 0.01 · tanh(|x(t)|)
Time-dependent (reverb progression): α(t) = α₀ · exp(-t/τ) (stronger in early stage / weaker in late stage)
Random modulation: α(t) = α₀ + ε(t) (metallic tone avoidance)

The essence: By making the nonlinearity a "state-dependent system," a more natural spatial response is simulated.

9.6 Neural Volterra (Learned Nonlinear Kernel)

The dimensionality explosion of the Volterra kernel h₂(τ₁, τ₂) is approximated using a neural network.

class NeuralVolterra(nn.Module):

def __init__(self):

super().__init__()

self.net = nn.Sequential(

nn.Linear(128, 64),

nn.ReLU(),

nn.Linear(64, 1)

)

def forward(self, x):

return self.net(x) # x is a time-series frame

9.7 FDN + Neural Volterra Integrated Model

class HybridReverb(nn.Module):

def __init__(self):

super().__init__()

self.fdn = FDN()

self.nl = NeuralVolterra()

def forward(self, x):

y_lin = self.fdn(x)

y_nl = self.nl(frame_signal(x))

return y_lin + y_nl.squeeze()

9.8 Ensuring Stability

Lipschitz constraint: Guarantee ‖N(x)‖ ≤ L‖x‖
Spectral normalization: torch.nn.utils.spectral_norm(layer)
Output clamping: y = torch.tanh(y) (also applies on the C++ side)
Denormal mitigation: To prevent CPU runaway from extremely small values, use noise addition or flush-to-zero

9.9 Recommended Implementation Pipeline

The following processing pipeline is the recommended configuration for a high-quality reverb with nonlinear components:

Input → Early Reflections (physical / geometric model)
→ FDN (linear; the primary component)
→ Dynamic Volterra (lightweight; adds naturalness)
→ Neural Volterra (correction; nonlinear IR approximation)
→ Output

Implementation Notes

The Volterra terms are small; the FDN is the primary component
The NN plays a supplementary role
Nonlinearity can be placed both inside the feedback loop (most important) and at the output stage

Chapter 10: Unified Theory and Future Directions

10.1 Hierarchical Structure of Reverb Design

Level	Perspective / Approach
Level 1: Physical	Wave equation: ∂²p/∂t² = c² ∇²p
Level 2: Structural	FDN / Waveguide Network (low-dimensional wave approximation)
Level 3: Statistical	Mode distribution, diffusion, energy distribution
Level 4: Perceptual	JND, masking, ERB filters, perceptual loss
Level 5: Learning	Differentiable FDN, PINN, Neural Volterra
Level 6: Information	Entropy maximization, mutual information minimization

10.2 Reinterpreting Commercial Reverbs Through Unified Theory

Commercial units are the result of optimizing — through human ears — the following implicit objective function:

J(θ) = d_spec + d_time + d_info + d_mask

d_spec: Spectral naturalness (suppression of comb peaks, high-frequency decay, low-frequency extension)
d_time: Temporal naturalness (monotonically increasing density, randomness)
d_info: Information diffusion (decay of input dependency, decorrelation)
d_mask: Masking adaptation (concealing unpleasant components)

The Essence of Each Commercial Unit

EMT 250: Imperfection as the perceptual optimum. Warm character in low dimensionality
Lexicon 480L: The pinnacle of perceptual hacking. Multiband + randomization + Loop Tank
Bricasti M7: Physical approximation (Waveguide) creates a tangible sense of "the room exists"
Modern plugins (DiffFDN, etc.): Learned integration of all three approaches

10.3 The Final Definition of a "Good Reverb"

Synthesizing all perspectives, a good reverb is one that satisfies the following conditions:

Eigenvalues are uniformly distributed on the unit circle (no modal bias)
Energy diffuses smoothly over time (approximating exponential decay)
Each frequency band has an appropriate decay rate (low > high)
Temporal fluctuation is present (modulation = controlled randomness)
Perceptually optimized (errors below JND are ignored)
Computationally feasible (meets real-time constraints)

In one phrase:

"A dynamic system in which the energy, information, and perception of sound undergo controlled diffusion over time"

10.4 Future Research Frontiers

Perceptual entropy-maximizing reverb: Optimization using entropy directly as the objective function
Real-time adaptive reverb: PINN that learns and adapts during a live session
Full wave + neural integration: Dynamic spatial rendering for VR/AR
Automatic generation of optimal delay lengths (quantitative design based on eigenvalue distribution)
Closed-form methods for direct eigenvalue design in FDN
Measurement and reproduction of nonlinear IRs (real-world Volterra kernel capture)

10.5 Key References

Schlecht, S.J., Habets, E.A.P. "On lossless feedback delay networks." IEEE Trans. Signal Processing, 2016.
Rocchesso, D., Smith, J.O. "Circulant and elliptic feedback delay networks for artificial reverberation." IEEE Trans. Speech & Audio Processing, 1997.
Schlecht, S.J., Habets, E.A.P. "Time-varying feedback matrices in FDN..." JASA, 2015.
Schlecht, S.J., Habets, E.A.P. "Scattering in feedback delay networks." IEEE/ACM Trans. Audio, 2020.
Dattorro, J. "Effect design, part 1: Reverberator and other filters." JAES, 1997.
Mezzadri, F. "How to generate random matrices from the classical compact groups." arXiv:math-ph/0609050, 2006.
Gerzon, M. "Synthetic Stereo Reverberation (Part 1 & 2)." Studio Sound, 1971-1972.
Smith, J.O. Physical Audio Signal Processing - Feedback Delay Networks (FDN). CCRMA, 2010.
Getting Started With Reverb Design, Part 2: The Foundations - Valhalla DSP.

Q:	What's the difference between a duck and an elephant?
A:	You can't get down off an elephant.

It is so very hard to be an
on-your-own-take-care-of-yourself-because-there-is-no-one-else-to-do-it-for-you
grown-up.

[ Main Page ]