[ Main Page ]

GPT-5.3 Instant / Sonnet 4.6 Ext

Reverb Algorithm

Technical Reference Document

Integrated Overview of Theory, Implementation, Perception, and Modern Approaches

Freeverb3 / FDN / Dattorro / Lexicon / Convolution Reverb / Neural FDN

Chapter 1: The Nature of Reverb — An Overview

1.1 What Is Reverb?

When listening to music in a large concert hall, the sound continues to resonate for some time after the instrument stops playing. Even during the performance, the sound resonates in a way that adds richness and luster to the music — this is why performing in a hall is preferred. Suntory Hall in Akasaka, Tokyo, is one of the most celebrated examples of a hall prized for its beautiful acoustics.

Reverb is not merely a "residual sound" — it is the spatial information itself, determining the depth, sense of distance, and texture of the sound. Acoustically, two temporal domains coexist simultaneously:

1.2 The Two Major Methods of Reverb Generation

Using a real concert hall is impractical due to time and cost constraints, and in CD mastering and similar applications, software is used to artificially recreate the acoustic properties of a hall. Freeverb3 is a well-known example. There are two primary methods for generating such reverberation:

Method

Overview / Characteristics

FIR (Convolution Reverb)

Records and uses the actual impulse response of a real space. Highly realistic but less flexible and computationally expensive. "Technology for copying a real hall."

IIR (Algorithmic Reverb)

Artificially generates reverb using mathematical models. Lightweight, adjustable, and creatively flexible, but complex to design. "Technology for creating the feeling of a hall." Used in nearly all commercial hardware units such as Lexicon.

1.3 The Nature of Depth Perception in Sound

The front-to-back perception (depth) of a sound is primarily determined by three factors:

In other words, depth perception is reverb design itself.

1.4 Human Perceptual Characteristics and Reverb Design

Humans do not perceive individual reflections; rather, they perceive patterns of energy distribution and temporal change. What is needed in a mix is not precise reflection placement but density, decay, and frequency balance.

FIR (IR) reverb can be difficult to use in certain contexts for the following reasons:

By contrast, algorithmic reverb functions as a "perceptually optimized fiction" and proves advantageous in contexts where musical pleasantness is the priority — pop music, film scores, and game audio.

Chapter 2: The Lineage of Algorithms

2.1 Schroeder Reverb (Classic / Foundational Form)

The oldest and most widely implemented form — including in free software — is the reverb based on a combination of comb filters and allpass filters. A detailed explanation is provided on the ARI-WEB website. The precursor to Freeverb3, known as Freeverb, is based on this algorithm, which is referred to as "Schroeder Reverb."

Structure

Characteristics

The fundamental issue is modal unevenness — i.e., insufficient randomness. The comb filter produces equally spaced frequency peaks (analogous to a standing-wave structure), which is the root cause of the characteristic metallic "ping" sound.

As a result, it is rarely used in applications demanding high quality. However, its metallic character is sometimes exploited as a substitute for a plate reverb (a real unit that inputs sound into a metal plate and outputs the vibration). Freeverb generates its filters using empirically derived values and is considered to have relatively less metallic character, though its limitations remain.

2.2 Improved Schroeder Variants (Implementations in Freeverb3_VST)

Implementation

Characteristics

Freeverb

Improved Schroeder type. Lightweight and simple. Metallic character relatively suppressed through empirical parameters.

CCRMA NRev

A more refined version of the same algorithm as Freeverb.

NVerb (v2)

Uses nested allpass filters plus input feedback to minimize ringing as much as possible. Improved low-frequency resonance.

Nested Allpass Filter Structure

In a standard (series) configuration: input → AP → AP → AP. In a nested structure, another reverb algorithm is inserted inside an allpass filter. This results in:

It is not widely known that nested allpass filters outperform series allpass filters in quality, but Schroeder himself referenced this algorithm long ago.

◆ Adding modulation to the feedback section of a comb filter (making it a chorus) tends to suppress ringing. The Lexicon Concert Hall algorithm is believed to be based on "Chorus comb filter + allpass filter + nested allpass filter."

2.3 FDN (Feedback Delay Networks)

One of the methods developed to address the limitations of Schroeder reverb. The algorithm consists of several delay lines of varying lengths along with a matrix for mixing and outputting the signal. In Freeverb3, this is implemented as Hibiki Reverb.

Basic Model (Mathematical Formulation)

An FDN can be described as a multi-input, multi-output recursive system as follows:

x(n) = A · x(n-D) + B · u(n)

y(n) = C · x(n)

Stability

The condition for system stability is: spectral radius ρ(A) < 1. However, because FDN includes delays, a more precise condition is: "stable when A is unitary (or orthogonal) and attenuation is applied."

Eigenvalues and Sound

In other words, an FDN is a "collection of many decaying oscillation modes." If the eigenvalues are uneven, specific frequencies are emphasized (ringing); if the delay lengths form simple ratios, modes overlap. This is the same structural issue that causes the Schroeder reverb to sound metallic.

Eight or More Delay Lines and the Hadamard Matrix

Using eight or more delay lines together with a Hadamard matrix makes it relatively straightforward to generate high-quality reverberation. Many implementations address frequency phase variations present in real reverb by dynamically varying the matrix in real time. This approach also accommodates algorithmic generation of early reflections and is widely used.

2.4 Loop Tank Reverb

Vintage digital reverbs such as the Lexicon 224 and EMT 250 are highly regarded for their "warm" and beautiful sound — much of which stems from their loop-based (tank) algorithm using allpass filters. By constructing a large loop, a reverb can be created that, like FDN, increases in density over time.

EMT 250 Structure

While the algorithm itself is simple, finding the optimal delay lengths and gains is challenging.

Lexicon 224 Structure (Dattorro Algorithm)

The Lexicon 224 was reverse-engineered and implemented as the ESP reverb, and was published in an academic paper. The ESP manual is mirrored on the Freeverb3 website.

The feedback loop is relatively short and is classified as a Plate Reverb, but a suitable character can be achieved by adjusting the coefficients. In Freeverb3_VST, it is implemented as STRev, and separately as ProG Reverb (Progenitor Reverb) based on a different loop structure optimized for hall use.

◆ The main feedback loop of Progenitor Reverb is based on the published Dattorro algorithm, but the input diffuser and output taps are original. The feedback coefficient inside the loop and diffuser is approximately 0.5 as a baseline, with experimentation suggesting an optimal range of 0.4–0.8.

Dattorro Algorithm Block Structure

The overall structure follows this flow: Input → Pre-delay → Multiple Allpass (diffuser) → Main Loop (2-channel cross-feedback) → Tap output (multiple points) → Damping.

The core is the cross-feedback. The left loop and right loop each feed back into themselves while also feeding back into each other (cross). This causes information to mix, causing density to increase explosively.

Component

Perceptual Role

Allpass diffuser (immediately after input)

Disperses transients, eliminates early echoes — "diffusion at the moment of entering the space"

Allpass in the loop

Additional diffusion. Optimal feedback coefficient: 0.4–0.8

Damping (LPF)

Naturally attenuates high frequencies. Without it, the sound is "digitally sterile"

Modulation

Breaks fixed modes over time. Removes metallic quality; gives a "living" sense of animation

Tap outputs (multiple points)

Output from different temporal positions and phases. Generates stereo image and spatial width

The essential difference from FDN: FDN mixes all signals at once via a matrix, whereas Dattorro mixes structurally (gradually and temporally). This makes it more "musically" controllable.

2.5 Frequency Division (Multiband Processing)

More recent Lexicon reverbs use frequency division to apply the most appropriate algorithm to each band when generating reverberation.

The Bass XOV parameter is a mechanism for dividing and processing low-frequency content to extend the low-band decay time, among other functions. Lexicon adopted this division algorithm quite early on, resulting in a reverb with powerful and warm low-frequency response. This approach is also physically justified, as low frequencies have low directivity in real environments while high frequencies have high directivity.

Perceptual Significance of Frequency Dependency

From an information-theoretic perspective: the low band achieves entropy increase (richness) early on, while the high band rapidly achieves a reduction in mutual information (diffusion).

2.6 Evolutionary Lineage of Algorithms (Summary)

Generation

Representative

Core Structure

Sonic Character

1st Gen

EMT 250 (1976)

Short loop + Allpass

Imperfect but warm, granular quality

2nd Gen

Lexicon 224 / 480L

Perceptual optimization + multiband + randomization

Musical, magical tail

3rd Gen

Bricasti M7

Waveguide-like physical approximation

Reproduces real spatial feel

4th Gen

DiffFDN / PINN

Learning-based integration

Unified perception, physics, and information

Chapter 3: Convolution and IR Reverb Technology

3.1 Accurate Definition of "Convolution"

The precise meaning of convolution is the mathematical operation commonly written as follows. The term "folding" as a rendering is sometimes seen in non-technical contexts, but the correct technical term is convolution.

Mathematical definition:

y(n) = Σ_k x(k) · h(n-k)

Acoustic meaning: IR = the "complete response" of a space (or device); convolution = "computing the result of playing the input signal in that space."

3.2 IR and Deconvolution (Inverse Filter)

Converting between microphone types (e.g., SM57 → U87) requires deconvolution (an inverse filter).

H_IR(ω) = Y(ω) / X(ω)

A practical issue is noise amplification due to "division by near-zero" at frequencies where the denominator approaches zero. Regularization and band limiting are used as countermeasures.

3.3 IR Measurement: Using Sweep Signals

Ideally, a delta function input would be used, but in practice, the signal-to-noise ratio is too poor. The modern standard method uses a sine sweep or TSP (Time-Stretched Pulse).

Procedure

Deconvolution: h(n) = y(n) * x^-1(n)

Why Sweeps Are Superior

3.4 Real-Time Convolution and FFT Acceleration

Computational Complexity

FFT convolution formula: Y = IFFT(FFT(X) · FFT(H))

Latency Problem

Because FFT operates on blocks, a delay equal to the block size is introduced (e.g., 1024 samples ≈ 20 ms).

Solution: Hybrid Method

This achieves both low latency and fast processing simultaneously.

3.5 Fundamental Limitations of IR (LTI Systems Only)

IR can only represent LTI (Linear Time-Invariant) systems.

Can Be Reproduced

Cannot Be Reproduced

Frequency characteristics, reverb, phase response

Even-order harmonic distortion in tube amplifiers (nonlinear distortion)

General linear spatial response

Nonlinear compression effects of speakers

FM-type modulation (time-dependent frequency variation)

IR can reproduce the "shape" of a sound but not its "behavior." Additionally, while the characteristics of recording equipment (speakers, microphones) can be partially removed through deconvolution, complete removal is impossible due to constraints of SNR, dynamic range, and distortion.

3.6 Software History and Major Products

Representative convolution reverb software of the era:

Software

Characteristics

SIR (Windows-only, freeware)

Initially high latency; progressively improved across versions

WAVES IR-1 / IR-L

Industry-standard commercial products

Altiverb

Supports stereo I/O using 4-channel IR. High spatial accuracy and high-quality bundled library. However, requires iLok and is expensive.

Voxengo Pristine Space

High-value commercial product

WizooVerb W2

Commercial product

◆ Altiverb's 4-channel IR enables superior spatial accuracy for stereo I/O. A procedure for converting Altiverb's proprietary format (little-endian 24-bit PCM + gain correction data in the resource fork) to WAV was shared publicly. Free IR distribution sites such as Noisevault, Voxengo, and memi.com were also widely used.

3.7 Creative Applications of IR

IR is not limited to "recording spaces" — it has a wide range of creative applications:

Chapter 4: FDN Implementation (C++)

4.1 Basic FDN Implementation

Below is a basic C++ implementation of an FDN. The template parameter length specifies the dimension of the FDN. The range of feedback is [-1.0, 1.0]; values outside this range will cause the system to diverge.

template<typename Sample, size_t length>

struct FeedbackDelayNetwork {

size_t bufIndex = 0;

std::array<std::array<Sample, length>, 2> buf{};

std::array<std::array<Sample, length>, length> matrix{};

std::array<Delay<Sample>, length> delay;

std::array<RateLimiter<Sample>, length> delayTimeSample;

Sample process(Sample input, Sample feedback) {

bufIndex ^= 1;

auto &front = buf[bufIndex];

auto &back = buf[bufIndex ^ 1];

front.fill(0);

for (size_t i = 0; i < length; ++i)

for (size_t j = 0; j < length; ++j)

front[i] += matrix[i][j] * back[j];

input /= Sample(length);

for (size_t idx = 0; idx < length; ++idx) {

auto &&sig = input + feedback * front[idx];

front[idx] = delay[idx].process(

sig, delayTimeSample[idx].process());

}

return std::accumulate(front.begin(), front.end(), Sample(0));

}

};

◆ Reset-related methods are omitted. When the template parameter length is large, rewriting with std::vector is recommended (approximately dim=200 is the practical upper limit).

Structure of the Processing Loop

The argument feedback is a scalar coefficient that uniformly scales the feedback matrix values. Note that because Delay performs linear interpolation, the output gradually attenuates if the delay time is non-integer.

4.2 The Nature of Stability

Stability Condition

Condition for FDN not to diverge: ρ(g·A) ≤ 1, where ρ denotes the spectral radius.

Meaning of Orthogonal Matrices

For an orthogonal matrix A, A^T · A = I holds. This ensures |λ_i| = 1 and thus energy is preserved:

‖y‖² = ‖Ay‖²

That is, an orthogonal matrix + |g| < 1 guarantees stability. Attenuation is controlled solely by g.

◆ According to Schlecht and Habets' "On lossless feedback delay networks," making the feedback matrix unitary or triangular prevents an FDN from diverging.

4.3 Designing Delay Lengths

Key Conditions

Bad and Good Examples

Automatic Generation Algorithms

The optimal solution cannot be determined analytically; it is a hybrid of search and heuristics.

Chapter 5: Feedback Matrix Design

5.1 Matrix Types and Acoustic Properties

According to research by Schlecht and Habets, the main types of feedback matrices that prevent FDN from diverging are unitary matrices (possibly including complex numbers) and triangular matrices. For the implementations discussed here, real orthogonal matrices are sufficient.

Matrix Type

Characteristics / Sonic Tendency

Random orthogonal matrix

High diffusion, uniform energy distribution. Smooth with minimal metallic character. Equivalent to scipy.stats.ortho_group.rvs()

Special orthogonal matrix

Determinant = 1. Difference in sound vs. random orthogonal is difficult to perceive

Householder matrix

Form: H = I - 2(vv^T)/(v^Tv). Constructable from dim random numbers. Somewhat weaker diffusion with more distinct character

Hadamard matrix

Only ±1 values. Perfectly orthogonal. Fast (additions only). Similar sound to random orthogonal but CPU-efficient

Circulant matrix

Shift structure with periodicity. "Pipe-like" sound. Diagonalizable via DFT

Triangular matrix

Strong early reflections and metallic character. Short delays are prominent

Conference matrix

Diagonal elements are 0. Very natural; minimizes self-loop (metallic sources). Excellent sound quality

Absorbent allpass matrix

From Schlecht-Habets Eq. (10). May be less efficient than not using FDN in some cases

Guidelines for Matrix Selection

5.2 Randomized Orthogonal Matrix (C++ Implementation)

Implementation based on Mezzadri's "How to generate random matrices from the classical compact groups" — a C++ translation of scipy.stats.ortho_group.rvs().

template<size_t dim>

void randomOrthogonal(unsigned seed,

std::array<std::array<Sample, dim>, dim> &H) {

pcg64 rng{}; rng.seed(seed);

std::normal_distribution<Sample> dist{};

H.fill({});

for (size_t i = 0; i < dim; ++i) H[i][i] = Sample(1);

std::array<Sample, dim> x;

for (size_t n = 0; n < dim; ++n) {

auto xRange = dim - n;

for (size_t i = 0; i < xRange; ++i) x[i] = dist(rng);

Sample norm2 = 0;

for (size_t i = 0; i < xRange; ++i) norm2 += x[i] * x[i];

Sample x0 = x[0];

Sample D = x0 >= 0 ? Sample(1) : Sample(-1);

x[0] += D * std::sqrt(norm2);

Sample denom = std::sqrt(

(norm2 - x0*x0 + x[0]*x[0]) / Sample(2));

for (size_t i = 0; i < xRange; ++i) x[i] /= denom;

for (size_t row = 0; row < dim; ++row) {

Sample dotH = 0;

for (size_t col = 0; col < xRange; ++col)

dotH += H[col][row] * x[col];

for (size_t col = 0; col < xRange; ++col)

H[col][row] = D * (H[col][row] - dotH * x[col]);

}

}

}

◆ PCG is used as the random number generator because it is adopted in NumPy's default_rng(). It is preferred over std::minstd_rand due to its superior stability.

5.3 Hadamard Matrix (Sylvester's Construction)

Initialized with 1/sqrt(dim) and recursively constructed by tiling. dim must be a power of 2.

template<size_t dim>

void constructHadamardSylvester(

std::array<std::array<Sample, dim>, dim> &mat) {

static_assert(dim && ((dim & (dim-1)) == 0), ...);

mat[0][0] = Sample(1) / std::sqrt(Sample(dim));

size_t start = 1; size_t end = 2;

while (start < dim) {

for (size_t row = start; row < end; ++row)

for (size_t col = start; col < end; ++col) {

auto &&value = mat[row-start][col-start];

mat[row-start][col] = value; // Upper right

mat[row][col-start] = value; // Lower left

mat[row][col] = -value; // Lower right

}

start *= 2; end *= 2;

}

}

5.4 Householder Matrix

Expressed in the form H = I - 2(vv^T)/(v^Tv). An orthogonal matrix can be constructed from dim random numbers, and the result is symmetric (H = H^T).

template<size_t dim>

void randomHouseholder(unsigned seed,

std::array<std::array<Sample, dim>, dim> &matrix) {

// ... (initialize vec with uniform random values)

auto scale = Sample(-2) / denom;

for (size_t i = 0; i < dim; ++i) {

matrix[i][i] = Sample(1) + scale * vec[i] * vec[i];

for (size_t j = i+1; j < dim; ++j) {

auto value = scale * vec[i] * vec[j];

matrix[i][j] = value; matrix[j][i] = value;

}

}

}

5.5 Conference Matrix

The size of a Conference matrix must satisfy the condition: "q+1, where q is an even number expressible as the sum of two squares (equivalent to OEIS A286636)." Candidate sizes from OEIS sequence A000952: 62, 54, 50, 46, 42, 38, 30, 26, 18, 14, 10, 6, 2.

Because the diagonal elements are 0, there are no simple comb filter sections, resulting in superior sound quality.

Construction Procedure

Final matrix: C[0][0]=0, C[0][i]=C[i][0]=1/sqrt(modulo), C[i][j]=S[i][j]

5.6 Absorbent Allpass Matrix

A matrix introduced in Schlecht-Habets' "Time-varying feedback matrices in FDN." It represents a nested allpass filter structure. Converges when α is in the range (-1, 1).

template<size_t dim>

void randomAbsorbent(unsigned seed, Sample low, Sample high,

std::array<std::array<Sample, dim>, dim> &mat) {

// dim must be even

constexpr size_t half = dim / 2;

// Generate orthogonal matrix A of size half×half

randomOrthogonal(seeder(rng), A);

for (size_t col = 0; col < half; ++col) {

auto gain = dist(rng);

mat[half+col][half+col] = gain; // Bottom-right

mat[half+col][col] = Sample(1) - gain*gain; // Bottom-left

for (size_t row = 0; row < half; ++row) {

mat[row][half+col] = A[row][col]; // Top-right

mat[row][col] = -A[row][col]*gain; // Top-left

}

}

}

5.7 Generating Near-Identity Random Special Orthogonal Matrices

By modifying the random number generation section of randomSpecialOrthogonal as follows, the proximity to the identity matrix can be adjusted:

x[0] = Sample(1);

for (size_t i = 1; i < xRange; ++i)

x[i] = identityAmount * dist(rng);

Acoustically, lowering identityAmount tends to strengthen early reflections while weakening the late reverb tail.

5.8 Direct Eigenvalue Design

Normally, eigenvalues are determined by the feedback matrix A. However, "direct eigenvalue design" — specifying eigenvalues first and then constructing the matrix — is also possible.

Design Procedure

Matrix reconstruction: A = V Λ V^-1 (Λ: designed eigenvalues, V: arbitrary orthogonal basis)

Practical Solutions

The essence of eigenvalue design is the "design of the decay spectrum."

Chapter 6: Perceptual Models and Reverb Design

6.1 Perceptual Conditions for a "Good Reverb"

Good Reverb

Bad Reverb

No echo sensation (density increases over time)

Metallic

Low frequencies extend naturally

Periodic (ping-pong feel)

High frequencies are not harsh

Muddy, or loses definition

Does not degrade stereo imaging

Specific band is prominent

Energy distribution becomes uniform over time

Ringing is clearly audible

6.2 JND (Just Noticeable Difference)

Reference values for the minimum perceptible difference (JND):

Knowing the JND leads to the design principle that "unnecessarily precise design is unnecessary." Optimization at the 1 ms level or frequency resolution design below the hearing limit carries no practical benefit.

6.3 Auditory Masking

Frequency Masking

A loud sound masks weaker sounds in adjacent frequency bands. The "granular quality" of the EMT 250 creates masking and can sound more pleasant than complete diffusion — a paradox in which "imperfection is the perceptual optimum."

Temporal Masking

An effect that makes sounds immediately following a loud sound inaudible. Early reflections are masked by the direct sound, so complete physical accuracy is not required.

Design Principles

6.4 Information-Theoretic Perspective

Treating the sound field as an "information source" rather than a "wave":

Goal of a good reverb: Entropy increases smoothly over time.

Mutual Information Perspective

Information-theoretic interpretation of metallic sound: A phenomenon in which information fails to diffuse and structure remains perceptible.

6.5 Three-Layer Structure of Reverb Design

① Early Reflections

② Diffusion

③ Late Reverb

Chapter 7: Perceptual Optimization in Commercial Reverbs

7.1 Lexicon 480L Random Hall

The perceptual "magic" of the Lexicon 480L lies in the combination of Bass XOV (multiband processing), controlled modulation, and the Loop Tank structure.

CHO Value (Modulation)

In Random Hall, internal modulation (Decay Optimization) corresponds to CHO.

Optimal Perceptual Settings for Bass XOV

◆ The manual states that "BASS MULT maximum, RT HF CUT medium, HF CUTOFF low" is the ideal concert hall setting.

Mathematical Interpretation of Why Lexicon Sounds "Warm"

Bass XOV is a mechanism that varies the optimal information diffusion rate by frequency band, based on the following physical and perceptual facts:

7.2 Mathematical Reproduction of M7 V2 Modulation

According to the official manual, V2 modulation is pitch variation in the late reverb tail — different from chorus or flange.

Each delay tap time D_i(t) is independently modulated:

D_i(t) = D_i + Δ · m(t, level)

Level-Dependent Modulation Function m(t)

// C++ implementation (simplified)

Sample sine = std::sin(2.0f * M_PI * modFreqBase * phase[i]);

Sample noise = 0;

for (int k = 0; k < 3; ++k)

noise += std::sin(2.0f * M_PI * (modFreqBase*(k+1)) * np[i]) / (k+1);

noise /= 3.0f;

// Low: sine-dominant; High: noise-dominant; Mid: blend

7.3 Perceptual Optimization of the EMT 250

The world's first commercial digital reverb (1976). The originator of the Loop Tank concept.

7.4 Speculative Algorithm Analysis of the Bricasti M7 (Waveguide-Like)

The internal algorithm of the Bricasti M7 is entirely proprietary, but its acoustic characteristics strongly suggest a structure inspired by a scattering-type Waveguide Network.

Waveguide-Like Characteristics

Contrast with FDN

7.5 Modern Plugin Comparison

Characteristic

Lexicon 480L

Valhalla VintageVerb

Bricasti M7

3D depth

Magical (perceptually optimized)

Beautiful spatial spread

Overwhelming realism

Warmth / low end

Rich via multiband

Sufficiently warm

Natural but full-bodied

Musicality

Supreme (emotionally evocative)

High (Lexicon-like)

Somewhat restrained

Metallic quality

Good via randomization

Excellent

Nearly zero

Mix compatibility

Prominent, hard to lose

Forward, pop-oriented

Blends naturally

Character

Vintage magic

Modern lush

Modern realistic

◆ The Lexicon 480L embodies "perceptual correctness over physical correctness." It anticipated in the 1980s — through human ears — the perceptual loss optimization that DiffFDN and PINN now pursue computationally.

Chapter 8: Modern Approaches — Neural / Physical Hybrids

8.1 DiffFDN: Optimizing for Perceptual Objective Functions

Traditional FDN design has mathematically pursued stability and diffusion, but it is now possible to directly optimize for "how humans perceive the output."

Objective Function

J(θ) = Σ_t w_t · d(P(y_θ(t)), P(y_target(t)))

Mel-Spectrogram-Based Perceptual Loss (Implementation Example)

def perceptual_loss(y, y_target):

Y = loudness(mel(y))

Yt = loudness(mel(y_target))

return torch.mean(torch.abs(Y - Yt))

8.2 Neural FDN (Constrained Learning)

Learning the FDN itself with a neural network. The orthogonality constraint is preserved through parameterization.

class FDN(nn.Module):

def __init__(self, N=8):

super().__init__()

self.S = nn.Parameter(torch.randn(N, N))

self.delay = nn.Parameter(torch.rand(N) * 2000)

def orthogonal_matrix(self):

S = self.S - self.S.T # Antisymmetric matrix

return torch.matrix_exp(S) # exp(S) is always orthogonal

By taking the matrix exponential of antisymmetric matrix S, A = exp(S) is always orthogonal, guaranteeing stability.

8.3 Composite Loss Function

def total_loss(x, y, y_target):

lp = perceptual_loss(y, y_target)

le = -entropy_loss(y) # Maximize entropy

lmi = mutual_info_loss(x, y) # Minimize mutual information

ld = decorrelation_loss(y) # Decorrelation

return lp + 0.1*le + 0.1*lmi + 0.1*ld

8.4 Neural + Physical Hybrid Model (PINN)

Decomposition

Application of Physics-Informed Neural Networks

The wave equation ∂²p/∂t² = c² ∇²p is incorporated into the loss function:

Loss = DataLoss + PhysicsLoss

PhysicsLoss = ‖∂²p/∂t² - c² ∇²p‖²

Advantages of the Hybrid Approach

8.5 Neural IR (Dynamic IR)

Traditional IR: y(n) = x(n) * h(n) (h is fixed). Neural IR: y(n) = Σ_k x(k) · h(n, k) (h is time-dependent).

8.6 LiquidSonics Seventh Heaven vs. M7

Seventh Heaven reproduces the M7 using its proprietary Fusion-IR technology (modulated capture of multiple IRs). It is fundamentally different from static IRs (e.g., Samplicity M7 IR).

8.7 Real-Time Control of Eigenvalue Distribution

A model that dynamically controls eigenvalues:

λ_k(t) = r_k(t) · exp(jθ_k(t))

Implementation Methods

Effects and Cautions

◆ This is an important idea that shifts reverb from a "static structure" to a "dynamic field."

Chapter 9: Volterra Nonlinear Extensions

9.1 Why Nonlinearity Is Needed

IR represents only the "first-order term" of the Volterra series. Real acoustic environments contain the following nonlinearities:

All of these can be expressed via the second- and third-order terms of the Volterra series. A simple downstream distortion (waveshaper) is insufficient — nonlinearity must be embedded within the convolution process itself.

9.2 Definition of the Volterra Series

Linear system: y(t) = ∫ h₁(τ) x(t-τ) dτ

Volterra series (extended to nonlinear):

y(t) = ∫ h₁(τ) x(t-τ) dτ

+ ∬ h₂(τ₁,τ₂) x(t-τ₁) x(t-τ₂) dτ₁dτ₂

+ ...

9.3 Integration into FDN (Linear + Nonlinear)

Nonlinear FDN (Volterra Type)

x_i(n+1) = Σ_j A_ij x_j(n) + Σ_{j,k} B_ijk x_j(n) x_k(n)

Practical Simplification (Lightweight Quadratic Approximation)

// C++ implementation example

y[i] += alpha * x[i] * x[i]; // Second-order Volterra term

◆ α should be a small value on the order of 0.001–0.01. Clipping prevention (e.g., tanh) is essential.

9.4 Learnable Volterra in PyTorch

class NonlinearFDN(nn.Module):

def __init__(self, N=8):

super().__init__()

self.S = nn.Parameter(torch.randn(N, N))

self.alpha = nn.Parameter(torch.tensor(0.01))

def forward(self, x):

A = torch.matrix_exp(self.S - self.S.T)

y_lin = x @ A

y_nl = self.alpha * (x ** 2)

return y_lin + y_nl

9.5 Dynamic Volterra (Time-Varying Nonlinearity)

Rather than fixing the nonlinear coefficient α, it can be made time- or state-dependent.

Design Patterns for α(t)

The essence: By making the nonlinearity a "state-dependent system," a more natural spatial response is simulated.

9.6 Neural Volterra (Learned Nonlinear Kernel)

The dimensionality explosion of the Volterra kernel h₂(τ₁, τ₂) is approximated using a neural network.

class NeuralVolterra(nn.Module):

def __init__(self):

super().__init__()

self.net = nn.Sequential(

nn.Linear(128, 64),

nn.ReLU(),

nn.Linear(64, 1)

)

def forward(self, x):

return self.net(x) # x is a time-series frame

9.7 FDN + Neural Volterra Integrated Model

class HybridReverb(nn.Module):

def __init__(self):

super().__init__()

self.fdn = FDN()

self.nl = NeuralVolterra()

def forward(self, x):

y_lin = self.fdn(x)

y_nl = self.nl(frame_signal(x))

return y_lin + y_nl.squeeze()

9.8 Ensuring Stability

9.9 Recommended Implementation Pipeline

The following processing pipeline is the recommended configuration for a high-quality reverb with nonlinear components:

Implementation Notes

Chapter 10: Unified Theory and Future Directions

10.1 Hierarchical Structure of Reverb Design

Level

Perspective / Approach

Level 1: Physical

Wave equation: ∂²p/∂t² = c² ∇²p

Level 2: Structural

FDN / Waveguide Network (low-dimensional wave approximation)

Level 3: Statistical

Mode distribution, diffusion, energy distribution

Level 4: Perceptual

JND, masking, ERB filters, perceptual loss

Level 5: Learning

Differentiable FDN, PINN, Neural Volterra

Level 6: Information

Entropy maximization, mutual information minimization

10.2 Reinterpreting Commercial Reverbs Through Unified Theory

Commercial units are the result of optimizing — through human ears — the following implicit objective function:

J(θ) = d_spec + d_time + d_info + d_mask

The Essence of Each Commercial Unit

10.3 The Final Definition of a "Good Reverb"

Synthesizing all perspectives, a good reverb is one that satisfies the following conditions:

In one phrase:

"A dynamic system in which the energy, information, and perception of sound undergo controlled diffusion over time"

10.4 Future Research Frontiers

10.5 Key References

Q:	What's the difference between a duck and an elephant?
A:	You can't get down off an elephant.

It is so very hard to be an
on-your-own-take-care-of-yourself-because-there-is-no-one-else-to-do-it-for-you
grown-up.


Powered by UNIX fortune(6)
[ Main Page ]