This article provides a comprehensive explanation of kernel-based regularized system identification — a modern nonparametric approach that estimates the impulse response of linear systems by combining regularization theory with machine learning concepts. It covers the fundamental idea of regularized impulse response estimation, key kernel designs (stable spline, tuned-correlated, diagonal-correlated), hyperparameter tuning, and practical MATLAB implementation. Related articles, research papers, and MATLAB links are placed at the bottom.
Author: Hiroshi Okajima, Associate Professor, Kumamoto University, Japan — 20 years of control engineering research
For the overview of system identification methods, see the hub article: System Identification: From Data to Dynamical Models — A Comprehensive Guide
- Why Kernel-Based Identification Matters
- Problem Setting
- Regularized Impulse Response Estimation
- Kernel Design: Encoding Prior Knowledge
- Hyperparameter Estimation
- Comparison with Parametric Methods
- MATLAB Implementation
- Connection to the Author's Research
- Related Articles and Videos
- Key References
Why Kernel-Based Identification Matters
Classical system identification has long been dominated by parametric methods — the prediction error method (PEM) with model structures such as ARX, ARMAX, and Box-Jenkins. These methods require the user to specify a model order (how many parameters to estimate), and their performance critically depends on this choice. If the model order is too low, the estimate is biased; if too high, the estimate suffers from excessive variance. Model order selection via information criteria (AIC, BIC) or cross-validation helps, but remains a discrete, combinatorial problem.
Around 2010, a fundamentally different approach emerged in the system identification community: kernel-based regularized identification. Instead of fixing a finite model order and optimizing parameters, this approach:
- Estimates the impulse response directly as a high-dimensional (potentially infinite-dimensional) object.
- Uses a regularization penalty defined through a kernel function to control model complexity.
- Tunes the regularization through a small number of hyperparameters that are estimated from data in a continuous manner.
This approach was introduced by Pillonetto and De Nicolao (2010) and further developed by Chen, Ljung, Chiuso, and others. It has since become a recognized complement to classical PEM, and is implemented in MATLAB's System Identification Toolbox through the impulseest function with regularization kernel options.
Key advantages of the kernel-based approach:
- No explicit model order selection: The regularization automatically handles the bias-variance trade-off, eliminating the need for discrete model order choices.
- Better performance with short data: Empirical studies consistently show that kernel methods outperform PEM when the number of data points is small relative to the system complexity, or when the signal-to-noise ratio is low.
- Incorporation of prior knowledge: The kernel encodes structural properties of the impulse response — such as exponential decay and BIBO stability — directly into the estimation process.
- Unified framework: The same regularized estimation framework handles SISO, MIMO, and even nonlinear systems with appropriate kernel choices.
This article explains the mathematical foundation, the key kernel designs, and practical implementation. For comparison with parametric methods (ARX, ARMAX, PEM), see the companion article: Classical Parametric System Identification: ARX, ARMAX, and the Prediction Error Method.
Problem Setting
Consider a discrete-time SISO LTI system with impulse response :
where is the input,
is the output, and
is white Gaussian noise with variance
.
Given input-output data pairs
, the goal is to estimate the impulse response coefficients
for some truncation length
.
In practice, can be chosen to be large (e.g., 100 or more), and the regularization will automatically suppress the coefficients that are not supported by data. This is in stark contrast to parametric methods, where choosing too large a model order leads to overfitting.
FIR Formulation
With the truncation at order , the model becomes a finite impulse response (FIR) model:
where the regressor vector and parameter vector are:
Stacking all data points gives the matrix equation:
where ,
, and
.
The ordinary least squares (LS) estimate is:
When is large relative to
, the matrix
becomes ill-conditioned, and the LS estimate has excessive variance. This is the fundamental problem that regularization addresses.
Regularized Impulse Response Estimation
The Regularized Least Squares Estimator
The kernel-based approach replaces ordinary least squares with regularized least squares:
where is a regularization parameter and
is a positive definite kernel matrix (also called the regularization matrix). The closed-form solution is:
Equivalently, using the matrix inversion lemma:
Key insight: The regularization term penalizes impulse response estimates that deviate from the prior encoded in
. The kernel matrix
defines the expected correlation structure of the impulse response coefficients.
Bayesian Interpretation
The regularized estimate has an elegant Bayesian interpretation. If we model the impulse response as a zero-mean Gaussian process:
and the noise as:
then with
is precisely the posterior mean (or equivalently, the MAP estimate) of
given the data. The kernel matrix
plays the role of the prior covariance of the impulse response.
This Bayesian perspective provides: - A principled way to design kernels: encode prior knowledge about the impulse response into the covariance structure. - A principled way to estimate hyperparameters: maximize the marginal likelihood of the data. - Uncertainty quantification: the posterior covariance gives confidence intervals for the estimated impulse response.
Kernel Design: Encoding Prior Knowledge
The choice of the kernel matrix is the most critical step in kernel-based identification. The kernel determines what structural properties are imposed on the estimated impulse response. Several kernels have been proposed, each encoding different prior assumptions.
The following figure visualizes the kernel matrices for the three most commonly used kernels. The color intensity at position (i,j) represents the prior correlation between impulse response coefficients and
.
Fig. 1: Kernel matrices K(i,j) for TC, SS, and DC kernels (λ=1, α=0.85, ρ=0.5). The exponential decay along the diagonal and the off-diagonal correlation structure are clearly visible.
Stable Spline (SS) Kernel
The stable spline kernel (also called the first-order stable spline kernel), introduced by Pillonetto and De Nicolao (2010), is defined as:
where is a scale parameter and
controls the exponential decay rate. For the precise definition and derivation, see Pillonetto and De Nicolao (2010) and Pillonetto et al. (2022, Chapter 7).
Properties:
- Encodes the prior that the impulse response decays exponentially (BIBO stability).
- Encodes smoothness: neighboring coefficients are correlated.
- Justified by maximum entropy arguments.
- The hyperparameters are few (typically 2), making hyperparameter estimation tractable.
Tuned-Correlated (TC) Kernel
The tuned-correlated kernel is defined as:
This is simpler than the stable spline kernel but still encodes exponential decay and correlation between coefficients. It is the default kernel in MATLAB's impulseest function.
Diagonal-Correlated (DC) Kernel
The diagonal-correlated kernel is:
where controls the correlation between different lag coefficients.
When , this reduces to the diagonal (DI) kernel:
which encodes exponential decay but no correlation between coefficients.
Squared Exponential (SE) Kernel
The squared exponential kernel (also called the RBF kernel in machine learning):
where controls the length scale. This kernel produces very smooth impulse response estimates.
Summary of Kernel Properties
The different kernels represent different trade-offs:
- TC and SS: Good general-purpose kernels. Encode exponential decay and smoothness. SS has stronger smoothness properties.
- DC: Allows control over the correlation strength via
. Useful when neighboring impulse response coefficients may have different magnitudes.
- DI: No correlation structure. Essentially ridge regression with exponentially decaying weights. Simplest but least informative.
- SE: Strong smoothness, suitable for systems with slowly varying impulse responses.
For most practical applications, the TC kernel (MATLAB default) or the SS kernel provides a good starting point.
Hyperparameter Estimation
The kernel contains a small number of hyperparameters (typically 2–3, e.g.,
and
). These hyperparameters play a role analogous to model order selection in PEM, but their estimation is a continuous optimization problem rather than a discrete search.
Empirical Bayes (Marginal Likelihood Maximization)
The most widely used approach is empirical Bayes, which maximizes the marginal likelihood of the data:
Under the Gaussian model, the log marginal likelihood is:
where .
This optimization is nonconvex, but because the number of hyperparameters is small (2–3), standard gradient-based optimization methods work well in practice. The marginal likelihood naturally balances model fit and complexity, analogous to BIC in parametric methods but operating in a continuous space.
Other Approaches
Other methods for hyperparameter estimation include: - Cross-validation (CV): Leave-one-out or k-fold. - Stein's Unbiased Risk Estimator (SURE): An unbiased estimate of the prediction error. - Generalized cross-validation (GCV): A computationally efficient approximation to leave-one-out CV.
Empirical evidence suggests that empirical Bayes is more robust than cross-validation for this problem, which is one of the key reasons for the success of kernel-based methods.
Comparison with Parametric Methods
The relationship between kernel-based regularized identification and classical PEM is an important topic.
When Does Regularization Help?
Regularized (kernel-based) methods tend to outperform PEM when:
- Data is short: With limited data, the variance reduction from regularization more than compensates for the introduced bias.
- Signal-to-noise ratio is low: Regularization stabilizes the estimate.
- True system order is unknown: Regularization avoids the discrete model order selection problem.
- System has many modes: High-order systems with multiple poles are difficult for PEM (many local minima), but regularization handles them naturally.
PEM tends to outperform regularized methods when:
- Data is abundant and the correct model structure is known.
- The model order is small and can be reliably selected.
The following figure demonstrates this effect. With short data (N=80), the unregularized LS estimate shows significant variance, while the TC kernel estimate remains close to the true impulse response. With longer data (N=200), both methods improve, but regularization still provides a cleaner estimate.
Fig. 2: Effect of data length on impulse response estimation. (a) Short data N=80, FIR order n=30. (b) Long data N=200, FIR order n=50. The TC kernel regularization significantly reduces estimation variance when data is limited.
Relationship to Subspace Methods
Kernel-based methods can also be combined with subspace identification:
- Use impulseest with regularization to obtain a high-quality nonparametric impulse response estimate.
- Convert the FIR model to a state-space model using model reduction techniques.
- Optionally refine with PEM (ssest in MATLAB uses N4SID initialization followed by PEM refinement).
For details on subspace identification, see: Subspace Identification Methods: N4SID, MOESP, and CVA Explained.
Comparison Table
| Aspect | PEM (ARX/ARMAX/BJ) | Subspace (N4SID) | Kernel-Based (Regularized) |
|---|---|---|---|
| Model type | Parametric (transfer function) | State-space | Nonparametric (impulse response) |
| Model order | Must be specified | Estimated via SVD | Not required |
| Optimization | Nonconvex (except ARX) | SVD (non-iterative) | Convex (given hyperparameters) |
| Hyperparameters | Model orders | SVD threshold | Kernel parameters (2–3) |
| Short data | Sensitive to order choice | Moderate | Robust |
| Asymptotic efficiency | Yes (ML under Gaussian) | No | No (biased) |
| MATLAB functions | arx, armax, oe, bj, pem |
n4sid, ssest |
impulseest |
MATLAB Implementation
MATLAB's System Identification Toolbox provides built-in support for kernel-based regularized impulse response estimation through the impulseest function.
Basic Usage
% Load or create input-output data load iddata1 z1; % Example dataset % Estimate regularized impulse response (default: TC kernel) sys = impulseest(z1); % Plot the impulse response impulseplot(sys);
By default, impulseest uses the TC (tuned-correlated) kernel and automatically tunes the hyperparameters.
Specifying Kernels
% Use stable spline kernel opt = impulseestOptions('RegularizationKernel', 'SS'); sys_ss = impulseest(z1, 50, opt); % Use diagonal-correlated kernel opt = impulseestOptions('RegularizationKernel', 'DC'); sys_dc = impulseest(z1, 50, opt); % No regularization (standard FIR / least squares) opt = impulseestOptions('RegularizationKernel', 'none'); sys_ls = impulseest(z1, 50, opt); % Compare results impulseplot(sys_ss, sys_dc, sys_ls, 50); legend('Stable Spline', 'Diagonal-Correlated', 'No Regularization');
Available Kernel Options
The impulseestOptions function supports the following kernels:
'TC'— Tuned-correlated kernel (default)'SS'— Stable spline kernel'HF'— High-frequency stable spline kernel'DC'— Diagonal-correlated kernel'DI'— Diagonal kernel'SE'— Squared exponential kernel'CS'— Cubic spline kernel'none'— No regularization (ordinary least squares)
From Impulse Response to State-Space Model
A common workflow is to first estimate a regularized impulse response, then convert to a parametric state-space model:
% Step 1: Regularized impulse response estimation opt = impulseestOptions('RegularizationKernel', 'DC'); sys_fir = impulseest(data, 70, opt); % Step 2: Convert to state-space and reduce order sys_ss = balred(idss(sys_fir), 4); % Reduce to order 4 % Alternative: Use ssest which internally uses regularized initialization sys_ss2 = ssest(data, 4);
The following figure shows the result of this workflow applied to the example system. The regularized FIR estimate (n=70) is converted to a 2nd-order state-space model via balanced truncation, and the step response closely matches the true system.
Fig. 3: Step response comparison. The regularized FIR model (TC kernel, n=70) and its balanced-reduced state-space model (order 2) both closely match the true system. ARX(2,2,1) is shown for reference.
Complete Workflow Example
%% Kernel-Based System Identification: Complete Workflow % Generate simulated data G_true = tf([1 0.5], [1 -1.5 0.7], 1); % True system (discrete-time) N = 200; % Number of data points u = randn(N, 1); % Random input y = lsim(G_true, u) + 0.1 * randn(N, 1); % Noisy output data = iddata(y, u, 1); %% Compare different identification approaches % 1. Regularized impulse response (TC kernel) opt_tc = impulseestOptions('RegularizationKernel', 'TC'); sys_tc = impulseest(data, 50, opt_tc); % 2. Regularized impulse response (SS kernel) opt_ss = impulseestOptions('RegularizationKernel', 'SS'); sys_ss = impulseest(data, 50, opt_ss); % 3. No regularization opt_none = impulseestOptions('RegularizationKernel', 'none'); sys_none = impulseest(data, 50, opt_none); % 4. Parametric ARX for comparison sys_arx = arx(data, [2 2 1]); %% Plot comparison figure; impulseplot(sys_tc, sys_ss, sys_none, G_true, 50); legend('TC Kernel', 'SS Kernel', 'No Reg.', 'True System'); title('Kernel-Based vs. Unregularized Impulse Response');
The following figure shows the result of this comparison. The TC and SS kernels produce smooth impulse response estimates that closely track the true response, while the unregularized LS estimate exhibits higher variance, particularly in the tail region where the impulse response has decayed.
Fig. 4: Impulse response comparison — True system vs. TC kernel, SS kernel, unregularized LS, and parametric ARX(2,2,1). System: G(z) = (z+0.5)/(z²−1.5z+0.7), N=200, σ=0.5.
MATLAB Demo Script
A complete MATLAB demo script that generates all four figures in this article is available on GitHub:
MATLAB_system_identification / 06_kernel_based
Run kernel_based_demo.m to reproduce all figures. The demo uses the same plant model with N=200 data points and additive noise (σ=0.5).
Connection to the Author's Research
Regularized Identification and Model Error Compensation
The kernel-based identification approach provides a nonparametric model (impulse response) that can serve as the nominal model in the Model Error Compensator (MEC) framework. Even if the regularized estimate has some bias due to the regularization, the MEC can compensate for the resulting model error in real-time, achieving robust control performance. The combination of kernel-based identification (for obtaining an initial model) and MEC (for compensating model inaccuracy) provides a practical workflow for control system design.
Subspace Identification and Cyclic Reformulation
The author's research on system identification focuses on subspace identification methods and their extensions to multirate and periodically time-varying systems. Kernel-based methods provide a complementary approach: while subspace methods (N4SID) directly estimate state-space models, kernel methods estimate impulse responses that can then be converted to state-space form.
For related work on subspace and multirate identification, see:
Cyclic Reformulation-Based System Identification — H. Okajima, Y. Fujimoto, H. Oku and H. Kondo, Cyclic Reformulation-Based System Identification for Periodically Time-Varying Systems, IEEE Access (2025). Develops cyclic identification for general LPTV systems using subspace methods.
Multirate System Identification — H. Okajima, R. Furukawa and N. Matsunaga, System Identification Under Multirate Sensing Environments, Journal of Robotics and Mechatronics, Vol. 37, No. 5, pp. 1102–1112 (2025) (Open Access). Extends cyclic reformulation to multirate systems with different sensor sampling rates.
Related Articles and Videos
Blog Articles (blog.control-theory.com)
- System Identification: From Data to Dynamical Models — A Comprehensive Guide (Hub)
- Classical Parametric System Identification: ARX, ARMAX, and the Prediction Error Method
- Subspace Identification Methods: N4SID, MOESP, and CVA Explained
- System Identification: Obtaining Dynamical Model
- Cyclic Reformulation-Based System Identification for Periodically Time-Varying Systems
- System Identification Under Multirate Sensing Environments
- Model Error Compensator (MEC)
- State Feedback Control and State-Space Design: A Comprehensive Guide
- State Observer and State Estimation: A Comprehensive Guide
- Linear Matrix Inequalities (LMIs) and Controller Design
- Discretization of Continuous-Time Control Systems
Research Web Pages (www.control-theory.com)
- Publications / Multi-rate System / LMI / MEC
Video
Video Portal
Key References
G. Pillonetto and G. De Nicolao, "A new kernel-based approach for linear system identification," Automatica, Vol. 46, No. 1, pp. 81–93, 2010. DOI: 10.1016/j.automatica.2009.10.031 — The seminal paper introducing stable spline kernels for system identification.
G. Pillonetto, F. Dinuzzo, T. Chen, G. De Nicolao, and L. Ljung, "Kernel methods in system identification, machine learning and function estimation: A survey," Automatica, Vol. 50, No. 3, pp. 657–682, 2014. DOI: 10.1016/j.automatica.2014.01.001 — Comprehensive survey covering the theoretical foundations and connections to machine learning.
T. Chen, H. Ohlsson, and L. Ljung, "On the estimation of transfer functions, regularizations and Gaussian processes — Revisited," Automatica, Vol. 48, No. 8, pp. 1525–1535, 2012. DOI: 10.1016/j.automatica.2012.05.026 — Key paper on the tuned-correlated kernel and its implementation.
G. Pillonetto, T. Chen, A. Chiuso, G. De Nicolao, and L. Ljung, Regularized System Identification: Learning Dynamic Models from Data, Springer, 2022. DOI: 10.1007/978-3-030-95860-2 (Open Access) — Comprehensive textbook covering the full theory and practice of regularized system identification.
L. Ljung, System Identification: Theory for the User, 2nd ed., Prentice Hall, 1999. — The standard reference for classical system identification.
MATLAB documentation:
impulseest/impulseestOptions— MATLAB implementation of kernel-based regularized impulse response estimation.
Self-Introduction
Hiroshi Okajima — Associate Professor, Graduate School of Science and Technology, Kumamoto University. Member of SICE, ISCIE, and IEEE.
If you found this article helpful, please consider bookmarking or sharing it.
#KernelMethods #SystemIdentification #RegularizedIdentification #ImpulseResponse #MachineLearning #BayesianEstimation #GaussianProcess #MATLAB #ControlEngineering #NonparametricIdentification