33. Classical Prediction and Filtering With Linear Algebra#
33.1. Overview#
This is a sequel to the earlier lecture Classical Control with Linear Algebra.
That lecture used linear algebra – in particular, the LU decomposition – to formulate and solve a class of linear-quadratic optimal control problems.
In this lecture, we’ll be using a closely related decomposition, the Cholesky decomposition, to solve linear prediction and filtering problems.
We exploit the useful fact that there is an intimate connection between two superficially different classes of problems:
deterministic linear-quadratic (LQ) optimal control problems
linear least squares prediction and filtering problems
The first class of problems involves no randomness, while the second is all about randomness.
Nevertheless, essentially the same mathematics solves both types of problem.
This connection, which is often termed “duality,” is present whether one uses “classical” or “recursive” solution procedures.
In fact, we saw duality at work earlier when we formulated control and prediction problems recursively in lectures LQ dynamic programming problems, A first look at the Kalman filter, and The permanent income model.
A useful consequence of duality is that
With every LQ control problem, there is implicitly affiliated a linear least squares prediction or filtering problem.
With every linear least squares prediction or filtering problem there is implicitly affiliated a LQ control problem.
An understanding of these connections has repeatedly proved useful in cracking interesting applied problems.
For example, Sargent [Sargent, 1987] [chs. IX, XIV] and Hansen and Sargent [Hansen and Sargent, 1980] formulated
and solved control and filtering problems using
In this lecture, we begin to investigate these ideas by using mostly elementary linear algebra.
This is the main purpose and focus of the lecture.
However, after showing matrix algebra formulas, we’ll summarize classic infinite-horizon formulas built on
And we’ll occasionally refer to some of these formulas from the infinite dimensional problems as we present the finite time formulas and associated linear algebra.
We’ll start with the following standard import:
import numpy as np
33.1.1. References#
Useful references include [Whittle, 1963], [Hansen and Sargent, 1980], [Orfanidis, 1988], [Athanasios and Pillai, 1991], and [Muth, 1960].
33.2. Finite Dimensional Prediction#
Let
Here
The
We regard the random variables as being
ordered in time so that
For example,
In that case,
We want to construct
where
(Sometimes
To find linear least squares predictors it is helpful first to construct a
The key insight here comes from noting that because the covariance matrix
and
where
Form the
The random vector
is nonsingular
It is enlightening to write out and interpret the equations
First, we’ll write
or
Next, we write
or
where
From (33.2), it follows that
From (33.4) it follows that that
Equation (33.2) forms a sequence of autoregressions that for
(Here
The equivalence of the subspaces spanned by
To proceed, it is useful to drill down and note that for
Representation (33.6) is an orthogonal decomposition of
It follows that
33.2.1. Implementation#
Here’s the code that computes solutions to LQ control and filtering problems using the methods described here and in Classical Control with Linear Algebra.
import numpy as np
import scipy.stats as spst
import scipy.linalg as la
class LQFilter:
def __init__(self, d, h, y_m, r=None, h_eps=None, β=None):
"""
Parameters
----------
d : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [d_0, d_1, ..., d_m]
h : scalar
Parameter of the objective function (corresponding to the
quadratic term)
y_m : list or numpy.array (1-D or a 2-D column vector)
Initial conditions for y
r : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [r_0, r_1, ..., r_k]
(optional, if not defined -> deterministic problem)
β : scalar
Discount factor (optional, default value is one)
"""
self.h = h
self.d = np.asarray(d)
self.m = self.d.shape[0] - 1
self.y_m = np.asarray(y_m)
if self.m == self.y_m.shape[0]:
self.y_m = self.y_m.reshape(self.m, 1)
else:
raise ValueError("y_m must be of length m = {self.m:d}")
#---------------------------------------------
# Define the coefficients of ϕ upfront
#---------------------------------------------
ϕ = np.zeros(2 * self.m + 1)
for i in range(- self.m, self.m + 1):
ϕ[self.m - i] = np.sum(np.diag(self.d.reshape(self.m + 1, 1) \
@ self.d.reshape(1, self.m + 1),
k=-i
)
)
ϕ[self.m] = ϕ[self.m] + self.h
self.ϕ = ϕ
#-----------------------------------------------------
# If r is given calculate the vector ϕ_r
#-----------------------------------------------------
if r is None:
pass
else:
self.r = np.asarray(r)
self.k = self.r.shape[0] - 1
ϕ_r = np.zeros(2 * self.k + 1)
for i in range(- self.k, self.k + 1):
ϕ_r[self.k - i] = np.sum(np.diag(self.r.reshape(self.k + 1, 1) \
@ self.r.reshape(1, self.k + 1),
k=-i
)
)
if h_eps is None:
self.ϕ_r = ϕ_r
else:
ϕ_r[self.k] = ϕ_r[self.k] + h_eps
self.ϕ_r = ϕ_r
#-----------------------------------------------------
# If β is given, define the transformed variables
#-----------------------------------------------------
if β is None:
self.β = 1
else:
self.β = β
self.d = self.β**(np.arange(self.m + 1)/2) * self.d
self.y_m = self.y_m * (self.β**(- np.arange(1, self.m + 1)/2)) \
.reshape(self.m, 1)
def construct_W_and_Wm(self, N):
"""
This constructs the matrices W and W_m for a given number of periods N
"""
m = self.m
d = self.d
W = np.zeros((N + 1, N + 1))
W_m = np.zeros((N + 1, m))
#---------------------------------------
# Terminal conditions
#---------------------------------------
D_m1 = np.zeros((m + 1, m + 1))
M = np.zeros((m + 1, m))
# (1) Constuct the D_{m+1} matrix using the formula
for j in range(m + 1):
for k in range(j, m + 1):
D_m1[j, k] = d[:j + 1] @ d[k - j: k + 1]
# Make the matrix symmetric
D_m1 = D_m1 + D_m1.T - np.diag(np.diag(D_m1))
# (2) Construct the M matrix using the entries of D_m1
for j in range(m):
for i in range(j + 1, m + 1):
M[i, j] = D_m1[i - j - 1, m]
#----------------------------------------------
# Euler equations for t = 0, 1, ..., N-(m+1)
#----------------------------------------------
ϕ = self.ϕ
W[:(m + 1), :(m + 1)] = D_m1 + self.h * np.eye(m + 1)
W[:(m + 1), (m + 1):(2 * m + 1)] = M
for i, row in enumerate(np.arange(m + 1, N + 1 - m)):
W[row, (i + 1):(2 * m + 2 + i)] = ϕ
for i in range(1, m + 1):
W[N - m + i, -(2 * m + 1 - i):] = ϕ[:-i]
for i in range(m):
W_m[N - i, :(m - i)] = ϕ[(m + 1 + i):]
return W, W_m
def roots_of_characteristic(self):
"""
This function calculates z_0 and the 2m roots of the characteristic
equation associated with the Euler equation (1.7)
Note:
------
numpy.poly1d(roots, True) defines a polynomial using its roots that can
be evaluated at any point. If x_1, x_2, ... , x_m are the roots then
p(x) = (x - x_1)(x - x_2)...(x - x_m)
"""
m = self.m
ϕ = self.ϕ
# Calculate the roots of the 2m-polynomial
roots = np.roots(ϕ)
# Sort the roots according to their length (in descending order)
roots_sorted = roots[np.argsort(abs(roots))[::-1]]
z_0 = ϕ.sum() / np.poly1d(roots, True)(1)
z_1_to_m = roots_sorted[:m] # We need only those outside the unit circle
λ = 1 / z_1_to_m
return z_1_to_m, z_0, λ
def coeffs_of_c(self):
'''
This function computes the coefficients {c_j, j = 0, 1, ..., m} for
c(z) = sum_{j = 0}^{m} c_j z^j
Based on the expression (1.9). The order is
c_coeffs = [c_0, c_1, ..., c_{m-1}, c_m]
'''
z_1_to_m, z_0 = self.roots_of_characteristic()[:2]
c_0 = (z_0 * np.prod(z_1_to_m).real * (- 1)**self.m)**(.5)
c_coeffs = np.poly1d(z_1_to_m, True).c * z_0 / c_0
return c_coeffs[::-1]
def solution(self):
"""
This function calculates {λ_j, j=1,...,m} and {A_j, j=1,...,m}
of the expression (1.15)
"""
λ = self.roots_of_characteristic()[2]
c_0 = self.coeffs_of_c()[-1]
A = np.zeros(self.m, dtype=complex)
for j in range(self.m):
denom = 1 - λ/λ[j]
A[j] = c_0**(-2) / np.prod(denom[np.arange(self.m) != j])
return λ, A
def construct_V(self, N):
'''
This function constructs the covariance matrix for x^N (see section 6)
for a given period N
'''
V = np.zeros((N, N))
ϕ_r = self.ϕ_r
for i in range(N):
for j in range(N):
if abs(i-j) <= self.k:
V[i, j] = ϕ_r[self.k + abs(i-j)]
return V
def simulate_a(self, N):
"""
Assuming that the u's are normal, this method draws a random path
for x^N
"""
V = self.construct_V(N + 1)
d = spst.multivariate_normal(np.zeros(N + 1), V)
return d.rvs()
def predict(self, a_hist, t):
"""
This function implements the prediction formula discussed in section 6 (1.59)
It takes a realization for a^N, and the period in which the prediction is
formed
Output: E[abar | a_t, a_{t-1}, ..., a_1, a_0]
"""
N = np.asarray(a_hist).shape[0] - 1
a_hist = np.asarray(a_hist).reshape(N + 1, 1)
V = self.construct_V(N + 1)
aux_matrix = np.zeros((N + 1, N + 1))
aux_matrix[:(t + 1), :(t + 1)] = np.eye(t + 1)
L = la.cholesky(V).T
Ea_hist = la.inv(L) @ aux_matrix @ L @ a_hist
return Ea_hist
def optimal_y(self, a_hist, t=None):
"""
- if t is NOT given it takes a_hist (list or numpy.array) as a
deterministic a_t
- if t is given, it solves the combined control prediction problem
(section 7)(by default, t == None -> deterministic)
for a given sequence of a_t (either deterministic or a particular
realization), it calculates the optimal y_t sequence using the method
of the lecture
Note:
------
scipy.linalg.lu normalizes L, U so that L has unit diagonal elements
To make things consistent with the lecture, we need an auxiliary
diagonal matrix D which renormalizes L and U
"""
N = np.asarray(a_hist).shape[0] - 1
W, W_m = self.construct_W_and_Wm(N)
L, U = la.lu(W, permute_l=True)
D = np.diag(1 / np.diag(U))
U = D @ U
L = L @ np.diag(1 / np.diag(D))
J = np.fliplr(np.eye(N + 1))
if t is None: # If the problem is deterministic
a_hist = J @ np.asarray(a_hist).reshape(N + 1, 1)
#--------------------------------------------
# Transform the 'a' sequence if β is given
#--------------------------------------------
if self.β != 1:
a_hist = a_hist * (self.β**(np.arange(N + 1) / 2))[::-1] \
.reshape(N + 1, 1)
a_bar = a_hist - W_m @ self.y_m # a_bar from the lecture
Uy = np.linalg.solve(L, a_bar) # U @ y_bar = L^{-1}
y_bar = np.linalg.solve(U, Uy) # y_bar = U^{-1}L^{-1}
# Reverse the order of y_bar with the matrix J
J = np.fliplr(np.eye(N + self.m + 1))
# y_hist : concatenated y_m and y_bar
y_hist = J @ np.vstack([y_bar, self.y_m])
#--------------------------------------------
# Transform the optimal sequence back if β is given
#--------------------------------------------
if self.β != 1:
y_hist = y_hist * (self.β**(- np.arange(-self.m, N + 1)/2)) \
.reshape(N + 1 + self.m, 1)
return y_hist, L, U, y_bar
else: # If the problem is stochastic and we look at it
Ea_hist = self.predict(a_hist, t).reshape(N + 1, 1)
Ea_hist = J @ Ea_hist
a_bar = Ea_hist - W_m @ self.y_m # a_bar from the lecture
Uy = np.linalg.solve(L, a_bar) # U @ y_bar = L^{-1}
y_bar = np.linalg.solve(U, Uy) # y_bar = U^{-1}L^{-1}
# Reverse the order of y_bar with the matrix J
J = np.fliplr(np.eye(N + self.m + 1))
# y_hist : concatenated y_m and y_bar
y_hist = J @ np.vstack([y_bar, self.y_m])
return y_hist, L, U, y_bar
Let’s use this code to tackle two interesting examples.
33.2.2. Example 1#
Consider a stochastic process with moving average representation
where
If we were to use the tools associated with infinite dimensional prediction and filtering to be described below,
we would use the Wiener-Kolmogorov formula (33.21) to compute the linear least squares forecasts
But we can do everything we want by instead using our finite dimensional tools and
setting
m = 1
y_m = np.asarray([.0]).reshape(m, 1)
d = np.asarray([1, -2])
r = np.asarray([1, -2])
h = 0.0
example = LQFilter(d, h, y_m, r=d)
The Wold representation is computed by example.coeffs_of_c()
.
Let’s check that it “flips roots” as required
example.coeffs_of_c()
array([ 2., -1.])
example.roots_of_characteristic()
(array([2.]), -2.0, array([0.5]))
Now let’s form the covariance matrix of a time series vector of length
Then we’ll take a Cholesky decomposition of
V = example.construct_V(N=5)
print(V)
[[ 5. -2. 0. 0. 0.]
[-2. 5. -2. 0. 0.]
[ 0. -2. 5. -2. 0.]
[ 0. 0. -2. 5. -2.]
[ 0. 0. 0. -2. 5.]]
Notice how the lower rows of the “moving average representations” are converging to the appropriate infinite history Wold representation to be described below when we study infinite horizon-prediction and filtering
Li = np.linalg.cholesky(V)
print(Li)
[[ 2.23606798 0. 0. 0. 0. ]
[-0.89442719 2.04939015 0. 0. 0. ]
[ 0. -0.97590007 2.01186954 0. 0. ]
[ 0. 0. -0.99410024 2.00293902 0. ]
[ 0. 0. 0. -0.99853265 2.000733 ]]
Notice how the lower rows of the “autoregressive representations” are converging to the appropriate infinite-history autoregressive representation to be described below when we study infinite horizon-prediction and filtering
L = np.linalg.inv(Li)
print(L)
[[0.4472136 0. 0. 0. 0. ]
[0.19518001 0.48795004 0. 0. 0. ]
[0.09467621 0.23669053 0.49705012 0. 0. ]
[0.04698977 0.11747443 0.2466963 0.49926632 0. ]
[0.02345182 0.05862954 0.12312203 0.24917554 0.49981682]]
33.2.3. Example 2#
Consider a stochastic process
where
Let’s find a Wold moving average representation for
To do this, we’ll use the Wiener-Kolomogorov formula (33.21) presented below to compute the linear least squares forecasts
We proceed in the same way as in example 1
m = 2
y_m = np.asarray([.0, .0]).reshape(m, 1)
d = np.asarray([1, 0, -np.sqrt(2)])
r = np.asarray([1, 0, -np.sqrt(2)])
h = 0.0
example = LQFilter(d, h, y_m, r=d)
example.coeffs_of_c()
array([ 1.41421356, -0. , -1. ])
example.roots_of_characteristic()
(array([ 1.18920712, -1.18920712]),
-1.4142135623731122,
array([ 0.84089642, -0.84089642]))
V = example.construct_V(N=8)
print(V)
[[ 3. 0. -1.41421356 0. 0. 0.
0. 0. ]
[ 0. 3. 0. -1.41421356 0. 0.
0. 0. ]
[-1.41421356 0. 3. 0. -1.41421356 0.
0. 0. ]
[ 0. -1.41421356 0. 3. 0. -1.41421356
0. 0. ]
[ 0. 0. -1.41421356 0. 3. 0.
-1.41421356 0. ]
[ 0. 0. 0. -1.41421356 0. 3.
0. -1.41421356]
[ 0. 0. 0. 0. -1.41421356 0.
3. 0. ]
[ 0. 0. 0. 0. 0. -1.41421356
0. 3. ]]
Li = np.linalg.cholesky(V)
print(Li[-3:, :])
[[ 0. 0. 0. -0.9258201 0. 1.46385011
0. 0. ]
[ 0. 0. 0. 0. -0.96609178 0.
1.43759058 0. ]
[ 0. 0. 0. 0. 0. -0.96609178
0. 1.43759058]]
L = np.linalg.inv(Li)
print(L)
[[0.57735027 0. 0. 0. 0. 0.
0. 0. ]
[0. 0.57735027 0. 0. 0. 0.
0. 0. ]
[0.3086067 0. 0.65465367 0. 0. 0.
0. 0. ]
[0. 0.3086067 0. 0.65465367 0. 0.
0. 0. ]
[0.19518001 0. 0.41403934 0. 0.68313005 0.
0. 0. ]
[0. 0.19518001 0. 0.41403934 0. 0.68313005
0. 0. ]
[0.13116517 0. 0.27824334 0. 0.45907809 0.
0.69560834 0. ]
[0. 0.13116517 0. 0.27824334 0. 0.45907809
0. 0.69560834]]
33.2.4. Prediction#
It immediately follows from the “orthogonality principle” of least squares (see [Athanasios and Pillai, 1991] or [Sargent, 1987] [ch. X]) that
This can be interpreted as a finite-dimensional version of the Wiener-Kolmogorov
We can use (33.7) to represent the linear least squares projection of
the vector
We have
This formula will be convenient in representing the solution of control problems under uncertainty.
Equation (33.4) can be recognized as a finite dimensional version of a moving average representation.
Equation (33.2) can be viewed as a finite dimension version of an autoregressive representation.
Notice that even
if the
If
Further, if
That is,
the “bottom” rows of
This last observation gives one simple and widely-used practical way of
forming a finite
First, form the covariance matrix
The last row of
This method can readily be generalized to multivariate systems.
33.3. Combined Finite Dimensional Control and Prediction#
Consider the finite-dimensional control problem, maximize
where
The variables
Maximization is over choices of
We saw in the lecture Classical Control with Linear Algebra that the solution of this problem under certainty could be represented in the feedback-feedforward form
for some
Using a version of formula (33.7), we can express
where
(We have reversed the time axis in dating the
The time axis can be reversed in representation (33.8) by replacing
The optimal decision rule to use at time
33.4. Infinite Horizon Prediction and Filtering Problems#
It is instructive to compare the finite-horizon formulas based on linear algebra decompositions of finite-dimensional covariance matrices with classic formulas for infinite horizon and infinite history prediction and control problems.
These classic infinite horizon formulas used the mathematics of
We’ll meet interesting lag operator and
We pose two related prediction and filtering problems.
We let
where
We impose no conditions on the zeros of
A second covariance stationary process is
where
We also assume that
The linear least squares prediction problem is to find the
That is, the problem is to find a
The linear least squares filtering problem is to find a
Interesting versions of these problems related to the permanent income theory were studied by [Muth, 1960].
33.4.1. Problem Formulation#
These problems are solved as follows.
The covariograms of
The covariance and cross-covariance generating functions are defined as
The generating functions can be computed by using the following facts.
Let
That is,
Let
Then, as shown for example in [Sargent, 1987] [ch. XI], it is true that
Applying these formulas to (33.9) – (33.12), we have
The key step in obtaining solutions to our problems is to factor the covariance generating function
The solutions of our problems are given by formulas due to Wiener and Kolmogorov.
These formulas utilize the Wold moving average representation of the
where
Here
Equation (33.17) is the condition that
Condition (33.17) requires that
This will be true if and only if the zeros of
It is an implication of (33.17) that
Consequently, an implication of (33.16) is
that the covariance generating function of
It remains to discuss how
Combining (33.14) and (33.18) gives
Therefore, we have already shown constructively how to factor the covariance generating function
We now introduce the annihilation operator:
In words,
We have defined the solution of the prediction problem as
Assuming that the roots of
We have defined the solution of the filtering problem as
The Wiener-Kolomogorov formula for
or
Formulas (33.21) and (33.22) are discussed in detail in [Whittle, 1983] and [Sargent, 1987].
The interested reader can there find several examples of the use of these formulas in economics Some classic examples using these formulas are due to [Muth, 1960].
As an example of the usefulness of formula (33.22), we let
where
Suppose that at time
given knowledge of
We shall use (33.22) to obtain the answer.
Using the standard formulas (33.14), we have that
Then (33.22) becomes
In order to evaluate the term in the annihilation operator, we use the following result from [Hansen and Sargent, 1980].
Proposition Let
where . , where , for .
Then
and, alternatively,
where
Applying formula (33.25) of the proposition to evaluating (33.23) with
or
Thus, we have
This formula is useful in solving stochastic versions of problem 1 of lecture Classical Control with Linear Algebra in which the randomness emerges because
The problem is to maximize
where
where
and
The problem is to maximize (33.27) with respect to a contingency plan
expressing
The solution of this problem can be achieved in two steps.
First, ignoring the uncertainty, we can solve the problem assuming that
The solution is, from above,
or
Second, the solution of the problem under uncertainty is obtained by replacing the terms on the right-hand side of the above expressions with their linear least squares predictors.
Using (33.26) and (33.28), we have the following solution
Blaschke factors
The following is a useful piece of mathematics underlying “root flipping”.
Let
Then define
The term multiplying
Then it can be proved directly that
and that the zeros of
33.5. Exercises#
Exercise 33.1
Let
where
Find the Wold moving average representation for
Find a formula for the
Find a formula for the
Exercise 33.2
Multivariable Prediction: Let
where
Let
Let
Define the covariograms as
Then define the matrix covariance generating function, as in (32.21), only interpret all the objects in (32.21) as matrices.
Show that the covariance generating functions are given by
A factorization of
where the zeros of
A vector Wold moving average representation of
where
That is,
The optimum predictor of
If