Numerical Linear Algebra Background¶
Solving Systems of Linear Equations — Matrix Factorizations¶
The typical \(\mathbf{A} \mathbf{x} = \mathbf{b}\), where \(\mathbf{A} \in \mathbb{R}^{n \times n}\) and \(\mathbf{b}, \mathbf{x} \in \mathbb{R}^n\), can be solved using QR or PLU.
QR’s orthogonal transformations (e.g. Householder) is numerically stable because they do not affect the magnitude of a matrix’s norm.
PLU is accurate and half as expensive as QR, but could magnify the elements of \(\mathbf{A}\).
Matrices with trivial analytical structure can be solved directly.
Permutation
Orthogonal
Nonsingular diagonal
Nonsingular diagonal block
Nonsingular lower triangular
Nonsingular upper triangular
Symmetric Positive Definite matrices can be factorized using Cholesky decomposition: \(\mathbf{A} = \mathbf{L} \mathbf{L}^\top\) or \(\mathbf{A} = \mathbf{L} \mathbf{D} \mathbf{L}^\top\).
Symmetric Indefinite matrices can be factorized using the Bunch-Parlett algorithm: \(\mathbf{A} = \mathbf{P} \mathbf{L} \mathbf{D}_B \mathbf{L}^\top \mathbf{P}^\top\) where \(\mathbf{P}\) is a permutation matrix.
Errors in Solving Linear Systems¶
The condition number \(\kappa_p(\mathbf{A}) = \lVert \mathbf{A} \rVert_p \lVert \mathbf{A}^{-1} \rVert_p\) of a matrix measures its sensitivity to perturbations. It could be used to indicate the matrix’s nearness to singularity. Since computing the inverse is an expensive and unstable operation, the condition number is often estimated as \(\hat{\kappa}_p(\mathbf{A}) = \lVert \mathbf{M} \rVert_p \alpha\). \(\mathbf{M}\) could be \(\mathbf{A}\) or a factor of \(\mathbf{A}\) with similar conditioning, and \(\alpha\) is an estimate of the induced matrix norm \(\lVert \mathbf{M}^{-1} \rVert_p\).
If ill-conditioned systems occur far away from the solution, then perturbing the system into a better-conditioned one is acceptable. If ill-conditioned systems occur near the solution, then the underlying problem is not well posed.
Updating Matrix Factorizations¶
Jacobi rotation can be used to efficiently update a QR decomposition. Each Jacobi rotation zeros out one element of a matrix while changing only two rows of the matrix.
Eigenvalues and Positive Definiteness¶
Given a symmetric matrix, the Gerschgorin Theorem (3.5.9) provides a lower bound of the minimum eigenvalue and an upper bound for the maximum eigenvalue. If the matrix is strictly diagonally dominant, it is positive definite.
Linear Least Squares¶
The geometric interpretation of \(\min_{\mathbf{x}} \lVert \mathbf{A} \mathbf{x} - \mathbf{b} \rVert_2\) is given in Figure 3.6.2. The closest point to \(\mathbf{b}\) in subspace \(C(\mathbf{A})\) will be \(\mathbf{A} \mathbf{x}_* \in C(\mathbf{A})\) such that the residual \(\mathbf{A} \mathbf{x}_* - \mathbf{b}\) is perpendicular to the entire subspace of \(C(\mathbf{A})\). The solution \(\mathbf{A}^\top (\mathbf{A} \mathbf{x}_* - \mathbf{b}) = \boldsymbol{0}\) is known as the normal equation because the residual vector is perpendicular to the column space of \(\mathbf{A}\) i.e. the residual is in the null space of \(\mathbf{A}^\top\). Note that SVD is more expensive than the other matrix factorization techniques, so only use it when \(\mathbf{A}\) is not known to have full row or column rank.
Exercise 1¶
Exercise 2¶
The following proof is from [Olva].
Recall that a norm defines a continuous real-valued function \(f(\mathbf{v}) = \lVert \mathbf{v} \rVert\) on \(\mathbb{R}^n\). Let \(\left\Vert \cdot \right\Vert_1\) and \(\left\Vert \cdot \right\Vert_2\) be any two norms on \(\mathbb{R}^n\). Let \(S_1 = \left\{ \left\Vert \mathbf{u} \right\Vert_1 = 1 \right\}\) denote the unit sphere of the first norm. Any continuous function defined on a compact set achieves both a maximum and a minimum value. Thus, restricting the second norm function to the unit sphere \(S_1\) of the first norm yields
By inspection, \(0 < c^\star \leq C^\star < \infty\) with equality holding if and only if the norms are the same. Hence
To generalize this result, define \(\mathbf{u} = \mathbf{v} / \left\Vert \mathbf{v} \right\Vert_1 \in S_1\). Consequently,
Therefore,
Exercise 3¶
(1)¶
because
and
(2)¶
(3)¶
Exercise 4¶
The \(p\)-norm of a matrix \(A\) induced by a given vector norm (3.1.7) is
for every \(A \in \mathbb{R}^{n \times n}\) and \(\lVert A \rVert_p = 0\) only if \(\lVert A v_* \rVert_p = 0\). An induced matrix norm is said to be compatible or consistent if \(\lVert A v \rVert_p \leq \lVert A \rVert_p \lVert v \rVert_p\).
(1)¶
Given that \(A \in \mathbb{R}^{n \times n}\) is invertible, the following property holds
(2)¶
For every \(\alpha \in \mathbb{R}\) and \(A \in \mathbb{R}^{n \times n}\),
(3)¶
For every \(A, B \in \mathbb{R}^{n \times n}\),
(4)¶
For every \(A, B \in \mathbb{R}^{n \times n}\),
Exercise 5¶
(1)¶
For every \(A \in \mathbb{R}^{n \times n}\) and \(v \in \mathbb{R}^n\),
Hence the induced matrix norm \(\lVert A \rVert_1 \leq \max_j \left\{ \lVert A_{.j} \rVert_1 \right\}\).
(2)¶
For every \(A \in \mathbb{R}^{n \times n}\) and \(v \in \mathbb{R}^n\),
Hence the induced matrix norm \(\lVert A \rVert_\infty \leq \max_i \left\{ \lVert A_{i.} \rVert_1 \right\}\).
Exercise 6¶
For every \(A \in \mathbb{R}^{n \times n}\) and \(w \in \mathbb{R}^n\),
Hence the induced matrix norm \(\lVert A \rVert_2 \leq \left( \max_{\lambda} \left\{ A^\top A \right\} \right)^{1/2}\).
By inspection, \(\min_w \frac{\lVert A w \rVert_2}{\lVert w \rVert_2} \geq \left( \min_{\lambda} \left\{ A^\top A \right\} \right)^{1/2}\).
Thus, the maximum and minimum value can be attained via picking \(w\) to be the eigenvector corresponding to \(\lambda_\max\) and \(\lambda_\min\) respectively.
Exercise 7¶
Recall that
where \(a = \begin{bmatrix} {A_{.0}}^\top & \cdots & {A_{.n}}^\top \end{bmatrix}^\top\) and \(b = \begin{bmatrix} {B_{.0}}^\top & \cdots & {B_{.n}}^\top \end{bmatrix}^\top\).
The Frobenius norm of \(A\) can be derived as
Exercise 8¶
The following are using matrix norms.
(1)¶
For every \(A \in \mathbb{R}^{n \times n}\),
(2)¶
For every \(A, B \in \mathbb{R}^{n \times n}\),
(3)¶
For every \(v \in \mathbb{R}^n\) and \(A \in \mathbb{R}^{n \times n}\),
(4)¶
For every \(A, B \in \mathbb{R}^{n \times n}\),
Exercise 9¶
Notice that \(\lVert v w^\top \rVert_F = \lVert v w^\top \rVert_2\) is true by definition of matrix norm.
For any \(v, w \in \mathbb{R}^n\),
Exercise 10¶
(1)¶
This solution was adapted from [Olvb].
Let \(\lVert \cdot \rVert\) be any norm on \(\mathbb{R}^{n \times n}\) that obeys (3.1.10). Define \(E, I \in \mathbb{R}^{n \times n}\) such that \(\lVert E \rVert < 1\) and \(\lVert I \rVert = 1\).
Given \(m, n \in \mathbb{Z}^*\) and \(m > n\), \(S_k = I + E + E^2 + \cdots + E^k\) for \(k \in \mathbb{Z}^*\) is a Cauchy sequence because
where \(\epsilon \in \mathbb{R}^+\).
Since \(E\) is a convergent matrix, notice that
and
implies \((I - E)^{-1} = \lim_{k \rightarrow \infty} S_k\).
Combining the foregoing observations with the fact that \(\lVert S_k \rVert\) is a convergent geometric series yield
(2)¶
Let \(A, B \in \mathbb{R}^{n \times n}\), \(A\) is nonsingular, and \(\lVert A^{-1} (B - A) \rVert < 1\). Observe that
Hence \(\left( I - \left( I - A^{-1} B \right) \right)^{-1} = B^{-1} A\) exists. Thus
Exercise 11¶
For any \(A \in \mathbb{R}^{n \times n}\) and any orthonormal set \(\left\{ v_i \in \mathbb{R}^n \right\}_{i = 1}^n\),
Exercise 12¶
(1)¶
For any \(v \in \mathbb{R}^n\) and orthogonal matrix \(Q \in \mathbb{R}^{n \times n}\),
(2)¶
By definition of element-wise matrix norm (3.1.6) and the result of Exercise 3.11, the following holds for every \(A \in \mathbb{R}^{n \times n}\):
(3)¶
For any \(v \in \mathbb{R}^n\) and orthogonal matrix \(Q \in \mathbb{R}^{n \times n}\), the induced matrix norm gives
while the element-wise matrix norm yields
Exercise 13¶
Hence \(x = \begin{bmatrix} 1 & 2 & 3 & 4 & 5 \end{bmatrix}^\top\).
Exercise 14¶
Let \(u \in \mathbb{R}^n\). Observe that \(I - u u^\top\) is a symmetric matrix. In order for that matrix to be orthogonal, the following must hold
The cancellation of terms will occur when \(u^\top u = 2\).
Let \(v \in \mathbb{R}^n \setminus \boldsymbol{0}\) and choose \(u = \frac{\sqrt{2} v}{\lVert v \rVert_2}\) so that
Thus \(Q = I - 2 \frac{u u^\top}{u^\top u}\) will be orthogonal for any \(u \in \mathbb{R}^n\).
The final property that the Householder transformation must satisfy is
where \(e_1 = \begin{bmatrix} 1 & 0 & \cdots & 0 \end{bmatrix}^\top \in \mathbb{R}^n\) and \(\alpha \in \mathbb{R} \setminus 0\).
Notice that for \(2 \leq i \leq n\), \(v_i - u_i \frac{2 u^\top v}{u^\top u} = 0\) constrains \(\frac{2 u^\top v}{u^\top u} = 1\) with \(u_i = v_i\) such that
In the more general case of
where \(\alpha_1, \alpha_2, \ldots, \alpha_k \in \mathbb{R} \setminus 0\), the same analysis
still holds. Hence the solution must satisfy
One solution would be to set \(u_i = 0\) for \(i < k\) and \(u_j = v_j\) for \(k < j\) resulting in
Exercise 15¶
[1]:
import numpy
def QR_Householder(A):
m, n = A.shape
Q = numpy.eye(m)
R = A
for i in range(min(m - 1, n)):
x = R[i:, i]
#use operations of the same sign to avoid cancellation
if x[0] < 0:
u = numpy.hstack((x[0] - numpy.linalg.norm(x), x[1:])).T
else:
u = numpy.hstack((x[0] + numpy.linalg.norm(x), x[1:])).T
Q_i = numpy.eye(m)
I = numpy.eye(len(u))
Q_i[i:,i:] = I - 2 * numpy.outer(u, u.T) / numpy.dot(u.T, u)
Q = numpy.dot(Q, Q_i)
R = numpy.dot(Q.T, A)
return Q, R
A = numpy.asfarray([[1, -1],
[1, 0],
[1, 1],
[1, 2]])
Q, R = QR_Householder(A)
print(Q)
print(R)
print(numpy.round(numpy.dot(Q, R)))
[[-0.5 0.67082039 0.0236068 0.5472136 ]
[-0.5 0.2236068 -0.43934466 -0.71202266]
[-0.5 -0.2236068 0.80786893 -0.21759547]
[-0.5 -0.67082039 -0.39213107 0.38240453]]
[[-2.00000000e+00 -1.00000000e+00]
[-1.11022302e-16 -2.23606798e+00]
[ 0.00000000e+00 -2.22044605e-16]
[-1.11022302e-16 3.33066907e-16]]
[[ 1. -1.]
[ 1. 0.]
[ 1. 1.]
[ 1. 2.]]
Exercise 16¶
Given \(x = \begin{bmatrix} -1 & 1 \end{bmatrix}^\top\),
illustrates that the positive definite criterion is not satisfied. Running Cholesky decomposition gives some negative diagonal matrix values.
[2]:
import numpy
def Cholesky_LDL(A):
n = A.shape[0]
D = numpy.zeros(n)
L = numpy.eye(n)
for i in range(n):
#compute lower triangular
for j in range(i):
_ = A[i,j] - sum([L[i,k] * L[j,k] * D[k] for k in range(j)])
L[i,j] = _ / D[j]
#compute diagonal
D[i] = A[i,i] - sum([L[i,k]**2 * D[k] for k in range(i)])
return L, D
A = numpy.asfarray([[1, 2],
[2, 3]
])
L, D = Cholesky_LDL(A)
print(L, D)
print(numpy.dot(L, numpy.dot(numpy.diag(D), L.T)))
[[1. 0.]
[2. 1.]] [ 1. -1.]
[[1. 2.]
[2. 3.]]
Exercise 17¶
The matrix will no longer be positive definite when \(10\) is replaced by \(7\).
[3]:
A = numpy.asfarray([[4, 6, -2],
[6, 10, 1],
[-2, 1, 22]
])
L, D = Cholesky_LDL(A)
print(L, D)
print(numpy.dot(L, numpy.dot(numpy.diag(D), L.T)))
[[ 1. 0. 0. ]
[ 1.5 1. 0. ]
[-0.5 4. 1. ]] [4. 1. 5.]
[[ 4. 6. -2.]
[ 6. 10. 1.]
[-2. 1. 22.]]
[4]:
A = numpy.asfarray([[4, 6, -2],
[6, 7, 1],
[-2, 1, 22]
])
L, D = Cholesky_LDL(A)
print(L, D)
print(numpy.dot(L, numpy.dot(numpy.diag(D), L.T)))
[[ 1. 0. 0. ]
[ 1.5 1. 0. ]
[-0.5 -2. 1. ]] [ 4. -2. 29.]
[[ 4. 6. -2.]
[ 6. 7. 1.]
[-2. 1. 22.]]
Exercise 18¶
(1)¶
Since the rows and columns of an orthogonal matrix \(Q\) are orthonormal vectors, \(\left\{ \lVert Q_{i.} \rVert_1, \lVert Q_{.j} \rVert_1 \right\} \leq \sqrt{n}\).
Given \(A = QR\), \(\frac{1}{n} \kappa_1(A) \leq \kappa_1(R) \leq n \kappa_1(A)\) because
and
(2)¶
Exercise 19¶
Given
Exercise 20¶
See [CMSW79] for an elegant and concise exposition on estimating the upper triangular condition number.
[5]:
import numpy
def R_solve(R, b):
x = numpy.copy(b)
n = b.shape[0]
d = numpy.diag(R)
x[-1] = x[-1] / d[-1]
for i in range(n - 2, -1, -1):
_ = x[i] - sum(x[i+1:n] * R[i,i+1:n])
x[i] = _ / d[i]
return x
[6]:
def cond_est(R):
#estimate ||R||_1
R_1 = max(numpy.linalg.norm(R, ord=1, axis=0))
#solve R^T z = e
d = numpy.diag(R)
n = d.shape[0]
z = numpy.zeros(d.shape)
z[0] = 1 / d[0]
p = R[0,:] * z[0]
m = numpy.zeros(p.shape)
for j in range(1, n):
zp = (1 - p[j]) / d[j]
zm = (-1 - p[j]) / d[j]
tmp_p = abs(zp)
tmp_m = abs(zm)
m[j+1:n] = p[j+1:n] + R[j,j+1:n] * zm
tmp_m += sum(numpy.abs(m[j+1:n]) / numpy.abs(d[j+1:n]))
p[j+1:n] = p[j+1:n] + R[j,j+1:n] * zp
tmp_p += sum(numpy.abs(p[j+1:n]) / numpy.abs(d[j+1:n]))
if tmp_p >= tmp_m:
z[j] = zp
else:
z[j] = zm
p[j+1:n] = m[j+1:n]
z_1 = numpy.linalg.norm(z)
#solve R y = z
y_1 = numpy.linalg.norm(R_solve(R, z), ord=1)
return R_1 * y_1 / z_1
[7]:
R = numpy.asfarray([[1, 1],
[0, 1e-7]
])
cond_est(R)
[7]:
20000002.000000022
Exercise 21¶
See Lemma 3.4.2 and [GVL12] for an algorithm to robustly compute the Jacobi matrix.
Given \(\hat{R}(\alpha, \beta) \in \mathbb{R}^{n \times n}\) where \(n = 2\), the Schur decomposition states that \(\hat{R} = Q U Q^*\) where \(Q\) is a unitary matrix. Let \(A_{ij} = \mathbb{\hat{R}}(\alpha, \beta)_{ij}\) where \(1 \leq i,j \leq 2\).
and
The eigenvalues are
and
Solving for the eigenvectors yields
and
which shows that \(v_1 = \begin{bmatrix} t_1 \\ -i t_1 \end{bmatrix}\) and \(v_2 = \begin{bmatrix} t_2 \\ i t_2 \end{bmatrix}\) where \(t_1, t_2 \in \mathbb{R}\) because each system of linear equations has a free variable.
Defining
gives
and
In order to satisfy the orthogonality constraint \(Q Q^* = Q^* Q = I\), set \(t_1 = t_2 = \frac{1}{\sqrt{2}}\).
The Schur decomposition allowed \(\hat{R}(\alpha, \beta)\) to be computed more accurately because the values in \(U\) are bounded between two conjugate values and \(Q, Q^*\) can be computed to any decimal place via Newton’s Method.
[8]:
import numpy
def jacobi_rotation(n, i, j, alpha, beta):
_ = numpy.sqrt(alpha**2 + beta**2)
alpha_bar = alpha / _
beta_bar = beta / _
J = numpy.zeros((n,n))
J[i,i] = J[j,j] = alpha_bar
J[j,i] = beta_bar
J[i,j] = -J[j,i]
for k in range(n):
if k != i and k != j:
J[k,k] = 1
return J
Exercise 22¶
[9]:
M = numpy.asfarray([[1, 3, 5],
[2, 4, 6],
[3, 5, 9]])
n = M.shape[0]
#zero out M[2,0]
J_il = jacobi_rotation(n, 2, 0, M[0,0], M[2,0])
J_jl = jacobi_rotation(n, 2, 0, M[2,0], -M[0,0])
print(numpy.dot(J_il, M))
print(numpy.dot(J_jl, M))
[[ 3.16227766e+00 5.69209979e+00 1.01192885e+01]
[ 2.00000000e+00 4.00000000e+00 6.00000000e+00]
[ 5.55111512e-17 -1.26491106e+00 -1.89736660e+00]]
[[-5.55111512e-17 1.26491106e+00 1.89736660e+00]
[ 2.00000000e+00 4.00000000e+00 6.00000000e+00]
[ 3.16227766e+00 5.69209979e+00 1.01192885e+01]]
Exercise 23¶
If \(A\) has full column rank, then the nullspace of \(A\) is zero.¶
This means for \(A v = 0\) to be true, \(v = 0\); since \(x \neq x_*\), \(\lVert Ax - b \rVert_2 > \lVert A x_* - b \rVert_2\).
\(A^\top A\) is nonsingular if and only if \(A\) has full column rank.¶
One way to prove this is to show that \(A^\top A\) has the same nullspace as \(A\). See [Str09] for background on the fundamental theorem of linear algebra, subspaces, and orthogonality.
If \(x\) is in the nullspace of \(A\), then \(Ax = 0\). Multiplying by \(A^\top\) yields \(A^\top A x = 0\), which shows that \(x\) is also in the nullspace of \(A^\top A\).
If \(x\) is in the nullspace of \(A^\top A\), then \(A^\top A x = 0\). Multiplying by \(x^\top\) yields \(x^\top A^\top A x = \lVert A x \rVert_2^2 = 0\), which means every \(x\) in the nullspace of \(A^\top A\) also needs to be in the nullspace of \(A\) in order to satisfy \(A x = 0\).
Exercise 24¶
Since \(A \in \mathbb{R}^{m \times n}\) is full rank, its nullspace is zero and its columns are linearly independent. A set of vectors spans a space if their linear combinations fill the space i.e. the columns of \(A\) span its column space. A basis for a vector space is a sequence of vectors that are linearly independent and span that space. Hence the column space of \(A\), whose dimension is \(n\), could span the entire space \(\mathbb{R}^m\) or only a subspace.
Notice that the QR decomposition of
where \(Q_1 \in \mathbb{R}^{m \times n}\) and \(R_u \in \mathbb{R}^{n \times n}\). Since \(Q\) is an orthogonal matrix, the columns of \(Q_1\) must be orthonormal. Hence the column space of \(Q_1\), whose dimension is \(n\), could span the entire space \(\mathbb{R}^m\) or only a subspace.
By the theorem of Equal Dimensions Yields Equal Subspaces [Bee08], the first \(n\) columns of \(Q\) form an orthonormal basis for the space spanned by the columns of \(A\).
Exercise 25¶
[10]:
A = numpy.asfarray([[1, -1],
[1, 0],
[1, 1],
[1, 2]])
b = numpy.asfarray([[3],
[2],
[0],
[4]])
print(numpy.dot(numpy.dot(numpy.linalg.inv(numpy.dot(A.T, A)), A.T), b))
n = A.shape[1]
Q, R = QR_Householder(A)
print(numpy.dot(numpy.linalg.inv(R[0:n,0:n]), numpy.dot(Q.T, b)[0:n,0:n]))
[[2.2]
[0.1]]
[[2.2]
[0.1]]
Exercise 26¶
Given \(A = LQ\), the equivalent minimization is
Since \(\lVert Q x \rVert_2 = \lVert x \rVert_2\) (3.1.23), defining \(z = Q x\) gives
According to the analysis done in Exercise 3.27, \(z = \begin{bmatrix} L_l^{-1} b \\ 0 \end{bmatrix}\). Thus \(x_* = Q^\top z = Q^\top \begin{bmatrix} L_l^{-1} b \\ 0 \end{bmatrix}\).
Exercise 27¶
Observe that the LQ decomposition of
where \(L_l \in \mathbb{R}^{m \times m}\), \(Q_1 \in \mathbb{R}^{m \times n}\), and \(Q_2 \in \mathbb{R}^{(n - m) \times n}\).
To carry out LQ decomposition, apply QR decomposition to the full column rank matrix
Exercise 28¶
where \(\sigma_1\) is \(A\)’s largest singular value and \(\sigma_n\) is \(A\)’s smallest singular value.
Let \(u_{\cdot j}\) and \(v_{\cdot j}\) be the columns of \(U\) and \(V\). The following are the results of perturbing \(b\):
References
- Bee08
Robert Arnold Beezer. A first course in linear algebra. Beezer, 2008.
- CMSW79
Alan K Cline, Cleve B Moler, George W Stewart, and James H Wilkinson. An estimate for the condition number of a matrix. SIAM Journal on Numerical Analysis, 16(2):368–375, 1979.
- GVL12
Gene H Golub and Charles F Van Loan. Matrix computations. Volume 3. JHU Press, 2012.
- Olva
Peter J. Olver. Numerical analysis lecture notes. http://www-users.math.umn.edu/ olver/num_/lnn.pdf. Accessed on 2017-02-18.
- Olvb
Sheehan Olver. Numerical complex analysis: functional analysis. http://www.maths.usyd.edu.au/u/olver/teaching/NCA/15.pdf. Accessed on 2017-03-04.
- Str09
Gilbert Strang. Introduction to Linear Algebra. Volume 4. Wellesley-Cambridge Press Wellesley, MA, 2009.