Numerical Linear Algebra Background

Solving Systems of Linear Equations — Matrix Factorizations

  • The typical \(\mathbf{A} \mathbf{x} = \mathbf{b}\), where \(\mathbf{A} \in \mathbb{R}^{n \times n}\) and \(\mathbf{b}, \mathbf{x} \in \mathbb{R}^n\), can be solved using QR or PLU.

    • QR’s orthogonal transformations (e.g. Householder) is numerically stable because they do not affect the magnitude of a matrix’s norm.

    • PLU is accurate and half as expensive as QR, but could magnify the elements of \(\mathbf{A}\).

  • Matrices with trivial analytical structure can be solved directly.

    • Permutation

    • Orthogonal

    • Nonsingular diagonal

    • Nonsingular diagonal block

    • Nonsingular lower triangular

    • Nonsingular upper triangular

  • Symmetric Positive Definite matrices can be factorized using Cholesky decomposition: \(\mathbf{A} = \mathbf{L} \mathbf{L}^\top\) or \(\mathbf{A} = \mathbf{L} \mathbf{D} \mathbf{L}^\top\).

  • Symmetric Indefinite matrices can be factorized using the Bunch-Parlett algorithm: \(\mathbf{A} = \mathbf{P} \mathbf{L} \mathbf{D}_B \mathbf{L}^\top \mathbf{P}^\top\) where \(\mathbf{P}\) is a permutation matrix.

Errors in Solving Linear Systems

The condition number \(\kappa_p(\mathbf{A}) = \lVert \mathbf{A} \rVert_p \lVert \mathbf{A}^{-1} \rVert_p\) of a matrix measures its sensitivity to perturbations. It could be used to indicate the matrix’s nearness to singularity. Since computing the inverse is an expensive and unstable operation, the condition number is often estimated as \(\hat{\kappa}_p(\mathbf{A}) = \lVert \mathbf{M} \rVert_p \alpha\). \(\mathbf{M}\) could be \(\mathbf{A}\) or a factor of \(\mathbf{A}\) with similar conditioning, and \(\alpha\) is an estimate of the induced matrix norm \(\lVert \mathbf{M}^{-1} \rVert_p\).

If ill-conditioned systems occur far away from the solution, then perturbing the system into a better-conditioned one is acceptable. If ill-conditioned systems occur near the solution, then the underlying problem is not well posed.

Updating Matrix Factorizations

Jacobi rotation can be used to efficiently update a QR decomposition. Each Jacobi rotation zeros out one element of a matrix while changing only two rows of the matrix.

Eigenvalues and Positive Definiteness

Given a symmetric matrix, the Gerschgorin Theorem (3.5.9) provides a lower bound of the minimum eigenvalue and an upper bound for the maximum eigenvalue. If the matrix is strictly diagonally dominant, it is positive definite.

Linear Least Squares

The geometric interpretation of \(\min_{\mathbf{x}} \lVert \mathbf{A} \mathbf{x} - \mathbf{b} \rVert_2\) is given in Figure 3.6.2. The closest point to \(\mathbf{b}\) in subspace \(C(\mathbf{A})\) will be \(\mathbf{A} \mathbf{x}_* \in C(\mathbf{A})\) such that the residual \(\mathbf{A} \mathbf{x}_* - \mathbf{b}\) is perpendicular to the entire subspace of \(C(\mathbf{A})\). The solution \(\mathbf{A}^\top (\mathbf{A} \mathbf{x}_* - \mathbf{b}) = \boldsymbol{0}\) is known as the normal equation because the residual vector is perpendicular to the column space of \(\mathbf{A}\) i.e. the residual is in the null space of \(\mathbf{A}^\top\). Note that SVD is more expensive than the other matrix factorization techniques, so only use it when \(\mathbf{A}\) is not known to have full row or column rank.

Exercise 1

\[\begin{split}\lim_{p \rightarrow \infty} \lVert x \rVert_p &= \lim_{p \rightarrow \infty} \left( \sum_{i = 1}^n \left\vert x_i \right\vert^p \right)^{1 / p}\\ &= \lim_{p \rightarrow \infty} \left( \sum_i \left\vert \frac{x_i}{M} \right\vert^p M^p \right)^{1 / p} & \quad & M = \max_i \left\{ \left\vert x_i \right\vert \right\} = \left\vert x_m \right\vert\\ &= \lim_{p \rightarrow \infty} \left( \left\vert \frac{x_m}{M} \right\vert^p M^p + \sum_{i \neq m} \left\vert \frac{x_i}{M} \right\vert^p M^p \right)^{1 / p} & \quad & \left\vert \frac{x_i}{M} \right\vert^p < 1 \forall\, i \neq m \text{ and } \left\vert \frac{x_m}{M} \right\vert^p = 1\\ &= \max_i \left\{ \left\vert x_i \right\vert \right\} & \quad & \lim_{p \rightarrow \infty} \left\vert \frac{x_i}{M} \right\vert^p = \delta(i - m)\\ &= \lVert x \rVert_{\infty}.\end{split}\]

Exercise 2

The following proof is from [Olva].

Recall that a norm defines a continuous real-valued function \(f(\mathbf{v}) = \lVert \mathbf{v} \rVert\) on \(\mathbb{R}^n\). Let \(\left\Vert \cdot \right\Vert_1\) and \(\left\Vert \cdot \right\Vert_2\) be any two norms on \(\mathbb{R}^n\). Let \(S_1 = \left\{ \left\Vert \mathbf{u} \right\Vert_1 = 1 \right\}\) denote the unit sphere of the first norm. Any continuous function defined on a compact set achieves both a maximum and a minimum value. Thus, restricting the second norm function to the unit sphere \(S_1\) of the first norm yields

\[c^\star = \min \left\{ \left\Vert \mathbf{u} \right\Vert_2 \mid \mathbf{u} \in S_1 \right\} \quad \text{and} \quad C^\star = \max \left\{ \left\Vert \mathbf{u} \right\Vert_2 \mid \mathbf{u} \in S_1 \right\}.\]

By inspection, \(0 < c^\star \leq C^\star < \infty\) with equality holding if and only if the norms are the same. Hence

\[c^\star \leq \left\Vert \mathbf{u} \right\Vert_2 \leq C^\star \quad \text{when} \quad \left\Vert \mathbf{u} \right\Vert_1 = 1.\]

To generalize this result, define \(\mathbf{u} = \mathbf{v} / \left\Vert \mathbf{v} \right\Vert_1 \in S_1\). Consequently,

\[\left\Vert \mathbf{u} \right\Vert_2 = \left\Vert \frac{\mathbf{v}}{\left\Vert \mathbf{v} \right\Vert_1} \right\Vert_2 = \frac{ \left\Vert \mathbf{v} \right\Vert_2 }{ \left\Vert \mathbf{v} \right\Vert_1 }.\]

Therefore,

\[c^\star \left\Vert \mathbf{v} \right\Vert_1 \leq \left\Vert \mathbf{v} \right\Vert_2 \leq C^\star \left\Vert \mathbf{v} \right\Vert_1.\]

Exercise 3

(1)

\[\lVert v \rVert_1 \geq \lVert v \rVert_2 \geq \lVert v \rVert_\infty\]

because

\[\begin{split}\lVert v \rVert_2^2 &= \sum_{i = 1}^n \lvert v_i \rvert^2\\ &\leq \sum_{i = 1}^n \lvert v_i \rvert \sum_{j = 1}^n \lvert v_j \rvert\\ &= \lVert v \rVert_1 \lVert v \rVert_1\\ &= \lVert v \rVert_1^2\\ \lVert v \rVert_2 &\leq \lVert v \rVert_1\end{split}\]

and

\[\begin{split}\lVert v \rVert_\infty^2 &= \left( \max_{1 \leq i \leq n} v_i \right)^2\\ &\leq v_\max^2 + \sum_{i \neq \max} v_i^2\\ &= \lVert v \rVert_2^2\\ \lVert v \rVert_\infty &\leq \lVert v \rVert_2.\end{split}\]

(2)

\[\begin{split}\lVert v \rVert_1^2 &= \left( \sum_{i = 1}^n \lvert v_i \rvert \right)^2\\ &= \left( \sum_{i = 1}^n \lvert v_i \rvert (1) \right)^2\\ &\leq \lVert v \rVert_2^2 \cdot \lVert \boldsymbol{1} \rVert_2^2 & \quad & \text{Cauchy–Schwarz inequality}\\ &= n \lVert v \rVert_2^2\\ \lVert v \rVert_1 &\leq \sqrt{n} \lVert v \rVert_2\end{split}\]

(3)

\[\begin{split}\lVert v \rVert_2^2 &= \sum_{i = 1}^n \lvert v_i \rvert^2\\ &\leq \sum_{i = 1}^n v_\max^2\\ &= n \lVert v \rVert_\infty^2\\ \lVert v \rVert_2 &\leq \sqrt{n} \lVert v \rVert_\infty\end{split}\]

Exercise 4

The \(p\)-norm of a matrix \(A\) induced by a given vector norm (3.1.7) is

\[\begin{split}\lVert A \rVert_p &= \max_{v \in \mathbb{R}^n \setminus 0} \left\{\frac{\lVert A v \rVert_p}{\lVert v \rVert_p} \right\}\\ &= \frac{\lVert A v_* \rVert_p}{\lVert v_* \rVert_p}\\ &= \frac{1}{\lVert v_* \rVert_p} \left( \sum_{i} \lvert A_{i.} v_* \rvert^p \right)^{1 / p}\\ &\geq 0\end{split}\]

for every \(A \in \mathbb{R}^{n \times n}\) and \(\lVert A \rVert_p = 0\) only if \(\lVert A v_* \rVert_p = 0\). An induced matrix norm is said to be compatible or consistent if \(\lVert A v \rVert_p \leq \lVert A \rVert_p \lVert v \rVert_p\).

(1)

Given that \(A \in \mathbb{R}^{n \times n}\) is invertible, the following property holds

\[\begin{split}\lVert A^{-1} \rVert_p &= \left( \min_{v \in \mathbb{R}^n \setminus 0} \frac{\lVert A v \rVert_p}{\lVert v \rVert_p} \right)^{-1} & \quad & \text{(3.1.18)}\\ \frac{1}{\lVert A^{-1} \rVert_p} &= \min_{v \in \mathbb{R}^n \setminus 0} \frac{\lVert A v \rVert_p}{\lVert v \rVert_p}\\ &\leq \max_{v \in \mathbb{R}^n \setminus 0} \frac{\lVert A v \rVert_p}{\lVert v \rVert_p}\\ \frac{\lVert v_* \rVert_p}{\lVert A^{-1} \rVert_p} &\leq \lVert A v_* \rVert_p.\end{split}\]

(2)

For every \(\alpha \in \mathbb{R}\) and \(A \in \mathbb{R}^{n \times n}\),

\[\begin{split}\lVert \alpha A \rVert_p &= \max_{v \in \mathbb{R}^n \setminus 0} \left\{\frac{\lVert \alpha A v \rVert_p}{\lVert v \rVert_p} \right\}\\ &= \frac{\lVert \alpha A v_* \rVert_p}{\lVert v_* \rVert_p}\\ &= \frac{1}{\lVert v_* \rVert_p} \left( \sum_{i} \lvert \alpha A_{i.} v_* \rvert^p \right)^{1 / p}\\ &= \frac{\lvert \alpha \rvert}{\lVert v_* \rVert_p} \left( \sum_{i} \lvert A_{i.} v_* \rvert^p \right)^{1 / p}\\ &= \lvert \alpha \rvert \lVert A \rVert_p.\end{split}\]

(3)

For every \(A, B \in \mathbb{R}^{n \times n}\),

\[\begin{split}\lVert A + B \rVert_p &= \max_{v \in \mathbb{R}^n \setminus 0} \left\{\frac{\lVert (A + B) v \rVert_p}{\lVert v \rVert_p} \right\}\\ &= \frac{\lVert (A + B) v_* \rVert_p}{\lVert v_* \rVert_p}\\ &= \frac{1}{\lVert v_* \rVert_p} \left( \sum_{i} \lvert (A_{i.} + B_{i.}) v_* \rvert^p \right)^{1 / p}\\ &\leq \frac{1}{\lVert v_* \rVert_p} \left( \left( \sum_{i} \lvert A_{i.} v_* \rvert^p \right)^{1 / p} + \left( \sum_{i} \lvert B_{i.} v_* \rvert^p \right)^{1 / p} \right) & \quad & \text{Minkowski inequality}\\ &= \lVert A \rVert_p + \lVert B \rVert_p.\end{split}\]

(4)

For every \(A, B \in \mathbb{R}^{n \times n}\),

\[\begin{split}\lVert A B \rVert_p &= \max_{v \in \mathbb{R}^n \setminus 0} \left\{\frac{\lVert A B v \rVert_p}{\lVert v \rVert_p} \right\}\\ &= \frac{\lVert A B v^* \rVert_p}{\lVert v^* \rVert_p}\\ &\leq \frac{\lVert A \rVert_p \lVert B v \rVert_p}{\lVert v \rVert_p}\\ &\leq \frac{ \lVert A \rVert_p \lVert B \rVert_p \lVert v \rVert_p }{ \lVert v \rVert_p }\\ &= \lVert A \rVert_p \lVert B \rVert_p.\end{split}\]

Exercise 5

(1)

For every \(A \in \mathbb{R}^{n \times n}\) and \(v \in \mathbb{R}^n\),

\[\begin{split}\lVert A v \rVert_1 &= \sum_i \lvert A_i v \rvert\\ &= \sum_i \lvert \sum_j A_{ij} v_j \rvert\\ &\leq \sum_i \sum_j \lvert A_{ij} \rvert \lvert v_j \rvert\\ &= \sum_j \left( \sum_i \lvert A_{ij} \rvert \right) \lvert v_j \rvert\\ &\leq \max_{1 \leq j \leq n} \left( \sum_i \lvert A_{ij} \rvert \right) \lVert v \rVert_1\\ \frac{\lVert A v \rVert_1}{\lVert v \rVert_1} &\leq \max_{1 \leq j \leq n} \sum_i \lvert A_{ij} \rvert.\end{split}\]

Hence the induced matrix norm \(\lVert A \rVert_1 \leq \max_j \left\{ \lVert A_{.j} \rVert_1 \right\}\).

(2)

For every \(A \in \mathbb{R}^{n \times n}\) and \(v \in \mathbb{R}^n\),

\[\begin{split}\lVert A v \rVert_\infty &= \max_{1 \leq i \leq n} \lvert A_i v \rvert\\ &= \max_{1 \leq i \leq n} \lvert \sum_j A_{ij} v_j \rvert\\ &\leq \max_{1 \leq i \leq n} \sum_j \lvert A_{ij} \rvert \lVert v \rVert_\infty\\ \frac{\lVert A v \rVert_\infty}{\lVert v \rVert_\infty} &\leq \max_{1 \leq i \leq n} \sum_j \lvert A_{ij} \rvert.\end{split}\]

Hence the induced matrix norm \(\lVert A \rVert_\infty \leq \max_i \left\{ \lVert A_{i.} \rVert_1 \right\}\).

Exercise 6

For every \(A \in \mathbb{R}^{n \times n}\) and \(w \in \mathbb{R}^n\),

\[\begin{split}\lVert A w \rVert_2^2 &= \sum_i \lvert A_i w \rvert^2\\ &= w^\top A^\top A w\\ &= \sum_{i = 1}^n \sum_{j = 1}^n \alpha_i v_i^\top A^\top A \alpha_j v_j & \quad & \text{by definition of orthonormal basis and Theorem 3.5.3}\\ &= \sum_{i = 1}^n \sum_{j = 1}^n \alpha_i v_i^\top \alpha_j \lambda_j v_j & \quad & \text{by definition of eigenvectors and eigenvalues}\\ &= \sum_{i = 1}^n \sum_{j = 1}^n \lambda_j \alpha_i \alpha_j v_i^\top v_j\\ &= \sum_{i = 1}^n \lambda_i \alpha_i^2\\ &\leq \max_\lambda \left\{ A^\top A \right\} w^\top w & \quad & \text{where}\ w = \sum_{i = 1}^n \alpha_i v_i\\ &= \max_\lambda \left\{ A^\top A \right\} \lVert w \rVert_2^2\\ \frac{\lVert A w \rVert_2}{\lVert w \rVert_2} &\leq \left( \max_\lambda \left\{ A^\top A \right\} \right)^{1/2}.\end{split}\]

Hence the induced matrix norm \(\lVert A \rVert_2 \leq \left( \max_{\lambda} \left\{ A^\top A \right\} \right)^{1/2}\).

By inspection, \(\min_w \frac{\lVert A w \rVert_2}{\lVert w \rVert_2} \geq \left( \min_{\lambda} \left\{ A^\top A \right\} \right)^{1/2}\).

Thus, the maximum and minimum value can be attained via picking \(w\) to be the eigenvector corresponding to \(\lambda_\max\) and \(\lambda_\min\) respectively.

Exercise 7

Recall that

\[\text{tr}(A^\top B) = \sum_i A^\top_i B_{.i} = a^\top b\]

where \(a = \begin{bmatrix} {A_{.0}}^\top & \cdots & {A_{.n}}^\top \end{bmatrix}^\top\) and \(b = \begin{bmatrix} {B_{.0}}^\top & \cdots & {B_{.n}}^\top \end{bmatrix}^\top\).

The Frobenius norm of \(A\) can be derived as

\[\begin{split}\text{tr}\left( A^\top A \right) &= \sum_i A^\top_i A_{.i}\\ &= \sum_i \sum_j A^\top_{ij} A_{ji}\\ &= \sum_j \sum_i \lvert A_{ji} \rvert^2\\ \text{tr}\left( A^\top A \right)^{1 / 2} &= \left( \sum_j \sum_i \lvert A_{ij} \rvert^2 \right)^{1 / 2}\\ &= \lVert A \rVert_F.\end{split}\]

Exercise 8

The following are using matrix norms.

(1)

For every \(A \in \mathbb{R}^{n \times n}\),

\[\begin{split}\lVert A \rVert_{p} &= \left( \sum_j \sum_i \lvert A_{ij} \rvert^p \right)^{1 / p}\\ &= \left( \sum_j \sum_i \lvert A_{ji} \rvert^p \right)^{1 / p}\\ &= \left( \sum_j \sum_i \lvert A^\top_{ij} \rvert^p \right)^{1 / p}\\ &= \lVert A^\top \rVert_{p}.\end{split}\]

(2)

For every \(A, B \in \mathbb{R}^{n \times n}\),

\[\begin{split}\lVert AB \rVert_F^2 &= \sum_i \sum_j \lvert (AB)_{ij} \rvert^2\\ &= \sum_i \sum_j \lvert A_{i.} B_{.j} \rvert^2\\ &= \sum_i \sum_j \left\vert \sum_k A_{ik} B_{kj} \right\vert^2\\ &\leq \sum_i \sum_j \sum_k \lvert A_{ik} \rvert^2 \lvert B_{kj} \rvert^2\\ \lVert AB \rVert_F &\leq \min\left\{ \lVert A \rVert_F \lVert B \rVert_2, \lVert A \rVert_2 \lVert B \rVert_F \right\}.\end{split}\]

(3)

For every \(v \in \mathbb{R}^n\) and \(A \in \mathbb{R}^{n \times n}\),

\[\begin{split}\lVert A v \rVert_2^2 &= \sum_i \lvert A_i v \rvert^2\\ &= \sum_i \left\vert \sum_j A_{ij} v_j \right\vert^2\\ &\leq \sum_i \sum_j \lvert A_{ij} \rvert^2 \lvert v_j \rvert^2\\ &\leq \sum_i \sum_j \lvert A_{ij} \rvert^2 \cdot \left( \sum_k \lvert v_k \rvert^2 \right)\\ \lVert A v \rVert_2 &\leq \lVert A \rVert_F \lVert v \rVert_2.\end{split}\]

(4)

For every \(A, B \in \mathbb{R}^{n \times n}\),

\[\begin{split}\lVert A + B \rVert_F^2 &= \sum_k \lvert A_k + B_k \rvert^2\\ &= \lVert A \rVert_F^2 + \lVert B \rVert_F^2 + \sum_k \lvert 2 A_k B_k \rvert\\ &\leq \lVert A \rVert_F^2 + \lVert B \rVert_F^2 + 2 \lVert A \rVert_F \lVert B \rVert_F & \quad & \text{Cauchy–Schwarz inequality (3.1.10)}\\ \lVert A + B \rVert_F &\leq \lVert A \rVert_F + \lVert B \rVert_F.\end{split}\]

Exercise 9

Notice that \(\lVert v w^\top \rVert_F = \lVert v w^\top \rVert_2\) is true by definition of matrix norm.

For any \(v, w \in \mathbb{R}^n\),

\[\begin{split}\lVert v w^\top \rVert_2^2 &= \sum_i \sum_j \left\vert \left( v w^\top \right)_{ij} \right\vert^2\\ &= \sum_i \sum_j \lvert v_i \rvert^2 \lvert w_j \rvert^2\\ &= \sum_i \lvert v_i \rvert^2 \sum_j \lvert w_j \rvert^2\\ \lVert v w^\top \rVert_2 &= \lVert v \rVert_2 \lVert w \rVert_2.\end{split}\]

Exercise 10

(1)

This solution was adapted from [Olvb].

Let \(\lVert \cdot \rVert\) be any norm on \(\mathbb{R}^{n \times n}\) that obeys (3.1.10). Define \(E, I \in \mathbb{R}^{n \times n}\) such that \(\lVert E \rVert < 1\) and \(\lVert I \rVert = 1\).

Given \(m, n \in \mathbb{Z}^*\) and \(m > n\), \(S_k = I + E + E^2 + \cdots + E^k\) for \(k \in \mathbb{Z}^*\) is a Cauchy sequence because

\[\begin{split}\lVert S_m - S_n \rVert &= \lVert E^{n + 1} + \cdots + E^m \rVert\\ &\leq \lVert E \rVert^{n + 1} + \cdots + \lVert E \rVert^m\\ &= \epsilon\end{split}\]

where \(\epsilon \in \mathbb{R}^+\).

Since \(E\) is a convergent matrix, notice that

\[\begin{split}(I - E) S_k &= (I - E) (I + E + E^2 + \cdots + E^k)\\ &= I - E + E - E^2 + E^2 - E^3 + \cdots - E^{k + 1}\\ &= I - E^{k + 1}\end{split}\]

and

\[\begin{split}(I - E) \lim_{k \rightarrow \infty} S_k &= \lim_{k \rightarrow \infty} I - E^{k + 1}\\ &= I - \lim_{k \rightarrow \infty} E^{k + 1}\\ &= I\end{split}\]

implies \((I - E)^{-1} = \lim_{k \rightarrow \infty} S_k\).

Combining the foregoing observations with the fact that \(\lVert S_k \rVert\) is a convergent geometric series yield

\[\begin{split}\lVert (I - E)^{-1} \rVert &= \lim_{k \rightarrow \infty} \lVert S_k \rVert\\ &\leq \sum_{i = 0}^\infty \lVert E \rVert^i\\ &= \frac{1}{1 - \lVert E \rVert}.\end{split}\]

(2)

Let \(A, B \in \mathbb{R}^{n \times n}\), \(A\) is nonsingular, and \(\lVert A^{-1} (B - A) \rVert < 1\). Observe that

\[\begin{split}\lVert A^{-1} (B - A) \rVert &= \lVert A^{-1} B - I \rVert\\ &= \lVert -(I - A^{-1} B) \rVert\\ &= \lvert -1 \rvert \lVert I - A^{-1} B \rVert\\ &< 1.\end{split}\]

Hence \(\left( I - \left( I - A^{-1} B \right) \right)^{-1} = B^{-1} A\) exists. Thus

\[\begin{split}\left\Vert \left( I - A^{-1} (B - A) \right)^{-1} \right\Vert &\leq \frac{1}{1 - \lVert A^{-1} (B - A) \rVert} & \quad & \text{(3.1.19)}\\ \lVert B^{-1} A \rVert &\leq \frac{1}{1 - \lVert A^{-1} (B - A) \rVert}\\ \lVert B^{-1} \rVert \lVert A \rVert &\leq \frac{1}{1 - \lVert A^{-1} (B - A) \rVert} & \quad & \text{(3.1.10)}\\ \lVert B^{-1} \rVert &\leq \frac{1}{1 - \lVert A^{-1} (B - A) \rVert} \frac{1}{ \max_{v \in \mathbb{R}^n, v \neq 0} \frac{\lVert A v \rVert}{\lVert v \rVert} }\\ &\leq \frac{1}{1 - \lVert A^{-1} (B - A) \rVert} \frac{1}{ \min_{v \in \mathbb{R}^n, v \neq 0} \frac{\lVert A v \rVert}{\lVert v \rVert} }\\ \lVert B^{-1} \rVert &\leq \frac{\lVert A^{-1} \rVert}{1 - \lVert A^{-1} (B - A) \rVert} & \quad & \text{(3.1.18).}\end{split}\]

Exercise 11

For any \(A \in \mathbb{R}^{n \times n}\) and any orthonormal set \(\left\{ v_i \in \mathbb{R}^n \right\}_{i = 1}^n\),

\[\begin{split}\sum_{i = 1}^n \lVert A v_i \rVert_2^2 &= \lVert A \begin{bmatrix} v_0 & \cdots & v_n \end{bmatrix} \rVert_F^2\\ &= \lVert A Q \rVert_F^2\\ &= \text{tr}\left( Q^\top A^\top A Q \right)\\ &= \text{tr}\left( A^\top A Q Q^\top \right)\\ &= \text{tr}\left( A^\top A \right)\\ &= \lVert A \rVert_F^2.\end{split}\]

Exercise 12

(1)

For any \(v \in \mathbb{R}^n\) and orthogonal matrix \(Q \in \mathbb{R}^{n \times n}\),

\[\begin{split}\lVert Q v \rVert_2^2 &= v^\top Q^\top Q v\\ &= v^\top v\\ \lVert Q v \rVert_2 &= \lVert v \rVert_2.\end{split}\]

(2)

By definition of element-wise matrix norm (3.1.6) and the result of Exercise 3.11, the following holds for every \(A \in \mathbb{R}^{n \times n}\):

\[\begin{split}\begin{aligned} \lVert Q A \rVert_2 &= \lVert Q A \rVert_F\\ &= \lVert A \rVert_F\\ &= \lVert A \rVert_2 \end{aligned} \quad \text{and} \quad \begin{aligned} \lVert A Q \rVert_2 &= \lVert A Q \rVert_F\\ &= \lVert A \rVert_F\\ &= \lVert A \rVert_2. \end{aligned}\end{split}\]

(3)

For any \(v \in \mathbb{R}^n\) and orthogonal matrix \(Q \in \mathbb{R}^{n \times n}\), the induced matrix norm gives

\[\begin{split}\lVert Q \rVert_2 &= \max_{v \in \mathbb{R}^n, v \neq 0} \frac{\lVert Q v \rVert_2}{\lVert v \rVert_2}\\ &= \frac{\left( v_*^\top Q^\top Q v_* \right)^{1/2}}{\lVert v_* \rVert_2}\\ &= \frac{\left( v_*^\top v_* \right)^{1/2}}{\lVert v_* \rVert_2}\\ &= \frac{\lVert v_* \rVert_2}{\lVert v_* \rVert_2}\\ &= 1\end{split}\]

while the element-wise matrix norm yields

\[\begin{split}\lVert Q \rVert_2 &= \lVert Q \rVert_F\\ &= \text{tr}(Q^\top Q)^{1/2}\\ &= \text{tr}(I)^{1/2}\\ &= \sqrt{n}.\end{split}\]

Exercise 13

\[2 x_2 = 6 \Rightarrow x_2 = 3.\]
\[\begin{split}\begin{bmatrix} 1 & 2\\ 3 & 1 \end{bmatrix} \begin{bmatrix} x_0\\ x_1 \end{bmatrix} = \begin{bmatrix} 5\\ 5 \end{bmatrix} \begin{matrix} \\ R_2 - 3 R_1 \end{matrix} \Rightarrow \begin{bmatrix} 1 & 2\\ 0 & -5 \end{bmatrix} \begin{bmatrix} x_0\\ x_1 \end{bmatrix} = \begin{bmatrix} 5\\ -10 \end{bmatrix} \Rightarrow \begin{aligned} -5 x_1 &= -10\\ x_1 &= 2 \end{aligned} \Rightarrow \begin{aligned} x_0 + 2 x_1 &= 5\\ x_0 &= 1 \end{aligned}\end{split}\]
\[\begin{split}\begin{bmatrix} -1 & 4\\ 2 & -1 \end{bmatrix} \begin{bmatrix} x_3\\ x_4 \end{bmatrix} = \begin{bmatrix} 16\\ 3 \end{bmatrix} \begin{matrix} \\ R_2 - (-2) R_1 \end{matrix} \Rightarrow \begin{bmatrix} -1 & 4\\ 0 & 7 \end{bmatrix} \begin{bmatrix} x_3\\ x_4 \end{bmatrix} = \begin{bmatrix} 16\\ 35 \end{bmatrix} \Rightarrow \begin{aligned} 7 x_4 &= 35\\ x_4 &= 5 \end{aligned} \Rightarrow \begin{aligned} 2 x_3 - x_4 &= 3\\ x_3 &= 4 \end{aligned}\end{split}\]

Hence \(x = \begin{bmatrix} 1 & 2 & 3 & 4 & 5 \end{bmatrix}^\top\).

Exercise 14

Let \(u \in \mathbb{R}^n\). Observe that \(I - u u^\top\) is a symmetric matrix. In order for that matrix to be orthogonal, the following must hold

\[\begin{split}(I - u u^\top)^\top (I - u u^\top) &= I - 2 u u^\top + u u^\top u u^\top\\ &= I - 2 u u^\top + u (u^\top u) u^\top\\ &= I.\end{split}\]

The cancellation of terms will occur when \(u^\top u = 2\).

Let \(v \in \mathbb{R}^n \setminus \boldsymbol{0}\) and choose \(u = \frac{\sqrt{2} v}{\lVert v \rVert_2}\) so that

\[\begin{split}u^\top u &= \left( \frac{\sqrt{2}}{\lVert v \rVert_2} v^\top \right) \left( \frac{\sqrt{2}}{\lVert v \rVert_2} v \right)\\ &= 2 \frac{v^\top v}{\lVert v \rVert_2^2}\\ &= 2.\end{split}\]

Thus \(Q = I - 2 \frac{u u^\top}{u^\top u}\) will be orthogonal for any \(u \in \mathbb{R}^n\).

The final property that the Householder transformation must satisfy is

\[\begin{split}\left( I - 2 \frac{u u^\top}{u^\top u} \right) v &= v - 2 \frac{u u^\top}{u^\top u} v\\ &= v - u \frac{2 u^\top v}{u^\top u}\\ &= \alpha e_1\end{split}\]

where \(e_1 = \begin{bmatrix} 1 & 0 & \cdots & 0 \end{bmatrix}^\top \in \mathbb{R}^n\) and \(\alpha \in \mathbb{R} \setminus 0\).

Notice that for \(2 \leq i \leq n\), \(v_i - u_i \frac{2 u^\top v}{u^\top u} = 0\) constrains \(\frac{2 u^\top v}{u^\top u} = 1\) with \(u_i = v_i\) such that

\[\begin{split}2 u^\top v &= u^\top u\\ 2 \left( u_1 v_1 + v_2^2 + \ldots + v_n^2 \right) &= \left( u_1^2 + v_2^2 + \ldots + v_n^2 \right)\\ 0 &= u_1^2 - 2 u_1 v_1 - v_2^2 - v_3^2 - \ldots - v_n^2\\ u_1 &= \frac{-(-2 v_1) \pm \sqrt{(-2 v_1)^2 - 4 (1) (- v_2^2 - v_3^2 - \ldots - v_n^2)}}{2 (1)}\\ &= \frac{2 v_1 \pm \sqrt{4 (v_1^2 + v_2^2 + \ldots + v_n^2)}}{2}\\ &= v_1 \pm \sqrt{v_1^2 + v_2^2 + \ldots + v_n^2}.\end{split}\]

In the more general case of

\[\begin{split}Q v = \begin{bmatrix} \alpha_1\\ \alpha_2\\ \vdots\\ \alpha_k\\ 0\\ 0\\ \vdots\\ \end{bmatrix}\end{split}\]

where \(\alpha_1, \alpha_2, \ldots, \alpha_k \in \mathbb{R} \setminus 0\), the same analysis

\[\begin{split}2 u^\top v &= u^\top u\\ 2 \left( u_1 v_1 + v_2^2 + \ldots + v_n^2 \right) &= \left( u_1^2 + v_2^2 + \ldots + v_n^2 \right)\\ 0 &= u_1^2 - 2 u_1 v_1 + u_2^2 - 2 u_2 v_2 + \ldots + u_k^2 - 2 u_k v_k - v_{k + 1}^2 - v_{k + 2}^2 - \ldots - v_n^2\end{split}\]

still holds. Hence the solution must satisfy

\[u_1^2 - 2 u_1 v_1 + u_2^2 - 2 u_2 v_2 + \ldots + u_k^2 - 2 u_k v_k = \lVert v \rVert^2 - v_1^2 - v_2^2 - \ldots - v_k^2.\]

One solution would be to set \(u_i = 0\) for \(i < k\) and \(u_j = v_j\) for \(k < j\) resulting in

\[\begin{split}u_k^2 - 2 u_k v_k &= \lVert v \rVert^2 - v_1^2 - v_2^2 - \ldots - v_k^2\\ u_k^2 - 2 u_k v_k + v_1^2 + v_2^2 + \ldots + v_k^2 - \lVert v \rVert^2 &= 0\\ u_k &= \frac{ -(-2 v_k) \pm \sqrt{ (-2 v_k)^2 - 4 (1) (v_1^2 + v_2^2 + \ldots + v_k^2 - \lVert v \rVert^2) } }{2 (1)}\\ u_k &= \frac{ 2 v_k \pm \sqrt{ 4 \lVert v \rVert^2 - 4 \left( v_1^2 + v_2^2 + \ldots + v_{k - 1}^2 \right) } }{2}\\ u_k &= \frac{ 2 v_k \pm \sqrt{ 4 \left( v_k^2 + v_{k + 1}^2 + \ldots + v_n^2 \right) } }{2}\\ u_k &= v_k \pm \sqrt{v_k^2 + v_{k + 1}^2 + \ldots + v_n^2}.\end{split}\]

Exercise 15

[1]:
import numpy

def QR_Householder(A):
    m, n = A.shape
    Q = numpy.eye(m)
    R = A
    for i in range(min(m - 1, n)):
        x = R[i:, i]
        #use operations of the same sign to avoid cancellation
        if x[0] < 0:
            u = numpy.hstack((x[0] - numpy.linalg.norm(x), x[1:])).T
        else:
            u = numpy.hstack((x[0] + numpy.linalg.norm(x), x[1:])).T
        Q_i = numpy.eye(m)
        I = numpy.eye(len(u))
        Q_i[i:,i:] = I - 2 * numpy.outer(u, u.T) / numpy.dot(u.T, u)
        Q = numpy.dot(Q, Q_i)
        R = numpy.dot(Q.T, A)
    return Q, R

A = numpy.asfarray([[1, -1],
                    [1, 0],
                    [1, 1],
                    [1, 2]])
Q, R = QR_Householder(A)
print(Q)
print(R)
print(numpy.round(numpy.dot(Q, R)))
[[-0.5         0.67082039  0.0236068   0.5472136 ]
 [-0.5         0.2236068  -0.43934466 -0.71202266]
 [-0.5        -0.2236068   0.80786893 -0.21759547]
 [-0.5        -0.67082039 -0.39213107  0.38240453]]
[[-2.00000000e+00 -1.00000000e+00]
 [-1.11022302e-16 -2.23606798e+00]
 [ 0.00000000e+00 -2.22044605e-16]
 [-1.11022302e-16  3.33066907e-16]]
[[ 1. -1.]
 [ 1.  0.]
 [ 1.  1.]
 [ 1.  2.]]

Exercise 16

Given \(x = \begin{bmatrix} -1 & 1 \end{bmatrix}^\top\),

\[\begin{split}x^\top \begin{bmatrix} 1 & 2 \\ 2 & 3 \end{bmatrix} x &= x^\top \begin{bmatrix} -1 + 2 \\ -2 + 3 \end{bmatrix}\\ &= 1 - 2 - 2 + 3\\ &= 0\end{split}\]

illustrates that the positive definite criterion is not satisfied. Running Cholesky decomposition gives some negative diagonal matrix values.

[2]:
import numpy

def Cholesky_LDL(A):
    n = A.shape[0]
    D = numpy.zeros(n)
    L = numpy.eye(n)

    for i in range(n):
        #compute lower triangular
        for j in range(i):
            _ = A[i,j] - sum([L[i,k] * L[j,k] * D[k] for k in range(j)])
            L[i,j] = _ / D[j]

        #compute diagonal
        D[i] = A[i,i] - sum([L[i,k]**2 * D[k] for k in range(i)])
    return L, D

A = numpy.asfarray([[1, 2],
                    [2, 3]
                    ])
L, D = Cholesky_LDL(A)
print(L, D)
print(numpy.dot(L, numpy.dot(numpy.diag(D), L.T)))
[[1. 0.]
 [2. 1.]] [ 1. -1.]
[[1. 2.]
 [2. 3.]]

Exercise 17

The matrix will no longer be positive definite when \(10\) is replaced by \(7\).

[3]:
A = numpy.asfarray([[4, 6, -2],
                    [6, 10, 1],
                    [-2, 1, 22]
                    ])
L, D = Cholesky_LDL(A)
print(L, D)
print(numpy.dot(L, numpy.dot(numpy.diag(D), L.T)))
[[ 1.   0.   0. ]
 [ 1.5  1.   0. ]
 [-0.5  4.   1. ]] [4. 1. 5.]
[[ 4.  6. -2.]
 [ 6. 10.  1.]
 [-2.  1. 22.]]
[4]:
A = numpy.asfarray([[4, 6, -2],
                    [6, 7, 1],
                    [-2, 1, 22]
                    ])
L, D = Cholesky_LDL(A)
print(L, D)
print(numpy.dot(L, numpy.dot(numpy.diag(D), L.T)))
[[ 1.   0.   0. ]
 [ 1.5  1.   0. ]
 [-0.5 -2.   1. ]] [ 4. -2. 29.]
[[ 4.  6. -2.]
 [ 6.  7.  1.]
 [-2.  1. 22.]]

Exercise 18

(1)

Since the rows and columns of an orthogonal matrix \(Q\) are orthonormal vectors, \(\left\{ \lVert Q_{i.} \rVert_1, \lVert Q_{.j} \rVert_1 \right\} \leq \sqrt{n}\).

Given \(A = QR\), \(\frac{1}{n} \kappa_1(A) \leq \kappa_1(R) \leq n \kappa_1(A)\) because

\[\begin{split}\kappa_1(A) &= \lVert A \rVert_1 \lVert A^{-1} \rVert_1\\ &= \lVert Q R \rVert_1 \lVert R^{-1} Q^\top \rVert_1\\ &\leq \lVert Q \rVert_1 \lVert R \rVert_1 \lVert R^{-1} \rVert_1 \lVert Q^\top \rVert_1 & \quad & \text{(3.1.10)}\\ &\leq n \lVert R \rVert_1 \lVert R^{-1} \rVert_1 & \quad & \text{(3.1.8a)}\\ \frac{1}{n} \kappa_1(A) &\leq \kappa_1(R)\end{split}\]

and

\[\begin{split}\kappa_1(A) &= \left( \max_{v \in \mathbb{R}^n, v \neq 0} \frac{\lVert Q R v \rVert_1}{\lVert v \rVert_1} \right) \left( \min_{v \in \mathbb{R}^n, v \neq 0} \frac{\lVert Q R v \rVert_1}{\lVert v \rVert_1} \right)^{-1} & \quad & \text{(3.1.18)}\\ &= \left( \max_{v \in \mathbb{R}^n, v \neq 0} \frac{ \sum_i \left\vert Q_{i \cdot} \cdot R v \right\vert }{\lVert v \rVert_1} \right) \left( \min_{v \in \mathbb{R}^n, v \neq 0} \frac{ \sum_i \left\vert Q_{i \cdot} \cdot R v \right\vert }{\lVert v \rVert_1} \right)^{-1}\\ &= \left( \max_{v \in \mathbb{R}^n, v \neq 0} \frac{ \sum_i \left\Vert Q_{i \cdot} \right\Vert_2 \left\Vert R v \right\Vert_2 \left\vert \cos \theta_i \right\vert }{\lVert v \rVert_1} \right) \left( \min_{v \in \mathbb{R}^n, v \neq 0} \frac{ \sum_i \left\Vert Q_{i \cdot} \right\Vert_2 \left\Vert R v \right\Vert_2 \left\vert \cos \theta_i \right\vert }{\lVert v \rVert_1} \right)^{-1} & \quad & \text{(3.1.21)}\\ &= \left( \max_{v \in \mathbb{R}^n, v \neq 0} \frac{\left\Vert R v \right\Vert_2}{\lVert v \rVert_1} \right) \left( \min_{v \in \mathbb{R}^n, v \neq 0} \frac{\left\Vert R v \right\Vert_2}{\lVert v \rVert_1} \right)^{-1}\\ &\geq \left( \max_{v \in \mathbb{R}^n, v \neq 0} \frac{\left\Vert R v \right\Vert_1}{\sqrt{n} \lVert v \rVert_1} \right) \left( \min_{v \in \mathbb{R}^n, v \neq 0} \frac{\sqrt{n} \left\Vert R v \right\Vert_1}{\lVert v \rVert_1} \right)^{-1} & \quad & \text{(3.1.5)}\\ &= \frac{1}{n} \lVert R \rVert_1 \lVert R^{-1} \rVert_1\\ n \kappa_1(A) &\geq \kappa_1(R).\end{split}\]

(2)

\[\begin{split}\kappa_2(A) &= \lVert A \rVert_2 \lVert A^{-1} \rVert_2\\ &= \left( \max_{v \in \mathbb{R}^n, v \neq 0} \frac{\lVert Q R v \rVert_2}{\lVert v \rVert_2} \right) \left( \min_{v \in \mathbb{R}^n, v \neq 0} \frac{\lVert Q R v \rVert_2}{\lVert v \rVert_2} \right)^{-1} & \quad & \text{(3.1.18)}\\ &= \left( \max_{v \in \mathbb{R}^n, v \neq 0} \frac{\lVert R v \rVert_2}{\lVert v \rVert_2} \right) \left( \min_{v \in \mathbb{R}^n, v \neq 0} \frac{\lVert R v \rVert_2}{\lVert v \rVert_2} \right)^{-1}\\ &= \lVert R \rVert_2 \lVert R^{-1} \rVert_2\\ &= \kappa_2(R).\end{split}\]

Exercise 19

Given

\[\begin{split}R = \begin{bmatrix} 1 & 1\\ 0 & 10^{-7} \end{bmatrix} \quad \text{and} \quad R^{-1} = \begin{bmatrix} 1 & -10^{7}\\ 0 & 10^7 \end{bmatrix},\end{split}\]
\[\begin{split}\kappa_1(R) &= \left\Vert R \right\Vert_1 \left\Vert R^{-1} \right\Vert_1\\ &= \left( \max_j \left\Vert R_{\cdot j} \right\Vert_1 \right) \left( \max_j \left\Vert R^{-1}_{\cdot j} \right\Vert_1 \right)\\ &= \left\Vert R_{\cdot 2} \right\Vert_1 \left\Vert R^{-1}_{\cdot 2} \right\Vert_1\\ &= 20000002.\end{split}\]

Exercise 20

See [CMSW79] for an elegant and concise exposition on estimating the upper triangular condition number.

[5]:
import numpy

def R_solve(R, b):
    x = numpy.copy(b)
    n = b.shape[0]
    d = numpy.diag(R)

    x[-1] = x[-1] / d[-1]
    for i in range(n - 2, -1, -1):
        _ = x[i] - sum(x[i+1:n] * R[i,i+1:n])
        x[i] = _ / d[i]
    return x
[6]:
def cond_est(R):
    #estimate ||R||_1
    R_1 = max(numpy.linalg.norm(R, ord=1, axis=0))

    #solve R^T z = e
    d = numpy.diag(R)
    n = d.shape[0]
    z = numpy.zeros(d.shape)

    z[0] = 1 / d[0]
    p = R[0,:] * z[0]
    m = numpy.zeros(p.shape)
    for j in range(1, n):
        zp = (1 - p[j]) / d[j]
        zm = (-1 - p[j]) / d[j]
        tmp_p = abs(zp)
        tmp_m = abs(zm)

        m[j+1:n] = p[j+1:n] + R[j,j+1:n] * zm
        tmp_m += sum(numpy.abs(m[j+1:n]) / numpy.abs(d[j+1:n]))

        p[j+1:n] = p[j+1:n] + R[j,j+1:n] * zp
        tmp_p += sum(numpy.abs(p[j+1:n]) / numpy.abs(d[j+1:n]))

        if tmp_p >= tmp_m:
            z[j] = zp
        else:
            z[j] = zm
            p[j+1:n] = m[j+1:n]
    z_1 = numpy.linalg.norm(z)

    #solve R y = z
    y_1 = numpy.linalg.norm(R_solve(R, z), ord=1)

    return R_1 * y_1 / z_1
[7]:
R = numpy.asfarray([[1, 1],
                    [0, 1e-7]
                    ])
cond_est(R)
[7]:
20000002.000000022

Exercise 21

See Lemma 3.4.2 and [GVL12] for an algorithm to robustly compute the Jacobi matrix.

Given \(\hat{R}(\alpha, \beta) \in \mathbb{R}^{n \times n}\) where \(n = 2\), the Schur decomposition states that \(\hat{R} = Q U Q^*\) where \(Q\) is a unitary matrix. Let \(A_{ij} = \mathbb{\hat{R}}(\alpha, \beta)_{ij}\) where \(1 \leq i,j \leq 2\).

\[\begin{split}\det(A) &= A_{11} A_{22} - A_{12} A_{21}\\ &= \frac{\alpha^2}{\alpha^2 + \beta^2} - \frac{-\beta^2}{\alpha^2 + \beta^2}\\ &= \frac{\alpha^2 + \beta^2}{\alpha^2 + \beta^2}\\ &= 1\end{split}\]

and

\[\begin{split}\text{tr}(A) &= A_{11} + A_{22}\\ &= \frac{2 \alpha}{\sqrt{\alpha^2 + \beta^2}}.\end{split}\]

The eigenvalues are

\[\begin{split}\lambda_1 &= \frac{\text{Tr}(A) + \sqrt{\text{Tr}(A)^2 - 4 \det(A)}}{2}\\ &= \frac{\alpha}{\sqrt{\alpha^2 + \beta^2}} + \sqrt{\frac{\alpha^2}{\alpha^2 + \beta^2} - 1}\\ &= \frac{\alpha}{\sqrt{\alpha^2 + \beta^2}} + \sqrt{\frac{-\beta^2}{\alpha^2 + \beta^2}}\\ &= \frac{\alpha + i \beta}{\sqrt{\alpha^2 + \beta^2}}\end{split}\]

and

\[\begin{split}\lambda_2 &= \frac{\text{Tr}(A) - \sqrt{\text{Tr}(A)^2 - 4 \det(A)}}{2}\\ &= \frac{\alpha - i \beta}{\sqrt{\alpha^2 + \beta^2}}.\end{split}\]

Solving for the eigenvectors yields

\[\begin{split}A v_1 &= \lambda_1 v_1\\ \left( A - \lambda_1 I \right) v_1 &= 0\\ \frac{1}{\sqrt{\alpha^2 + \beta^2}} \begin{bmatrix} - i \beta & -\beta\\ \beta & - i \beta \end{bmatrix} v_1 &= 0\end{split}\]

and

\[\begin{split}A v_2 &= \lambda_2 v_2\\ \left( A - \lambda_2 I \right) v_2 &= 0\\ \frac{1}{\sqrt{\alpha^2 + \beta^2}} \begin{bmatrix} i \beta & -\beta\\ \beta & i \beta \end{bmatrix} v_2 &= 0\end{split}\]

which shows that \(v_1 = \begin{bmatrix} t_1 \\ -i t_1 \end{bmatrix}\) and \(v_2 = \begin{bmatrix} t_2 \\ i t_2 \end{bmatrix}\) where \(t_1, t_2 \in \mathbb{R}\) because each system of linear equations has a free variable.

Defining

\[\begin{split}Q = \begin{bmatrix} v_1, v_2 \end{bmatrix} = \begin{bmatrix} t_1 & t_2 \\ -i t_1 & i t_2 \end{bmatrix} \quad \text{and} \quad Q^* = \begin{bmatrix} t_1 & i t_1 \\ t_2 & -i t_2 \end{bmatrix}\end{split}\]

gives

\[\begin{split}Q Q^* &= \begin{bmatrix} t_1 & t_2 \\ -i t_1 & i t_2 \end{bmatrix} \begin{bmatrix} t_1 & i t_1 \\ t_2 & -i t_2 \end{bmatrix}\\ &= \begin{bmatrix} t_1^2 + t_2^2 & i t_1^2 - i t_2^2 \\ -i t_1^2 + i t_2^2 & t_1^2 + t_2^2 \end{bmatrix}\end{split}\]

and

\[\begin{split}Q^* Q &= \begin{bmatrix} t_1 & i t_1 \\ t_2 & -i t_2 \end{bmatrix} \begin{bmatrix} t_1 & t_2 \\ -i t_1 & i t_2 \end{bmatrix}\\ &= \begin{bmatrix} t_1^2 + t_1^2 & t_1 t_2 - t_1 t_2 \\ t_1 t_2 - t_1 t_2 & t_2^2 + t_2^2 \end{bmatrix}\\ &= \begin{bmatrix} 2 t_1^2 & 0 \\ 0 & 2 t_2^2 \end{bmatrix}.\end{split}\]

In order to satisfy the orthogonality constraint \(Q Q^* = Q^* Q = I\), set \(t_1 = t_2 = \frac{1}{\sqrt{2}}\).

\[\begin{split}U &= Q^* A Q\\ &= \frac{1}{\sqrt{\alpha^2 + \beta^2}} Q^* \begin{bmatrix} \alpha & -\beta \\ \beta & \alpha \end{bmatrix} \begin{bmatrix} t_1 & t_2 \\ -i t_1 & i t_2 \end{bmatrix}\\ &= \frac{1}{\sqrt{\alpha^2 + \beta^2}} \begin{bmatrix} t_1 & i t_1 \\ t_2 & -i t_2 \end{bmatrix} \begin{bmatrix} \alpha t_1 + i \beta t_1 & \alpha t_2 - i \beta t_2\\ \beta t_1 - i \alpha t_1 & \beta t_2 + i \alpha t_2 \end{bmatrix}\\ &= \frac{1}{\sqrt{\alpha^2 + \beta^2}} \begin{bmatrix} \alpha t_1^2 + i \beta t_1^2 + i \beta t_1^2 + \alpha t_1^2 & \alpha t_1 t_2 - i \beta t_1 t_2 + i \beta t_1 t_2 - \alpha t_1 t_2\\ \alpha t_1 t_2 + i \beta t_1 t_2 - i \beta t_1 t_2 - \alpha t_1 t_2 & \alpha t_2^2 - i \beta t_2^2 - i \beta t_2^2 + \alpha t_2^2 \end{bmatrix}\\ &= \frac{1}{\sqrt{\alpha^2 + \beta^2}} \begin{bmatrix} 2 \alpha t_1^2 + 2 i \beta t_1^2 & 0\\ 0 & 2 \alpha t_2^2 - 2 i \beta t_2^2 \end{bmatrix}\\ &= \frac{\sqrt{\alpha^2 + \beta^2}}{\alpha^2 + \beta^2} \begin{bmatrix} \alpha + i \beta & 0 \\ 0 & \alpha - i \beta \end{bmatrix}\end{split}\]

The Schur decomposition allowed \(\hat{R}(\alpha, \beta)\) to be computed more accurately because the values in \(U\) are bounded between two conjugate values and \(Q, Q^*\) can be computed to any decimal place via Newton’s Method.

[8]:
import numpy

def jacobi_rotation(n, i, j, alpha, beta):
    _ = numpy.sqrt(alpha**2 + beta**2)
    alpha_bar = alpha / _
    beta_bar = beta / _

    J = numpy.zeros((n,n))
    J[i,i] = J[j,j] = alpha_bar
    J[j,i] = beta_bar
    J[i,j] = -J[j,i]

    for k in range(n):
        if k != i and k != j:
            J[k,k] = 1

    return J

Exercise 22

[9]:
M = numpy.asfarray([[1, 3, 5],
                    [2, 4, 6],
                    [3, 5, 9]])
n = M.shape[0]

#zero out M[2,0]
J_il = jacobi_rotation(n, 2, 0, M[0,0], M[2,0])
J_jl = jacobi_rotation(n, 2, 0, M[2,0], -M[0,0])
print(numpy.dot(J_il, M))
print(numpy.dot(J_jl, M))
[[ 3.16227766e+00  5.69209979e+00  1.01192885e+01]
 [ 2.00000000e+00  4.00000000e+00  6.00000000e+00]
 [ 5.55111512e-17 -1.26491106e+00 -1.89736660e+00]]
[[-5.55111512e-17  1.26491106e+00  1.89736660e+00]
 [ 2.00000000e+00  4.00000000e+00  6.00000000e+00]
 [ 3.16227766e+00  5.69209979e+00  1.01192885e+01]]

Exercise 23

If \(A\) has full column rank, then the nullspace of \(A\) is zero.

\[\begin{split}\lVert Ax - b \rVert^2_2 &= \lVert A(x - x_*) + (A x_* - b) \rVert^2_2\\ &= \left[ A(x - x_*) + (A x_* - b) \right]^\top \left[ A(x - x_*) + (A x_* - b) \right]\\ &= \lVert A (x - x_*) \rVert^2_2 + \lVert A x_* - b \rVert^2_2 + 2 (x - x_*)^\top A^\top (A x_* - b)\\ &= \lVert A (x - x_*) \rVert^2_2 + \lVert A x_* - b \rVert^2_2 & \quad & \text{since}\ A^\top (A x_* - b) = 0\\ \lVert Ax - b \rVert_2 &\geq \lVert A x_* - b \rVert_2.\end{split}\]

This means for \(A v = 0\) to be true, \(v = 0\); since \(x \neq x_*\), \(\lVert Ax - b \rVert_2 > \lVert A x_* - b \rVert_2\).

\(A^\top A\) is nonsingular if and only if \(A\) has full column rank.

One way to prove this is to show that \(A^\top A\) has the same nullspace as \(A\). See [Str09] for background on the fundamental theorem of linear algebra, subspaces, and orthogonality.

If \(x\) is in the nullspace of \(A\), then \(Ax = 0\). Multiplying by \(A^\top\) yields \(A^\top A x = 0\), which shows that \(x\) is also in the nullspace of \(A^\top A\).

If \(x\) is in the nullspace of \(A^\top A\), then \(A^\top A x = 0\). Multiplying by \(x^\top\) yields \(x^\top A^\top A x = \lVert A x \rVert_2^2 = 0\), which means every \(x\) in the nullspace of \(A^\top A\) also needs to be in the nullspace of \(A\) in order to satisfy \(A x = 0\).

Exercise 24

Since \(A \in \mathbb{R}^{m \times n}\) is full rank, its nullspace is zero and its columns are linearly independent. A set of vectors spans a space if their linear combinations fill the space i.e. the columns of \(A\) span its column space. A basis for a vector space is a sequence of vectors that are linearly independent and span that space. Hence the column space of \(A\), whose dimension is \(n\), could span the entire space \(\mathbb{R}^m\) or only a subspace.

Notice that the QR decomposition of

\[\begin{split}A = \begin{bmatrix} Q_1 & Q_2 \end{bmatrix} \begin{bmatrix} R_u \\ 0 \end{bmatrix} = Q_1 R_u\end{split}\]

where \(Q_1 \in \mathbb{R}^{m \times n}\) and \(R_u \in \mathbb{R}^{n \times n}\). Since \(Q\) is an orthogonal matrix, the columns of \(Q_1\) must be orthonormal. Hence the column space of \(Q_1\), whose dimension is \(n\), could span the entire space \(\mathbb{R}^m\) or only a subspace.

By the theorem of Equal Dimensions Yields Equal Subspaces [Bee08], the first \(n\) columns of \(Q\) form an orthonormal basis for the space spanned by the columns of \(A\).

Exercise 25

[10]:
A = numpy.asfarray([[1, -1],
                    [1, 0],
                    [1, 1],
                    [1, 2]])
b = numpy.asfarray([[3],
                    [2],
                    [0],
                    [4]])

print(numpy.dot(numpy.dot(numpy.linalg.inv(numpy.dot(A.T, A)), A.T), b))

n = A.shape[1]
Q, R = QR_Householder(A)
print(numpy.dot(numpy.linalg.inv(R[0:n,0:n]), numpy.dot(Q.T, b)[0:n,0:n]))
[[2.2]
 [0.1]]
[[2.2]
 [0.1]]

Exercise 26

Given \(A = LQ\), the equivalent minimization is

\[\min_{x \in \mathbb{R}^n} \left\{ \lVert x \rVert_2 \mid LQx = b\right\}.\]

Since \(\lVert Q x \rVert_2 = \lVert x \rVert_2\) (3.1.23), defining \(z = Q x\) gives

\[\min_{z \in \mathbb{R}^n} \left\{ \lVert z \rVert_2 \mid Lz = b\right\}.\]

According to the analysis done in Exercise 3.27, \(z = \begin{bmatrix} L_l^{-1} b \\ 0 \end{bmatrix}\). Thus \(x_* = Q^\top z = Q^\top \begin{bmatrix} L_l^{-1} b \\ 0 \end{bmatrix}\).

Exercise 27

Observe that the LQ decomposition of

\[\begin{split}A = \begin{bmatrix} L_l & 0 \end{bmatrix} \begin{bmatrix} Q_1 \\ Q_2 \end{bmatrix} = L_l Q_1\end{split}\]

where \(L_l \in \mathbb{R}^{m \times m}\), \(Q_1 \in \mathbb{R}^{m \times n}\), and \(Q_2 \in \mathbb{R}^{(n - m) \times n}\).

To carry out LQ decomposition, apply QR decomposition to the full column rank matrix

\[\begin{split}A^\top = \begin{bmatrix} Q_1^\top & Q_2^\top \end{bmatrix} \begin{bmatrix} L_l^\top \\ 0 \end{bmatrix}.\end{split}\]

Exercise 28

\[\begin{split}\kappa_2(A) &= \lVert A \rVert_2 \lVert A^{-1} \rVert_2\\ &= \left( \max_{v \in \mathbb{R}^n, v \neq 0} \frac{\lVert A v \rVert_2}{\lVert v \rVert_2} \right) \left( \min_{v \in \mathbb{R}^n, v \neq 0} \frac{\lVert A v \rVert_2}{\lVert v \rVert_2} \right)^{-1} & \quad & \text{(3.1.18)}\\ &= \frac{ \left( \max_\lambda \left\{ A^\top A \right\} \right)^{1/2} }{ \left( \min_\lambda \left\{ A^\top A \right\} \right)^{1/2} } & \quad & \text{(3.1.8b)}\\ &= \frac{ \left( \max_\lambda \left\{ V D^2 V^\top \right\} \right)^{1/2} }{ \left( \min_\lambda \left\{ V D^2 V^\top \right\} \right)^{1/2} } & \quad & A = U D V^\top\\ &= \frac{\sigma_1}{\sigma_n} & \quad & \text{Theorem 3.6.4}\end{split}\]

where \(\sigma_1\) is \(A\)’s largest singular value and \(\sigma_n\) is \(A\)’s smallest singular value.

Let \(u_{\cdot j}\) and \(v_{\cdot j}\) be the columns of \(U\) and \(V\). The following are the results of perturbing \(b\):

\[\begin{split}A \Delta x &= \Delta b\\ \Delta x &= V D^{-1} U^\top \alpha u_{\cdot j} & \quad & \Delta b = \alpha u_{\cdot j}\\ &= V D^{-1} \begin{bmatrix} 0 & \cdots & \alpha & \cdots & 0 \end{bmatrix}^\top\\ &= \frac{\alpha}{\sigma_j} V \begin{bmatrix} 0 & \cdots & 1 & \cdots & 0 \end{bmatrix}^\top\\ &= \frac{\alpha}{\sigma_j} v_{\cdot j}.\end{split}\]

References

Bee08

Robert Arnold Beezer. A first course in linear algebra. Beezer, 2008.

CMSW79

Alan K Cline, Cleve B Moler, George W Stewart, and James H Wilkinson. An estimate for the condition number of a matrix. SIAM Journal on Numerical Analysis, 16(2):368–375, 1979.

GVL12

Gene H Golub and Charles F Van Loan. Matrix computations. Volume 3. JHU Press, 2012.

Olva

Peter J. Olver. Numerical analysis lecture notes. http://www-users.math.umn.edu/ olver/num_/lnn.pdf. Accessed on 2017-02-18.

Olvb

Sheehan Olver. Numerical complex analysis: functional analysis. http://www.maths.usyd.edu.au/u/olver/teaching/NCA/15.pdf. Accessed on 2017-03-04.

Str09

Gilbert Strang. Introduction to Linear Algebra. Volume 4. Wellesley-Cambridge Press Wellesley, MA, 2009.