Regression¶

[Sul] contains an elegant illustration of the KKT conditions. Given the constrained optimization problem

\[\begin{split}\min_{\mathbf{x} \in \mathbb{R}^n} \quad f(\mathbf{x})\\ \text{subject to} \quad h_i(\mathbf{x}) &= 0 \quad i = 1, \ldots, l\\ g_j(\mathbf{x}) &\leq 0 \quad j = 1, \ldots, m,\end{split}\]

define the Lagrangian as

\[\mathcal{L}(\mathbf{x}, \boldsymbol{\mu}, \boldsymbol{\lambda}) = f(\mathbf{x}) + \boldsymbol{\mu}^\top \mathbf{h}(\mathbf{x}) + \boldsymbol{\lambda}^\top \mathbf{g}(\mathbf{x}).\]

The KKT optimality conditions are

Lagrangian Stationarity
\(\nabla_{\mathbf{x}} \mathcal{L} = \boldsymbol{0}\).

Primal Feasibility
\(\nabla_{\boldsymbol{\lambda}} \mathcal{L} = \mathbf{g}(\mathbf{x}^*) \preceq \boldsymbol{0}\) and \(\nabla_{\boldsymbol{\mu}} \mathcal{L} = \mathbf{h}(\mathbf{x}^*) = \boldsymbol{0}\).

Dual Feasibility
\(\boldsymbol{\lambda}^* \succeq \boldsymbol{0}\).

Complementary Slackness
\(\lambda_j^* g_j(\mathbf{x}^*) = 0\) for all \(j = 1, \ldots, m\).

Positive Definiteness
\(\mathbf{w}^\top \left( \nabla_{\mathbf{x} \mathbf{x}} \mathcal{L} \right) \mathbf{w} > 0\) for all \(\mathbf{w}\) in the feasible set.

Note that non-negativity constraints are merely a subset of inequality constraints [Lus]. For example, \(\mathbf{x} \succeq \boldsymbol{0}\) can be incorporated explicitly by defining the corresponding Lagrangian as

\[\tilde{\mathcal{L}}(\mathbf{x}, \boldsymbol{\mu}, \boldsymbol{\lambda}, \boldsymbol{\nu}) = \mathcal{L}(\mathbf{x}, \boldsymbol{\mu}, \boldsymbol{\lambda}) + \boldsymbol{\nu}^\top (-\mathbf{x}) \quad \text{where} \quad \boldsymbol{\nu} \succeq \boldsymbol{0}.\]

To incorporate it implicitly, [Wil] used

\[\begin{split}\boldsymbol{0} &= \nabla_{\mathbf{x}} \tilde{\mathcal{L}}\\ &= \frac{\partial \mathcal{L}}{\partial \mathbf{x}} - \boldsymbol{\nu}\\ \boldsymbol{\nu} &= \frac{\partial \mathcal{L}}{\partial \mathbf{x}}\end{split}\]

to simplify the Lagrange multipliers \(\boldsymbol{\nu}\) in the (additional) complementary slackness conditions to

\[\begin{split}0 &= -x_k^* \nu_k^*\\ &= -x_k^* \frac{\partial \mathcal{L}}{\partial x_k}\\ &= x_k^* \frac{\partial \mathcal{L}}{\partial x_k}.\end{split}\]

Furthermore, the Lagrangian stationarity conditions are replaced by \(\nabla_{\mathbf{x}} \mathcal{L} \succeq \boldsymbol{0}\), and the (additional) primal feasibility conditions are typically rewritten as \(\mathbf{x} \succeq \boldsymbol{0}\).

In some cases, it may seem that the appropriate constraint is a strict inequality. However, a strict inequality constraint may lead to an optimization problem that is ill-posed in that a minimizer is infeasible but on the boundary of the feasible set [Goc]. For this reason, strict inequality constraints are not used in nonlinear programming. In practice, if the strict inequality constraint is not active, one can treat it as a nonstrict inequality. The alternative is to select the smallest value that is feasible and use that to establish a nonstrict inequality.

Exercise 12.1¶

[1]:

import numpy

b = numpy.asfarray([0, 3, 1, 2])
_ = numpy.asfarray([0, 1, 2, 4])
A = numpy.vstack((_, numpy.ones(_.shape))).T

lhs = numpy.dot(A.T, b)
rhs = numpy.linalg.inv(numpy.dot(A.T, A))
print('L2-Regression: x = {}'.format(numpy.dot(rhs, lhs)))

L2-Regression: x = [0.28571429 1.        ]

Exercise 12.2¶

[2]:

m = len(b)
n = 2

from pymprog import model

lp = model('L1-Regression LP')

#lp.verbose(True)
x = lp.var('x', n, bounds=(None, None))
t = lp.var('t', m, bounds=(0, None))

lp.minimize(sum(t))

for i in range(m):
    -t[i] <= b[i] - sum(A[i, j] * x[j] for j in range(n)) <= t[i]

lp.solve()
lp.sensitivity()

lp.end()


PyMathProg 1.0 Sensitivity Report Created: 2020/04/05 Sun 03:20AM
================================================================================
Variable            Activity   Dual.Value     Obj.Coef   Range.From   Range.Till
--------------------------------------------------------------------------------
*x[0]                    0.5            0            0           -3            1
*x[1]                      0            0            0        -0.25         0.75
 t[0]                      0         0.25            1         0.75          inf
*t[1]                    2.5            0            1            0      1.33333
 t[2]                      0            1            1            0          inf
 t[3]                      0         0.75            1         0.25          inf
================================================================================
================================================================================
Constraint       Activity Dual.Value  Lower.Bnd  Upper.Bnd RangeLower RangeUpper
--------------------------------------------------------------------------------
 R1                     0      -0.75       -inf          0          0          0
*R2                     0          0       -inf          0          0          0
*R3                    -2          0       -inf          3         -2         -2
 R4                    -3         -1       -inf         -3 -1.79769e+308       -0.5
*R5                     1          0       -inf          1          1          1
*R6                    -1          0       -inf         -1         -1         -1
 R7                     2      -0.25       -inf          2          2          2
*R8                    -2          0       -inf         -2         -2         -2
================================================================================

[2]:

model('L1-Regression LP') is not the default model.

Exercise 12.3¶

The minimization of the maximum deviation over a sorted set of real numbers \(\{ b_1, \ldots, b_m \}\)

\[\min_{x \in \mathbb{R}} \max_i \left\vert x - b_i \right\vert\]

can be recasted as

\[\begin{split}\mathcal{P} \quad \text{minimize} \quad t\\ \text{subject to} \quad x - b_i &\leq t\\ -x + b_i &\leq t\\ t &\geq 0.\end{split}\]

Recall that \(\left\vert x - b \right\vert\) is the distance to \(b\) from \(x\) on the number line. The midrange is defined as

\[\tilde{x} = \frac{b_1 + b_m}{2}\]

where \(b_1\) and \(b_m\) are the minimum and maximum values in the data set. Suppose \(\tilde{x}\) is not the minimizer. Then there must exist \(\epsilon \neq 0\) such that

\[\left\vert \tilde{x} + \epsilon - b_i \right\vert < \left\vert \tilde{x} - b_i \right\vert \qquad \forall\, i = 1, \ldots, m.\]

Define the data points in terms of the midrange: \(d_i = \tilde{x} - b_i\). The preceding observation must still hold, but

\[\begin{split}\left\vert \tilde{x} + \epsilon - b_i \right\vert &< \left\vert \tilde{x} - b_i \right\vert\\ \left\vert \epsilon - d_i \right\vert &< \left\vert d_i \right\vert\end{split}\]

is a contradiction because it can only hold for half of the data points. Therefore, the midrange minimizes the maximum deviation from the set of observations.

Exercise 12.4¶

Given a set of points \(\left\{ b_i \in \mathbb{R}^2 \right\}_{i = 1}^m\), the \(\bar{x}\) that solves

\[L = \min_x \sum_{i = 1}^m \left\Vert x - b_i \right\Vert_2^2\]

satisfies

\[\begin{split}\frac{\partial L}{\partial x} &= \sum_{i = 1}^m \frac{\partial}{\partial x} (x - b_i)^\top (x - b_i)\\ 0 &= \sum_{i = 1}^m 2x - 2b_i\\ x &= \frac{1}{m} \sum_{i = 1}^m b_i.\end{split}\]

Exercise 12.5¶

Given a set of points \(\left\{ b_i \in \mathbb{R}^2 \right\}_{i = 1}^m\), the \(\hat{x}\) that solves

\[\begin{split}L &= \min_x \sum_{i = 1}^m \left\Vert x - b_i \right\Vert_1\\ &= \min_x \sum_i \sum_j \left\vert x_j - b_{ij} \right\vert\end{split}\]

satisfies

\[\begin{split}\frac{\partial L}{\partial x_j} &= \sum_{i = 1}^m \sum_{j'} \frac{x_{j'} - b_{ij'}}{\left\vert x_{j'} - b_{ij'} \right\vert} \delta[j - j']\\ 0 &= \sum_i \frac{x_j - b_{ij}}{\left\vert x_j - b_{ij} \right\vert}\\ \sum_i \frac{x_j}{\epsilon_{ij}(x)} &= \sum_i \frac{b_{ij}}{\epsilon_{ij}(x)} & \quad & \epsilon_{ij}(x) = \left\vert x_j - b_{ij} \right\vert\\ x_j &= \left( \sum_i \frac{1}{\epsilon_{ij}(x)} \right)^{-1} \sum_i \frac{b_{ij}}{\epsilon_{ij}(x)} & \quad & \forall\, j = 1, \ldots, n.\end{split}\]

Exercise 12.6¶

Given a set of points \(\left\{ b_i \in \mathbb{R}^2 \right\}_{i = 1}^3\), the \(\hat{x}\) that solves

\[L = \min_x \sum_i \left\Vert x - b_i \right\Vert_2\]

satisfies

\[\begin{split}\frac{\partial L}{\partial x} &= \sum_i \frac{2x - 2b_i}{\left\Vert x - b_i \right\Vert_2}\\ 0 &= \sum_i \frac{x}{\left\Vert x - b_i \right\Vert_2} - \sum_i \frac{b_i}{\left\Vert x - b_i \right\Vert_2}\\ x &= \left( \sum_i \frac{1}{\epsilon_i(x)} \right)^{-1} \sum_i \frac{b_i}{\epsilon_i(x)} & \quad & \epsilon_i(x) = \left\Vert x - b_i \right\Vert_2.\end{split}\]

The solution \(\hat{x}\) is known as the Fermat point [Haj94]. When one of the angles exceeds \(120^\circ\), the minimum is at the vertex holding the largest angle, which is opposite the edge with the greatest length.

Exercise 12.7¶

Let \(a_i\) denote the number of hours of labor needed for month \(i\), and let \(x\) denote the number of hours per month of labor that will be handled by regular employees. The total labor costs

\[\min_{x > 0} L(x) = \min_{x > 0} \sum_{i = 1}^{12} 25 \max(a_i - x, 0) + 17.5x\]

is a piecewise-linear minimization.

(i)¶

The optimization problem can be recasted as a LP

\[\begin{split}\text{minimize} \quad \sum_i 17.5x + 25t_i\\ \text{subject to} \quad a_i - x &\leq t_i \quad i = 1, \ldots, 12\\ 0 &\leq t_i, x.\end{split}\]

(ii)¶

Given the projected labor hours by month in Table 12.2, the optimal solution is \(x^* = 340\).

(iii)¶

\[\begin{split}\frac{\partial L}{\partial x} = \sum_{i = 1}^{12} 25 f(x) + 17.5 \quad \text{where} \quad f(x) = \begin{cases} -1 & \text{if } a_i > x\\ 0 & \text{otherwise.} \end{cases}\end{split}\]

Applying gradient descent with an adaptive learning rate schedule converges to \(x^* \approx 340\).

[3]:

a = [390, 420, 340, 320,
     310, 590, 340, 580,
     550, 360, 420, 600]

from pymprog import model

lp = model('Piecewise-Linear LP')

#lp.verbose(True)
x = lp.var('x', bounds=(0, None))
t = lp.var('t', len(a), bounds=(0, None))

lp.minimize(sum(17.5 * x + 25 * t_i for t_i in t))

for a_i, t_i in zip(a, t):
    a_i - x <= t_i

lp.solve()
lp.sensitivity()

lp.end()


PyMathProg 1.0 Sensitivity Report Created: 2020/04/05 Sun 03:20AM
================================================================================
Variable            Activity   Dual.Value     Obj.Coef   Range.From   Range.Till
--------------------------------------------------------------------------------
*x                       340            0          210          200          225
*t[0]                     50            0           25           10           35
*t[1]                     80            0           25           10           35
 t[2]                      0           25           25            0          inf
 t[3]                      0           25           25            0          inf
 t[4]                      0           25           25            0          inf
*t[5]                    250            0           25           10           35
 t[6]                      0           15           25           10          inf
*t[7]                    240            0           25           10           35
*t[8]                    210            0           25           10           35
*t[9]                     20            0           25           10           35
*t[10]                    80            0           25           10           35
*t[11]                   260            0           25           10           35
================================================================================
================================================================================
Constraint       Activity Dual.Value  Lower.Bnd  Upper.Bnd RangeLower RangeUpper
--------------------------------------------------------------------------------
 R1                  -390        -25       -inf       -390 -1.79769e+308       -340
 R2                  -420        -25       -inf       -420 -1.79769e+308       -340
*R3                  -340          0       -inf       -340       -340       -340
*R4                  -340          0       -inf       -320       -340       -340
*R5                  -340          0       -inf       -310       -340       -340
 R6                  -590        -25       -inf       -590 -1.79769e+308       -340
 R7                  -340        -10       -inf       -340       -360       -340
 R8                  -580        -25       -inf       -580 -1.79769e+308       -340
 R9                  -550        -25       -inf       -550 -1.79769e+308       -340
 R10                 -360        -25       -inf       -360 -1.79769e+308       -340
 R11                 -420        -25       -inf       -420 -1.79769e+308       -340
 R12                 -600        -25       -inf       -600 -1.79769e+308       -340
================================================================================

[3]:

model('Piecewise-Linear LP') is not the default model.

Exercise 12.8¶

Minimizing over the square of the \(L^2\)-norm of

\[\begin{split}\DeclareMathOperator*{\argmin}{arg\,min} \argmin_{f, g} L(f, g) &= \argmin_{f, g} \sum_{i = 1}^n \left( \frac{1}{f} - \frac{f}{g} x_i - t_i \right)^2\\ &= \argmin_{a, b} \sum_i \left( a + b x_i - t_i \right)^2\end{split}\]

gives

\[\begin{split}\frac{\partial L}{\partial a} &= \sum_i 2 \left( a + b x_i - t_i \right) (1)\\ 0 &= \sum_i a + b x_i - t_i\\ a &= \frac{1}{n} \sum_i t_i - b x_i\end{split}\]

and

\[\begin{split}\frac{\partial L}{\partial b} &= \sum_i 2 \left( a + b x_i - t_i \right) (x_i)\\ 0 &= \sum_i a x_i + b x_i^2 - t_i x_i\\ \sum_i b x_i^2 &= \sum_i t_i x_i - a x_i\\ b &= \frac{\sum_i t_i x_i - a x_i}{\sum_i x_i^2}.\end{split}\]

These resulting update equations converge to \(f = 1 / a^* \approx 3.00\) and \(g = -\left( a^* b^* \right)^{-1} \approx 8.92\).

The \(L^1\)-regression converges to \(f \approx 3.125\) and \(g \approx 9.25\). Note that this solver requires the use of an equivalent formulation where \(b = f / g\).

[4]:

obs = [(-10, 3.72), (-20, 7.06), (-30, 10.46),
       (-10, 3.71), (-20, 7.00), (-30, 10.48),
       (-10, 3.67), (-20, 7.08), (-30, 10.33)]

a = 1
b = 1
n = len(obs)

denom_const = sum(x_i**2 for x_i, _ in obs)
for i in range(256):
    a = sum(t_i - b * x_i for x_i, t_i in obs) / n
    b = sum(t_i * x_i - a * x_i for x_i, t_i in obs) / denom_const

    f = round(1 / a, 7)
    g = round(-1 / (a * b), 7)

print('L2-Regression: f = {}, g = {}'.format(f, g))

L2-Regression: f = 3.0, g = 8.9241448

[5]:

from pymprog import model

lp = model('L1-Regression LP')

#lp.verbose(True)
a = lp.var('a', bounds=(0, None))
b = lp.var('b', bounds=(0, None))
s = lp.var('s', n, bounds=(0, None))

lp.minimize(sum(s))

for (x_i, t_i), s_i in zip(obs, s):
    -s_i <= a - b * x_i - t_i <= s_i

lp.solve()
lp.sensitivity()

lp.end()

f = round(1 / a.primal, 7)
g = round(1 / (a.primal * b.primal), 7)
print('L1-Regression: f = {}, g = {}'.format(f, g))


PyMathProg 1.0 Sensitivity Report Created: 2020/04/05 Sun 03:20AM
================================================================================
Variable            Activity   Dual.Value     Obj.Coef   Range.From   Range.Till
--------------------------------------------------------------------------------
*a                      0.32            0            0            0            0
*b                     0.338            0            0            0            0
*s[0]                   0.02            0            1            1            1
*s[1]                   0.02            0            1            0            1
 s[2]                      0            0            1            1          inf
*s[3]                   0.01            0            1            1            1
*s[4]                   0.08            0            1            0            1
*s[5]                   0.02            0            1            1            2
*s[6]                   0.03            0            1            1            1
 s[7]                      0            1            1            0          inf
*s[8]                   0.13            0            1            0            1
================================================================================
================================================================================
Constraint       Activity Dual.Value  Lower.Bnd  Upper.Bnd RangeLower RangeUpper
--------------------------------------------------------------------------------
 R1                 -3.72         -1       -inf      -3.72 -1.79769e+308       -3.7
*R2                  3.68          0       -inf       3.72       3.68       3.68
*R3                  -7.1          0       -inf      -7.06       -7.1       -7.1
 R4                  7.06         -1       -inf       7.06 -1.79769e+308       7.08
 R5                -10.46         -1       -inf     -10.46     -10.46     -10.45
*R6                 10.46          0       -inf      10.46      10.46      10.46
 R7                 -3.71         -1       -inf      -3.71 -1.79769e+308       -3.7
*R8                  3.69          0       -inf       3.71       3.69       3.69
*R9                 -7.16          0       -inf         -7      -7.16      -7.16
 R10                    7         -1       -inf          7 -1.79769e+308       7.08
 R11               -10.48         -1       -inf     -10.48 -1.79769e+308     -10.46
*R12                10.44          0       -inf      10.48      10.44      10.44
*R13                -3.73          0       -inf      -3.67      -3.73      -3.73
 R14                 3.67         -1       -inf       3.67 -1.79769e+308        3.7
*R15                -7.08          0       -inf      -7.08      -7.08      -7.08
 R16                 7.08          0       -inf       7.08       7.08      7.085
*R17               -10.59          0       -inf     -10.33     -10.59     -10.59
 R18                10.33         -1       -inf      10.33 -1.79769e+308      10.46
================================================================================
L1-Regression: f = 3.125, g = 9.2455621

Exercise 12.9¶

(a)¶

A minimum of \(g_x(z)\) is given by

\[\begin{split}\nabla g_x(z) &= \frac{\partial}{\partial z} \left( b^\top E_x^{-1} b - 2 b^\top E_x^{-1} A z + z^\top A^\top E_x^{-1} A z \right) & \quad & E_x = E_x^\top\\ 0 &= 2 A^\top E_x^{-1} A z - 2 A^\top E_x^{-1} b\\ z &= \left( A^\top E_x^{-1} A \right)^{-1} A^\top E_x^{-1} b\\ &= T(x)\end{split}\]

because

\[\begin{split}\nabla^2 g_x(z) &= \frac{\partial}{\partial z} \left( 2 A^\top E_x^{-1} A z - 2 A^\top E_x^{-1} b \right)\\ &= 2 A^\top E_x^{-1} A\end{split}\]

is positive semi-definite. Furthermore, \(T(x)\) is a global minimum seeing that \(g_x(z)\) is convex:

\[\begin{split}g_x(\alpha z_1 + (1 - \alpha) z_2) &= b^\top E_x^{-1} b - 2 b^\top E_x^{-1} A \left( \alpha z_1 + (1 - \alpha) z_2 \right) + \left( \alpha z_1 + (1 - \alpha) z_2 \right)^\top A^\top E_x^{-1} A \left( \alpha z_1 + (1 - \alpha) z_2 \right)\\ &= \alpha g_x(z_1) + (1 - \alpha) g_x(z_2) - \alpha (1 - \alpha) \left\Vert E_x^{-1 / 2} A (z_1 - z_2) \right\Vert_2^2\\ &\leq \alpha g_x(z_1) + (1 - \alpha) g_x(z_2).\end{split}\]

(b)¶

\[\begin{split}g_x(x) &= \sum_{i = 1}^m \frac{\epsilon_i^2(x)}{\epsilon_i(x)}\\ &= \sum_{i = 1}^m \epsilon_i(x)\\ &= f(x).\end{split}\]

The preceding equality implies that

\[\begin{split}\left\Vert E_x^{-1 / 2} \left( b - Ax \right) \right\Vert_2^2 &= \left( b - Ax \right)^\top E_x^{-1} \left( b - Ax \right)\\ &= \left\Vert b - Ax \right\Vert_1.\end{split}\]

(c)¶

Let \(y = T(x)\) and \(\epsilon_i(y) = \epsilon_i(x) + \left( \epsilon_i(y) - \epsilon_i(x) \right)\).

\[\begin{split}g_x(y) &= \sum_i \frac{\epsilon^2_i(y)}{\epsilon_i(x)}\\ &= \sum_i \frac{ \left[ \epsilon_i(x) + \left( \epsilon_i(y) - \epsilon_i(x) \right) \right]^2 }{ \epsilon_i(x) }\\ &= \sum_i \frac{ \epsilon^2_i(x) + 2 \epsilon_i(x) \left( \epsilon_i(y) - \epsilon_i(x) \right) + \left( \epsilon_i(y) - \epsilon_i(x) \right)^2 }{ \epsilon_i(x) }\\ &= \sum_i 2 \epsilon_i(y) - \epsilon_i(x) + \frac{ \left( \epsilon_i(y) - \epsilon_i(x) \right)^2 }{ \epsilon_i(x) }\\ &= 2 f(y) - f(x) + \sum_i \frac{ \left( \epsilon_i(y) - \epsilon_i(x) \right)^2 }{ \epsilon_i(x) }\\ &\geq 2 f\left( T(x) \right) - f(x)\end{split}\]

(d)¶

Let \(y = T(x)\). By inspection, \(T(x) \neq x\) for each valid \(x\).

\[\begin{split}f(y) &\leq \frac{1}{2} \left[ g_x(y) + f(x) \right] & \quad & \text{(c)}\\ &< \frac{1}{2} \left[ g_x(x) + f(x) \right] & \quad & \text{(a)}\\ &< f(x) & \quad & \text{(b)}\end{split}\]

Therefore, the sequence of iterates in the iteratively reweighted least squares algorithm produces a monotonically decreasing sequence of objective function values.

Exercise 12.10¶

Let parameter \(\mu \in \mathbb{R}\) and the set of numbers \(\left\{ b_j \right\}_{j = 1}^n\).

(a)¶

The optimization problem can be recast as

\[\begin{split}\min_x L &= \min_x \sum_j \left\vert x - b_j \right\vert + \mu (x - b_j)\\ &= \min_x \sum_i^{P} (x - b_i) + \mu (x - b_i) + \sum_k^{N} (b_k - x) + \mu (x - b_k) & \quad & P + N = n \land x \geq b_i \land x < b_k.\end{split}\]

The optimal solution satisfies

\[\begin{split}\frac{\partial L}{\partial x} &= \left( \sum_i^{P} 1 + \mu \right) + \left( \sum_k^{N} -1 + \mu \right)\\ 0 &= n \mu + P - N.\end{split}\]

The following analysis shows that this optimization problem can be interpreted as quantile estimation.

\(\mu = 0 \implies P = N\).
- This is the median or second quantile.
\(\mu = \frac{1}{2} \implies P = \frac{1}{4} n \land N = \frac{3}{4} n\).
- This is the first quantile.
\(\mu = -\frac{1}{2} \implies P = \frac{3}{4} n \land N = \frac{1}{4} n\).
- This is the third quantile.
\(\mu = 1 \implies P = 0 \land N = n\).
- This is the minimum value of the set or zeroth quantile.
\(\mu = -1 \implies P = n \land N = 0\).
- This is the maximum value of the set or fourth quantile.

By inspection, the minimum does not exist when \(\mu \not\in [-1, 1]\).

(b)¶

\[\begin{split}\DeclareMathOperator*{\argmax}{arg\,max} \text{minimize} \quad \sum_j t_j + \mu (x - b_j)\\ \text{subject to} \quad x - b_j &\leq t_j\\ -x + b_j &\leq t_j\\ t_j &\geq 0.\end{split}\]

Observe that the constant term \(\sum_j \mu b_j\) in the LP objective does not affect the \(\argmax L\). Therefore, the LP in standard form is

\[\begin{split}\text{maximize} \quad -n \mu x - \sum_j t_j\\ \text{subject to} \quad x - b_j &\leq t_j\\ -x + b_j &\leq t_j\\ t_j &\geq 0.\end{split}\]

(c)¶

The LP in standard form can be thought of as a one parameter family of linear programming problems.

Suppose \(n = 4\) and \(b_j = 2^{j - 1}\). Rewriting the standard form LP into matrix form gives

\[\begin{split}\text{maximize} \quad -n \mu x - \sum_j t_j\\ \text{subject to} \quad x - b_j &\leq t_j\\ -x + b_j &\leq t_j\\ t_j &\geq 0.\end{split}\]

\[\begin{split}\mathcal{B} &= \{ 5, 6, 7, 8, 9, 10, 11, 12 \} \quad & \quad \mathcal{N} &= \{ 0, 1, 2, 3, 4 \}\\ B &= \mathbf{I}_8 \quad & \quad N &= \begin{bmatrix} 1 & -1\\ -1 & -1\\ 1 & & -1\\ -1 & & -1\\ 1 & & & -1\\ -1 & & & -1\\ 1 & & & & -1\\ -1 & & & & -1 \end{bmatrix}\\ x^*_\mathcal{B} &= b = \begin{bmatrix} 1\\ -1\\ 2\\ -2\\ 4\\ -4\\ 8\\ -8 \end{bmatrix} \quad & \quad z^*_\mathcal{N} &= -c_\mathcal{N} = \begin{bmatrix} n\mu' \\ 1\\ 1\\ 1\\ 1 \end{bmatrix}.\end{split}\]

To analyze the one parameter family of LPs, arbitrarily perturb the initial dictionary to make it optimal by defining \(\bar{x}_\mathcal{B} = \boldsymbol{1}_8\) and \(\bar{z}_\mathcal{N} = \boldsymbol{1}_5\).

Evaluating \(x^*_\mathcal{B} + \bar{x}_\mathcal{B} \mu \geq \boldsymbol{0}\) and \(z^*_\mathcal{N} + \bar{z}_\mathcal{N} \mu \geq \boldsymbol{0}\) reveals the current range for \(\mu\) is

\[2^{n - 1} \leq -n\mu' \leq \mu \leq \infty.\]

By inspection, the initial dictionary is optimal as long as \(\mu' \leq -\frac{8}{n}\).

Iteration 1¶

Step 1

Computing

\[\begin{split}\mu^* &= \max\left\{ \max_{j \in \mathcal{N}, \bar{z}_j > 0} -\frac{z^*_j}{\bar{z}_j}, \max_{i \in \mathcal{B}, \bar{x}_i > 0} -\frac{x^*_i}{\bar{x}_i}, \right\}\\ &= \max\left\{ \max\left\{ -\frac{n\mu'}{1}, -\frac{1}{1}, -\frac{1}{1}, -\frac{1}{1}, -\frac{1}{1} \right\}, \max\left\{ -\frac{1}{1}, -\frac{-1}{1}, -\frac{2}{1}, -\frac{-2}{1}, -\frac{4}{1}, -\frac{-4}{1}, -\frac{8}{1}, -\frac{-8}{1} \right\} \right\}\\ &= -n\mu'\end{split}\]

shows that \(x_0\) is the entering variable for \(\mu' < -\frac{8}{n}\).

Step 2

The maximum is achieved by \(j = 0 \in \mathcal{N}\), so do one step of the primal simplex method.

The primal step direction is

\[\begin{split}\Delta x_\mathcal{B} = B^{-1} N e_j = \begin{bmatrix} 1 & -1\\ -1 & -1\\ 1 & & -1\\ -1 & & -1\\ 1 & & & -1\\ -1 & & & -1\\ 1 & & & & -1\\ -1 & & & & -1 \end{bmatrix} \begin{bmatrix} 1\\ 0\\ 0\\ 0\\ 0 \end{bmatrix} = \begin{bmatrix} 1\\ -1\\ 1\\ -1\\ 1\\ -1\\ 1\\ -1 \end{bmatrix}.\end{split}\]

The index of the leaving variable is

\[\begin{split}\argmin_{i \in \mathcal{B}} \left\{ \frac{x^*_i + \mu^* \bar{x}_i}{\Delta x_i} \colon \Delta x_i > 0 \right\} &= \argmin_{i \in \mathcal{B}}\left\{ \frac{1 + \mu^* (1)}{1}, \frac{2 + \mu^* (1)}{1}, \frac{4 + \mu^* (1)}{1}, \frac{8 + \mu^* (1)}{1} \right\}\\ &= 5.\end{split}\]

The dual step direction is

\[\begin{split}\Delta z_\mathcal{N} = -\left( B^{-1} N \right)^\top e_i = -\begin{bmatrix} 1 & -1 & 1 & -1 & 1 & -1 & 1 & -1\\ -1 & -1\\ & & -1 & -1\\ & & & & -1 & -1\\ & & & & & &-1 & -1 \end{bmatrix} \begin{bmatrix} 1\\ 0\\ 0\\ 0\\ 0\\ 0\\ 0\\ 0 \end{bmatrix} = \begin{bmatrix} -1\\ 1\\ 0\\ 0\\ 0 \end{bmatrix}.\end{split}\]

Step 3

The step length adjustments are given by

\[\begin{split}t &= \frac{x^*_i}{\Delta x_i} = \frac{1}{1} = 1 \quad & \quad \bar{t} = \frac{\bar{x}_i}{\Delta x_i} = \frac{1}{1} = 1\\ s &= \frac{z^*_j}{\Delta z_j} = \frac{n\mu'}{-1} = -n\mu' \quad & \quad \bar{s} = \frac{\bar{z}_j}{\Delta z_j} = \frac{1}{-1} = -1.\end{split}\]

Step 4

The current primal and dual solutions are updated to

\[\begin{split}x^*_\mathcal{B} &\leftarrow x^*_\mathcal{B} - t \Delta x_\mathcal{B} = \begin{bmatrix} 1\\ -1\\ 2\\ -2\\ 4\\ -4\\ 8\\ -8 \end{bmatrix} - (1) \begin{bmatrix} 1\\ -1\\ 1\\ -1\\ 1\\ -1\\ 1\\ -1 \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 1\\ -1\\ 3\\ -3\\ 7\\ -7 \end{bmatrix} \quad & \quad x^*_j &\leftarrow t = 1\\ \bar{x}_\mathcal{B} &\leftarrow \bar{x}_\mathcal{B} - \bar{t} \Delta x_\mathcal{B} = \begin{bmatrix} 1\\ 1\\ 1\\ 1\\ 1\\ 1\\ 1\\ 1 \end{bmatrix} - (1) \begin{bmatrix} 1\\ -1\\ 1\\ -1\\ 1\\ -1\\ 1\\ -1 \end{bmatrix} = \begin{bmatrix} 0\\ 2\\ 0\\ 2\\ 0\\ 2\\ 0\\ 2 \end{bmatrix} \quad & \quad \bar{x}_j &\leftarrow \bar{t} = 1\end{split}\]

and

\[\begin{split}z^*_\mathcal{N} &\leftarrow z^*_\mathcal{N} - s \Delta z_\mathcal{N} = \begin{bmatrix} n\mu' \\ 1\\ 1\\ 1\\ 1 \end{bmatrix} - (-n\mu') \begin{bmatrix} -1\\ 1\\ 0\\ 0\\ 0 \end{bmatrix} = \begin{bmatrix} 0\\ 1 + n\mu'\\ 1\\ 1\\ 1 \end{bmatrix} \quad & \quad z^*_i &\leftarrow s = -n\mu'\\ \bar{z}_\mathcal{N} &\leftarrow \bar{z}_\mathcal{N} - \bar{s} \Delta z_\mathcal{N} = \begin{bmatrix} 1\\ 1\\ 1\\ 1\\ 1 \end{bmatrix} - (-1) \begin{bmatrix} -1\\ 1\\ 0\\ 0\\ 0 \end{bmatrix} = \begin{bmatrix} 0\\ 2\\ 1\\ 1\\ 1 \end{bmatrix} \quad & \quad \bar{z}_i &\leftarrow \bar{s} = -1.\end{split}\]

Step 5

The current basis is updated to

\[\begin{split}\mathcal{B} &= \{ 0, 6, 7, 8, 9, 10, 11, 12 \} \quad & \quad \mathcal{N} &= \{ 5, 1, 2, 3, 4 \}\\ B &= \begin{bmatrix} 1\\ -1 & 1\\ 1 & & 1\\ -1 & & & 1\\ 1 & & & & 1\\ -1 & & & & & 1\\ 1 & & & & & & 1\\ -1 & & & & & & & 1 \end{bmatrix} \quad & \quad N &= \begin{bmatrix} 1 & -1\\ & -1\\ & & -1\\ & & -1\\ & & & -1\\ & & & -1\\ & & & & -1\\ & & & & -1 \end{bmatrix}\\ x^*_\mathcal{B} &= \begin{bmatrix} 1\\ 0\\ 1\\ -1\\ 3\\ -3\\ 7\\ -7 \end{bmatrix} \quad & \quad z^*_\mathcal{N} &= \begin{bmatrix} -n\mu' \\ 1 + n\mu'\\ 1\\ 1\\ 1 \end{bmatrix}\\ \bar{x}_\mathcal{B} &= \begin{bmatrix} 1\\ 2\\ 0\\ 2\\ 0\\ 2\\ 0\\ 2 \end{bmatrix} \quad & \quad \bar{z}_\mathcal{N} &= \begin{bmatrix} -1\\ 2\\ 1\\ 1\\ 1 \end{bmatrix}.\end{split}\]

The current range for \(\mu\) is

\[\frac{-1 - n\mu'}{2} \leq \mu \leq -n\mu'.\]

By inspection, the dictionary is optimal as long as \(\mu' \leq -\frac{8}{n}\).

Iteration 2¶

Step 1

Computing

\[\begin{split}\mu^* &= \max\left\{ \max_{j \in \mathcal{N}, \bar{z}_j > 0} -\frac{z^*_j}{\bar{z}_j}, \max_{i \in \mathcal{B}, \bar{x}_i > 0} -\frac{x^*_i}{\bar{x}_i}, \right\}\\ &= \max\left\{ \max\left\{ -\frac{1 + n\mu'}{2}, -\frac{1}{1}, -\frac{1}{1}, -\frac{1}{1} \right\}, \max\left\{ -\frac{1}{1}, -\frac{0}{2}, -\frac{-1}{2}, -\frac{-3}{2}, -\frac{-7}{2} \right\} \right\}\\ &= -\frac{1 + n\mu'}{2}\end{split}\]

shows that \(x_1\) is the entering variable for \(\mu' < -\frac{8}{n}\).

Step 2

The maximum is achieved by \(j = 1 \in \mathcal{N}\), so do one step of the primal simplex method.

The primal step direction is

\[\begin{split}\Delta x_\mathcal{B} = B^{-1} N e_j = \begin{bmatrix} 1 & -1\\ 1 & -2\\ -1 & 1 & -1\\ 1 & -1 & -1\\ -1 & 1 & 0 & -1\\ 1 & -1 & 0 & -1\\ -1 & 1 & 0 & 0 & -1\\ 1 & -1 & 0 & 0 & -1 \end{bmatrix} \begin{bmatrix} 0\\ 1\\ 0\\ 0\\ 0 \end{bmatrix} = \begin{bmatrix} -1\\ -2\\ 1\\ -1\\ 1\\ -1\\ 1\\ -1 \end{bmatrix}.\end{split}\]

The index of the leaving variable is

\[\begin{split}\argmin_{i \in \mathcal{B}} \left\{ \frac{x^*_i + \mu^* \bar{x}_i}{\Delta x_i} \colon \Delta x_i > 0 \right\} &= \argmin_{i \in \mathcal{B}}\left\{ \frac{1 + \mu^* (0)}{1}, \frac{3 + \mu^* (0)}{1}, \frac{7 + \mu^* (0)}{1} \right\}\\ &= 7.\end{split}\]

The dual step direction is

\[\begin{split}\Delta z_\mathcal{N} = -\left( B^{-1} N \right)^\top e_i = -\begin{bmatrix} 1 & 1 & -1 & 1 & -1 & 1 & -1 & 1\\ -1 & -2 & 1 & -1 & 1 & -1 & 1 & -1\\ & & -1 & -1\\ & & & & -1 & -1\\ & & & & & & -1 & -1 \end{bmatrix} \begin{bmatrix} 0\\ 0\\ 1\\ 0\\ 0\\ 0\\ 0\\ 0 \end{bmatrix} = \begin{bmatrix} 1\\ -1\\ 1\\ 0\\ 0 \end{bmatrix}.\end{split}\]

Step 3

The step length adjustments are given by

\[\begin{split}t &= \frac{x^*_i}{\Delta x_i} = \frac{1}{1} = 1 \quad & \quad \bar{t} = \frac{\bar{x}_i}{\Delta x_i} = \frac{0}{1} = 0\\ s &= \frac{z^*_j}{\Delta z_j} = \frac{1 + n\mu'}{-1} = -1 - n\mu' \quad & \quad \bar{s} = \frac{\bar{z}_j}{\Delta z_j} = \frac{2}{-1} = -2.\end{split}\]

Step 4

The current primal and dual solutions are updated to

\[\begin{split}x^*_\mathcal{B} &\leftarrow x^*_\mathcal{B} - t \Delta x_\mathcal{B} = \begin{bmatrix} 1\\ 0\\ 1\\ -1\\ 3\\ -3\\ 7\\ -7 \end{bmatrix} - (1) \begin{bmatrix} -1\\ -2\\ 1\\ -1\\ 1\\ -1\\ 1\\ -1 \end{bmatrix} = \begin{bmatrix} 2\\ 2\\ 0\\ 0\\ 2\\ -2\\ 6\\ -6 \end{bmatrix} \quad & \quad x^*_j &\leftarrow t = 1\\ \bar{x}_\mathcal{B} &\leftarrow \bar{x}_\mathcal{B} - \bar{t} \Delta x_\mathcal{B} = \begin{bmatrix} 1\\ 2\\ 0\\ 2\\ 0\\ 2\\ 0\\ 2 \end{bmatrix} - (0) \begin{bmatrix} -1\\ -2\\ 1\\ -1\\ 1\\ -1\\ 1\\ -1 \end{bmatrix} = \begin{bmatrix} 1\\ 2\\ 0\\ 2\\ 0\\ 2\\ 0\\ 2 \end{bmatrix} \quad & \quad \bar{x}_j &\leftarrow \bar{t} = 0\end{split}\]

and

\[\begin{split}z^*_\mathcal{N} &\leftarrow z^*_\mathcal{N} - s \Delta z_\mathcal{N} = \begin{bmatrix} -n\mu' \\ 1 + n\mu'\\ 1\\ 1\\ 1 \end{bmatrix} - (-1 - n\mu') \begin{bmatrix} 1\\ -1\\ 1\\ 0\\ 0 \end{bmatrix} = \begin{bmatrix} 1\\ 0\\ 2 + n\mu'\\ 0\\ 0 \end{bmatrix} \quad & \quad z^*_i &\leftarrow s = -1 - n\mu'\\ \bar{z}_\mathcal{N} &\leftarrow \bar{z}_\mathcal{N} - \bar{s} \Delta z_\mathcal{N} = \begin{bmatrix} -1\\ 2\\ 1\\ 1\\ 1 \end{bmatrix} - (-2) \begin{bmatrix} 1\\ -1\\ 1\\ 0\\ 0 \end{bmatrix} = \begin{bmatrix} 1\\ 0\\ 3\\ 1\\ 1 \end{bmatrix} \quad & \quad \bar{z}_i &\leftarrow \bar{s} = -2.\end{split}\]

Step 5

The current basis is updated to

\[\begin{split}\mathcal{B} &= \{ 0, 6, 1, 8, 9, 10, 11, 12 \} \quad & \quad \mathcal{N} &= \{ 5, 7, 2, 3, 4 \}\\ B &= \begin{bmatrix} 1 & & -1\\ -1 & 1 & -1\\ 1\\ -1 & & & 1\\ 1 & & & & 1\\ -1 & & & & & 1\\ 1 & & & & & & 1\\ -1 & & & & & & & 1 \end{bmatrix} \quad & \quad N &= \begin{bmatrix} 1\\ \\ & 1 & -1\\ & & -1\\ & & & -1\\ & & & -1\\ & & & & -1\\ & & & & -1 \end{bmatrix}\\ x^*_\mathcal{B} &= \begin{bmatrix} 2\\ 2\\ 1\\ 0\\ 2\\ -2\\ 6\\ -6 \end{bmatrix} \quad & \quad z^*_\mathcal{N} &= \begin{bmatrix} 1\\ -1 - n\mu'\\ 2 + n\mu'\\ 0\\ 0 \end{bmatrix}\\ \bar{x}_\mathcal{B} &= \begin{bmatrix} 1\\ 2\\ 0\\ 2\\ 0\\ 2\\ 0\\ 2 \end{bmatrix} \quad & \quad \bar{z}_\mathcal{N} &= \begin{bmatrix} 1\\ -2\\ 3\\ 1\\ 1 \end{bmatrix}.\end{split}\]

The current range for \(\mu\) is

\[\frac{-2 - n\mu'}{3} \leq \mu \leq \frac{-1 - n\mu'}{2}.\]

By inspection, the dictionary is optimal as long as \(\mu' \leq -\frac{2}{n}\).

Iteration 3¶

Step 1

Computing

\[\begin{split}\mu^* &= \max\left\{ \max_{j \in \mathcal{N}, \bar{z}_j > 0} -\frac{z^*_j}{\bar{z}_j}, \max_{i \in \mathcal{B}, \bar{x}_i > 0} -\frac{x^*_i}{\bar{x}_i}, \right\}\\ &= \max\left\{ \max\left\{ -\frac{1}{1}, -\frac{2 + n\mu'}{3}, -\frac{0}{1}, -\frac{0}{1} \right\}, \max\left\{ -\frac{2}{1}, -\frac{2}{2}, -\frac{0}{2}, -\frac{-2}{2}, -\frac{-6}{2} \right\} \right\}\\ &= -\frac{1 + n\mu'}{2}\end{split}\]

shows that \(x_2\) is the entering variable for \(\mu' < -\frac{2}{n}\).

Step 2

The maximum is achieved by \(j = 2 \in \mathcal{N}\), so do one step of the primal simplex method.

The primal step direction is

\[\begin{split}\Delta x_\mathcal{B} = B^{-1} N e_j = \begin{bmatrix} & 1 & -1\\ -1 & 2 & -2\\ -1 & 1 & -1\\ & 1 & -2\\ & -1 & 1 & -1\\ & 1 & -1 & -1\\ & -1 & 1 & & -1\\ & 1 & -1 & & -1 \end{bmatrix} \begin{bmatrix} 0\\ 0\\ 1\\ 0\\ 0 \end{bmatrix} = \begin{bmatrix} -1\\ -2\\ -1\\ -2\\ 1\\ -1\\ 1\\ -1 \end{bmatrix}.\end{split}\]

The index of the leaving variable is

\[\begin{split}\argmin_{i \in \mathcal{B}} \left\{ \frac{x^*_i + \mu^* \bar{x}_i}{\Delta x_i} \colon \Delta x_i > 0 \right\} &= \argmin_{i \in \mathcal{B}}\left\{ \frac{2 + \mu^* (0)}{1}, \frac{6 + \mu^* (0)}{1} \right\}\\ &= 9.\end{split}\]

The dual step direction is

\[\begin{split}\Delta z_\mathcal{N} = -\left( B^{-1} N \right)^\top e_i = -\begin{bmatrix} & -1 & -1\\ 1 & 2 & 1 & 1 & -1 & 1 & -1 & 1\\ -1 & -2 & -1 & -2 & 1 & -1 & 1 & -1\\ & & & & -1 & -1\\ & & & & & & -1 & -1 \end{bmatrix} \begin{bmatrix} 0\\ 0\\ 0\\ 0\\ 1\\ 0\\ 0\\ 0 \end{bmatrix} = \begin{bmatrix} 0\\ 1\\ -1\\ 1\\ 0 \end{bmatrix}.\end{split}\]

Step 3

The step length adjustments are given by

\[\begin{split}t &= \frac{x^*_i}{\Delta x_i} = \frac{2}{1} = 2 \quad & \quad \bar{t} = \frac{\bar{x}_i}{\Delta x_i} = \frac{0}{1} = 0\\ s &= \frac{z^*_j}{\Delta z_j} = \frac{2 + n\mu'}{-1} = -2 - n\mu' \quad & \quad \bar{s} = \frac{\bar{z}_j}{\Delta z_j} = \frac{3}{-1} = -3.\end{split}\]

Step 4

The current primal and dual solutions are updated to

\[\begin{split}x^*_\mathcal{B} &\leftarrow x^*_\mathcal{B} - t \Delta x_\mathcal{B} = \begin{bmatrix} 2\\ 2\\ 1\\ 0\\ 2\\ -2\\ 6\\ -6 \end{bmatrix} - (2) \begin{bmatrix} -1\\ -2\\ -1\\ -2\\ 1\\ -1\\ 1\\ -1 \end{bmatrix} = \begin{bmatrix} 4\\ 6\\ 3\\ 4\\ 0\\ 0\\ 4\\ -4 \end{bmatrix} \quad & \quad x^*_j &\leftarrow t = 2\\ \bar{x}_\mathcal{B} &\leftarrow \bar{x}_\mathcal{B} - \bar{t} \Delta x_\mathcal{B} = \begin{bmatrix} 1\\ 2\\ 0\\ 2\\ 0\\ 2\\ 0\\ 2 \end{bmatrix} - (0) \begin{bmatrix} -1\\ -2\\ -1\\ -2\\ 1\\ -1\\ 1\\ -1 \end{bmatrix} = \begin{bmatrix} 1\\ 2\\ 0\\ 2\\ 0\\ 2\\ 0\\ 2 \end{bmatrix} \quad & \quad \bar{x}_j &\leftarrow \bar{t} = 0\end{split}\]

and

\[\begin{split}z^*_\mathcal{N} &\leftarrow z^*_\mathcal{N} - s \Delta z_\mathcal{N} = \begin{bmatrix} 1\\ -1 - n\mu'\\ 2 + n\mu'\\ 0\\ 0 \end{bmatrix} - (-2 - n\mu') \begin{bmatrix} 0\\ 1\\ -1\\ 1\\ 0 \end{bmatrix} = \begin{bmatrix} 1\\ 1\\ 0\\ 2 + n\mu'\\ 0 \end{bmatrix} \quad & \quad z^*_i &\leftarrow s = -2 - n\mu'\\ \bar{z}_\mathcal{N} &\leftarrow \bar{z}_\mathcal{N} - \bar{s} \Delta z_\mathcal{N} = \begin{bmatrix} 1\\ -2\\ 3\\ 1\\ 1 \end{bmatrix} - (-3) \begin{bmatrix} 0\\ 1\\ -1\\ 1\\ 0 \end{bmatrix} = \begin{bmatrix} 1\\ 1\\ 0\\ 4\\ 1 \end{bmatrix} \quad & \quad \bar{z}_i &\leftarrow \bar{s} = -3.\end{split}\]

Step 5

The current basis is updated to

\[\begin{split}\mathcal{B} &= \{ 0, 6, 1, 8, 2, 10, 11, 12 \} \quad & \quad \mathcal{N} &= \{ 5, 7, 9, 3, 4 \}\\ B &= \begin{bmatrix} 1 & & -1\\ -1 & 1 & -1\\ 1 & & & & -1\\ -1 & & & 1 & -1\\ 1\\ -1 & & & & & 1\\ 1 & & & & & & 1\\ -1 & & & & & & & 1 \end{bmatrix} \quad & \quad N &= \begin{bmatrix} 1\\ \\ & 1\\ \\ & & 1 & -1\\ & & & -1\\ & & & & -1\\ & & & & -1 \end{bmatrix}\\ x^*_\mathcal{B} &= \begin{bmatrix} 4\\ 6\\ 3\\ 4\\ 2\\ 0\\ 4\\ -4 \end{bmatrix} \quad & \quad z^*_\mathcal{N} &= \begin{bmatrix} 1\\ 1\\ -2 - n\mu'\\ 2 + n\mu'\\ 0 \end{bmatrix}\\ \bar{x}_\mathcal{B} &= \begin{bmatrix} 1\\ 2\\ 0\\ 2\\ 0\\ 2\\ 0\\ 2 \end{bmatrix} \quad & \quad \bar{z}_\mathcal{N} &= \begin{bmatrix} 1\\ 1\\ -3\\ 4\\ 1 \end{bmatrix}.\end{split}\]

The current range for \(\mu\) is

\[\frac{-2 - n\mu'}{4} \leq \mu \leq \frac{-2 - n\mu'}{3}.\]

By inspection, the dictionary is optimal as long as \(\mu' \leq -\frac{10}{n}\).

Iteration 4¶

Step 1

Computing

\[\begin{split}\mu^* &= \max\left\{ \max_{j \in \mathcal{N}, \bar{z}_j > 0} -\frac{z^*_j}{\bar{z}_j}, \max_{i \in \mathcal{B}, \bar{x}_i > 0} -\frac{x^*_i}{\bar{x}_i}, \right\}\\ &= \max\left\{ \max\left\{ -\frac{1}{1}, -\frac{1}{1}, -\frac{2 + n\mu'}{4}, -\frac{0}{1} \right\}, \max\left\{ -\frac{4}{1}, -\frac{6}{2}, -\frac{4}{2}, -\frac{0}{2}, -\frac{-4}{2} \right\} \right\}\\ &= -\frac{2 + n\mu'}{4}\end{split}\]

shows that \(x_3\) is the entering variable for \(\mu' < -\frac{10}{n}\).

Step 2

The maximum is achieved by \(j = 3 \in \mathcal{N}\), so do one step of the primal simplex method.

The primal step direction is

\[\begin{split}\Delta x_\mathcal{B} = B^{-1} N e_j = \begin{bmatrix} & & 1 & -1\\ -1 & & 2 & -2\\ -1 & & 1 & -1\\ & -1 & 2 & -2\\ & -1 & 1 & -1\\ & & 1 & -2\\ & & -1 & 1 & -1\\ & & 1 & -1 & -1 \end{bmatrix} \begin{bmatrix} 0\\ 0\\ 0\\ 1\\ 0 \end{bmatrix} = \begin{bmatrix} -1\\ -2\\ -1\\ -2\\ -1\\ -2\\ 1\\ -1 \end{bmatrix}.\end{split}\]

The index of the leaving variable is

The dual step direction is

\[\begin{split}\Delta z_\mathcal{N} = -\left( B^{-1} N \right)^\top e_i = -\begin{bmatrix} & -1 & -1\\ & & & -1 & -1\\ 1 & 2 & 1 & 2 & 1 & 1 & -1 & 1\\ -1 & -2 & -1 & -2 & -1 & -2 & 1 & -1\\ & & & & & & -1 & -1 \end{bmatrix} \begin{bmatrix} 0\\ 0\\ 0\\ 0\\ 0\\ 0\\ 1\\ 0 \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 1\\ -1\\ 1 \end{bmatrix}.\end{split}\]

Step 3

The step length adjustments are given by

\[\begin{split}t &= \frac{x^*_i}{\Delta x_i} = \frac{4}{1} = 4 \quad & \quad \bar{t} = \frac{\bar{x}_i}{\Delta x_i} = \frac{0}{1} = 0\\ s &= \frac{z^*_j}{\Delta z_j} = \frac{2 + n\mu'}{-1} = -2 - n\mu' \quad & \quad \bar{s} = \frac{\bar{z}_j}{\Delta z_j} = \frac{4}{-1} = -4.\end{split}\]

Step 4

The current primal and dual solutions are updated to

\[\begin{split}x^*_\mathcal{B} &\leftarrow x^*_\mathcal{B} - t \Delta x_\mathcal{B} = \begin{bmatrix} 4\\ 6\\ 3\\ 4\\ 2\\ 0\\ 4\\ -4 \end{bmatrix} - (4) \begin{bmatrix} -1\\ -2\\ -1\\ -2\\ -1\\ -2\\ 1\\ -1 \end{bmatrix} = \begin{bmatrix} 8\\ 14\\ 7\\ 12\\ 6\\ 8\\ 0\\ 0 \end{bmatrix} \quad & \quad x^*_j &\leftarrow t = 4\\ \bar{x}_\mathcal{B} &\leftarrow \bar{x}_\mathcal{B} - \bar{t} \Delta x_\mathcal{B} = \begin{bmatrix} 1\\ 2\\ 0\\ 2\\ 0\\ 2\\ 0\\ 2 \end{bmatrix} - (0) \begin{bmatrix} -1\\ -2\\ -1\\ -2\\ -1\\ -2\\ 1\\ -1 \end{bmatrix} = \begin{bmatrix} 1\\ 2\\ 0\\ 2\\ 0\\ 2\\ 0\\ 2 \end{bmatrix} \quad & \quad \bar{x}_j &\leftarrow \bar{t} = 0\end{split}\]

and

\[\begin{split}z^*_\mathcal{N} &\leftarrow z^*_\mathcal{N} - s \Delta z_\mathcal{N} = \begin{bmatrix} 1\\ 1\\ -2 - n\mu'\\ 2 + n\mu'\\ 0 \end{bmatrix} - (-2 - n\mu') \begin{bmatrix} 0\\ 0\\ 1\\ -1\\ 1 \end{bmatrix} = \begin{bmatrix} 1\\ 1\\ 0\\ 0\\ 2 + n\mu' \end{bmatrix} \quad & \quad z^*_i &\leftarrow s = -2 - n\mu'\\ \bar{z}_\mathcal{N} &\leftarrow \bar{z}_\mathcal{N} - \bar{s} \Delta z_\mathcal{N} = \begin{bmatrix} 1\\ 1\\ -3\\ 4\\ 1 \end{bmatrix} - (-4) \begin{bmatrix} 0\\ 0\\ 1\\ -1\\ 1 \end{bmatrix} = \begin{bmatrix} 1\\ 1\\ 1\\ 0\\ 5 \end{bmatrix} \quad & \quad \bar{z}_i &\leftarrow \bar{s} = -4.\end{split}\]

Step 5

The current basis is updated to

\[\begin{split}\mathcal{B} &= \{ 0, 6, 1, 8, 2, 10, 3, 12 \} \quad & \quad \mathcal{N} &= \{ 5, 7, 9, 11, 4 \}\\ B &= \begin{bmatrix} 1 & & -1\\ -1 & 1 & -1\\ 1 & & & & -1\\ -1 & & & 1 & -1\\ 1 & & & & & & -1\\ -1 & & & & & 1 & -1\\ 1\\ -1 & & & & & & & 1 \end{bmatrix} \quad & \quad N &= \begin{bmatrix} 1\\ \\ & 1\\ \\ & & 1\\ \\ & & & 1 & -1\\ & & & & -1 \end{bmatrix}\\ x^*_\mathcal{B} &= \begin{bmatrix} 8\\ 14\\ 7\\ 12\\ 6\\ 8\\ 4\\ 0 \end{bmatrix} \quad & \quad z^*_\mathcal{N} &= \begin{bmatrix} 1\\ 1\\ 0\\ -2 - n\mu'\\ 2 + n\mu' \end{bmatrix}\\ \bar{x}_\mathcal{B} &= \begin{bmatrix} 1\\ 2\\ 0\\ 2\\ 0\\ 2\\ 0\\ 2 \end{bmatrix} \quad & \quad \bar{z}_\mathcal{N} &= \begin{bmatrix} 1\\ 1\\ 1\\ -4\\ 5 \end{bmatrix}.\end{split}\]

The current range for \(\mu\) is

\[\frac{-2 - n\mu'}{5} \leq \mu \leq \frac{-2 - n\mu'}{4}.\]

By inspection, the dictionary is optimal as long as \(\mu' \leq -\frac{2}{n}\).

Iteration 5¶

Step 1

Computing

\[\begin{split}\mu^* &= \max\left\{ \max_{j \in \mathcal{N}, \bar{z}_j > 0} -\frac{z^*_j}{\bar{z}_j}, \max_{i \in \mathcal{B}, \bar{x}_i > 0} -\frac{x^*_i}{\bar{x}_i}, \right\}\\ &= \max\left\{ \max\left\{ -\frac{1}{1}, -\frac{1}{1}, -\frac{0}{1}, -\frac{2 + n\mu'}{5}, \right\}, \max\left\{ -\frac{8}{1}, -\frac{14}{2}, -\frac{12}{2}, -\frac{8}{2}, -\frac{0}{2} \right\} \right\}\\ &= -\frac{2 + n\mu'}{5}\end{split}\]

shows that \(x_4\) is the entering variable for \(\mu' < -\frac{2}{n}\).

Step 2

The maximum is achieved by \(j = 4 \in \mathcal{N}\), so do one step of the primal simplex method.

The primal step direction is

\[\begin{split}\Delta x_\mathcal{B} = B^{-1} N e_j = \begin{bmatrix} & & & 1 & -1\\ -1 & & & 2 & -2\\ -1 & & & 1 & -1\\ & -1 & & 2 & -2\\ & -1 & & 1 & -1\\ & & -1 & 2 & -2\\ & & -1 & 1 & -1\\ & & & 1 & -2 \end{bmatrix} \begin{bmatrix} 0\\ 0\\ 0\\ 0\\ 1 \end{bmatrix} = \begin{bmatrix} -1\\ -2\\ -1\\ -2\\ -1\\ -2\\ -1\\ -2 \end{bmatrix}.\end{split}\]

The index of the leaving variable is

\[\argmin_{i \in \mathcal{B}} \left\{ \frac{x^*_i + \mu^* \bar{x}_i}{\Delta x_i} \colon \Delta x_i > 0 \right\} = \emptyset.\]

Therefore, the primal is unbounded (i.e. the dual is infeasible) when \(\mu' < -\frac{2}{n}\).

(d)¶

Even though there is a pattern for the \(k\)-th iteration, it is unclear whether the results of (c) is correct because \(\mu\) and \(\mu'\) have strict boundary conditions.

[6]:

import numpy

n = 10
b = numpy.random.randint(-100, 100, size=n)
mu = 0
print(numpy.sort(b))
from pymprog import model

lp = model('L1-Regression LP')

#lp.verbose(True)
x = lp.var('x', bounds=(None, None))
t = lp.var('t', n, bounds=(0, None))

lp.minimize(sum(t_j + mu * (x - b_j) for b_j, t_j in zip(b, t)))

for i in range(n):
    -t[i] <= x - b[i] <= t[i]

lp.solve()
lp.sensitivity()

lp.end()

[-80 -61 -61 -30   7  23  48  49  52  87]

PyMathProg 1.0 Sensitivity Report Created: 2020/04/05 Sun 03:20AM
================================================================================
Variable            Activity   Dual.Value     Obj.Coef   Range.From   Range.Till
--------------------------------------------------------------------------------
*x                        23            0            0           -1            0
*t[0]                    103            0            1            0            1
*t[1]                     26            0            1            1            2
*t[2]                     25            0            1            1            2
*t[3]                     29            0            1            1            2
*t[4]                     84            0            1            0            1
 t[5]                      0            0            1            1          inf
*t[6]                     53            0            1            0            1
*t[7]                     16            0            1            0            1
*t[8]                     84            0            1            0            1
*t[9]                     64            0            1            1            2
================================================================================
================================================================================
Constraint       Activity Dual.Value  Lower.Bnd  Upper.Bnd RangeLower RangeUpper
--------------------------------------------------------------------------------
*R1                  -126          0       -inf         80       -126       -126
 R2                   -80         -1       -inf        -80 -1.79769e+308         23
 R3                   -49         -1       -inf        -49 -1.79769e+308        -23
*R4                    -3          0       -inf         49         -3         -3
 R5                   -48         -1       -inf        -48 -1.79769e+308        -23
*R6                    -2          0       -inf         48         -2         -2
 R7                   -52         -1       -inf        -52 -1.79769e+308        -23
*R8                    -6          0       -inf         52         -6         -6
*R9                  -107          0       -inf         61       -107       -107
 R10                  -61         -1       -inf        -61 -1.79769e+308         23
 R11                  -23         -1       -inf        -23        -23         -7
*R12                   23          0       -inf         23         23         23
*R13                  -76          0       -inf         30        -76        -76
 R14                  -30         -1       -inf        -30 -1.79769e+308         23
*R15                  -39          0       -inf         -7        -39        -39
 R16                    7         -1       -inf          7 -1.79769e+308         23
*R17                 -107          0       -inf         61       -107       -107
 R18                  -61         -1       -inf        -61 -1.79769e+308         23
 R19                  -87         -1       -inf        -87 -1.79769e+308        -23
*R20                  -41          0       -inf         87        -41        -41
================================================================================

[6]:

model('L1-Regression LP') is not the default model.

Exercise 12.11¶

Already proved in another exercise.

References

Goc: Mark S. Gockenbach. Introduction to inequality-constrained optimization: the logarithmic barrier method. http://www.math.mtu.edu/ msgocken/ma5630spring2003/lectures/bar/bar.pdf. Accessed on 2017-09-09.
Haj94: Mowaffaq Hajja. An advanced calculus approach to finding the fermat point. Mathematics Magazine, 67(1):29–34, 1994.
Lus: Richard Lusby. Karush-kuhn-tucker conditions. http://www2.imm.dtu.dk/courses/02711/lecture3.pdf. Accessed on 2017-09-09.
Sul: Josephine Sullivan. Lagrange multipliers and the karush-kuhn-tucker conditions. http://www.csc.kth.se/utbildning/kth/kurser/DD3364/Lectures/KKT.pdf. Accessed on 2017-09-09.
Wil: Joshua Wilde. Math camp notes: kuhn-tucker theorem. http://faculty.cas.usf.edu/jkwilde/mathcamp/Kuhn-Tucker.pdf. Accessed on 2017-09-09.