The Simplex Method
Linear programming, often referred to as linear optimization, is a method for optimizing an objective based on a set of linear constraints. LP was popularized during WW II by George B. Dantzig to help solve logistical problems in the US Air Force. Since then, LP has found applications in personnel scheduling, network flows, production optimization, best-fit lines, and much more. One of the most well-know algorithms for solving linear programs is the Simplex Method, developed in 1947. The purpose of this article is to describe some of the aspects of the Simplex method as I study for my exams in Math 407: Linear Optimization during my senior year at the University of Washington.
Standard Form
Although writing a linear program in standard form is not required for most modern applications, most theory assumes that the problem is stated in standard form. A problem is in standard form if it appears as follows:
\[\begin{array}{l l} \text{Maximize} & c^{T}x\\ \text{Subject to} & Ax \le b\\ & x \ge 0 \end{array}\]It is also that case that any linear program can be converted to standard form. To see this, let us consider an example:
\[\begin{array}{lccccccc} \text{Minimize} & x_1 & - & 12x_2 & - & 2x_3\\ \text{Subject to} & 5x_5 & - & x_2 & - & 2x_3 & = & 10\\ & 2x_1 & + & x_2 & - & 2x_3 & \ge & -30\\ & & & x_2 \le 0 & &1 \le x_3 \le x_4 \end{array}\]Our linear program violates a number of constraints. First, it is not a maximization problem. Second, it violates our requirement that all constraints are “less than or equal to”. And finally, it violates our non-negativity constraints.
Note: Different sources and textbooks have different defintions for standard form. Notably, what form the constraints must be written in. This article follows the convention of Linear Programming by Vašek Chvátal (1981), which states that inequalities are written as “less than or equal to.”
First, let us convert our objective function. A maximization problem is equivalent to the negation of the respective minimization. That is, we can multiply the objective by negative 1 to achieve a maximization problem like so
\[\begin{array}{lccccccc} \text{Maximize} & -x_1 & + & 12x_2 & + & 2x_3\\ \end{array}\]Next, let us consider the first constraint. This violates our definition of standard form, since it is written as an equality. Any equality can be written as two inequalities, as such
\[\begin{array}{ccccccc} 5x_5 & - & x_2 & - & 2x_3 & \le & 10\\ 5x_5 & - & x_2 & - & 2x_3 & \ge & 10\\ \end{array}\]The second constraints violates our definition of standard form, however, so we must multiply it by negative one. Thus, the constraint can be restated in standard form as
\[\begin{array}{ccccccc} 5x_5 & - & x_2 & - & 2x_3 & \le & 10\\ -5x_5 & + & x_2 & + & 2x_3 & \le & -10\\ \end{array}\]Likewise, our third inequality can be converted to the desired form by, again, multiplying by negative one.
\[\begin{array}{ccccccc} -2x_1 & - & x_2 & + & 2x_3 & \le & 30\\ \end{array}\]As of now, our LP can be written in the following form:
\[\begin{array}{lccccccc} \text{Maximize} & -x_1 & + & 12x_2 & + & 2x_3\\ \text{Subject to} & 5x_5 & - & x_2 & - & 2x_3 & \le & 10\\ & -5x_5 & + & x_2 & + & 2x_3 & \le & -10\\ & -2x_1 & - & x_2 & + & 2x_3 & \le & 30\\ & & & x_2 \le 0 & &1 \le x_3 \le x_4 \end{array}\]Our problem is not yet in standard form, however, as it fails to satisfy our non-negativity requirement. There are a few different cases here which we will consider one at a time.
First, \(x_1\) is totally unconstrained. To remedy this, we can write \(x_1\) as the difference of two non-negative variables. The notation here is inconsistent between sources, but I will label them \(z_{1a}\) and \(z_{1b}\). Therefore, we denote \(x_1\) as follows:
\[\begin{align} x_1 & = z_{1a} - z_{1b}\\ 0 & \le z_{1a},z_{1b} \end{align}\]Since \(x_2\) is constraint below by 0, we can use the same method as \(x_1\)
\[\begin{align} x_2 & = z_{2a} - z_{2b}\\ 0 & \le z_{2a},z_{2b} \end{align}\]Now, let us consider the case of \(x_3\). Sine \(x_3\) is bounded above and below, we can separate this into two cases. The upper bound is a perfectly legitimate constraint that we can simply add to our LP
\[x_3 \le 4\]To deal with the lower bound, we introduce another non-negative variable \(z_3\) and rewrite \(x_3\) as follows:
\[\begin{align} x_3 & = z_{3} + 1\\ 0 & \le z_3 \end{align}\]All that is left is to substitute our new variables into the previous LP, giving us the following linear program:
\[\begin{array}{lccccccccc} \text{Maximize} & -z_{1a} & + & z_{1b} & + & 12z_{2a} & - & 12z_{2b} & + & 3z_3\\ \text{Subject to} & 5z_{1a} & - & 5z_{1b}& - & z_{2a} & - & z_{2b} & - & 2z_{3} & \le & 6\\ & -5z_{1a} & + & 5z_{1b}& + & z_{2a} & + & z_{2b} & + & 2z_{3} & \le & -14\\ & -2z_{1a} & + & 2z_{1b}& - & z_{2a} & + & z_{2b} & + & 20z_3 & \le & -10\\ & & & & & & & & & z_3 & \le & 2\\ & z_{1a} & & z_{1b} & & z_{2a} & & z_{2b} & & z_3 & \ge & 0 \end{array}\]This gives us our LP in standard form.
The following sections will be devoted to what Chvátal calls the “pitfalls” of the simplex method.
Dictionaries
The result of this article will discuss linear programs written in dictionary form. Formally, an LP presented as
\[\begin{array}{llll} \text{Maximize} & \displaystyle\sum_{j=1}^n c_j x_j \\ \text{Subject to} & \displaystyle\sum_{i=1}^n a_{ij}x_j & \le & b_i & (i=1,2,\dots,m)\\ & x_j & \ge & 0 & (j=1,2,\dots,m)\\ \end{array}\]Can be written using the following dictionary:
\[\begin{array}{lllllll} x_{n+i} & = & b_i & - & \displaystyle\sum_{j=1}^{n} a_{ij}x_j & (i = 1,2,\dots,m)\\ z & = & & & \displaystyle\sum_{j=1}^{n} c_j x_j \end{array}\]Variables that appear in the objective function are called non-basic and have a value of 0. Variables that do not appear in the objective function are called basic (together, they constitute our basis), and can take on any value determined by their respective equations.
This can also be written in Tableau form, but for the rest of this article we will use dictionaries.
The Algorithm
I won’t spend a lot of time discussing the details of the Simplex algorithm - there are probably hundreds of places that have written about it better than me. However, I will give a general layout of the algorithm so we can discuss some of its caveats and shortcomings in the rest of this article.
Given some dictionary, our goal is to increase the value of \(z\): the objective function. To do so, we pick some value \(x_i\), a non-basic variable, to enter our basis. Doing so is called a pivot, and we want to pick an \(x_i\) such that it increases the value of our objective function \(z\). There are numerous pivot rules, but the simplest is to pick the \(x_i\) which has the largest associated coefficient. Often times there are ties, in which case we defer to the \(x_i\) with the smallest index. Next, we must select some non-basic \(x_j\) to leave our basis. To pick \(x_j\) we select one which allows us to increase \(x_i\) by the largest amount without violating our non-negativity constraint. This is done by performing a ratio-test, which we perform by dividing the associated \(b_j\) with the coefficient of \(x_i\) in the respective basic equation. One we have made a selection, for \(x_j\), we solve the equation for \(x_i\) and make substitutions accordingly.
This process continues for a number of iterations. The algorithm terminates once it is no longer possible to increase the value of the objective function*. There are, however, some special cases which we will consider below.
Simplex: Initialization
Given some linear program, we can always construct an initial dictionary. However, such a dictionary is not always feasible. In general, an infeasible initial dictionary exists whenever there exists some value \(b_i\) in the linear program is negative as it violates the non-negativity constraint for the respective basic variable \(x_i\). To remedy this problem, we construct the auxillary problem. That is, for some initial linear program,
\[\begin{array}{llll} \text{Maximize} & \displaystyle\sum_{j=1}^n c_j x_j \\ \text{Subject to} & \displaystyle\sum_{i=1}^n a_{ij}x_j & \le & b_i & (i=1,2,\dots,m)\\ & x_j & \ge & 0 & (j=1,2,\dots,m)\\ \end{array}\]We construct the auxillary problem by introducing the variable \(x_0\) in the following manner.
\[\begin{array}{llll} \text{Maximize} & -x_0 \\ \text{Subject to} & \displaystyle\sum_{i=1}^n a_{ij}x_j - x_0 & \le & b_i & (i=1,2,\dots,m)\\ & x_j & \ge & 0 & (j=1,2,\dots,m)\\ \end{array}\]And the respective auxiliary dictionary:
\[\begin{array}{lllllll} x_{n+i} & = & b_i & - & \displaystyle\sum_{j=1}^{n} a_{ij}x_j & + & x_0 & (i = 1,2,\dots,m)\\ w & = & & & & - & x_0 \end{array}\]Now, under the normal rules of the Simplex algorithm, there are no valid pivots, since increasing the value of \(x_0\) would decrease the value of our objective function \(w\). However, only for the first pivot, we ignore this rule and perform a special pivot allowing \(x_0\) to enter our basis. We choose a leaving variable based on the \(b_i\) with the most negative value. This value gets added to all our basic equations and gives us a feasible dictionary, allowing us to continue with simplex as normal.
Once simplex as terminated there are two possible cases. In the first case, \(x_0\) is non-basic and \(w=0\). In this case, we set \(x_0 = 0\) and substitute our nonbasic-values into \(z\) and continue solving the LP. This is commonly called Phase II. In the second case, \(x_0\) is basic and \(w\) is non-zero. In this case, our LP is infeasible and we do not continue onto Phase II.
Geometrically speaking, one might recognize an infeasible initial dictionary if the feasible region does not contain the origin.
Simplex: Termination
There are various issues related to the termination of the simplex algorithm. One such problem is the uniqueness of our solution. In certain situations, there may an infinite number of solutions that give us the same optimal value. To see one such example, consider the following dictionary.
\[\begin{array}{llllll} x_4 & = & 3 & + & x_2 & - & 2x_5 & + & 7x_3\\ x_1 & = & 1 & - & 5x_2 & + & 6x_5 & - & 8x_3\\ x_6 & = & 4 & + & 9x_2 & + & 2x_5 & - & x_3\\ z & = & 7 & & & & & - & x_3 \end{array}\]There are clearly no more pivots to make, as the associated coefficient with \(x_3\) is negative. However, we may increase the value of \(x_2\) and \(x_5\) within a certain range without affecting the value of the objective function \(z\). Therefore, we may write our solution as a system of inequalities.
\[\begin{array}{rll} -x_2 & + & 2x_5 & \le & 3\\ 5x_2 & - & 6x_5 & \le & 1\\ -9x_2 & - & 2x_5 & \le & 4\\ \end{array}\]Another such problem is that of an unbounded linear program. This is a system in which we may increase the value of some variable, and the objective function, indefinitely. Consider the following example:
\[\begin{array}{lllll} x_4 & = & 4 & + & 3x_3 & - & 2x_5\\ x_2 & = & 4 & + & 2x_3 & - & x_5\\ x_1 & = & 3 & + & x_3 & - & x_5\\ z & = & 13 & + & 5x_3 & - & 4x_5 \end{array}\]Normally, we would pick \(x_3\) to be our entering variable. However, it becomes immediately clear that \(x_3\) is unconstrained, as all associated coefficients in our basis are positive. Thus, we say \(x_3\) is a free variable and set \(x_3 = t\). Then, a solution of our system can be written in terms of \(t\).
\[\begin{array}{lll} x_1 & = & 3 & + & t\\ x_2 & = & 4 & + & 2t\\ x_3 & = & & & t\\ x_4 & = & 4 & + & 3t\\ x_5 & = & 0\\ z & = & 13 & + & 5t \end{array}\]Note that there is a distinct difference between an unbounded LP and an unbounded feasible region. It is possible that the feasible region is infinite in size but still has a single solution. Geometrically, the value of the objective might decrease as you move in the unbounded direction, in which case the optimal solution is in fact at the intersection of multiple hyperplanes.
The final issue with regards to termination is degeneracy. Consider the following dictionary:
\[\begin{array}{llll} x_3 & = & & & x_1 & - & x_2\\ x_4 & = & 2 & - & x_1\\ z & = & & & & & x_2\\ \end{array}\]Clearly, \(x_2\) is the only candidate for entering our basis. However, since \(b_3 = 0\), we cannot increase the value of our objective function. In certain cases, the algorithm may return to the same dictionary in which case it will never terminate. This is called cycling. There are several methods for breaking cycling, such as always picking the value with the smallest index or by performing a perturbation.