COMP3121

[use light mode]

Course Outline
- Assessment
Introduction
- Proofs
- Stable Matching Problem
  - Gale-Shapley Algorithm
Divide and Conquer
Integer Multiplication I
- Master Theorem
- Arithmetic Operations
Integer Multiplication II
- Generalising Karatsuba's Algorithm
Fast Fourier Transform
- Complex Numbers
- The Fast Fourier Transform
The Greedy Method
- Single Source Shortest Paths I
  - Dijkstra's Shortest Paths Algorithm
- Minimum Spanning Trees
  - Kruskal's Algorithm
  - Union-Find
Dynamic Programming
- Single Source Shortest Paths II
  - Bellman-Ford Algorithm
- All Pairs Shortest Paths
  - Floyd-Warshall Algorithm
Maximum Flow
String Matching
- Hashing
  - Rabin-Karp Algorithm
- Finite Automata
  - Knuth-Morris-Pratt Algorithm
Linear Programming
- Weak Duality Theorem
Intractability

Course Outline

Assessment

4 assignments (10% each)
Final Exam (60%)

Introduction

An algorithm is a collection of precisely defined steps that are executable using certain specified mechanical methods.
By "mechanical" we mean the methods that do not involve any creativity, intuition or even intelligence.
We deal with sequential (not parallel) and deterministic (not randomised) algorithms.

Proofs

Sometimes it is not obvious that an algorithm:
- Terminates,
- Will not run in exponentially many steps (in the size of the input), and
- Produces a desired solution.
Mathematical proofs are needed for such circumstances.

Stable Matching Problem

Suppose there are $n$ hospitals and $n$ doctors. Every hospital submits a list of doctor preferences and every doctor submits a list of hospital preferences.
A stable matching algorithm produces a set of $n$ pairs $(h, d)$ for a hospital $h$ and a doctor $d$ so that the following never happens:
- For two pairs $p = (h, d)$ and $p' = (h', d')$ ,
  - Hospital $h$ prefers doctor $d'$ and
  - Doctor $d'$ prefers hospital $h$ .
A stable matching always exists, but this is not obvious.

Gale-Shapley Algorithm

Claims to prove:
- The algorithm terminates after $\le n^2$ iterations of the loop.
- The algorithm produces a matching.
- The matching is stable.

Divide and Conquer

Landau Notation

Big-O Notation

We say $f(n) = O(g(n))$ if there exists positive constants $C$ and $N$ such that $0 \le f(n) \le C g(n)$ for all $n \ge N$ .
$g(n)$ is said to be an asymptotic upper bound for $f(n)$ .
Useful to (over-)estimate the complexity of a particular algorithm.

Big- $\Omega$ Notation

We say $f(n) = \Omega(g(n))$ if there exists positive constants $C$ and $N$ such that $0 \le C g(n) \le f(n)$ for all $n \ge N$ .
$g(n)$ is said to be an asymptotic lower bound for $f(n)$ .
Useful to say that an algorithm runs in at least $\Omega(g(n))$ .

Properties

$f(n) = \Omega(g(n))$ if and only if $g(n) = O(f(n))$ .
We say $f(n) = \Theta(g(n))$ if $f(n) = O(g(n))$ and $f(n) = \Omega(g(n))$ . That is, $f(n)$ and $g(n)$ have the same asymptotic growth rate.
Sum property:
- If $f_1 = O(g_1)$ and $f_2 = O(g_2)$ , then $f_1 + f_2 = O(g_1 + g_2)$ .
Product property:
- If $f_1 = O(g_1)$ and $f_2 = O(g_2)$ , then $f_1 \cdot f_2 = O(g_1 \cdot g_2)$ .
- If $f = O(g)$ and $\lambda$ is a constant, then $\lambda \cdot f = O(g)$ .
These properties also hold for $\Omega$ , $\Theta$ , $o$ , or $\omega$ .
$\log_a{n} = \Theta(\log_b{n})$ , that is, logarithms of any base are interchangeable in asymptotic notation. For this reason we typically write $\log{n}$ instead.

Counting Inversions

Suppose there are $m$ users ranking the same set of $n$ movies. We want to determine for any two users $A$ and $B$ how similar their tastes are.
Enumerate the movies on $B$ 's list. For movie $i$ on $B$ 's list we denote the position of that movie on $A$ 's list as $a(i)$ .

A good measure of the degree of similarity between users $A$ and $B$ is to count the number of inversions.
Inversions are the total number of pairs of movies $i$ , $j$ such that movie $i$ precedes movie $j$ on $B$ 's list but movie $j$ is higher than movie $i$ on $A$ 's list.
- That is, $i < j$ but $a(i) > a(j)$ .
- For example, 4 and 7 form an inversion because $5 = a(7) < a(4) = 8$ .
The brute force approach is a quadratic time algorithm $T(n) = \Theta(n^2)$ . A divide and conquer approach can achieve time $O(n\log{n})$ .
We do a modified merge sort algorithm where we count inversions at the merge step.

Each time we reach an element of $A_{hi}$ , each remaining element to be merged in $A_{lo}$ counts as an inversion. In this diagram, when we merge 6 there are 5 remaining elements in $A_{lo}$ so we add 5 inversions.
We add this number of inversions (across $A_{lo}$ and $A_{hi}$ ) to the number of inversions within $A_{lo}$ and $A_{hi}$ themselves.

Recurrences

Recurrences arise in estimations of time complexity of divide and conquer algorithms.
Counting inversions in an array $A$ of size $n$ requires recursing on each half of the array ( $A_{lo}$ and $A_{hi}$ ) and counting inversions across the partition in linear time.
- $T(n) = 2T(\frac{n}{2}) + cn$
Suppose a divide and conquer algorithm reduces a problem of size $n$ to $a$ many problems of smaller size $n/b$ with overhead cost $f(n)$ to split up the problem and combine the solutions.
- $T(n) = aT(\frac{n}{b}) + f(n)$
- Depth of recursion $\log_b{n}$
To estimate an algorithm's efficiency we do not need the exact solution of a recurrence. We only need the growth rate of the solution (asymptotic behaviour) and the approximate sizes of the constants involved. (Master Theorem)

Integer Multiplication I

Master Theorem

$T(n) = aT(\frac{n}{b}) + f(n)$
Define the critical exponent $c^* = \log_b{a}$ and the critical polynomial $n^{c^*}$ .

Theorem

If $f(n) = O(n^{c^* - \epsilon})$ for some $\epsilon > 0$ , then $T(n) = \Theta(n^{c^*})$ .

If $f(n) = \Theta(n^{c^*})$ , then $T(n) = \Theta(n^{c^*}\log{n})$ .

If $f(n) = \Omega(n^{c^* + \epsilon})$ for some $\epsilon > 0$ , and for some $c < 1$ and some $n_0$ , $af(\frac{n}{b}) \leq cf(n)$ holds for all $n > n_0$ , then $T(n) = \Theta(f(n))$ .

If none of these conditions hold, the Master Theorem is not applicable.

$f(n) = \Omega(n^{c^* + \epsilon})$ is a consequence of $af(\frac{n}{b}) \leq cf(n)$ .

Example

Let $T(n) = 4T(\frac{n}{2}) + n$ .
The critical exponent is $c^* = \log_b{a} = \log_2{4} = 2$ so the critical polynomial is $n^2$ .
$f(n) = n = O(n^{2 - \epsilon})$ for small $\epsilon$ . (Case 1)
Therefore $T(n) = \Theta(n^2)$ .

Arithmetic Operations

Addition

  C C C C C    carry
    X X X X X  first integer
+   X X X X X  second integer
-------------
  X X X X X X  result

Adding 3 bits can be done in constant time and so the entire algorithm runs in linear time $O(n)$ .
There is no asymptotically faster algorithm to add two $n$ -bit numbers because we have to read every bit of the input.

Multiplication

        X X X X  first integer
      * X X X X  second integer
      ---------
        X X X X  O(n^2) intermediate operations:
      X X X X      O(n^2) elementary multiplications
    X X X X        O(n^2) elementary additions
  X X X X
---------------
X X X X X X X X  result of length 2n

Assume two X's can be multiplied in $O(1)$ time.
The above procedure runs in time $O(n^2)$ .
It is not known whether we can multiply two $n$ -bit numbers in linear time.
We can use divide and conquer to achieve faster than quadratic time.

Applying Divide and Conquer to Multiplication of Large Integers

Split the two input numbers $A$ and $B$ into halves:
- $A_0, B_0$ - the least significant $n/2$ bits.
- $A_1, B_1$ - the most significant $n/2$ bits.
$A = A_1 2^{n/2} + A_0$
$B = B_1 2^{n/2} + B_0$
$AB = A_1 B_1 2^n + (A_1 B_0 + A_0 B_1) 2^{n/2} + A_0 B_0$
The 4 products $A_1 B_1$ , $A_1 B_0$ , $A_0 B_1$ and $A_0 B_0$ can be calculated recursively in the same manner.
Each multiplication of two $n$ digit numbers is replaced by four multiplications of $n/2$ digit numbers. With the linear overhead to shift and add: $T(n) = 4T(n/2) + cn$ .
The critical exponent $c^* = \log_2{4} = 2$ so the critical polynomial is $n^2$ . Then, $f(n) = cn = O(n^{2 - \epsilon})$ . (Case 1)
Therefore $T(n) = \Theta(n^2)$ . We gained nothing from divide and conquer.

The Karatsuba Trick

In 1960, Anatoly Karatsuba found an algorithm (later called "divide and conquer") that multiplies two $n$ -digit numbers in $\Theta(n^{\log_2{3}}) \approx \Theta(n^{1.58...})$ .
Previously we saw that $AB = A_1 B_1 2^n + (A_1 B_0 + A_0 B_1) 2^{n/2} + A_0 B_0$ .
But rearranging, $AB = A_1 B_1 2^n + ((A_1 + A_0)(B_1 + B_0) - A_1 B_1 - A_0 B_0) 2^{n/2} + A_0 B_0$ .
- We save one multiplication at each round of recursion.
3 products $A_1 B_1$ , $A_0 B_0$ and $(A_1 + A_0)(B_1 + B_0)$ .
$T(n) = 3T(n/2) + cn \implies T(n) = \Theta(n^{\log_2{3}}) = o(n^2)$

Integer Multiplication II

The General Case II

Evaluating $P_A(x_0)$ takes linear time $\Theta(k)$ since it requires $O(p^2)$ multiplications of a $k$ -bit number $A_i$ by a constant.
We then multiply large numbers $2p+1$ times $P_C(x_0) = P_A(x_0)P_B(x_0)$ .
We reconstruct these $2p + 1$ values of $P_C(x)$ to get the coefficients of the polynomial which requires $O(p^2)$ multiplications of a constant by a large number, which is linear time.
At the multiplication step, $P_A(x) = A_px^p + A_{p-1}x^{p-1} + \ldots + A_0$ .
- Each $A_i$ is a $k$ -bit number.
- Each term $x^i$ has absolute value at most $p^p$ .
- There are $p + 1$ such terms.
Therefore, $|P_A(x)| < (p+1)p^p2^k$ and so $\log_2|P_A(x)| < \log_2((p+1)p^p) + k$ . Letting $s = \lfloor \log_2((p+1)p^p) \rfloor$ , we see that the function values are $(k+s)$ -bit numbers, where $s$ is a constant.
We have reduced a multiplication of two $n$ -digit numbers to $2p + 1$ multiplications of $(k + s)$ -digit numbers plus a linear overhead of additions, splitting, etc, so: $T(n) = (2p + 1)T\left(\frac{n}{p+1} + s\right) + \Theta(n)$ .
We ignore the constant $s$ and apply the Master Theorem to get $T(n) = \Theta(n^{\log_{p+1}(2p+1)})$ .
Note that $\log_{p+1}(2p+1) < \log_{p+1}2(p+1) = 1 + 1/\log_2(p+1)$ which can be made arbitrarily close to 1 by choosing a sufficiently large $p$ .
Therefore using a large enough number of slices allows us to get a runtime arbitarily close to linear time.
However, for large $p$ , evaluating $P_A(p) = A_pp^p + \ldots + A_0$ involves an extremely large constant factor $p^p$ , resulting in a slow algorithm despite the fact that asymptotic bounds improve as $p$ increases.
For this reason, Python implements multiplication of large numbers only using two slices (the original Karatsuba algorithm).

Fast Fourier Transform

Context

In our strategy to multiply polynomials fast, we:
- Evaluated $P_A(x)$ and $P_B(x)$ at $2n + 1$ distinct points $x_0, \ldots, x_{2n}$ ,
- Multiplied them point by point using $2n + 1$ multiplications of large numbers to get $2n + 1$ values of $P_C(x)$ , and
- Reconstructed the coefficients of $P_C(x) = C_{2n}x^{2n} + \ldots + C_1 x + C_0$ .
Previously we chose $x_0, \ldots, x_{2n}$ to be the $2n + 1$ integers $-n, \ldots, n$ which required us to compute values such as $P_A(n) = A_0 + A_1 n + \ldots + A_n n^n$ . As the value of $n$ increases, the value of $n^n$ explodes causing a rapid increase in the computational complexity of our polynomial multiplication algorithm.
We want to choose values for $x_0, \ldots, x_{2n}$ to avoid the explosion of size when we evaluate $x_i^n$ while computing $P_A(x_i)$ . We would like $|x_i^n| = 1$ but this cannot be achieved with real numbers.

Complex Numbers

To multiply complex numbers, we multiply the moduli and add the arguments.

If $z = re^{i\theta}$ and $w = se^{i\phi}$ , then $zw = (rs)e^{i(\theta + \phi)}$ .

Corollary. If $z = re^{i\theta}$ , then $z^n = r^{n}e^{i(n\theta)}$ .

Complex Roots of Unity

We are interested in the solutions of $z^m = 1$ , known as roots of unity of order $m$ .
Solving $r^{m}e^{i(m\theta)} = 1$ , we see that $r^m = 1 \Rightarrow r = 1$ and $m\theta = 2\pi k \Rightarrow \theta = 2\pi k/m$ .
Let $\omega_m = e^{i2\pi /m}$ . Then $z = \omega_m^k$ , i.e. all roots of unity of order $m$ can be written as powers of $\omega_m$ . We say that $\omega_m$ is a primitive root of unity of order $m$ . Note that there are only $m$ distinct values here, as $\omega_m^m = \omega_m^0$ .

$\omega_m^k$ is a primitive root of unity of order $m$ if and only if $\gcd(m,k) = 1$ .

The product of two roots of unity of order $m$ is given by $\omega_m^i \omega_m^j = \omega_m^{i + j}$ which is itself a root of unity of the same order.
The set of all roots of unity of order $m$ , $\{1, \omega_m, \omega_m^2, \ldots, \omega_m^{m-1}\}$ is closed under multiplication (and by extension, under taking powers).

Cancellation Lemma. For all positive integers $l$ , $m$ and integers $k$ , $\omega_{lm}^{lk} = \omega_m^k$ .

The Fast Fourier Transform

Given 2 polynomials of degree at most $n$ , $P_A(x) = A_n x^n + \ldots + A_0$ and $P_B(x) = B_n x^n + \ldots + B_0$ we evaluate $P_A(x)$ and $P_B(x)$ at $m = 2n + 1$ distinct points $\omega_m^0, \omega_m^1, \ldots, \omega_m^{m-1}$ .

The Discrete Fourier Transform (DFT)

Let $A = \langle A_0, A_1, \ldots, A_{m-1} \rangle$ be a sequence of $m$ real or complex numbers, and let the corresponding polynomial be $P_A(x) = \sum_{j=0}^{m-1}A_j x^j$ .

Let $\hat{A}_k = P_A(\omega_m^k)$ for all $0 \leq k \leq m - 1$ , the values of $P_A$ at the roots of unity of order $m$ .

The sequence of values $\hat{A} = \langle \hat{A}_0, \ldots, \hat{A}_{m-1} \rangle$ is called the Discrete Fourier Transform (DFT) of $A$ .

Our polynomials $P_A(x)$ and $P_B(x)$ have degree $n$ , so the corresponding sequences of coefficients have only $n + 1$ terms.
We defined the DFT of a sequence as having the same length as the original sequence and we must obtain the values at all $m$ roots of unity of order $m$ .
We do this by padding the sequences $A$ and $B$ with zeroes corresponding to the terms $0x^{n+1} + \ldots + 0x^{2n}$ since $m = 2n + 1$ .
The DFT $\hat{A}$ of a sequence $A$ can be computed very fast using a divide and conquer algorithm called the Fast Fourier Transform.

The Fast Fourier Transform (FFT)

We can now compute the DFTs of the two 0-padded sequences:
- $DFT(\langle A_0, \ldots, A_n, \underbrace{0, \ldots, 0}_{n} \rangle) = \langle \hat{A}_0, \hat{A}_1, \ldots, \hat{A}_{m-1} \rangle$
- $DFT(\langle B_0, \ldots, B_n, \underbrace{0, \ldots, 0}_{n} \rangle) = \langle \hat{B}_0, \hat{B}_1, \ldots, \hat{B}_{m-1} \rangle$
For each $k$ we multiply the corresponding values $\hat{A}_k = P_A(\omega_m^k)$ and $\hat{B}_k = P_B(\omega_m^k)$ , obtaining $\hat{C}_k = \hat{A}_k \hat{B}_k = P_A(\omega_m^k)P_B(\omega_m^k) = P_C(\omega_m^k)$ .
We then use the inverse transformation for DFT, called IDFT, to recover the coefficients $\langle C_0, C_1, \ldots, C_{m-1} \rangle$ of the product polynomial $P_C(x)$ from the sequence $\langle \hat{C}_0, \ldots, \hat{C}_{m-1} \rangle$ .
We can assume that $m$ is a power of 2, otherwise we can pad $P_A(x)$ with zero coefficients until its number of coefficients becomes the nearest power of 2.

For every $m$ which is not a power of 2, the smallest power of 2 larger or equal to $m$ is smaller than $2m$ .

Problem. Given a sequence $A = \langle A_0, \ldots, A_{m-1} \rangle$ , compute its DFT, i.e. find the values $P_A(x)$ at $x = \omega_m^k$ for all $k$ such that $0 \leq k < m$ .
The idea is to use divide and conquer by splitting the polynomial $P_A(x)$ into the even powers and the odd powers:
- $P_A(x) = A_0 + A_2 x^2 + A_4 (x^2)^2 + \ldots + A_{m-2} (x^2)^{m/2 - 1} + x(A_1 + A_3 x^2 + A_5 (x^2)^2 + \ldots + A_{m-1} (x^2)^{m/2 - 1})$
Let us define $A^{[0]} = \langle A_0, A_2, \ldots, A_{m-2} \rangle$ and $A^{[1]} = \langle A_1, A_3, \ldots, A_{m-1} \rangle$ . Note $m - 1$ is odd since we assume $m$ is a power of 2.
- $P_{A^{[0]}}(y) = A_0 + A_2 y + A_4 y^2 + \ldots + A_{m-2} y^{m/2 - 1}$
- $P_{A^{[1]}}(y) = A_1 + A_3 y + A_5 y^2 + \ldots + A_{m-1} y^{m/2 - 1}$
- Hence, $P_A(x) = P_{A^{[0]}}(x^2) + xP_{A^{[1]}}(x^2)$ .
- The polynomials $P_{A^{[0]}}(y)$ and $P_{A^{[1]}}(y)$ each have $m/2$ coefficients while $P_A(x)$ has $m$ coefficients.
We have reduced the problem from evaluating $P_A(x)$ with $m$ coefficients to evaluating two polynomials $P_{A^{[0]}}(y)$ and $P_{A^{[1]}}(y)$ each with $m/2$ coefficients at points $y = x^2 = \omega_m^{2k}$ .
Since $m$ is a power of 2, and hence even, we can use the cancellation lemma $y = \omega_m^{2k} = \omega_{m/2}^k$ to deduce that $y$ is a root of unity of order $m/2$ .
- Note $k$ goes up to $m - 1$ but there are only $m/2$ distinct roots of unity of order $m/2$ . Thus we simplify the later terms as repetitions of the earlier terms.
We can write $P_A(\omega_m^k) = P_{A^{[0]}}((\omega_m^k)^2) + \omega_m^k \cdot P_{A^{[1]}}((\omega_m^k)^2) = P_{A^{[0]}}(\omega_{m/2}^k) + \omega_m^k \cdot P_{A^{[1]}}(\omega_{m/2}^k)$ .
- We have reduced a problem of size $m$ to two such problems of size $m/2$ plus a linear overhead to multiply the $\omega_m^k$ terms.
$T(m) = 2T(m/2) + cm$ which by Case 2 of the Master Theorem gives $T(m) = \Theta(m\log{m}) = \Theta(n\log{n})$ since $m = 2n + 1$ .
FFT is a method of replacing $m^2$ many multiplications with a $\Theta(m\log{m})$ procedure.

The Inverse Fast Fourier Transform

The evaluation of a polynomial $P_A(x) = A_0 + A_1 x + \ldots + A_{m-1}x^{m-1}$ at roots of unity $\omega_m^k$ of order $m$ can be represented by the matrix-vector product:

$\underbrace{\begin{pmatrix} 1 && 1 && 1 && \ldots && 1 \\ 1 && \omega_m && \omega_m^2 && \ldots && \omega_m^{m-1} \\ 1 && \omega_m^2 && \omega_m^{2\cdot 2} && \ldots && \omega_m^{2(m-1)} \\ \vdots && \vdots && \vdots && \ddots && \vdots \\ 1 && \omega_m^{m-1} && \omega_m^{2(m-1)} && \ldots && \omega_m^{(m-1)(m-1)} \end{pmatrix}}_{M} \begin{pmatrix} A_0 \\ A_1 \\ A_2 \\ \vdots \\ A_{m-1} \end{pmatrix} = \begin{pmatrix} P_A(1) \\ P_A(\omega_m) \\ P_A(\omega_m^2) \\ \vdots \\ P_A(\omega_m^{m-1}) \end{pmatrix}$

We need a way to reconstruct the coefficients from these values.
Since $M$ is a square Vandermonde matrix with distinct rows, it is invertible, so:

$\begin{pmatrix} A_0 \\ A_1 \\ A_2 \\ \vdots \\ A_{m-1} \end{pmatrix} = M^{-1} \begin{pmatrix} P_A(1) \\ P_A(\omega_m) \\ P_A(\omega_m^2) \\ \vdots \\ P_A(\omega_m^{m-1}) \end{pmatrix}$

The inverse of matrix $M$ is found by simply changing the signs of the exponents and dividing by $m$ .

We get:

$\begin{pmatrix} A_0 \\ A_1 \\ A_2 \\ \vdots \\ A_{m-1} \end{pmatrix} = \frac{1}{m} \begin{pmatrix} 1 && 1 && 1 && \ldots && 1 \\ 1 && \omega_m^{-1} && \omega_m^{-2} && \ldots && \omega_m^{-(m-1)} \\ 1 && \omega_m^{-2} && \omega_m^{-2\cdot 2} && \ldots && \omega_m^{-2(m-1)} \\ \vdots && \vdots && \vdots && \ddots && \vdots \\ 1 && \omega_m^{-(m-1)} && \omega_m^{-2(m-1)} && \ldots && \omega_m^{-(m-1)(m-1)} \end{pmatrix} \begin{pmatrix} P_A(1) \\ P_A(\omega_m) \\ P_A(\omega_m^2) \\ \vdots \\ P_A(\omega_m^{m-1}) \end{pmatrix}$

This is not so different to the original matrix-vector product.
The inverse FFT requires us to convert from the sequence of values $\langle P_A(1), P_A(\omega_m), P_A(\omega_m^2), \ldots, P_A(\omega_m^{m-1}) \rangle$ , denoted by $\langle \hat{A}_0, \hat{A}_1, \ldots, \hat{A}_{m-1} \rangle$ back to the sequence of coefficients $\langle A_0, A_1, \ldots, A_{m-1} \rangle$ .
To do this we use the same FFT algorithm with 2 changes:
- The root of unity $\omega_m$ is replaced by $\overline{\omega_m} = e^{-i(2\pi /m)}$ , and
- The resulting output values are divided by $m$ .
We can now compute the product $P_C(x) = P_A(x)P_B(x)$ of two polynomials $P_A(x)$ and $P_B(x)$ in time $\Theta(n\log{n})$ .
- Finding the DFTs $\hat{A}$ and $\hat{B}$ of $P_A(x)$ and $P_B(x)$ each run in $\Theta(n\log{n})$ .
- Multiplying $\hat{A}$ and $\hat{B}$ to get $\hat{C}$ runs in $\Theta(n)$ .
- Finding the IDFT of $\hat{C}$ to get $P_C(x)$ runs in $\Theta(n\log{n})$ .
- Thus we can find the convolution of two $n$ -term sequences in $\Theta(n\log{n})$ .

The Greedy Method

A greedy algorithm is one that solves a problem by dividing it into stages, and rather than exhaustively searching all the ways to get from one stage to the next, instead only considers the choice that appears best. This obviously reduces the search space, but it is not always clear whether the locally optimal choice leads to the globally optimal outcome.
The greedy method does not always work. Frameworks exist to determine whether a problem can be solved using a greedy algorithm.
We focus on proving the correctness of greedy algorithms. There are two main methods of proof:
- Greedy stays ahead: prove that at every stage, no other algorithm could do better than our proposed algorithm.
- Exchange argument: consider an optimal solution, and gradually transform it to the solution found by our proposed algorithm without making it any worse.
See the example problems in Lectures 6-8.
Proving Correctness of Greedy Algorithms

Single Source Shortest Paths I

Suppose we have a directed graph $G = (V, E)$ with non-negative weight $w(e) \geq 0$ assigned to each edge $e \in E$ and a designated (source) vertex $v \in V$ .
We want to find for every $u \in V$ the shortest path from $v$ to $u$ .
This is accomplished by a very elegant greedy algorithm developed by Edsger Dijkstra in 1959.

For every vertex $w$ on a shortest path from $v$ to $z$ , the shortest path from $v$ to $w$ is just the truncation of that path ending at $w$ .

Dijkstra's Shortest Paths Algorithm

The algorithm builds a set $S$ of vertices for which the shortest path has already been established, starting with an empty set and adding one vertex at a time.
At each stage of the construction, we add the vertex $u \in V \setminus S$ which has the shortest path from $v$ to $u$ with all intermediate vertices already in $S$ .
- The first vertex added to $S$ is $v$ itself, with path length zero.
At each of $n$ steps, we scan an array of length $n$ . We also run the constant time update procedure at most once for each edge. The algorithm therefore runs in $O(n^2 + m)$ .
- In a simple graph (no self loops or parallel edges) we have $m \leq n(n-1)$ so we can simplify this to $O(n^2)$ .
A more efficient implementation using an augmented heap can achieve a time complexity of $O((n+m)\log{n})$ . Assuming the graph is connected, $m \geq n - 1$ so we can simplify to $O(m\log{n})$ .

Minimum Spanning Trees

A minimum spanning tree $T$ of a connected graph $G$ is a subgraph of $G$ (with the same set of vertices) which is a tree, and among all such trees it minimises the total length of all edges in $T$ .
Let $G$ be a connected graph with all lengths of edges $E$ of $G$ distinct and $S$ a non-empty proper subset of the set of all vertices $V$ of $G$ . Assume that $e = (u, v)$ is an edge such that $u \in S$ and $v \notin S$ and is of minimal length among all the edges having this property. Then $e$ must belong to every minimum spanning tree $T$ of $G$ .
There are two famous greedy algorithms for the minimum spanning tree problem. Both algorithms build up a forest, beginning with all $n$ isolated vertices and adding edges one by one.
Prim's algorithm uses one large component, adding one of the isolated vertices to it at each stage. This algorithm is very similar to Dijkstra's algorithm, but adds the vertex closest to $S$ rather than the one closest to the starting vertex $v$ .

Kruskal's Algorithm

We order the edges $E$ in a non-decreasing order of their weights.
An edge $e$ is added if its inclusion does not introduce a cycle in the graph constructed thus far, or discarded otherwise.
The process terminates when the forest is connected, i.e. when $n - 1$ edges have been added.
Kruskal's algorithm produces a minimal spanning tree and if all weights are distinct then such a tree is unique.
We need to quickly determine whether a certain new edge will introduce a cycle. An edge $e = (u, v)$ will introduce a cycle in the forest $F$ if and only if there is already a path between $u$ and $v$ , i.e. $u$ and $v$ are in the same connected component.

Union-Find

In our implementation of Kruskal's algorithm, we use a data structure called Union-Find which handles disjoint sets. This data structure supports three operations:
- $\text{MakeUnionFind}(S)$ , which returns a structure in which all elements are placed into distinct singleton sets. This operation runs in time $O(n)$ where $n = |S|$ .
- $\text{Find}(v)$ which returns the (label of the) set to which $v$ belongs. This operation runs in time $O(1)$ .
- $\text{Union}(A, B)$ which changes the data structure by replacing sets $A$ and $B$ with the set $A \cup B$ . A sequence of $k$ initial consecutive $\text{Union}$ operations run in time $O(k\log{k})$ .
We do not give the runtime of a single $\text{Union}$ operation but a sequence of $k$ consecutive such operations. Such time complexity analysis is called amortized analysis; it estimates average cost of an operation in a sequence of operations, in this case $\log{k}$ .
$k$ initial consecutive $\text{Union}$ operations refers to the first $k$ $\text{Union}$ operations performed after $\text{MakeUnionFind}$ . There may be $\text{Find}$ operations between these.
$S$ is the vertex set $V$ of a graph, of the form $\{1, 2, \ldots, n\}$ . We will label each set by one of its elements, called the representative of the set.
The simplest implementation:
- An array $A$ such that $A[i] = j$ means that $i$ belongs to the set with representative $j$ ,
- An array $B$ such that $B[i]$ contains the number of elements in the set with representative $i$ , and
- An array $L$ such that $L[i]$ contains pointers to the head and tail of a linked list containing the elements of the set with representative $i$ .
If $i$ is not the representative of any set, then $B[i]$ is zero and the list $L[i]$ is empty.
$\text{Union}(i, j)$ with two sets $I$ and $J$ with representatives $i$ and $j$ is defined as:
- Assume $|I| \geq |J|$ (i.e. $B[i] \geq B[j]$ ) otherwise perform $\text{Union}(j, i)$ instead.
- For each $m \in J$ , update $A[m]$ from $j$ to $i$ .
- Update $B[i]$ to $B[i] + B[j]$ and $B[j]$ to zero.
- Append the list $L[j]$ to the list $L[i]$ and replace $L[j]$ with an empty list.

Efficient Implementation of Kruskal's Algorithm

We first sort the $m$ edges of graph $G$ which takes $O(m\log{m})$ . Since $m < n^2$ , we can rewrite this as $O(m\log{n^2}) = O(m\log{n})$ .
We start with $n$ isolated vertices which will be merged into connected components until all vertices belong to a single connected component. We use the Union-Find data structure to keep track of the connected components.
For each edge $e = (u, v)$ on the sorted list of edges, we use two $\text{Find}$ operations to determine whether vertices $u$ and $v$ belong to the same component. If not, i.e. if $\text{Find}(u) = i$ and $\text{Find}(v) = j$ where $i \neq j$ , we add edge $e = (u, v)$ to the spanning tree being constructed and perform $\text{Union}(i, j)$ to merge the connected components containing $u$ and $v$ .
We perform $2m$ $\text{Find}$ operations, each costing $O(1)$ , as well as $n - 1$ $\text{Union}$ operations which in total cost $O(n\log{n})$ .
The initial sorting of the edges dominates so $O(m\log{n})$ is the overall time complexity.

Dynamic Programming

The idea is to solve a large problem recursively by building from (carefully chosen) subproblems of smaller size.
Optimal substructure property. We must choose subproblems so that the optimal solutions to the subproblems can be combined into an optimal solution for the full problem.
Greedy algorithms view a problem as a sequence of stages and we consider only the locally optimal choice at each stage. Some greedy algorithms are incorrect and fail to construct a globally optimal solution. Also, greedy algorithms are unhelpful for certain types of problems, such as "count the number of ways to ...". Dynamic programming can be used to efficiently consider all the options at each stage.
Divide and conquer used recursion to break a large problem into disjoint subproblems. However, dynamic programming is charcaterised by overlapping subproblems.
Overlapping subproblems property. We must choose subproblems so that the same subproblem occurs several times in the recursion tree. When we solve a subproblem, we store the result so that subsequent instances of the same subproblem can be answered by just looking up a value in a table.
A dynamic programming algorithm consists of three parts:
- A definition of the subproblems,
- A recurrence relation, which determines how the solutions to smaller subproblems are combined to solve a larger subproblem, and
- Any base cases, which are the trivial subproblems - those for which the recurrence is not required.
The original problem may be one of our subproblems, or it may be solved by combining results from several subproblems, in which case we must also describe this process.
The time complexity of our algorithm is usually given by multiplying the number of subproblems by the average time taken to solve a subproblem using the recurrence.
See the example problems in Lectures 9-11.
Notation. $\underset{1 \leq m \leq n}{\argmin} \ \text{opt}(i - v_m)$ is the value of $m$ that minimises $\text{opt}(i - v_m)$ . We also have $\argmax$ .

Single Source Shortest Paths II

Suppose we have a directed weighted graph $G = (V, E)$ with edge weights $w(e)$ which can be negative, but without cycles of negative total weight, and a (source) vertex $s \in V$ .
We want to find the weight of the shortest path from vertex $s$ to every other vertex $t$ .
This differs to the SSSP problem solved by Dijkstra's algorithm because we allow negative edge weights, so the greedy strategy no longer works.
We disallow cycles of negative total weight because with such a cycle, there is no shortest path. You can take as many laps around a negative cycle as you like.

Bellman-Ford Algorithm

For any vertex $t$ , there is a shortest $s \rightarrow t$ path without cycles.

It follows that every shortest $s \rightarrow t$ path contains any vertex $v$ at most once, and therefore has at most $|V| - 1$ edges.

For every vertex $t$ , let us find the weight of a shortest $s \rightarrow t$ path consisting of at most $i$ edges, for each $i$ up to $|V| - 1$ .
Suppose the path in question is $p = \underbrace{s \rightarrow \ldots \rightarrow v}_{p'} \rightarrow t$ , with the final edge going from $v$ to $t$ .
Then, $p'$ must be itself the shortest path from $s$ to $v$ of at most $i - 1$ edges, which is another subproblem.
No such recursion is necessary if $t = s$ or if $i = 0$ .
Subproblems. For all $0 \leq i \leq |V| - 1$ and all $t \in V$ , let $P(i, t)$ be the problem of determining $\text{opt}(i,t)$ , the length of a shortest path from $s$ to $t$ which contains at most $i$ edges.
Recurrence. For all $i > 0$ and $t \neq s$ , $\text{opt}(i, t) = \min\{\text{opt}(i - 1, v) + w(v,t) | (v, t) \in E\}$ .
Base cases. $\text{opt}(i,s) = 0$ and for $t \neq s$ , $\text{opt}(0, t) = \infty$ .
The overall solutions are given by $\text{opt}(|V| - 1, t)$ .
We proceed in $|V|$ rounds ( $i = 0, 1, \ldots, |V| - 1$ ). In each round, each edge of the graph is considered only once. Therefore the time complexity is $O(|V||E|)$ .
This method is sometimes called relaxation because we progressively relax the additional constraint on how many edges the shortest paths can contain.
The SPFA (Shortest Paths Faster Algorithm) speeds up the later rounds by ignoring some edges. This optimisation and others (e.g. early exit) do not change the worst case time complexity.
The Bellman-Ford algorithm can be augmented to detect cycles of negative weight.

All Pairs Shortest Paths

Suppose we have a directed weighted graph $G = (V, E)$ with edge weights $w(e)$ which can be negative, but without cycles of negative total weight.
We want to find the weight of the shortest path from every vertex $s$ to every other vertex $t$ .

Floyd-Warshall Algorithm

Label the vertices of $V$ as $v_1, v_2, \ldots, v_n$ where $n = |V|$ .
Let $S$ be the set of vertices allowed as intermediate vertices. Initially $S$ is empty, and we add vertices $v_1, v_2, \ldots, v_n$ one at a time.
Subproblems. For all $1 \leq i, j \leq n$ and $0 \leq k \leq n$ , let $P(i, j, k)$ be the problem of determining $\text{opt}(i, j, k)$ , the weight of a shortest path from $v_i$ to $v_j$ using only $v_1, \ldots, v_k$ as intermediate vertices.
Recurrence. For all $1 \leq i, j, k \leq n$ , $\text{opt}(i, j, k) = \min(\text{opt}(i,j,k-1), \text{opt}(i, k, k - 1) + \text{opt}(k, j, k - 1))$ .
Base cases. $\text{opt}(i, j, 0) = \begin{cases}0 & \text{if } i = j \\ w(i, j) & \text{if } (v_i, v_j) \in E \\ \infty & \text{otherwise}\end{cases}$ .
The overall solutions are given by $\text{opt}(i, j, n)$ , where all vertices are allowed as intermediates.
Each of $O(|V|^3)$ subproblems is solved in constant time, so the time complexity is $O(|V|^3)$ .

Maximum Flow

Flow Networks

A flow network $G = (V, E)$ is a directed graph in which each edge $e = (u, v) \in E$ has a positive integer capacity $c(u, v) > 0$ .
There are two distinguished vertices: a source $s$ and a sink $t$ ; no edge leaves the sink and no edge enters the source.

Examples of flow networks (possibly with several sources and sinks):
- Transportation networks
- Gas pipelines
- Computer networks
A flow in $G$ is a function $f : E \rightarrow \mathbb{R}^{+}, f(u, v) \geq 0$ , which satisfies:
1. Capacity constraint: for all edges $e = (u, v) \in E$ we require $f(u, v) \leq c(u, v)$ , i.e. the flow through any edge does not exceed its capacity.
2. Flow conservation: for all vertices $v \in S - \{s, t\}$ we require $\underset{(u,v) \in E}{\sum} f(u,v) = \underset{(v,w) \in E}{\sum}f(v,w)$ , i.e. the flow into any vertex (other than the source and the sink) equals the flow out of that vertex.
The value of a flow is defined as $|f| = \underset{v:(s,v) \in E}{\sum}f(s,v) = \underset{v:(v,t) \in E}{\sum}f(v,t)$ , i.e. the flow leaving the source or equivalently the flow arriving at the sink.
Given a flow network, our goal is to find a flow of maximum value.

Integrality Theorem. If all capacities are integers (as assumed earlier), then there is a flow of maximum value such that $f(u,v)$ is an integer for each edge $(u,v) \in E$ .

Residual Flow Networks

Given a flow in a flow network, the residual flow network is the network made up of the leftover capacities.

Suppose the original flow network has an edge from $v$ to $w$ with capacity $c$ , and that $f$ units of flow are being sent through this edge.
The residual flow network has two edges:
1. an edge from $v$ to $w$ with capacity $c - f$ , and
2. an edge from $w$ to $v$ with capacity $f$ .
These capacities represent the amount of additional flow in each direction. Note that sending flow on the "virtual" edge from $w$ to $v$ counteracts the already assigned flow from $v$ to $w$ .
Edges of capacity zero (when $f = 0$ or $f = c$ ) need not be included.
Suppose the original flow network has an edge from $v$ to $w$ with capacity $c_1$ and flow $f_1$ units, and an edge from $w$ to $v$ with capacity $c_2$ and flow $f_2$ units.
In this case, the residual flow network has edges:
1. an edge from $v$ to $w$ with capacity $c_1 - f_1 + f_2$ , and
2. an edge from $w$ to $v$ with capacity $c_2 - f_2 + f_1$ .
This is because from $v$ to $w$ , the forward edge allows $c_1 - f_1$ additional units of flow, and we can also send up to $f_2$ units to cancel the flow through the reverse edge.

Augmenting Paths

An augmenting path is a path from $s$ to $t$ in the residual flow network.

The capacity of an augmenting path is the capacity of its "bottleneck" edge, i.e. the edge of smallest capacity.
We can now send that amount of flow along the augmenting path, recalculating the flow and the residual capacities for each edge used.
Suppose we have an augmenting path of capacity $f$ , including an edge from $v$ to $w$ . We should:
- cancel up to $f$ units of flow being sent from $w$ to $v$ ,
- add the remainder of these $f$ units to the flow being sent from $v$ to $w$ ,
- increase the residual capacity from $w$ to $v$ by $f$ , and
- reduce the residual capacity from $v$ to $w$ by $f$ .
For the above augmenting path, after sending 4 units of flow along this path, the new residual flow network becomes:

Solving the Maximum Flow Problem

Ford-Fulkerson Algorithm

Keep adding flow through new augmenting paths for as long as it is possible.

When there are no more augmenting paths, you have achieved the largest possible flow in the network.

The proof is based on the notion of a minimal cut in a flow network.
A cut in a flow network is any partition of the vertices of the underlying graph into two subsets $S$ and $T$ such that:
The capacity $c(S, T)$ of a cut $(S, T)$ is the sum of all the capacities of all edges leaving $S$ and entering $T$ , i.e. $c(S, T) = \underset{(u,v) \in E}{\sum} \{c(u,v) : u \in S, v \in T\}$ . Note that capacities of edges going in the opposite direction (from $T$ to $S$ ) do not count.
Given a flow $f$ , the flow $f(S, T)$ through a cut $(S,T)$ is the total flow through edges from $S$ to $T$ minus the total flow through edges from $T$ to $S$ , i.e. $f(S,T) = \underset{(u,v) \in E}{\sum} \{f(u,v) : u \in S, v \in T\} - \underset{(u,v) \in E}{\sum} \{f(u,v) : u \in T, v \in S\}$ .

For any flow $f$ , the flow through any cut $(S,T)$ is equal to the value of the flow, i.e. $f(S, T) = |f|$ .

An edge from $S$ to $T$ counts its full capacity towards $c(S,T)$ , but only the flow through it towards $f(S,T)$ .
An edge from $T$ to $S$ counts zero towards $c(S,T)$ , but minuses the flow through it from $f(S,T)$ .
Therefore, $f(S,T) \leq c(S,T)$ .
It follows that $|f| \leq c(S,T)$ , so the value of any flow is at most the capacity of any cut.

In this example, $f(S,T) = f(v_1, v_3) + f(v_2, v_4) - f(v_2, v_3) = 12 + 11 - 4 = 19$ and $c(S,T) = c(v_1, v_3) + c(v_2, v_4) = 12 + 14 = 26$ .

Max Flow Min Cut Theorem. The maximal amount of flow in a flow network is equal to the capacity of the cut of minimal capacity.

Since $|f|$ is at most $c(S,T)$ , then if we find a flow $f$ which equals the capacity of some cut $(S,T)$ , then such flow must be maximal and the capacity of such a cut must be minimal.

Assume that the Ford-Fulkerson algorithm has terminated. Define $S$ to be the source $s$ and all vertices $u$ such that there is a path in the residual flow network from $s$ to $u$ . Define $T$ to be the set of all vertices for which there is no such path. Since there are no more augmenting paths from $s$ to $t$ , then the sink $t$ belongs to $T$ .

All the edges from $S$ to $T$ are fully occupied with flow, and all the edges from $T$ to $S$ are empty (proof omitted).

Since all edges from $S$ to $T$ are occupied with flows to their full capacity, and also there is no flow from $T$ to $S$ , then $f(S,T) = c(S,T) = |f|$ . Thus, such a flow is maximal and the corresponding cut is a minimal cut.
The Ford-Fulkerson algorithm can potentially run in time proportional to the value of the max flow, which can be exponential in the size of the input.
In a flow network, $|V| \leq |E| + 1$ otherwise there are vertices other than the source which don't have any incoming edges, or vertices other than the sink which don't have any outgoing edges. We therefore simplify $O(|V| + |E|)$ to $O(|E|)$ .
The Ford-Fulkerson algorithm has worst-case time complexity of $O(|E|f)$ where $f$ is the value of a maximum flow. In general if there are $m$ edges in the graph, each of capacity $C$ , then $f$ may be up to $mC$ . Since the edge capacities are specified using only $m\log_{2}{C}$ bits, the algorithm does not run in polynomial time in general.
In some circumstances the time complexity of the Ford-Fulkerson algorithm is $O(|E||V|)$ .

Edmonds-Karp Algorithm

The Edmonds-Karp algorithm improves the Ford-Fulkerson algorithm in a simple way: always choose the shortest path from the source $s$ to the sink $t$ , where the "shortest path" means the fewest number of edges, regardless of their capacities.
This algorithm runs in $O(|V||E|^2)$ time.
The fastest max flow algorithm to date is an extension of the Preflow-Push algorithm and runs in time $|V|^3$ .
The Edmonds-Karp algorithm is a specialisation of the Ford-Fulkerson algorithm, so its time complexity is also bounded by $O(|E|f)$ . It can be proved that it finds $O(|V||E|)$ augmenting paths, each in $O(|E|)$ time using BFS, so an alternative bound for its time complexity is $O(|V||E|^2)$ .
The time complexity can be written $O(\min(|V||E|^2, |E|f))$ .

Applications of Network Flow

Flow networks with multiple sources and sinks are reducible to networks with a single source and single sink by adding a super-source and super-sink and connecting them to all sources and sinks respectively by edges of infinite capacity.

Sometimes not only the edges but also the vertices $v_i$ of the flow graph might have capacities $C(v_i)$ which limit the total throughput of the flow coming to the vertex and leaving the vertex: $\underset{(u,v) \in E}{\sum}f(u,v) = \underset{(v,w) \in E}{\sum}f(v, w) \leq C(v)$ .
We can also reduce this to a situation with only edge capacities. Suppose vertex $v$ has capacity $C(v)$ . Split $v$ into two vertices $v_{in}$ and $v_{out}$ . Attach all of $v$ 's incoming edges to $v_{in}$ and all its outgoing edges from $v_{out}$ . Connect $v_{in}$ and $v_{out}$ with an edge $e^* = (v_{in}, v_{out})$ of capacity $C(v)$ .

Bipartite Graphs

A graph $G = (V, E)$ is said to be bipartite if its vertices can be divided into two disjoint sets $A$ and $B$ such that every edge $e \in E$ has one end in the set $A$ and the other in the set $B$ .

A matching in a graph $G = (V, E)$ is a subset $M \subseteq E$ such that each vertex of the graph belongs to at most one edge in $M$ .
A maximum matching in $G$ is a matching containing the largest possible number of edges.

We can turn a Maximum Bipartite Matching problem into a Maximum Flow problem: Create two new vertices $s$ and $t$ (the source and sink). Construct an edge from $s$ to each vertex in $A$ , and from each vertex in $B$ to $t$ . Orient the existing edges from $A$ to $B$ . Assign capacity 1 to all edges.
Since all capacities in the flow network are 1, we need only denote the direction of the edge in the residual graph.

String Matching

Suppose you have an alphabet $S = \{s_0, s_1, \ldots, s_{d-1}\}$ of $d$ characters.
You want to determine whether a string $B = b_0 b_1 \ldots b_{m-1}$ appears as a contiguous substring of a much longer string $A = a_0 a_1 \ldots a_{n-1}$ .
The naive string matching algorithm runs in $O(nm)$ .

Hashing

Rabin-Karp Algorithm

We compute a hash value for the string $B = b_0 b_1 b_2 \ldots b_{m-1}$ in the following way:
- First, map each symbol $s_i$ to a corresponding integer $i$ : $S = \{s_0, s_1, s_2, \ldots, s_{d-1}\} \rightarrow \{0, 1, 2, \ldots, d-1\}$ , so as to identify each string with a sequence of these integers.
- Then, when we refer to an integer $a_i$ or $b_i$ , we refer to the ID of the symbol $a_i$ or $b_i$ .
- We can therefore identify $B$ with a sequence of IDs $\langle b_0, b_1, b_2, \ldots, b_{m-1} \rangle$ , each between 0 and $d-1$ inclusive. Viewing these IDs as digits in base $d$ , we can construct a corresponding integer $h(B) = h(b_0 b_1 b_2 \ldots b_{m-1}) = d^{m-1}b_0 + d^{m-2}b_1 + \ldots + db_{m-2} + b_{m-1}$ .
- This can be evaluated efficiently using Horner's rule: $h(B) = b_{m-1} + d(b_{m-2} + d(b_{m-3} + d(b_{m-4} + \ldots + d(b_1 + db_0)\ldots)))$ , requiring only $m-1$ additions and $m-1$ multiplications.
- Next we choose a large prime number $p$ and define the hash value of $B$ as $H(B) = h(B) \mod{p}$ . We require that $(d+1)p$ fits in a register.
Recall that $A = a_0 a_1 a_2 a_3 \ldots a_s a_{s+1} \ldots a_{s+m-1} \ldots a_{n-1}$ where $n \gg m$ .
We want to efficiently find all $s$ such that the string of length $m$ of the form $a_s a_{s+1} \ldots a_{s+m-1}$ and string $b_0 b_1 \ldots b_{m-1}$ are equal.
For each contiguous substring $A_s = a_s a_{s+1} \ldots a_{s+m-1}$ of string $A$ we also compute its hash value as $H(A_s) = d^{m-1} a_s + d^{m-2} a_{s+1} + \ldots + d^1 a_{s+m-2} + a_{s+m-1} \mod{p}$ .
We can now compare the hash values $H(B)$ and $H(A_s)$ and do a symbol-by-symbol matching only if $H(B) = H(A_s)$ .
Such an algorithm would only be faster than the naive symbol-by-symbol comparison only if we can compute the hash values of substrings $A_s$ faster than comparing strings $B$ and $A_s$ character by character.
We use recursion: we compute $H(A_{s+1})$ efficiently from $H(A_s)$ by doing the following:
- Since $H(A_s) = d^{m-1} a_s + d^{m-2} a_{s+1} + \ldots + d^1 a_{s+m-2} + a_{s+m-1} \mod{p}$ , then by multiplying both sides by $d$ we obtain $\begin{align*} d & \cdot H(A_s) \\ &= d^m a_s + d^{m-1}a_{s+1} + \ldots + d^1 a_{s+m-1} \\ &= d^m a_s + (d^{m-1} a_{s+1} + \ldots + d^1 a_{s+m-1} + a_{s+m}) - a_{s+m} \\ &= d^m a_s + H(A_{s+1}) - a_{s+m} \mod{p} \end{align*}$
- Consequently, $H(A_{s+1}) = d \cdot H(A_s) - d^m a_s + a_{s+m} \mod{p}$ .
- To find $d^m a_s \mod{p}$ , we use the precomputed value $d^m \mod{p}$ , multiply it by $a_s$ and again take the remainder modulo $p$ .
- Also, since $(-d^m a_s + a_{s+m}) \mod{p}$ and $H(A_s)$ are each less than $p$ , it follows that $d \cdot H(A_s) + [(-d^m a_s + a_{s+m}) \mod{p}] < (d+1)p$ .
- Thus, since we chose $p$ such that $(d+1)p$ fits in a single register, all the values and the intermediate results for the above expression also fit in a single register.
Thus, we first compute $H(B)$ and $H(A_0)$ using Horner's rule.
The $O(n)$ subsequent values of $H(A_s)$ for $s > 0$ are computed in constant time using the above recursion.
$H(A_s)$ is compared with $H(B)$ and if they are equal the strings $A_s$ and $B$ are compared by brute force character-by-character to confirm whether they are genuinely equal.
Since $p$ was chosen large, the false positives when $H(A_s) = H(B)$ but $A_s \neq B$ are very unlikely, which makes the algorithm run fast in the average case.
However, when we use hashing we cannot achieve useful bounds for the worst case performance.

Finite Automata

A string matching finite automaton for a pattern $B$ of length $m$ has:
- $m+1$ many states $0, 1, \ldots, m$ which correspond to the number of characters matched thus far, and
- a transition function $\delta (s,c)$ where $0 \leq s \leq m$ and $c \in S$ . $\delta (s,c)$ is the state you go to if you were in state $s$ and then saw a character $c$ .
Suppose that the last $s$ characters of the text $A$ match the first $s$ characters of the pattern $B$ , and that $c$ is the next character in the text. Then $\delta (s,c)$ is the new state after character $c$ is read, i.e. the largest $s'$ so that the last $s'$ characters of $A$ (ending at the new character $c$ ) match the first $s'$ characters of $B$ .
We first suppose that $\delta (s,c)$ is given as a pre-constructed table. For example, if $B = xyxyxzx$ , then the table defining $\delta (s,c)$ would be:

To compute the transition function $\delta$ (this table):
- Let $B_k = b_0 \ldots b_{k-1}$ denote a prefix of length $k$ of the string $B$ .
- Being at state $k$ means that so far we have matched the prefix $B_k$ .
- If we now see an input character $a$ , then $\delta (k, a)$ is the largest $m$ such that the prefix $B_m$ of string $B$ is a suffix of the string $B_k a$ .
- In the particular case where $a = b_k$ , i.e. $B_k a = B_{k+1}$ , then $m = k + 1$ and so $\delta (k, a) = k + 1$ .
- If $a \neq b_k$ however, we cant extend our match from length $k$ to $k+1$ . To find $\delta (k, a)$ , the largest $m$ such that $B_m$ is a suffix of $B_k a$ , we match the string against itself: we can recursively compute a function $\pi (k)$ which for each $k$ returns the largest integer $m$ such that the prefix $B_m$ of $B$ is a proper suffix of $B_k$ .
- Suppose we have already found that $\pi (k)$ , i.e. $B_{\pi (k)} = b_0 \ldots b_{\pi (k) - 1}$ is the longest prefix of $B$ which is a proper suffix of $B_k$ .
- To compute $\pi (k+1)$ , we first check whether $b_k = b_{\pi (k)}$ .
  - If true, then $\pi (k+1) = \pi(k) + 1$ .
  - If false, then we cannot extend $B_{\pi (k)}$ . The next longest prefix of $B$ which is a proper suffix of $B_k$ is $B_{\pi (\pi (k))}$ , so we check whether $b_k = b_{\pi (\pi (k))}$ .
    - If true, then $\pi (k+1) = \pi (\pi (k)) + 1$ .
    - If false, then check whether $b_k = b_{\pi (\pi (\pi (k)))}$ , and so on...

Knuth-Morris-Pratt Algorithm

There are $O(m)$ values of $k$ , and for each we might try several values $l$ .
We maintain two pointers: the left pointer $k + 1 - l$ (the start of the match we are trying to extend) and the right pointer at $k$ .
After each step of the algorithm (i.e. each comparison between $b_k$ and $b_l$ ), exactly one of these two pointers is moved forwards.
Each can take up to $m$ values, so the total number of steps is $O(m)$ . This is an example of amortisation.
The time complexity of this algorithm is linear $O(m)$ .
We can now do our search for string $B$ in a longer string $A$ .
Suppose $B_s$ is the longest prefix of $B$ which is a suffix of $A_i = a_0 \ldots a_{i-1}$ .
To answer the same question for $A_{i+1}$ , we begin by checking whether $a_i = b_s$ .
- If true, then the answer for $A_{i+1}$ is $s+1$ .
- If false, check whether $a_i = b_{\pi (s)}$ ...
If the answer for any $A_i$ is $m$ , we have a match.
- Reset to state $\pi (m)$ to detect any overlapping full matches.
By the same two pointer argument, the time complexity is $O(n)$ .

Looking for Imperfect Matches

Given a very long string $A = a_0 a_1 a_2 a_3 \ldots a_s a_{s+1} \ldots a_{s+m-1} \ldots a_{n-1}$ , a shorter string $B = b_0 b_1 b_2 \ldots b_{m-1}$ , where $m \ll n$ , and an integer $k \ll m$ , we want to find all matches for $B$ in $A$ which have up to $k$ errors.
We split $B$ into $k+1$ substrings of (approximately) equal length. Then any match in $A$ with at most $k$ errors must contain a substring which is a perfect match for a substring of $B$ .
We look for all perfect matches in $A$ for each of the $k+1$ parts of $B$ . For every match, we test by brute force whether the remaining parts of $B$ match sufficiently with the appropriate parts of $A$ .

Linear Programming

In the standard form the objective to be maximised is given by $\sum_{j=1}^n c_j x_j$ and the constraints are of the form:
- $\sum_{j=1}^n a_{ij}x_j \leq b_i$ $(1 \leq i \leq m)$
- $x_j \geq 0$ $(1 \leq j \leq n)$
To get a more compact representation of linear programs, we use vectors and matrices.
Let $\mathbf{x}$ represent a (column) vector, $\mathbf{x} = \langle x_1 \ldots x_n \rangle^T$ .
Define a partial ordering on the vectors in $\mathbb{R}^n$ by $\mathbf{x} \leq \mathbf{y}$ if and only if the corresponding inequalities hold coordinate-wise, i.e. if and only if $x_j \leq y_j$ for all $1 \leq j \leq n$ .
Write the coefficients in the objective function as $\mathbf{c} = \langle c_1 \ldots c_n \rangle^T \in \mathbb{R}^n$ , the coefficients in the constraints as an $m \times n$ matrix $A = (a_{ij})$ and the RHS values of the constraints as $\mathbf{b} = \langle b_1 \ldots b_m \rangle^T \in \mathbb{R}^m$ .
The standard form can be formulated simply as:
- maximise $\mathbf{c}^T \mathbf{x}$
- subject to the following two (matrix-vector) constraints:
  - $A \mathbf{x} \leq \mathbf{b}$
  - $\mathbf{x} \geq \mathbf{0}$
Thus, a Linear Programming optimisation problem can be specified as a triplet ( $A, \mathbf{b}, \mathbf{c}$ ) which is the form accepted by most standard LP solvers.
The full generality of LP problems does not appear to be handled by the standard form. LP problems could have:
- equality constraints,
- unconstrained variables (i.e. potentially negative values $x_i$ ), and
- absolute value constraints.
An equality constraint of the form $\sum_{i=1}^n a_{ij} x_i = b_i$ can be replaced by two inequalities $\sum_{i=1}^n a_{ij} x_i \geq b_i$ and $\sum_{i=1}^n a_{ij} x_i \leq b_i$ . Thus, we can assume all constraints are inequalities.
Each occurrence of an unconstrained variable $x_j$ can be replaced by the expression $x_j' - x_j^*$ where $x_j', x_j^*$ are new variables satisfying the equality $x_j' \geq 0, x_j^* \geq 0$ .
For a vector $\mathbf{x} = \langle x_1, \ldots, x_n \rangle^T$ , we can define $|\mathbf{x}| = \langle |x_1|, \ldots, |x_n| \rangle^T$ . Some problems are naturally translated into constraints of the form $|A\mathbf{x}| \leq \mathbf{b}$ . This also poses no problem as we can replace such absolute value constraints with two linear constraints: $A\mathbf{x} \leq \mathbf{b}$ and $-A\mathbf{x} \leq \mathbf{b}$ .
In the standard form, any vector $\mathbf{x}$ which satisfies the two constraints is called a feasible solution, regardless of what the corresponding objective value $\mathbf{c}^T \mathbf{x}$ might be.

Example

Maximise $z(x_1, x_2, x_3) = 3x_1 + x_2 + 2x_3$ subject to:
Adding the first two inequalities gives $3x_1 + 3x_2 + 8x_3 \leq 54$ . Since all variables are constrained to be non-negative, then we know that $3x_1 + x_2 + 2x_3 \leq 3x_1 + 3x_2 + 8x_3 \leq 54$ , i.e. the objective does not exceed 54. Can we do better?
We try to look for coefficients $y_1, y_2, y_3 \geq 0$ to be used to form a linear combination of the constraints:
Summing up all these inequalities and factoring, we get $x_1(y_1 + 2y_2 + 4y_3) + x_2(y_1 + 2y_2 + y_3) + x_3(3y_1 + 5y_2 + 2y_3) \leq 30y_1 + 24y_2 + 36y_3$ .
If we compare this to our objective, we see that if we choose $y_1, y_2, y_3$ such that:
then $3x_1 + x_2 + 2x_3 \leq x_1(y_1 + 2y_2 + 4y_3) + x_2(y_1 + 2y_2 + y_3) + x_3(3y_1 + 5y_2 + 2y_3)$ .
Combining this with the above inequalities, we get $30y_1 + 24y_2 + 36y_3 \geq 3x_1 + x_2 + 2x_3 = z(x_1, x_2, x_3)$ .
Consequently, in order to find a tight upper bound for our objective $z(x_1, x_2, x_3)$ in the original problem $P$ , we have to find $y_1, y_2, y_3$ which solve problem $P^*$ :
- Minimise $z^* (y_1, y_2, y_3) = 30y_1 + 24y_2 + 36y_3$ subject to:
Then, $z^* (y_1, y_2, y_3) = 30y_1 + 24y_2 + 36y_3 \geq 3x_1 + x_2 + 2x_3 = z(x_1, x_2, x_3)$ will be a tight upper bound.
This new problem $P^*$ is called the dual problem of $P$ .
We repeat the whole procedure to find the dual of $P^*$ , denoted $(P^*)^*$ . We are now looking for $z_1, z_2, z_3 \geq 0$ to obtain:
Summing these up and factorising we get $y_1(z_1 + z_2 + 3z_3) + y_2(2z_1 + 2z_2 + 5z_3) + y_3(4z_1 + z_2 + 2z_3) \geq 3z_1 + z_2 + 2z_3$ .
If we choose multipliers $z_1, z_2, z_3$ such that:
then $y_1(z_1 + z_2 + 3z_3) + y_2(2z_1 + 2z_2 + 5z_3) + y_3(4z_1 + z_2 + 2z_3) \leq 30y_1 + 24y_2 + 36y_3$ .
Combining this with the above we get $3z_1 + z_2 + 2z_3 \leq 30y_1 + 24y_2 + 36y_3$ .
Consequently, finding the double dual program $(P^*)^*$ amounts to maximising the objective $3z_1 + z_2 + 2z_3$ subject to the constraints:
Thus, the double dual program $(P^*)^*$ is just $P$ itself.
Recall that the Ford-Fulkerson algorithm produces a maximum flow by showing that it terminates only when we reach the capacity of a minimal cut. Looking for the multipliers $y_1, y_2, y_3$ reduced a maximisation problem to an equally hard minimisation problem.
In general, the primal Linear Program $P$ and its dual $P^*$ are:

Weak Duality Theorem

If $x = \langle x_1 \ldots x_n \rangle$ is any feasible solution for $P$ and $y = \langle y_1 \ldots y_m \rangle$ is any feasible solution for $P^*$ , then:

$z(x) = \sum_{j=1}^n c_j x_j \leq \sum_{i=1}^m b_i y_i = z^*(y)$ (proof omitted).

Thus, the value of (the objective of $P^*$ for) any feasible solution of $P^*$ is an upper bound for the set of all values of (the objective of $P$ for) all feasible solutions of $P$ , and every feasible solution of $P$ is a lower bound for the set of feasible solutions for $P^*$ .
If we find a feasible solution for $P$ which is equal to a feasible solution to $P^*$ , this common value must be the maximal feasible value of the objective of $P$ and the minimal feasible value of the objective of $P^*$ .

Intractability

Feasibility of Algorithms

A (sequential) algorithm is said to be polynomial time if for every input it terminates in polynomially many steps in the length of the input.
The length of an input is the number of symbols needed to describe the input precisely.

Decision Problems and Class $\mathbf{P}$

A decision problem is a problem with a YES or NO answer.
A decision problem $A(x)$ is in class $\mathbf{P}$ (polynomial time, denoted $A \in \mathbf{P}$ if there exists a polynomial time algorithm which solves it.

Class $\mathbf{NP}$

A decision problem $A(x)$ is in class $\mathbf{NP}$ (non-deterministic polynomial time, denoted $A \in \mathbf{NP}$ ) if there exists a problem $B(x,y)$ such that:
1. for every input $x$ , $A(x)$ is true if and only if there is some $y$ for which $B(x,y)$ is true, and
2. the truth of $B(x,y)$ can be verified by an algorithm running in polynomial time in the length of $x$ only.
We call $y$ a certificate for $x$ and $B$ a certifier.
Class $\mathbf{NP}$ problems are problems that can be verified in polynomial time whereas Class $\mathbf{P}$ problems are problems that can be solved in polynomial time.
For example, consider the decision problem $A(x) =$ "integer $x$ is not prime". Then we need to find a problem $B(x,y)$ such that $A(x)$ is true if and only if there is some $y$ for which $B(x,y)$ is true. Naturally, $B(x,y) =$ " $x$ is divisibe by $y$ ". $B(x,y)$ can indeed be verified by an algorithm running in polynomial time in the length of $x$ only.

$\mathbf{P}$ vs $\mathbf{NP}$

Is it the case that every problem in $\mathbf{NP}$ is also in $\mathbf{P}$ ?
The conjecture that $\mathbf{NP}$ is a strictly larger class of decision problems than $\mathbf{P}$ is known as the " $\mathbf{P} \neq \mathbf{NP}$ " hypothesis, and it is widely considered to be one of the hardest open problems in mathematics.

Polynomial Reductions

Let $U$ and $V$ be two decision problems. We say that $U$ is polynomially reducible to $V$ if and only if there exists a function $f(x)$ such that:
1. $f(x)$ maps instances of $U$ into instances of $V$ .
2. $f$ maps YES instances of $U$ to YES instances of $V$ and NO instances of $U$ to NO instances of $V$ , i.e. $U(x)$ is YES if and only if $V(f(x))$ is YES.
3. $f(x)$ is computable by a polynomial time algorithm.

Cook's Theorem

Every $\mathbf{NP}$ problem is polynomially reducible to the SAT problem.

SAT problem:

There are $2^n$ cases to consider in the SAT problem.
The SAT problem is in class $\mathbf{NP}$ since given an evaluation of the propositional variables one can determine in polynomial time whether the formula is true for such an evaluation.
- If each clause $C_i$ involves exactly two variables (2SAT), then we are in class $\mathbf{P}$ .
This means that for every $\mathbf{NP}$ decision problem $U(x)$ there exists a polynomial time computable function $f(x)$ such that:
1. for every instance $x$ of $U$ , $f(x)$ produces a propositional formula $\Phi_x$ ;
2. $U(x)$ is true if and only if $\Phi_x$ is satisfiable.

$\mathbf{NP}$ -complete

An $\mathbf{NP}$ decision problem $U$ is $\mathbf{NP}$ -complete ( $U \in \mathbf{NP}$ - $\mathbf{C}$ ) if every other $\mathbf{NP}$ problem is polynomially reducible to $U$ .
Thus, Cook's theorem says that SAT is $\mathbf{NP}$ -complete.
$\mathbf{NP}$ -complete problems are the hardest $\mathbf{NP}$ problems since a polynomial time algorithm for solving an $\mathbf{NP}$ -complete problem would make every other $\mathbf{NP}$ problem also solvable in polynomial time.
But if $\mathbf{P} \neq \mathbf{NP}$ (as commonly hypothesised), then there cannot be any polynomial time algorithms for solving an $\mathbf{NP}$ -complete problem.

Proving $\mathbf{NP}$ -completeness

Let $U$ be an $\mathbf{NP}$ -complete problem, and let $V$ be another $\mathbf{NP}$ problem. If $U$ is polynomially reducible to $V$ , then $V$ is also $\mathbf{NP}$ -complete (proof omitted).

Reducing 3SAT to VC

We want to find a polynomial time reduction from 3SAT to Vertex Cover (VC).

An instance of 3SAT consisting of $M$ clauses and $N$ propositional variables is satisfiable if and only if the corresponding graph has a vertex cover of size at most $2M + N$ (proof omitted).

Optimisation Problems

$\mathbf{NP}$ -hard Problems

Let $A$ be a problem and suppose we have a "black box" device which for every input $x$ instantaneously computes $A(x)$ .
We consider algorithms which are polynomial time in $A$ . This means algorithms which run in polynomial time in the length of the input and which, besides the usual computational steps, can also use the above mentioned "black box".
We say that a problem $A$ is $\mathbf{NP}$ -hard ( $A \in \mathbf{NP}$ - $\mathbf{H}$ ) if every $\mathbf{NP}$ problem is polynomial time in $A$ , i.e. if we can solve every $\mathbf{NP}$ problem $U$ using a polynomial time algorithm which can also use a black box to solve any instance of $A$ .
We do not require $A$ to be an $\mathbf{NP}$ problem nor a decision problem. It can also be an optimisation problem.
It is important to be able to figure out if a problem at hand is $\mathbf{NP}$ -hard in order to know that one has to abandon trying to come up with a feasible polynomial time solution.
All $\mathbf{NP}$ -complete problems are equally difficult because any of them is polynomially reducible to any other. However, the related optimisation problems can be very different. Some of these optimisation problems allow us to get within a constant factor of the optimal answer.
- Vertex Cover permits an approximation which produces a cover at most twice as large as the minimum vertex cover.
- Metric TSP permits an approximation which produces a tour at most twice as long as the shortest tour.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
resources		resources
README.md		README.md

qlyde/cs3121-notes

Folders and files

Latest commit

History

Repository files navigation

COMP3121

Table of Contents

Course Outline

Assessment

Introduction

Proofs

Stable Matching Problem

Gale-Shapley Algorithm

Divide and Conquer

Landau Notation

Big-O Notation

Big- Notation

Properties

Counting Inversions

Recurrences

Integer Multiplication I

Master Theorem

Theorem

Example

Arithmetic Operations

Addition

Multiplication

Applying Divide and Conquer to Multiplication of Large Integers

The Karatsuba Trick

Integer Multiplication II

Generalising Karatsuba's Algorithm

The General Case I

Convolution

The General Case II

Fast Fourier Transform

Context

Complex Numbers

Complex Roots of Unity

The Fast Fourier Transform

The Discrete Fourier Transform (DFT)

The Fast Fourier Transform (FFT)

The Inverse Fast Fourier Transform

The Greedy Method

Single Source Shortest Paths I

Dijkstra's Shortest Paths Algorithm

Minimum Spanning Trees

Kruskal's Algorithm

Union-Find

Efficient Implementation of Kruskal's Algorithm

Dynamic Programming

Single Source Shortest Paths II

Bellman-Ford Algorithm

All Pairs Shortest Paths

Floyd-Warshall Algorithm

Maximum Flow

Flow Networks

Residual Flow Networks

Augmenting Paths

Solving the Maximum Flow Problem

Ford-Fulkerson Algorithm

Edmonds-Karp Algorithm

Applications of Network Flow

Bipartite Graphs

String Matching

Hashing

Rabin-Karp Algorithm

Finite Automata

Knuth-Morris-Pratt Algorithm

Looking for Imperfect Matches

Linear Programming

Example

Weak Duality Theorem

Intractability

Feasibility of Algorithms

Decision Problems and Class

Class

vs

Polynomial Reductions

Cook's Theorem

-complete

Big- $\Omega$ Notation

Decision Problems and Class $\mathbf{P}$

Class $\mathbf{NP}$

$\mathbf{P}$ vs $\mathbf{NP}$

$\mathbf{NP}$ -complete

Proving $\mathbf{NP}$ -completeness

$\mathbf{NP}$ -hard Problems