数值计算中的并行算法

(并行与分布式计算七)

Arithmetic Computations

线性递推式(linear recurrences)

m阶线性递推式(linear recurrece of order m):
$yi$ 由 $ y{i-1}, y{i-2}, …, y{i-m} $ 的线性组合生成

多项式求值

A = ($ a_0, a_1, …, a_n $)是多项式$ p(x) = a_0x^n + a1x^{n-1 + … + a{n-1}x} + a_n $的系数数组,给定点$x_0$计算$p(x_0)$

  • Horner’s algorithm(多项式求值的经典算法: 使用n次乘法和n次加法)
    $$ p(x_0) = (…((a_0x_0 +a_1)x_0 + a_2)x0 + … + a{n-1})x_0 + a_n $$

$ y_1 = a_0x_0 + a_1 $
$ y_2 = y_1x_0 + a_2 $
$ . $
$ . $
$ . $
$ p(x_0) = yn = y{n-1}x_0 + a_n $

即如下一阶递推式:
$ y_0 = a_0 $
$ y_i = y_ix_0 + a_i, 1 \leq i \leq n $

三对角线矩阵的LDU分解(LDU Factorization of a Tridiagonal Matrix)

  • n*n 非奇异(nonsingular)三对角线矩阵A
    $$ A =
    \left[
    \begin{matrix}
    b_1 & c_1 & . & . & . & . \
    a_2 & b_2 & c_2 & . & . & . \
    . & a_3 & b_3 & c3 & . & . \
    . & . & . & . & . & . \
    . & . & . & . & . & . \
    . & . & . & a
    {n-1} & b{n-1} & c{n-1}\
    . & . & . & . & a_n & b_n
    \end{matrix}
    \right]
    $$
  • A的LDU分解: 单位下三角矩阵L, 对角矩阵D和单位上三角矩阵U; A = LDU
  • 易证三个矩阵形式如下
    $$ L =
    \left[
    \begin{matrix}
    1 & & & & & \
    l_2 & 1 & & & & \

    & l_3 & 1   &     &     & \\
    &     & .   &  .  &     & \\
    &     &     &  .  & .   & \\
    &     &     &     & .   & . \\
    &     &     &     & l_n & 1 
    

    \end{matrix}
    \right]
    $$
    $$ U =
    \left[
    \begin{matrix}
    1 & u_1 & & & & \

    & 1   & u_2 &     &     & \\
    &     & 1   & u_3 &     & \\
    &     &     &  .  & .   & \\
    &     &     &     & .   & u_{n-1} \\
    &     &     &     &     & 1 
    

    \end{matrix}
    \right]
    $$
    $$ D =
    \left[
    \begin{matrix}
    d_1 & & & & & \

    & d_2 &     &     &     & \\
    &     & .   &     &     & \\
    &     &     &  .  &     & \\
    &     &     &     & .   & \\
    &     &     &     &     & d_n 
    

    \end{matrix}
    \right]
    $$
    其中
    $ d_1 = b_1 $
    $ d_j = b_j - ajc{j-1}/d_{j-1}, 2 \leq j \leq n $
    $ l_j = aj/d{j-1}, 2 \leq j \leq n $
    $ u_j = c_j/d_j, 1 \leq j \leq n - 1 $

  • $l_j$和$u_j$的值可以有$d_j$快速得到

  • 使$d_j = wj/w{j-1}, w_0 = 1, w_1 = b_1$
  • $w_j$可由下列二阶递推式计算:
    $w_0 = 1$
    $w_1 = b_1$
    $w_j = bjw{j-1} - (ajc{j-1})w_{j-2}, 2 \leq j \leq n$
  • 三对角线矩阵A的LDU分解转化成求解二阶线性递推式

带状三角线性系统(Banded Triangular Linear Systems)

n*n 下三角矩阵A, 其中所有非零元在主对角线或者其下的m-1对角线上(m < n)

  • e.g. (m = 3):
    $$ A =
    \left[
    \begin{matrix}
    a{11} & 0 & 0 & 0 & … & … & 0 \
    a
    {21} & a{22} & 0 & 0 & … & … & 0 \
    a
    {31} & a{32} & a{33} & 0 & . & . & . \
    0 & a{42} & a{43} & a_{44} & & & \

    &     & .   &  .  & .  & . & . \\
    &     &     &  .  & .  & . & 0\\
    &     &     &     & a_{n,n-2}  & a_{n, n-1} & a_{nn}
    

    \end{matrix}
    \right]
    $$

  • 线性系统$ Ax = b $的解可以表示为:
    $ x_1 = \frac{b1}{a{11}} $
    $ xi = -(\sum{j=i-m+1}^{i-1}\frac{a{ij}}{a{i1}}x_j) - \frac{bi}{a{i1}}, 2 \leq i \leq n $

  • 该解是一个m-1阶线性递推式

一阶线性递推式(first-order linear recurrences)

$ y_1 = b_1 $
$ y_i = aiy{i-1} + b_i, 2 \leq i \leq n $

基本思想:

  • 类似于前缀和的计算(若对所有i有$a_i = 1$,则为前缀和计算)
  • 用平衡树方法

  • 不妨设$n = 2^k$,i($2\leq i \leq n$)为偶数,则$y_i = ai(a{i-1}y{i-2} + b{i-1}) + b_i$,即$y_i = aia{i-1}y_{i-2} + aib{i-1} + b_i$

  • 这是对偶数下标i的一阶线性递推式(size: n/2)

  • 使$ai^{‘} = a{2i}a_{2i-1}$和$bi^{‘} = a{2i}b{2i-1} + b{2i}$, $1 \leq i \leq \frac{n}{2}$, 同时使$zi = y{2i}$, 则得到下列递推式:
    $ z_1 = b_1^{‘} $
    $ z_i = ai^{‘}z{i-1} + b_i^{‘}, 2 \leq i \leq n/2 $

Algorithm 8.1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
(First-Order linear Recurrence)
Input: Two arrays B = (b_1, b_2, ..., b_n) and A = (a_1 = 0, a_2, ..., a_n) representing the first-order linear recurrence y_1 = b_1 and y_i = a_iy_{i-1} + b_i, (2 <= i <= n); n is assumed to be a power of 2.
Output: The values of all the y_i terms.
begin
1. if n = 1 then {set y_1 := b_1, exit}
2. for 1 <= i <= n/2 pardo
Set a_i^{'} := a_{2i}a_{2i-1}
Set b_i^{'} := a_{2i}b_{2i-1} + b_{2i}
3. Recursively, solve the first-order linear recurrence defined by z_1 = b_1^{'} and z_i = a_i^{'}z_{i-1} + b_i^{'}, where 2 <= i <= n/2.
4. for 1 <= i <= n pardo
i even : Set y_i := z_{i/2}
i = 1 : Set y_1 := b_1
i odd > 1 : Set y_i = a_iz_{(i-1)/2} + b_i
end

  • $ T(n) = T(n/2) + O(1) $
  • $ W(n) = W(n/2) + O(n) $

  • 并行复杂度: T(n) = O(logn), W(n) = O(n)

高维(一阶)线性递推式((first-order)higher-demensional linear recurrences)

  • m维向量$b_i(1 \leq i \leq n)$, mm矩阵$A_i(2 \leq i \leq n)$, 递推式如下:
    $ y_1 = b_1 $
    *$ y_i = Aiy{i-1} + b_i $
    , $2 \leq i \leq n$

  • 将问题规约为通过计算$Ai^{‘} = A{2i}A_{2i-1}$和$bi^{‘} = A{2i}b{2i-1} + b{2i}, 1 \leq i \leq n/2$求解$y_2, y_4, …, y_n$

  • $y_2i = Ai^{‘}y{2(i-1)} + b_i^{‘}$; 偶数下标y值得出后容易通过定义算得奇数下标y值

  • 将Algorithm8.1中相应元素换为矩阵和向量即可

两个mm 矩阵乘积的可以在O(logm)的时间复杂度内用O(M(m))计算量复杂度求得,其中M(m)是计算mm矩阵乘积所需要的数学操作数量的时序约束(sequential bound),目前可知的最好上界是$M(m) = O(m^2.376)$(over a ring)

  • 并行复杂度: T(n) = O(lognlogm); W(n) = O(nM(m))

三角线性方程组(Triangular Linear Systems)

  • 高斯消元法(Gaussian elimination scheme)可以将任意的线性方程组化简为三角矩阵

  • 再用标准前向替换法(standard forward substitution algorithm)可以求解

单位下三角矩阵及其线性方程

线性方程组$Ax = b$, 其中$x$和$b是n维向量$, A是n*n 单位下三角矩阵:
$$ A =
\left[
\begin{matrix}
1 & 0 & 0 & … & 0 \
a{21} & 1 & 0 & … & 0 \
a
{31} & a{32} & 1 & … & 0 \
. & . & . & . & . \
a
{n1} & a{n2} & … & a{n,n-1} & 1
\end{matrix}
\right]
$$

  • 对任意矩阵,通过将每行元素除以$a{ii}(a{ii} \not= 1, 1 \leq i \leq n)$可得到以上形式的矩阵

  • 简单方法: $x_i = bi - \sum{j=1}^{i-1}a_{ij}x_j$; 需要O(n^2)的计算量, 但无法求得并行算法

快速矩阵求逆并行算法

用分治策略求出矩阵$A$的逆$A^{-1}$,则$x = A^{-1}b$能在$O(logn)$时间内用$O(n^2)$计算量求得

  • 矩阵分块: 将矩阵$A$分成(n/2) * (n/2)的块(不妨设n是2的幂)
    $$ A =
    \left[
    \begin{matrix}
    A_1 & 0 \
    A_2 & A_3
    \end{matrix}
    \right]
    $$
  • $A_1$和$A_3$是非奇异下三角矩阵

    非奇异矩阵: 行列式不为零(存在逆矩阵)
    $$ A^{-1} =
    \left[
    \begin{matrix}
    A_1^{-1} & 0 \
    -A_3^{-1}A_2A_1^{-1} & A_3^{-1}
    \end{matrix}
    \right]
    $$

  • n*n 三角线性方程组可以在$O(log^2n)$时间复杂度,O(M(n))的计算量内求解

  • $T(n) = O(log^2n) = O(logn) * O(log^n)$: O(logn) (分治)迭代; O(logn) 维数<=(n/2)的矩阵乘法

  • $W(n) = 2W(n/2) + 2M(n/2)$: 假设$M(n) \geq 4M(n/2)$可得$W(n) \leq M(n)$

带状三角形线性方程组

e.g.(n = 6, m = 3)
$$ A =
\left[
\begin{matrix}
a{11} & 0 & 0 & 0 & 0 & 0 \
a
{21} & a{22} & 0 & 0 & 0 & 0 \
a
{31} & a{32} & a{33} & 0 & 0 & 0 \
0 & a{42} & a{43} & a{44} & 0 & 0 \
0 & 0 & a
{53} & a{54} & a{55} & 0 \
0 & 0 & 0 & a{64} & a{65} & a_{66}
\end{matrix}
\right]
$$

  • 不失一般性假设主对角线元等于1,$Ax = b$的解可以表示为
    $ x_1 = b_1 $
    $ xi = (-\sum{j=i-m+1}^{i-1}a_{ij}x_j) + b_i, 2 \leq i \leq n $

  • 不妨设m能整除n,将A分成m*m 的块$A{i,j}, 1 \leq i,j \leq n/m$; $A{i,i}$是下三角矩阵,$A_{i,i-1}$是上三角矩阵
    e.g.:

    matrix_block.png

  • 设D为如下子矩阵:
    $$ D =
    \left[
    \begin{matrix}
    A{1,1} & & & \
    & A
    {2,2} & & \
    & & . & \
    & & & A_{\frac{n}{m},\frac{n}{m}}
    \end{matrix}
    \right]
    $$

  • 使$A^{} = D^{-1}A$, 使$d = D^{-1}b$; $Ax = b$的解同$D_{-1}Ax = D^{-1}b$, 即$A^{}x = b$, $A^{}$如下:
    $$ A =
    \left[
    \begin{matrix}
    Im & & & & \
    A
    {2,1}^{
    } & Im & & & \
    & A
    {3,2}^{} & Im & & \
    & & . & . & \
    & & & A
    {\frac{n}{m},\frac{n}{m}-1}^{
    } & I_m
    \end{matrix}
    \right]
    $$

  • $A{i,i-1}^{*} = A{i,i}^{-1}A{i,i-1}, 2 \leq i \leq n/m$; $A^{}$的计算需要对(n/m)-1个独立的mm矩阵求逆得到${A{i,i}^{-1}}{i=2}^{n/m}$,再通过(n/m)-1个独立的m*m矩阵相乘得到${A{i,i}^{-1}A{i,i-1}}{i=2}^{n/m}$

  • 运用对下三角矩阵求逆的并行算法,可以在$O(log^2m)$时间内用$O((n/m)M(m))$计算量求出$A{i,i}^{-1}, 1 \leq i \leq n/m$; (n/m)-1矩阵乘法与$d = D{-1}b$的计算显然在这个上界之内

  • 将x和d分成m维子向量$x = (x_1, x2, …, x{n/m})$和$d = (d_1, d2, …, d{n/m})$, $Ax = d$的解可表示为
    $x_1 = d_1$
    $xi = -A{i,i-1}^{
    }x_{i-1} + d_i, 2 \leq i \leq n/m$

  • $x_i$的计算可以在O(log(n/m)logm)的时间复杂度内,用O((n/m)M(m))的计算量

  • $T(n) = O(lognlogm)$; W(n) = O((n/m)M(m))

离散傅氏变换(Discrete fourier transform)

定义在复数域上的离散傅氏变换DFT

  • 定义复数$ \omega = e^{i\frac{2\pi}{n}} = \cos\frac{2\pi}{n} + i\sin\frac{2\pi}{n} $,其中$i = \sqrt{-1}$

  • $\omega^{n} = e^{2\pi i} = 1$; $1, \omega, \omega^2, …, \omega^{n-1}$是复数域上(相异的)第n单位根(root of unity); $\omega$被称为第n本原单位根(primitive root of unity)

  • 使n*n矩阵$W_n(j,k) = \omega^{jk}, 0 \leq j,k \leq n-1$, n维列向量$x$的DFT定义为列向量$y = W_nx$

  • $yj = \sum{k=0}^{n-1}\omega^{jk}x_k, 0 \leq j \leq n-1$

e.g.

eg_DFT.png

快速傅氏变换

用分治策略计算DFT $y = W_nx$(假设n是2的幂)

  • 使j为偶数: $j = 2l, 0 \leq l \leq \frac{n}{2} - 1$

  • $yj = y{2l} = \sum_{k=0}^{n-1}\omega^{2lk}xk$, 故有
    $ y
    {2l} = x_0 + \omega^{2l}x_1 + \omega^{4l}x2 + … + \omega^{2l(\frac{n}{2}-1)}x{\frac{n}{2}-1} + x{\frac{n}{2}} + \omega^{2l}x{\frac{n}{2}+1} + \omega^{4l}x{\frac{n}{2}+2} + … + \omega^{2l(\frac{n}{2}-1)}x{n-1} $

  • $ y_{2l} = (x0 + x{\frac{n}{2}}) + \omega^{2l}(x1 + x{\frac{n}{2}+1}) + \omega^{4l}(x2 + x{\frac{n}{2}+2}) + … + \omega^{2l(\frac{n}{2}-1)}(x{\frac{n}{2}-1} + x{n-1}) $

  • 显然$\omega^2 = e^{i\frac{2\pi}{\frac{n}{2}}}$, 故$\omega^2$是第(n/2)本原单位根

  • $z^{(1)} = [y_0, y2, …, y{n-2}]^{T}$是向量$[x0 + x{\frac{n}{2}}, x1 + x{\frac{n}{2}+1}, …, x{\frac{n}{2}-1} + x{n-1}]^{T}$的离散傅氏变换

  • 如果j为奇数: $j = 2l + 1$, 类似地有
    $ y_{2l+1} = (x0 + \omega^{\frac{n}{2}})x{\frac{n}{2}} + \omega^{2l}(\omega x1 + \omega^{\frac{n}{2}+1}x{\frac{n}{2}+1}) + … + \omega^{2l(\frac{n}{2}-1)}(\omega^{\frac{n}{2}-1}x{\frac{n}{2}-1} + \omega^{n-1}x{n-1}) $

  • $\omega^{\frac{n}{2}} = e^{i\pi} = -1$, 故
    $ y_{2l+1} = (x0 - x{\frac{n}{2}}) + \omega^{2l}\omega (x1 - x{\frac{n}{2}+1}) + \omega^{4l}\omega^2(x2 - x{\frac{n}{2}+2}) + … + \omega^{2l(\frac{n}{2}-1)}\omega^{\frac{n}{2}-1}(x{\frac{n}{2}-1} - x{n-1}) $

  • $z^{(2)} = [y_1, y3, …, y{n-1}]^{T}$是向量$[x0 - x{\frac{n}{2}}, \omega (x1 - x{\frac{n}{2}+1}), …, \omega^{\frac{n}{2}-1}(x{\frac{n}{2}-1} - x{n-1})]^{T}$的离散傅氏变换

fft.png

algorithm 8.2

1
2
3
4
5
6
7
8
9
10
11
12
13
(Fast Fourier Transform)
Input: An n-dimensional vector x whose entries are complex numbers, and \omega = e^{i\frac{2\pi}{n}}, where n is assumed to be a power of 2.
Output: The vector y that is the DFT of x.
begin
1. if n = 2 then {Set y_1 := x_1 + x_2, y_2 := x_1 - x_2, exit}
2. for 0 <= l <= n/2-1 pardo
Set u_l := x_l + x_{n/2+l}
Set v_l := \omega^l (x_l - x_{n/2+l})
3. Recurrsively, compute the DFT of the two vectors [u_0, u_1, ..., u_{n/2-1}] and [v_0, v_1, ..., v_{n/2-1}], and store the results in vetors z^(1) = [z_0^(1), z_1^(1), ..., z_{n/2-1}^(1)] and z^(2) = [z_0^(2), z_1^(2), ..., z_{n/2-1}^(2)], respectively.
4. for 0 <= j <= n-1 pardo
j even : Set y_j := z_{j/2}^(1)
j odd : Set y_j := z_{(j-1)/2}^(2)
end

  • $ T(n) = T(n/2) + O(1) $
  • $ W(n) = 2W(n/2) + O(n) $

  • T(n) = O(logn); W(n) = O(nlogn)

  • 离散傅里叶逆变换(inverse discrete Fourier transform, IDFT): 矩阵$W_n$的逆$W_n^{-1}(j,k) = \frac{1}{n}\omega^{-jk}, 0 \leq j,k \leq n-1$, 向量x的离散傅氏逆变换是向量$y = W_n^{-1}x$

  • 类似地,T = O(logn), W = O(n logn)

多项式乘法和卷积

多项式$p(x) = \sum_{k=0}^{n-1}a_kx^k$可由系数列表唯一表示$(a_0, a1, …, a{n-1})$; 给定点$x_i$, $y_i = p(x_j)$值唯一

多项式乘法

  • 两个多项式
    $ p(x) = \sum_{k=0}^{n-1}akx^k $,
    $ q(x) = \sum
    {k=0}^{m-1}bkx^k $,
    其积为
    $ r(x) = p(x)q(x) = \sum
    {k=0}^{n+m-2}c_kx^k $,
    使得
    $ ck = \sum{j=0}^{k}ajb{k-j} $ (下标越界的a、b值为0)

Polynomial multiplication using the FFT algorithm

  • 设l为2的幂且使得$n+m-2 < l \leq 2(n+m-2)$; $a = [a0, …, a{l-1}]^{T}, a_j = 0$ for j > n-1, $b = [b0, …, b{l-1}]^{T}, b_k = 0$ for k > m-1

  • 计算系数列表${c_k}$的算法如下( 多项式乘法的并行算法):

  1. 用F快速傅里叶变换算法计算 $y = W_la$ 和 $z = W_lb$ , 得到p(x)和q(x)在点x = 1, $\omega, \omega^2, …, \omega^{l-1}$的值,其中$\omega$是第l本原单位根,$y_j = p(\omega^j), z_j = q(\omega^j), 0 \leq j \leq l-1$

  2. 计算 $u_j = y_jz_j$ , for $0 \leq j \leq l-1$。显然 $u_j = p(\omega^j)q(\omega^j) = r(\omega^j)$, 故能得到多项式$r(x)$在l个相异单位根的值

  3. 计算向量$u = [u_0, u1, …, u{l-1}]^{T}$的傅里叶逆变换。向量前n+m-1个元和乘积$r(x)$的系数相同。

  • T(n) = O(log(n+m)); W(n) = O((n+m)log(n+m))

用快速傅里叶变换算法计算卷积(Convolution)

向量$a$和$b$的卷积,可以表示为$ a\bigotimes b $

  • $a = [a0, …, a{n-1}]^{T}$; $b = [b0, …, b{m-1}]^{T}$
  • $a \bigotimes b$定义为向量$c = [c0, …, c{m+n-1}]^{T}$, 使得$ck = \sum{j=0}^kajb{k-j}$, 其中$a_j = 0, j > n-1$, $b_j = 0, j > m-1$

  • 除$c{n+m-1}$(总是为0)外,两个向量$a$和$b$的卷积和多项式$p(x) = \sum{j=0}^{n-1}ajx^j$和$q(x) = \sum{j=0}^{m-1}b_jx^j$的乘积的系数相同

计算n维向量x和m维向量y的卷积的并行算法的复杂度:

  • T = O(log(n+m)); W((n+m)log(n+m))

Toeplitz矩阵(托普利茨矩阵)

Toeplitz矩阵: n*n矩阵T满足$T(k,l) = T(k-1,l-1)$, for $2 \leq l,k \leq n$

  • 矩阵T同一对角线上的所有元素相等; 因此这类矩阵又出现在第一行和第一列的2n-1个元素唯一确定
    $$ T =
    \left[
    \begin{matrix}
    t{n-1} & t{n-2} & … & & t_2 & t_1 & t_0 \
    tn & t{n-1} & t_{n-2} & … & & t_2 & t1 \
    t
    {n+1} & tn & t{n-1} & t_{n-2} & … & & t2 \
    . & & & . & . & . & . \
    . & & & & . & . & . \
    t
    {2n-3} & t{2n-4} & … & & & t{n-1} & t{n-2} \
    t
    {2n-2} & t{2n-3} & … & & t{n+1} & tn & t{n-1}
    \end{matrix}
    \right]
    $$
  • $t = [t_0, t1, …, t{2n-2}]^{T}$确定了托普利茨矩阵T

Toeplitz矩阵乘以向量

  • 向量$a = [a_0, a1, …, a{n-1}]^{T}$, $d = Ta$, 其中T是n*n托普利茨矩阵 $t = [t_0, t1, …, t{2n-2}]^{T}$

  • 计算d的一般算法需要O(logn)的时间和O(n^2)的计算量

  • 易证$dl = \sum{j=0}^{n-1}ajt{n+l-j-1}, 0 \leq l \leq n-1$

  • 考察向量 $a$ 和 $t$ 的卷积c, $ck = \sum{j=0}^kajt{k-j}$, 将k=n+l-1代入得$c{n+l-1} = \sum{j=0}^{n+l-1}ajt{n+l-j-1} = \sum_{j=0}^{n-1}ajt{n+l-j-1}$($j > n-1 时 a_j = 0$)

  • $dl = c{n+l-1}, 0 \leq l \leq n-1$

  • 乘积Ta可通过计算$a \bigotimes t$, 令$dl = c{n+l-1}(0 \leq l \leq n-1)$得到

Toeplitz矩阵乘以向量的并行算法:
$c = a \bigotimes t$得到的c向量的元素$(c_{n-1}, cn, …, c{2n-2})$

  • T(n) = O(logn); W(n) = O(nlogn)

求下三角Toeplitz矩阵的逆矩阵

-下三角托普利茨矩阵T分成n/2 * n/2的块:
$$ T =
\left[
\begin{matrix}
T_1 & 0 \
T_2 & T_1
\end{matrix}
\right]
$$

  • 不妨设n是2的幂

  • 矩阵T的逆$T^{-1}$如下:
    $$ T^{-1} =
    \left[
    \begin{matrix}
    T_1^{-1} & 0 \
    -T_1^{-1}T_2T_1^{-1} & T_1^{-1}
    \end{matrix}
    \right]
    $$

  • $T^{-1}$也是一个下三角托普利茨矩阵,由第一列元素唯一确定; 计算$T^{-1}$的算法:

  1. 递归地计算$T_1^{-1}$的第一列
  2. 计算$-T_1^{-1}T_2T_1^{-1}$的第一列
  • 计算$-T_1^{-1}T_2T_1^{-1}$的第一列等价于计算$-T_1^{-1}T_2T_1^{-1}e$, 其中$e = [1, 0, …, 0]^{T}$

  • T(n) = T(n/2) + O(logn)

  • W(n) = W(n/2) + O(nlogn)

Toeplitz矩阵求逆算法的并行复杂度: $T(n) = O(log^2n)$; $W(n) = O(nlogn)$

应用Toeplitz矩阵实现多项式的除法

  • 两个多项式
    $ s(x) = \sum_{j=0}^{n-1}sjx^j $

    $ t(x) = \sum
    {j=0}^{m-1}t_jx^j $,
    存在唯一的两个多项式, 商$q(x)$和余数$r(x)$
    使得
    $ s(x) = t(x)q(x) + r(x) $
    且$r(x)$的阶(degree)小于$t(x)$的阶(余数的最高次小于除数的最高次)

引理8.2: $t(x)$和$s(x)$是两个阶分别为n-1和m-1的多项式(n>m)。

  • 如果商$q(x)$已知,$r(x)$可以在O(logn)的时间内用O(nlogn)的计算量求出来。
  • 如果余数$r(x)$已知,$q(x)$可以在O(log(n-m))的时间内用O((n-m)log(n-m))的计算量算出来。

  • $q(x)$的阶为n-m,两个矩阵$t(x)q(x)$相乘可以在O(logn)时间内用O(nlogn)计算量复杂度的基于FFT的算法。$r(x) = s(x) - t(x)q(x)$可以在O(1)时间内由O(n)计算量完成

  • $s(x) - r(x)$可在O(1)时间内由O(n)操作完成。通过多项式求值/差值和DFT的关系求$q(x) = \frac{s(x)-r(x)}{t(x)}$: 计算$s(\omega^l) - r(\omega^l)$和$t(\omega^l), 0 \leq l \leq n-m+1$, 其中$\omega$是第(n+m-1)本原单位根。FFT算法可以在O(log(n-m))的时间内用O((n-m)log(n-m))计算量计算出来。因此$q(\omega^l)$能在这个上界内求得,再使用快速傅里叶逆变换算法能求得$q(x)$

计算多项式的商:

  • 设$q(x) = \sum_{j=0}^{n-m}qjx^j$是两个多项式$s(x) = \sum{j=0}^{n-1}sjx^j$和$t(x) = \sum{j=0}^{m-1}t_jx^j$的商。
  • 不失一般性假设$t{m-1}s{n-1} \not= 0$, $t(x)q(x)$可表示为$\sum_{k=0}^{n-1}a_kx^k$, 其中$ak = \sum{j=0}^ktjq{k-j}$

  • 设r(x)为$s(x)$除以$t(x)$的余数,给定$r(x)$的阶小于m-1; 则$\sum_{k=m-1}^{n-1}skx^k = \sum{k=m-1}^{n-1}a_kx^k$, 故$s_k = a_k, m-1 \leq k \leq n-1$ , 从而$sk = \sum{j=0}^ktjq{k-j}, m-1 \leq k \leq n-1$

  • 由$t_j = 0, j \geq m$和$q_j = 0, j \geq n-m+1$可得:

polynomial_divide.png

  • 关于$s(x)$和$t(x)$的系数的矩阵T是一个(n-m+1)*(n-m+1)下三角托普利茨矩阵

两个多项式(n阶和m阶, n > m)相除的并行算法的复杂度
$T(n) = O(log^2(n-m) + logn)$;
$W(n) = O(nlogn)$

多项式求值和插值

多项式求值(polynomial evaluation):

  • n-1阶多项式$p(x) = \sum_{j=0}^{n-1}a_jx^j$
  • ${a_l | 0 \leq l \leq n-1}$是一组相异点的集合
  • 计算$p(\alpha_l), 0 \leq l \leq n-1$

  • Horner’s algorithm并发地求,$T = O(logn), W = O(n^2)$

多项式求值的分治算法

  • 令$q_l = x - \alpha_l, 0 \leq l \leq n-1$

  • $Q1(x) = \prod{i=0}^{\frac{n}{2}-1}q_l(x)$
    $Q2(x) = \prod{l=\frac{n}{2}}^{n-1}q_l(x)$

  • 设定n为2的幂,$Q_1(x)$和$Q_2(x)$是阶为$\frac{n}{2}$的多项式

  • $p_1(x) = p(x) mod Q_1(x)$; $p_2(x) = p(x) mod Q_2(x)$

  • 对某个多项式$t_1(x)$和$t_2(x)$有$p(x) = t_1(x)Q_1(x) + p_1(x)$和$p(x) = t_2(x)Q_2(x) + p_2(x)$ ($deg(p_1(x)), deg(p_2(x)) < \frac{n}{2}$)

  • $p(\alpha_l) = p_1(\alpha_l), 0 \leq l \leq \frac{n}{2}-1$; $p(\alpha_l) = p_2(\alpha_l), \frac{n}{2} \leq l \leq n-1$

polynomial_evaluation.png

algorithm 8.3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(Polynomial Evaluation)
Input: (1)A polynomial p(x) of degree n-1 specified by its coefficients, and (2)a set of n distinct points a_j, where 0 <= j <= n-1 and n = 2^k for some integer k.
Output: The values p(a_j), where 0 <= j <= n-1
begin
1. for 0 <= j <= n-1 pardo
Set Q_{0, j}(x) := x - a_j
2. for h = 1 to log n-1 do
for 0 <= j <= (n/2^h) - 1 pardo
Set Q_{h,j}(x) := Q_{h-1,2j}(x) * Q_{h-1,2j+1}(x)
3. Set p_{k,0}(x) := p(x)
4. for h = log n-1 to 0 do
for 0 <= j <= (n/2^h)-1 pardo
j even : Set p_{h,j}(x) := p_{h+1,j/2}(x) mod Q_{h,j}(x)
j odd : Set p_{h,j}(x) := p_{h+1,(j-1)/2}(x) mod Q_{h,j}(x)
5. for 0 <= j <= n-1 pardo
Set p(a_j) := p_{0,j}
end

  • $ T(n) = O(log^3n) $; $ W(n) = O(nlog^2n) $

多项式插值的分治算法

  • 集合$ { \betaj = p(\alpha) }{j=0}^{n-1} $

  • 拉格朗日插值公式(Lagrange interpolation formula):
    $ p(x) = \sum_{j=0}^{n-1}\betaj \frac{\prod{l=0,l \not= j}^{n-1}(x - \alphaj)}{\prod{l=0,l \not= j}^{n-1}(\alpha_j-\alpha_l)} $

  • $q_l(x) = x - \alphal$; $Q(x) = \prod{i=0}^{n-1}q_l(x)$; $Q^{‘}(\alphaj) = \prod{l=0,l \not= j}^{n-1}(\alpha_j) - \alpha_l$

  • $\gamma_j = Q^{‘}(\alpha_j), 0 \leq j \leq n-1$, $c_j = \beta_j/\gammaj, 0 \leq j \leq n-1$; $p(x) = Q(x)\sum{j=0}^{n-1}\frac{c_j}{x - \alpha_j}$

  • $\sum_{j=0}^{n-1}\frac{c_j}{x-\alphaj}$可用平衡树方法求得
    $ \frac{p(x)}{Q(x)} = \frac{p
    {k-1,0}(x)}{Q{k-1,0}(x)} + \frac{p{k-1,1}(x)}{Q{k-1,1}(x)} = \frac{p{k-1,0}(x)Q{k-1,1}(x) + p{k-1,1}(x)Q_{k-1,0}(x)}{Q(x)} $

Algorithm 8.4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(Polynomial Interpolation)
Input: A set of pairs of numbers (a_j, b_j), where 0 <= j <= n, the a_j's are distinct, and n = 2^k for some integer k.
Output: The coefficients of the (n-1)st-degree polynomial p(x) such that p(a_j) = b_j, for all 0 <= j <= n-1.
begin
1. for 0 <= j <= n-1 pardo
Set Q_{0,j}(x) := x - a_j
2. for h = 1 to logn do
for 0 <= j <= (n/2^h)-1 pardo
Set Q_{h,j}(x) := Q_{h-1,2j}(x) * Q_{h-1,2j+1}(x)
3. Compute Q_{k,0}(x), and user Algorithm 8.3 to compute y_j = Q'_{k,0}(a_j), for all 0 <= j <= n-1
4. for 0 <= j <= n-1 pardo
Set p_{0,j}(x) := b_j/y_j
5. for h = 1 to logn do
for 0 <= j <= (n/2^h)-1 pardo
Set p_{h,j}(x) := p_{h-1,2j}(x)Q_{h-1,2j+1}(x) + p_{h-1,2j+1}(x)Q_{h-1,2j}(x)
6. Set p(x) := p_{k,0}(x)

  • $ T(n) = O(log^3n) $; $ W(n) = O(nlog^2n) $