String matching on random text.

NAIVE-STRING-MATCHER(T,P)
1. n\leftarrow length[T]
2. n\leftarrow length[P]
3. for s\leftarrow 0 to n-m
4.    if P[1\ldots m] = T[s+1\ldots s+m]
5.      then print "pattern occurs at shift s"

Ex. from CLRS. The Naïve String matching algorithm is optimal for uniform random text T[] of length n and uniform random pattern P[] of length m.

Proof. Assume an alphabet of size d\geq 2.
Let p(k) be the probability of doing k\leq m comparisons at line 4.
If k<m then p(k) = \frac{1}{d^{k-1}}(1-\frac{1}{d})=\frac{d-1}{d^k}.
Otherwise k=m and p(k) = \frac{1}{d^{m-1}}.

Then the expected number of comparisons is,
E_p = \sum_{k=0}^{m} kp(k) = \sum_{k=0}^{m-1}k\frac{d-1}{d^k} + \frac{m}{d^{m-1}} =
= (d-1)\sum_{k=0}^{m} \frac{k}{d^k} + \frac{m}{d^{m}}.

Recall that,
\sum_{k=0}^m kx^k =
= x \frac{d}{dx} (\sum_{k=0}^{m} x^k) = x(-(m+1) x^m (1-x) +1-x^{m+1})/(1-x)^2 =
= x(mx^{m+1} -mx^{m}-x^m+1)/(1-x)^2 =
= \frac{x}{(1-x)^2}x^m(m(x-1) -1+\frac{1}{x^m}).

Then,
E_p = (\frac{1-x}{x})\frac{x}{(1-x)^2}x^m(m(x-1) -1 + \frac{1}{x^m}) + mx^m = x^m(\frac{1}{x^m(1-x)}-\frac{1}{1-x}) =
= \frac{1}{1-x}(1-x^m) = \frac{1-x^m}{1-x} =
= \frac{1-\frac{1}{d^m}}{1-\frac{1}{d}}\leq \frac{1}{1/2} = 2

Thus, the expected number of total comparisons is optimal for uniform random input,
E^{\text{total}}_p = E_p(n-m+1)\leq 2(n-m+1) = O(m+n)
Q.E.D.