Maximum likelihood is equivalent to least squares in the presence of Gaussian noise

Studying machine learning and statistical pattern recognition these days, I’ve learned a nice fact about estimation. The proof is straightforward but I’d like to remeber this fact, so here it is.

Note. Maximum likelihood estimation is equivalent to least squares estimation in the presence of Gaussian noise.

Let y_i=f(x_i,\boldsymbol{\theta}) + r_i and let r_i follow a normal gaussian distribution r_i \asymp G(0,\sigma).

In a least squares estimate one minimize the following

\min_{\boldsymbol{\theta}\in \Omega}\sum_{i=1}^n r_i^2(\boldsymbol{\theta})=

\min_{\boldsymbol{\theta}\in \Omega}\sum_{i=1}^n (y_i-f(x_i, \boldsymbol{\theta}))^2 (*)

In a maximum likelihood one defines a likelihood

\ell(\boldsymbol{\theta}; r_1, \ldots, r_n):= \prod_{i=1}^n \frac{1}{\sqrt{2\pi}\sigma}\text{exp}(-\frac{1}{2}\frac{(y_i-f(x_i,\boldsymbol{\theta}))^2}{\sigma^2})

and then minimize

\text{min}_{\boldsymbol{\theta}\in \Omega} -\ln{\ell(\boldsymbol{\theta}, r_1, \ldots, r_n)}=

\text{min}_{\boldsymbol{\theta}\in \Omega}\left(n\ln{\sqrt{2\pi}} + n\ln\sigma + \frac{1}{2\sigma^2}\sum_{i=1}^n (y_i-f(x_i, \boldsymbol\theta))^2\right)=

n\ln{\sqrt{2\pi}} + n\ln\sigma + \frac{1}{2\sigma^2}\text{min}_{\boldsymbol{\theta\in\Omega}} \sum_{i=1}^n (y_i-f(x_i, \boldsymbol\theta))^2

which is equivalent to (*). QED