# Maximum likelihood is equivalent to least squares in the presence of Gaussian noise

Studying machine learning and statistical pattern recognition these days, I’ve learned a nice fact about estimation. The proof is straightforward but I’d like to remeber this fact, so here it is.

Note. Maximum likelihood estimation is equivalent to least squares estimation in the presence of Gaussian noise.

Let $y_i=f(x_i,\boldsymbol{\theta}) + r_i$ and let $r_i$ follow a normal gaussian distribution $r_i \asymp G(0,\sigma)$.

In a least squares estimate one minimize the following

$\min_{\boldsymbol{\theta}\in \Omega}\sum_{i=1}^n r_i^2(\boldsymbol{\theta})=$

$\min_{\boldsymbol{\theta}\in \Omega}\sum_{i=1}^n (y_i-f(x_i, \boldsymbol{\theta}))^2$ (*)

In a maximum likelihood one defines a likelihood

$\ell(\boldsymbol{\theta}; r_1, \ldots, r_n):= \prod_{i=1}^n \frac{1}{\sqrt{2\pi}\sigma}\text{exp}(-\frac{1}{2}\frac{(y_i-f(x_i,\boldsymbol{\theta}))^2}{\sigma^2})$

and then minimize

$\text{min}_{\boldsymbol{\theta}\in \Omega} -\ln{\ell(\boldsymbol{\theta}, r_1, \ldots, r_n)}=$

$\text{min}_{\boldsymbol{\theta}\in \Omega}\left(n\ln{\sqrt{2\pi}} + n\ln\sigma + \frac{1}{2\sigma^2}\sum_{i=1}^n (y_i-f(x_i, \boldsymbol\theta))^2\right)=$

$n\ln{\sqrt{2\pi}} + n\ln\sigma + \frac{1}{2\sigma^2}\text{min}_{\boldsymbol{\theta\in\Omega}} \sum_{i=1}^n (y_i-f(x_i, \boldsymbol\theta))^2$

which is equivalent to (*). QED