Math

Approximating factorials

\(x! \approx x^x e^{-x}\)

Binomial distribution

Mean and variance

Differentiation rules

Exponential

Logarithm

Linear algebra

Cross product

\[ A \times B = \left\Vert A \right\Vert \left\Vert B \right\Vert \sin{\theta} n \]

Dot product

\[ a \cdot b = \sum_i a_i b_i \]

Dot product intuition

\(a \cdot b\) measures how much \(a\) and \(b\) point in the same direction, scaled by their magnitude.

Gaussian

\[ P(x | \mu, \sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp(-\frac{(x - \mu)^2}{2 \sigma^2}) \]

Exponential distribution

\[ P(x | \lambda) = \frac{e^{-\frac{x}{\lambda}}}{\mathcal{Z}} \] where \(\mathcal{Z}\) is a normalizing factor so that \(\int P(x | \lambda) = 1\).

Bayes

\[ P(A | B) = \frac{P(B | A) P(A)}{P(B)} \] \[ \text{posterior} = \text{likelihood ratio} \cdot \text{prior} \] \[ \text{likelihood ratio} = \frac{P(B | A)}{P(B)} \]

Maximum Likelihood Estimate vs. Maximum a Priori

\[ \theta_{\text{MLE}} = \arg \max_\theta p(x | \theta) \\ \theta_{\text{MAP}} = \arg \max_\theta p(x | \theta) p(\theta) \]

If \(p(\theta)\) is uniform, \(\theta_{\text{MLE}} = \theta_{\text{MAP}}\).

Using logarithms to make calculations easier

For example, for Maximum a Priori, we can do the following:

Perplexity

Wiki.

\[ PP(x) = 2^{H(x)} \]

Properties of binary operations

Commutative

\[ f(a, b) = f(b, a) \] ### Associativity \[ f(a, f(b, c)) = f(f(a, b), c) \] ### Distributive \[ f(a, g(b, c)) = f(g(a, b), g(a, c)) \]

For example, we say multiplication distributes over addition.

Jacobian

Given a function \(f(x) = y\) where \(x\) and \(y\) are vectors, the gradient of \(y\) with respect to \(x\) is the Jacobian:

\[ \frac{\delta y}{\delta x} = J = \begin{bmatrix} \frac{\delta y_1}{\delta x} & \frac{\delta y_2}{\delta x} & \frac{\delta y_3}{\delta x} & ... \end{bmatrix} = \begin{bmatrix} \frac{\delta y_1}{\delta x_1} & \frac{\delta y_2}{\delta x_1} & \frac{\delta y_3}{\delta x_1} & ... \\ \frac{\delta y_1}{\delta x_2} & \frac{\delta y_2}{\delta x_2} & \frac{\delta y_3}{\delta x_2} & ... \\ ... & ... & ... & ... \end{bmatrix} \]

Polar coordinates

Specify coordinates by distance from a central point and angle (wiki).