Mathematics for Artifical Intelligence : Linear Algebra

A simplified guide on how to prep up on Mathematics for Artificial Intelligence, Machine Learning and Data Science: Linear Algebra (Important Pointers only)

Module - I : Linear Algebra

I. Vector and its properties.

A mathematical entity with magnitude and direction denoted as $v$ or $\vec{v}$

$In component form represented as ;$

$\mathbf{v} = \begin{pmatrix} v_1 \\ v_2 \end{pmatrix}$

$In Cartesian coordinates represented as :$

$\mathbf{v} = v_1 \mathbf{i} + v_2 \mathbf{j} + v_3 \mathbf{k}$

Types of Vectors : Zero Vector (magnitude ->0, no direction) , Unit Vector (magnitude ->1, directional), Position Vector (position of a point relative to its origin) , Equal Vectors (same magnitude and same direction) , Opposite Vectors (same magnitude but opposite direction).

Vector operations:

1. Addition and Subtraction

Vector Addition: $u + v = (\begin{array}{c} u_{1} \\ u_{2} \end{array}) + (\begin{array}{c} v_{1} \\ v_{2} \end{array}) = (\begin{array}{c} u_{1} + v_{1} \\ u_{2} + v_{2} \end{array})$
Vector Subtraction: $u - v = (\begin{array}{c} u_{1} \\ u_{2} \end{array}) - (\begin{array}{c} v_{1} \\ v_{2} \end{array}) = (\begin{array}{c} u_{1} - v_{1} \\ u_{2} - v_{2} \end{array})$

2. Scalar Multiplication

Multiplying a vector by a scalar $k v = k (\begin{array}{c} v_{1} \\ v_{2} \end{array}) = (\begin{array}{c} k v_{1} \\ k v_{2} \end{array})$

3. Dot Product (Scalar Product)

The dot product of two vectors is a scalar. $u \cdot v = u_{1} v_{1} + u_{2} v_{2} + \dots + u_{n} v_{n}$
The dot product can also be expressed in terms of the magnitudes of the vectors and the angle between them: $u \cdot v = ∥ u ∥ ∥ v ∥ \cos θ$

4. Cross Product (Vector Product)

The cross product is only defined in three-dimensional space.

u \times v = ∣ \begin{array}{ccc} i & j & k \\ u_{1} & u_{2} & u_{3} \\ v_{1} & v_{2} & v_{3} \end{array} ∣

Vector Properties:

Commutativity of Addition: $u + v = v + u$
Associativity of Addition: $u + (v + w) = (u + v) + w$
Distributivity: $k (u + v) = k u + k v$
Zero Vector: $v + 0 = v$
Negative Vector: $v + (- v) = 0$

II. Vector Spaces and Subspaces.

A collection of vectors that can be added together and multiplied by scalars (real or complex numbers) and still remain within the set.

Denoted as $V$ over a field $F$ (usually $\mathbb{R}$ or $\mathbb{C}$ )

Vector Addition: For vectors

u, v \in V

and

\mathbf{w} \in V

u + v = w

Scalar Multiplication: For

a \in F

and vectors

\mathbf{v} \in V

and

w \in V

a \mathbf{v} = \mathbf{w}

Important Axioms:

Associativity of Addition:

(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})

Commutativity of Addition:

u + v = v + u

Identity Element of Addition: There exists an element

\mathbf{0} \in V

such that

\mathbf{v} + \mathbf{0} = \mathbf{v}

for all

v \in V.

Inverse Elements of Addition: For each

\mathbf{v} \in V

, there exists an element

-\mathbf{v} \in V

such that

v + (- v) = 0.

Compatibility of Scalar Multiplication with Field Multiplication:

a(b\mathbf{v}) = (ab)\mathbf{v}

for all

a, b \in F

and

v \in V

Identity Element of Scalar Multiplication:

1\mathbf{v} = \mathbf{v}

for all

\mathbf{v} \in V

Distributivity of Scalar Multiplication with Respect to Vector Addition:

a(\mathbf{u} + \mathbf{v}) = a\mathbf{u} + a\mathbf{v}

for all

a \in F

and

\mathbf{u}, \mathbf{v} \in V

Distributivity of Scalar Multiplication with Respect to Field Addition:

(a + b)\mathbf{v} = a\mathbf{v} + b\mathbf{v}

for all

a, b \in F

and

v \in V.

Subspaces:

A subset $W$ of a vector space $V$ is called a subspace of $V$ if:

Zero Vector: $\mathbf{0} \in W$ .
Closed under Addition: For all $\mathbf{u}, \mathbf{v} \in W$ , $u + v \in W$ .
Closed under Scalar Multiplication: For all $a \in F$ and $v \in W$ , $a v \in W$ .

If $W$ satisfies these conditions, $W$ is a subspace of $V$ .

Eg: The zero subspace: $\{\mathbf{0}\}$ is a subspace of any vector space.

Properties of Subspaces:

Intersection: The intersection of two subspaces of $V$ is also a subspace of $V$ .
Sum: The sum of two subspaces $W_1$ and $W_2$ , defined as $W_1 + W_2 = \{ \mathbf{u} + \mathbf{v} \mid \mathbf{u} \in W_1, \mathbf{v} \in W_2 \}$ , is also a subspace of $V$ .
Span: The span of a set of vectors in $V$ is the smallest subspace of $V$ that contains all those vectors.

Spanning Set:

A set of vectors $S = \{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k\}$ in $V$ is said to span $V$ if every vector in $V$ can be written as a linear combination of the vectors in $S$ .

Basis:

A basis of a vector space $V$ is a set of vectors in $V$ that is linearly independent and spans $V$ .

Dimension:

The dimension of a vector space $V$ is the number of vectors in a basis of $V$ . It is a measure of the "size" or "degree of freedom" of the vector space.

III. Matrices.

A rectangular array of numbers, symbols, or expressions arranged in rows and columns, denoted by $A$ . If a matrix has $m$ rows and $n$ columns, it is called an $m \times n$ matrix.

For example, a $2 \times 3$ matrix $A$ :

$A = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \end{pmatrix}$

Types of Matrices:

Square Matrix: A matrix with the same number of rows and columns ( $m = n$ ).
Row Matrix: A matrix with a single row ( $1 \times n$ ).
Column Matrix: A matrix with a single column ( $m \times 1$ ).
Zero Matrix: A matrix in which all elements are zero.
Identity Matrix: A square matrix with ones on the main diagonal and zeros elsewhere.
Diagonal Matrix: A square matrix in which all off-diagonal elements are zero.
Symmetric Matrix: A square matrix that is equal to its transpose ( $A = A^{T}$ ).
Skew-Symmetric Matrix: A square matrix that is equal to the negative of its transpose ( $A = -A^T$ ).
Upper Triangular Matrix: A square matrix in which all elements below the main diagonal are zero.
Lower Triangular Matrix: A square matrix in which all elements above the main diagonal are zero.

Matrix Operations :

1. Addition:

$A + B = \begin{pmatrix} a_{11} + b_{11} & a_{12} + b_{12} \\ a_{21} + b_{21} & a_{22} + b_{22} \end{pmatrix}$

2. Subtraction

$A - B = \begin{pmatrix} a_{11} - b_{11} & a_{12} - b_{12} \\ a_{21} - b_{21} & a_{22} - b_{22} \end{pmatrix}$

3. Scalar Multiplication

$cA = \begin{pmatrix} c \cdot a_{11} & c \cdot a_{12} \\ c \cdot a_{21} & c \cdot a_{22} \end{pmatrix}$

4. Matrix Multiplication

Two matrices $A$ ( $m \times n$ ) and $B$ ( $n \times p$ ) can be multiplied to form an $m \times p$ matrix $C$ :

$C = A B$

Where each element $c_{i j}$ is calculated as:

$c_{i j} = \sum_{k = 1}^{n} a_{i k} b_{k j}$

5. Transpose

$A^{T}$ is formed by swapping rows and columns:

$A^T = \begin{pmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \end{pmatrix}$

6. Determinant

The determinant is a scalar value. For a $2 \times 2$ matrix:

$det (A) = ∣ \begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array} ∣ = a_{11} a_{22} - a_{12} a_{2}$

IV. Matrix Inversion.

The inverse of a square matrix $A$ , denoted $A^{-1}$ , is the matrix such that $AA^{-1} = A^{-1}A = I$ , where $I$ is the identity matrix. The inverse exists only if $\det(A) \neq 0$ .

For a $2 \times 2$ matrix:

$A^{-1} = \frac{1}{\det(A)} \begin{pmatrix} a_{22} & -a_{12} \\ -a_{21} & a_{11} \end{pmatrix}$

Conditions for Inversion:

Not all matrices have inverses. A matrix $A$ is invertible (or non-singular) if and only if:

$A$ is a square matrix.
The determinant of $A$ is non-zero, i.e., $\det (A) \neq 0$ .

If these conditions are not met, the matrix is said to be singular or non-invertible.

Properties of Inverse Matrix:

Uniqueness: If $A$ is invertible, its inverse $A^{-1}$ is unique.
Product of Inverses: If $A$ and $B$ are invertible matrices of the same dimension, then the product $AB$ is invertible
$(A B)^{- 1} = B^{- 1} A^{- 1}$
Inverse of Transpose: If $A$ is invertible, then
$(A^{T})^{- 1} = (A^{- 1})^{T}$
Inverse of a Scalar Multiple: If $A$ is invertible and $c$ is a non-zero scalar, then
$(cA)^{-1} = \frac{1}{c}A^{-1}$

Methods for Finding the Inverse:

1. Gaussian Elimination

To find the inverse of a matrix $A$ using Gaussian elimination:

Form the augmented matrix $[A | I]$ , where $I$ is the identity matrix of the same dimension as $A$ .
Use row operations to transform $[A | I]$ into $[I | A^{-1}]$ .
If this is possible, the matrix $A$ is invertible and the right half of the augmented matrix will be $A^{-1}$ .

2. Adjugate Method

For a square matrix $A$ , the inverse can also be found using the adjugate (or adjoint) and the determinant:

$A^{-1} = \frac{1}{\det(A)} \text{adj}(A)$

where $\text{adj}(A)$ is the adjugate of $A$ . The steps are:

Compute the determinant $\det(A)$ .
Find the matrix of cofactors.
Transpose the matrix of cofactors to get the adjugate.
Divide each entry of the adjugate by $\det(A)$ .

3. Using Elementary Matrices

An elementary matrix is obtained by performing a single elementary row operation on an identity matrix. The inverse of $A$ can be found by expressing $A$ as a product of elementary matrices:

$A = E_1 E_2 \cdots E_k$

Then,

$A^{-1} = E_k^{-1} \cdots E_2^{-1} E_1^{-1}$

Eg:

Let's find the inverse of the following 2x2 matrix $A$ :

$A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}$

The inverse is given by:

$A^{-1} = \frac{1}{\det(A)} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}$

where $\det(A) = ad - bc$

For example, if:

$A = \begin{pmatrix} 2 & 3 \\ 1 & 4 \end{pmatrix}$

Then:

$\det(A) = (2)(4) - (3)(1) = 8 - 3 = 5$

$A^{-1} = \frac{1}{5} \begin{pmatrix} 4 & -3 \\ -1 & 2 \end{pmatrix} = \begin{pmatrix} 0.8 & -0.6 \\ -0.2 & 0.4 \end{pmatrix}$

Applications

Solving Linear Systems: Given $A x = b$ , if $A$ is invertible, the solution is $x = A^{- 1} b$ .
Computer Graphics: Inverse matrices are used for transforming coordinates and manipulating images.
Control Theory: Inverse matrices are essential in system design and stability analysis.

V. Properties of Determinants.

Determinant of the Identity Matrix
The determinant of an identity matrix $I$ of any size is 1:
$\det(I) = 1$
Determinant of a Diagonal Matrix
The determinant of a diagonal matrix (a square matrix in which all off-diagonal elements are zero) $D$ with elements $d_{11}, d_{22}, \dots, d_{n n}$ :
$\det(D) = d_{11} \cdot d_{22} \cdot \ldots \cdot d_{nn$
Determinant of a Triangular Matrix
Similar to diagonal matrices, the determinant of a triangular matrix (either upper or lower triangular) is the product of its diagonal elements.
Determinant of the Transpose
The determinant of a matrix is equal to the determinant of its transpose:
$\det(A) = \det(A^T)$
Multiplicative Property
The determinant of the product of two matrices is the product of their determinants:
$\det(AB) = \det(A) \cdot \det(B)$
Determinant of an Inverse
If $A$ is an invertible matrix, the determinant of its inverse is the reciprocal of the determinant of $A$ :
$\det(A^{-1}) = \frac{1}{\det(A$
Determinant of a Scalar Multiple
If $A$ is an $n \times n$ matrix and $c$ is a scalar, the determinant of the scalar multiple of $A$ is:
$\det(cA) = c^n \det(A)$
Row and Column Operations
- Row Interchange: Swapping two rows (or two columns) of a matrix changes the sign of the determinant:
  $\det(B) = -\det(A)$
  if $B$ is obtained by interchanging two rows (or columns) of $A$ .
- Row Scaling: Multiplying a row (or column) by a scalar multiplies the determinant by the same scalar:
  $\det(B) = k \cdot \det(A)$
  if $B$ is obtained by multiplying a row (or column) of $A$ by $k$ .
- Row Addition: Adding a multiple of one row (or column) to another row (or column) does not change the determinant:
  $\det(B) = \det(A)$
  if $B$ is obtained by adding a multiple of one row (or column) to another row (or column) of $A$ .
Determinant of a Block Matrix
For a block diagonal matrix:
$A = \begin{pmatrix} A_1 & 0 \\ 0 & A_2 \end{pmatrix}$
The determinant is the product of the determinants of the blocks:
$\det(A) = \det(A_1) \cdot \det(A_2)$
Singular Matrix
A matrix is singular (non-invertible) if and only if its determinant is zero:
$\det(A) = 0 \iff A \text{ is singular}$
Linearity in Rows and Columns
The determinant is a linear function in each row and each column.

VI. Eigen Values and Eigen Vectors.

Given a square matrix $A$ of dimension $n \times n$ :

Eigenvalue: A scalar $λ$ is called an eigenvalue of $A$ if there exists a non-zero vector $v$ (called an eigenvector) such that:
$A v = λ v$
Eigenvector: A non-zero vector $v$ is called an eigenvector of $A$ corresponding to the eigenvalue $λ$ if it satisfies the above equation.

Finding Eigenvalues and Eigenvectors

To find the eigenvalues and eigenvectors of a matrix $A$ :

Eigenvalues: Solve the characteristic equation:
$\det(A - \lambda I) = 0$
This equation is derived from the equation $A v = λ v$ by rearranging it to:
$(A - \lambda I) \mathbf{v} = 0$
For non-trivial solutions (non-zero $\mathbf{v}$ ), the matrix $A - \lambda I$ must be singular, which means its determinant must be zero.
Eigenvectors: For each eigenvalue $\lambda$ , solve the linear system:
$(A - \lambda I) \mathbf{v} = 0$
to find the corresponding eigenvector(s) $\mathbf{v}$ .

Properties of Eigenvalues and Eigenvectors

Sum of Eigenvalues: The sum of the eigenvalues of a matrix $A$ is equal to the trace of $A$ (the sum of the diagonal elements).
$\sum_{i=1}^{n} \lambda_i = \text{tr}(A)$
Product of Eigenvalues: The product of the eigenvalues of a matrix $A$ is equal to the determinant of $A$ .
$\prod_{i=1}^{n} \lambda_i = \det(A)$
Eigenvectors of Different Eigenvalues: Eigenvectors corresponding to distinct eigenvalues are linearly independent.
Diagonalizability: A matrix $A$ is diagonalizable if it has $n$ linearly independent eigenvectors. In such cases, $A$ can be written as:
$A = PDP^{-1}$
where $P$ is the matrix of eigenvectors and $D$ is the diagonal matrix of eigenvalues.
Similarity Transformation: If $A$ and $B$ are similar matrices, they have the same eigenvalues.
Power of a Matrix: If $A$ is diagonalizable, then $A^{k}$ can be expressed as:
$A^k = PD^kP^{-1}$
where $D$ is the diagonal matrix with eigenvalues of $A$ on the diagonal, and $D^k$ is simply raising each diagonal entry to the $k$ -th power.

Applications:

Differential Equations: Eigenvalues and eigenvectors are used to solve systems of linear differential equations.
Stability Analysis: In control theory, the stability of a system can be analyzed using the eigenvalues of the system matrix.
Principal Component Analysis (PCA): In statistics, PCA uses eigenvalues and eigenvectors of the covariance matrix to reduce the dimensionality of data.
Quantum Mechanics: Eigenvalues and eigenvectors are used to solve the Schrödinger equation.
Vibration Analysis: In mechanical engineering, the natural frequencies (eigenvalues) and mode shapes (eigenvectors) of a system are analyzed.

VII. Diagonalization of Matrices.

Diagonalization of matrices is a process by which a given square matrix $A$ is decomposed into a product of three matrices: $A = PDP^{-1}$ , where $P$ is an invertible matrix whose columns are the eigenvectors of $A$ , $D$ is a diagonal matrix whose entries are the eigenvalues of $A$ , and $P^{-1}$ is the inverse of $P$ .

Here’s a step-by-step guide to diagonalizing a matrix:

Find the Eigenvalues of $A$ :
- Solve the characteristic equation $\det (A - λ I) = 0$ for $λ$ . The solutions $λ_{1}, λ_{2}, \dots, λ_{n}$ are the eigenvalues of $A$ .
Find the Eigenvectors of $A$ :
- For each eigenvalue $λ$ , solve the equation $(A - λ I) v = 0$ for the eigenvector $v$ .
Form the Matrix $P$ :
- Construct the matrix $P$ using the eigenvectors as columns.
Form the Diagonal Matrix $D$ :
- Construct the matrix $D$ by placing the eigenvalues $λ_{i}$ on the diagonal. The order of the eigenvalues in $D$ should correspond to the order of the eigenvectors in $P$ .
Verify the Diagonalization:
- Verify that $A = P D P^{- 1}$ by calculating both sides of the equation.

VIII. Singular Value Decomposition (SVD).

For a given $m \times$ matrix $A$ , the SVD is given by:

$A = U Σ V^{T}$

where:

$U$ is an $m \times m$ orthogonal matrix whose columns are called the left singular vectors of $A$ .
$Σ$ is an $m \times n$ diagonal matrix with non-negative real numbers on the diagonal, known as the singular values of $A$ .
$V$ is an $n \times n$ orthogonal matrix whose columns are called the right singular vectors of $A$ .

Steps to Compute the SVD

Compute $A^{T} A$ and $A A^{T}$ :
- These are both symmetric matrices.
Find the eigenvalues and eigenvectors:
- Compute the eigenvalues and eigenvectors of $A^{T} A$ . The eigenvalues are the squares of the singular values of $A$ .
- Compute the eigenvalues and eigenvectors of $A A^{T}$ . The eigenvectors form the columns of $U$ and $V$ .
Construct $Σ$ :
- The singular values $σ_{i}$ are the square roots of the eigenvalues of $A^{T} A$ .
Form the matrices $U$ and $V$ :
- The columns of $U$ are the eigenvectors of $A A^{T}$ .
- The columns of $V$ are the eigenvectors of $A^{T} A$ .

Applications of SVD in Machine Learning:

1. Reducing Dimensions in Data (in PCA - Principal Component Analysis)

2. Compressing Images

3. Removing Noise

4. Making Recommendation systems (like Netflix or Amazon)

5. Understanding Text in Latent Semantic Analysis (LSA)

6. Designing Control Systems like robots or aircraft

7. Face Recognition

8. Analyzing Genetic Datasets

9. Compressing Data for Communication

10. Preventing Overfitting in ML models.

11. Quantum Computing

$k \mathbf{v} = k \begin{pmatrix} v_1 \\ v_2 \end{pmatrix} = \begin{pmatrix} k v_1 \\ k v_2 \end{pmatri$

technotes.

Search This Blog