Mathematics for Artificial Intelligence: Discrete Mathematics

A simplified guide on how to prep up on Mathematics for Artificial Intelligence, Machine Learning and Data Science: Discrete Mathematics (Important Pointers only)

Module V : Discrete Mathematics.

I. Graphs and their Properties.

A graph $G$ is defined as $G = (V, E)$ , where $V$ is a set of nodes (also called vertices) and $E$ is a set of edges (also called links) connecting pairs of nodes.

Important Concepts:

1. Nodes (Vertices) $V$ :

Degree: The degree of a node is the number of edges connected to it.
- In-degree: In directed graphs, the in-degree of a node is the number of edges directed towards it.
- Out-degree: The out-degree of a node is the number of edges directed away from it.

2. Edges $E$ :

An edge $e$ is a pair $(u, v)$ where $u$ and $v$ are nodes in $V$ .
Types of edges:
- Directed edge: An edge with a direction, represented as $(u \to v)$ .
- Undirected edge: An edge without a direction, represented as $\{u, v\}$ .
Weight: A value assigned to an edge, representing cost, distance, or any other metric.

3. Graph Types:

Undirected Graph: All edges are undirected.
Directed Graph (Digraph): All edges have a direction.
Weighted Graph: Edges have weights.
Unweighted Graph: Edges do not have weights.
Simple Graph: No loops (edges connecting a node to itself) and no multiple edges between two nodes.
Multigraph: Multiple edges between the same set of nodes are allowed.
Complete Graph: Every pair of distinct nodes is connected by a unique edge.
Bipartite Graph: Nodes can be divided into two disjoint sets such that every edge connects a node in one set to a node in the other.

4. Special Types of Graphs:

Tree: A connected, acyclic undirected graph.
Forest: A disjoint set of trees.
Cycle: A graph that consists of a single cycle.
Clique: A subset of vertices such that every two distinct vertices are adjacent.

5. Connectivity:

A graph is connected if there is a path between any pair of nodes.
A graph is strongly connected (in directed graphs) if there is a directed path from any node to every other node.

6. Paths and Cycles:

Path: A sequence of edges that connect a sequence of vertices.
Cycle: A path that starts and ends at the same vertex without repeating any edges or vertices.

7. Subgraph:

A graph formed from a subset of the nodes and edges of another graph.

8. Graph Representations:

Adjacency Matrix: A 2D array where $A[i][j]$ represents the presence (and possibly the weight) of an edge between nodes $i$ and $j$ .
Adjacency List: An array or list where each element is a list of nodes adjacent to a given node.

II. Graph Traversal Algorithms.

These are techniques for visiting all the nodes and edges of a graph.

1. Breadth-First Search (BFS)

An algorithm which starts at a given node (often called the 'source' or 'root' node) and explores all of the neighbor nodes at the present depth prior to moving on to nodes at the next depth level.

Algorithm Flow:

Initialize: Start from the source node. Create a queue and enqueue the source node. Also, create a set or array to keep track of visited nodes.

Dequeue and Process: Dequeue a node from the queue. Process the node (e.g., print it, check for a condition, etc.).

Enqueue Neighbors: Enqueue all unvisited neighbor nodes of the dequeued node and mark them as visited.

Repeat: Repeat steps 2 and 3 until the queue is empty.

Sample Code : (in python)

def bfs(graph, start):
    visited = set()
    queue = []

    queue.append(start)
    visited.add(start)

    while queue:
        node = queue.pop(0)
        process(node) # Replace with actual processing logic

        for neighbor in graph[node]:
            if neighbor not in visited:
                queue.append(neighbor)
                visited.add(neighbor)

Important Terms:

Time Complexity:

O(V + E)

, where

V

is the number of vertices and

E

is the number of edges.

Space Complexity:

O(V)

for the queue and visited set/array.

2. Depth-First Search (DFS)

An algorithm which starts at the root (or an arbitrary node in the case of a graph) and explores as far as possible along each branch before backtracking.

Algorithm Flow :

Initialize: Start from the source node. Create a stack (or use recursion) and push the source node. Also, create a set or array to keep track of visited nodes.

Push and Process: Push a node onto the stack. Process the node (e.g., print it, check for a condition, etc.).

Explore Neighbors: For each unvisited neighbor, push it onto the stack and mark it as visited.

Backtrack: If no unvisited neighbors are left, backtrack by popping nodes from the stack.

Repeat: Repeat steps 2 to 4 until the stack is empty.

Important Terms :

Time Complexity:

O(V + E)

Space Complexity:

O(V)

for the stack and visited set/array. In the case of recursion, the call stack might also consume

O(V)

space.

Sample Code : (in python) --> Using recursion

def dfs_recursive(graph, node, visited):
    visited.add(node)
    process(node) # Replace with actual processing logic

    for neighbor in graph[node]:
        if neighbor not in visited:
            dfs_recursive(graph, neighbor, visited)

# To call the recursive DFS
visited = set()
dfs_recursive(graph, start_node, visited)

III. Shortest Path Algorithms.

To find the shortest path between nodes in a graph.

1. Dijkstra's Algorithm

To find the shortest path from a single source node to all other nodes in a graph with non-negative edge weights.

Algorithm Flow:

Initialize:
- Set the distance to the source node to 0 and the distance to all other nodes to infinity.
- Create a priority queue (or min-heap) and insert the source node with a distance of 0.
- Create a set to track visited nodes.
Process Nodes:
- Extract the node with the smallest distance from the priority queue.
- For each neighbor of the extracted node, calculate the tentative distance through the extracted node. If this distance is smaller than the current known distance to the neighbor, update the distance and insert the neighbor into the priority queue.
- Mark the extracted node as visited.
Repeat: Repeat step 2 until the priority queue is empty.

Important Features:

Time Complexity: $O(V^2)$ for the simple implementation, $O((V + E) \log V)$ with a priority queue.
Space Complexity: $O(V)$ for storing distances and the priority queue.
Limitations: Works only with graphs that have non-negative edge weights.

Sample Code : (in python)

import heapq

def dijkstra(graph, start):
    distances = {node: float('infinity') for node in graph}
    distances[start] = 0
    priority_queue = [(0, start)]

    while priority_queue:
        current_distance, current_node = heapq.heappop(priority_queue)

        if current_distance > distances[current_node]:
            continue

        for neighbor, weight in graph[current_node]:
            distance = current_distance + weight

            if distance < distances[neighbor]:
                distances[neighbor] = distance
                heapq.heappush(priority_queue, (distance, neighbor))

    return distances

2. Bellman-Ford Algorithm

To find the shortest path from a single source node to all other nodes in a graph, even with negative edge weights. It can also detect negative weight cycles.

Algorithm Flow :

Initialize:
- Set the distance to the source node to 0 and the distance to all other nodes to infinity.
Relax Edges:
- For each edge, if the distance to the destination node can be shortened by taking the edge, update the distance.
- Repeat this process $V - 1$ times, where $V$ is the number of vertices.
Check for Negative Weight Cycles:
- Check all edges again to see if any distance can be further shortened. If so, a negative weight cycle exists.

Important Features :

Time Complexity: $O(V \times E)$ .
Space Complexity: $O(V)$ for storing distances.
Limitations: Slower compared to Dijkstra's algorithm for graphs with non-negative weights.

Sample Code : (in python)

def bellman_ford(graph, start):
    distances = {node: float('infinity') for node in graph}
    distances[start] = 0

    for _ in range(len(graph) - 1):
        for node in graph:
            for neighbor, weight in graph[node]:
                if distances[node] + weight < distances[neighbor]:
                    distances[neighbor] = distances[node] + weight

    for node in graph:
        for neighbor, weight in graph[node]:
            if distances[node] + weight < distances[neighbor]:
                raise ValueError("Graph contains a negative weight cycle")

    return distances

IV. Minimum Spanning Tree Algorithms. (MST)

To find a subset of edges in a connected, undirected graph that connects all the vertices together without any cycles and with the minimum possible total edge weight.

1. Prim's Algorithm

It builds the MST by starting with an arbitrary node and growing the MST one edge at a time by adding the cheapest possible edge that expands the tree.

Algorithm Flow:

Initialize:
- Start with an arbitrary node.
- Initialize a priority queue to keep track of the edges with the smallest weights.
- Create a set to keep track of nodes that are included in the MST.
Expand MST:
- Add the starting node to the MST set.
- While the priority queue is not empty and the MST set does not include all nodes:
  - Extract the edge with the smallest weight from the priority queue.
  - If the edge connects a node not already in the MST set, add it to the MST set and add all edges from this node to the priority queue.
Repeat: Repeat until all nodes are included in the MST set.

Important Features:

Time Complexity: $O(E \log V)$ with a priority queue (binary heap).
Space Complexity: $O(V + E)$ for the priority queue and MST set.
Works Well: For dense graphs due to its priority queue implementation.

Sample Code : (in python)

import heapq

def prim(graph, start):
    mst = []
    visited = set()
    min_heap = [(0, start, None)] # (weight, node, parent)

    while min_heap:
        weight, node, parent = heapq.heappop(min_heap)

        if node in visited:
            continue

        visited.add(node)

        if parent is not None:
            mst.append((parent, node, weight))

        for neighbor, edge_weight in graph[node]:
            if neighbor not in visited:
                heapq.heappush(min_heap, (edge_weight, neighbor, node))

    return mst

2. Kruskal's Algorithm

It builds the MST by sorting all edges and adding them one by one to the MST set, ensuring no cycles are formed. It uses the Union-Find data structure to efficiently manage and merge disjoint sets.

Algorithm Flow:

Sort Edges: Sort all edges in the graph by their weight in non-decreasing order.
Initialize Union-Find: Initialize a Union-Find data structure to keep track of connected components.
Add Edges: Iterate through the sorted edges and for each edge:
- If the edge connects two nodes that are not already connected (i.e., they belong to different sets), add the edge to the MST and union the sets of the two nodes.
Repeat: Repeat until the MST contains $V - 1$ edges (where $V$ is the number of vertices).

Important Features:

Time Complexity: $O(E \log E)$ or $O(E \log V)$ with Union-Find.
Space Complexity: $O(V + E)$ for the Union-Find data structure and the MST set.
Works Well: For sparse graphs as it processes edges individually.

Sample Code : (in python)

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, u):
        if self.parent[u] != u:
            self.parent[u] = self.find(self.parent[u])
        return self.parent[u]

    def union(self, u, v):
        root_u = self.find(u)
        root_v = self.find(v)
        if root_u != root_v:
            if self.rank[root_u] > self.rank[root_v]:
                self.parent[root_v] = root_u
            elif self.rank[root_u] < self.rank[root_v]:
                self.parent[root_u] = root_v
            else:
                self.parent[root_v] = root_u
                self.rank[root_u] += 1

def kruskal(graph):
    edges = sorted(graph['edges'], key=lambda edge: edge[2]) # Sort by weight
    uf = UnionFind(graph['num_vertices'])
    mst = []

    for u, v, weight in edges:
        if uf.find(u) != uf.find(v):
            uf.union(u, v)
            mst.append((u, v, weight))

    return mst

V. Combinatorics.

The branch of mathematics focused on counting, arrangement, and combination of objects. Two fundamental concepts in combinatorics are permutations and combinations.

1. Permutations

Permutations are arrangements of objects in a specific order. The order of objects is significant in permutations.

The number of permutations of $n$ distinct objects is $n!$ (n factorial).

P(n) = n! = n \times (n-1) \times (n-2) \times \ldots \times 1

Permutations of $n$ objects taken $r$ at a time: When choosing

r

objects from

n

distinct objects and arranging them, the number of permutations is given by:

P(n, r) = \frac{n!}{(n-r)!}

Combinations

Combinations are selections of objects where the order does not matter.

The number of combinations of

n

distinct objects taken

r

at a time is given by:

C(n, r) = \binom{n}{r} = \frac{n!}{r!(n-r)!}

Permutations: Used when order matters.

P(n, r) = \frac{n!}{(n-r)!}

Combinations: Used when order does not matter.

C (n, r) = (\binom{n}{r}) = \frac{n!}{r! (n - r)!}

VI. Pigeonhole Principle.

This combines concepts of combinatorics and mathematics. It states that if you have more pigeons than pigeonholes and you try to place each pigeon into a hole, at least one pigeonhole must contain more than one pigeon.

Statement

If $n$ items are put into $m$ containers, with $n > m$ , then at least one container must contain more than one item.

The Pigeonhole Principle can be used in a variety of problems in mathematics and computer science, such as proving the existence of certain properties, solving problems in number theory, and addressing issues in data structures.

Proof by Contradiction Using Pigeonhole Principle

Eg: Prove that among any 51 positive integers not exceeding 100, there are two that are relatively prime (i.e., their greatest common divisor is 1).

Proof:
Consider the set of integers from 1 to 100.
Group these integers into pairs: (1,2), (3,4), (5,6), ..., (99,100). There are 50 such pairs.
If we select 51 integers from these 100 integers, by the Pigeonhole Principle, at least one pair must be selected.
Each pair (a, a+1) consists of two consecutive integers. Consecutive integers are always relatively prime.
Therefore, in any selection of 51 integers from 1 to 100, there must be at least one pair of consecutive integers, which are relatively prime.

VII. Recurrence Relations.

A recurrence relation is an equation that recursively defines a sequence, meaning each term of the sequence is defined as a function of its preceding terms.

Types of Recurrence Relations

Linear Recurrence Relations: A recurrence relation is linear if each term is a linear combination of its previous terms.
Example: $a_n = c_1 a_{n-1} + c_2 a_{n-2} + \ldots + c_k a_{n-k} + f(n)$
Homogeneous: $a_{n} = c_{1} a_{n - 1} + c_{2} a_{n - 2} + \dots + c_{k} a_{n - k}$
Non-Homogeneous: $a_n = c_1 a_{n-1} + c_2 a_{n-2} + \ldots + c_k a_{n-k} + f(n)$
Non-Linear Recurrence Relations: The recurrence relation is non-linear if it involves non-linear combinations of previous terms.
Example: $a_{n} = a_{n - 1}^{2} + a_{n - 2}$

Eg: Solving Recurrence Relations

Geometric Sequence:

Recurrence Relation:

a_n = r a_{n-1}

with initial condition

a_{0} = a

Solution:

a_{n} = a \cdot r^{n}

Solving Recurrence Relations Using Generating Functions

Generating functions are a powerful tool for solving recurrence relations, especially when the characteristic equation method is difficult to apply.

Define the Generating Function:
$A(x) = \sum_{n=0}^{\infty} a_n x^n$
Transform the Recurrence Relation:
Multiply the recurrence relation by $x^n$ and sum over all $n$ .
Solve for the Generating Function:
Rearrange terms to find $A(x)$ .
Extract Coefficients:
Use partial fractions or other techniques to find the coefficients of $x^n$ .

Example: Solve $a_n = a_{n-1} + 2^n$ with $a_0 = 0$ .

Define the generating function: $A (x) = \sum_{n = 0}^{\infty} a_{n} x^{n}$
Transform the recurrence relation:
$A(x) - a_0 = x (A(x) - a_0) + \sum_{n=0}^{\infty} 2^n x^n$
Since $a_{0} = 0$ , this simplifies to:
$A(x) = x A(x) + \sum_{n=0}^{\infty} (2x)^n = x A(x) + \frac{1}{1-2x}$
Solve for $A(x)$ :
$A(x) - x A(x) = \frac{1}{1-2x}$ $A(x) (1 - x) = \frac{1}{1-2x}$ $A(x) = \frac{1}{(1-x)(1-2x)}$
Use partial fractions to decompose $A(x)$ :
$A(x) = \frac{1}{(1-x)(1-2x)} = \frac{A}{1-x} + \frac{B}{1-2x}$
Solving for $A$ and $B$ :
$1 = A(1-2x) + B(1-x)$
Setting $x = 0$ :
$1 = A \Rightarrow A = 1$
Setting $x = 1$ :
$1 = B(1 - 1) + 1(-1) \Rightarrow B = 1$
Thus:
$A(x) = \frac{1}{1-x} + \frac{1}{1-2x}$
The generating function $A(x)$ corresponds to:
$A(x) = \sum_{n=0}^{\infty} x^n + \sum_{n=0}^{\infty} (2x)^n$
Extract coefficients:
$a_n = 1 + 2^n$

So, the solution to the recurrence relation $a_n = a_{n-1} + 2^n$ with $a_0 = 0$ is $a_n = 1 + 2^n$ .

technotes.

Search This Blog