Other Key Data Structures

This chapter delves into two fundamental data structures: hash tables and graph representations. A thorough understanding of their principles, implementation details, and complexity analysis is critical for designing efficient algorithms and is a recurring focus in CMI examinations.

---

Chapter Contents

| Topic |

|---|-------| | 1 | Hash Tables | | 2 | Graph Representations |

---

We begin with Hash Tables.

Part 1: Hash Tables

Hash tables are fundamental data structures that store key-value pairs, providing efficient average-case performance for insertion, deletion, and search operations. We use hash functions to map keys to indices within an array, enabling direct access.

---

Core Concepts

1. Hash Functions

A hash function $h(k)$ maps a key $k$ to an index in a hash table. A good hash function is deterministic, efficiently computable, and distributes keys uniformly across the table to minimize collisions.

📐 Division Method

h(k) = k \pmod m

Where:

k

= key,

m

= size of the hash table. When to use:

m

should be a prime number not too close to a power of 2 or 10.

Worked Example: Division Method

Consider a hash table of size $m=11$ . We want to insert keys $5, 28, 19, 15, 20$ .

Step 1: Calculate hash for key $5$ .

h(5) = 5 \pmod{11} = 5

Step 2: Calculate hash for key $28$ .

h(28) = 28 \pmod{11} = 6

Step 3: Calculate hash for key $19$ .

h(19) = 19 \pmod{11} = 8

Step 4: Calculate hash for key $15$ .

h(15) = 15 \pmod{11} = 4

Step 5: Calculate hash for key $20$ .

h(20) = 20 \pmod{11} = 9

Answer: The keys are mapped to indices $5, 6, 8, 4, 9$ respectively.

:::question type="MCQ" question="Given a hash table of size $m=7$ , which of the following keys will collide with key $25$ using the division method hash function $h(k) = k \pmod m$ ?" options=[" $18$ "," $32$ "," $40$ "," $51$ "] answer=" $32$ " hint="Calculate $h(k)$ for each option and compare with $h(25)$ ." solution="Step 1: Calculate the hash for the given key $25$ .
>

h(25) = 25 \pmod 7 = 4

Step 2: Calculate the hash for each option.
>

h(18) = 18 \pmod 7 = 4

h(32) = 32 \pmod 7 = 4

h(40) = 40 \pmod 7 = 5

h(51) = 51 \pmod 7 = 2

Step 3: Identify the key that hashes to the same index as

25

. Both

18

and

32

collide with

25

. However, usually, a multiple-choice question has only one correct option (or it's an MSQ). If only one option is expected,

32

is a valid answer. If multiple are correct, the question type should be MSQ. Assuming a single best fit or a common error in question setting,

32

is a direct collision. In a strictly constructed MCQ, only one option would match. Since both

18

and

32

collide, the question might be flawed or imply selecting any one. If forced to pick one,

32

is a valid collision. For CMI, we assume perfect questions. Let's re-evaluate. If

18

is an option, it also collides. Let's assume the question expects a key that collides.
Corrected options to ensure only one collision:
Options: ["

18

", "

32

", "

40

", "

51

h(18) = 4

h(32) = 4

This question is poorly formed for an MCQ if two options collide. Let's pick one for the answer and assume the question setters intended only one. Let's choose

32

. If it were an MSQ, both

18, 32

would be correct. I will ensure my future questions have a single correct answer for MCQ.

Let's assume the options were: [" $19$ ", " $32$ ", " $40$ ", " $51$ "]. Then only $32$ would collide.
Given the current options, both $18$ and $32$ collide. For the purpose of providing a single answer for MCQ, I must pick one. I will pick $32$ .

Corrected Answer based on initial options: Both $18$ and $32$ collide. If this was an MSQ, both would be correct. Assuming the question intends a single answer, there might be an issue. But if we must pick one, let's pick $32$ .
Let's refine the options to make it a clear MCQ:
Options: [" $19$ ", " $32$ ", " $40$ ", " $51$ "]
Then $h(19) = 19 \pmod 7 = 5$ .
$h(32) = 32 \pmod 7 = 4$ .
$h(40) = 40 \pmod 7 = 5$ .
$h(51) = 51 \pmod 7 = 2$ .
In this refined set, only $32$ collides. I will proceed with this refined understanding.
Original options: [" $18$ "," $32$ "," $40$ "," $51$ "]
$h(18) = 4$ . So $18$ collides.
$h(32) = 4$ . So $32$ collides.
This is an MSQ. I need to ensure my questions are single-answer for MCQ. I will fix the options.
Let's change the question options to ensure only one correct answer for an MCQ.
Original Question: "Given a hash table of size $m=7$ , which of the following keys will collide with key $25$ using the division method hash function $h(k) = k \pmod m$ ?"
Original Options: [" $18$ "," $32$ "," $40$ "," $51$ "]
$h(25) = 4$ .
$h(18) = 4$ .
$h(32) = 4$ .
This is an issue for MCQ. I need to fix the options.
Let's make one of them unique. I will change $18$ to $19$ .
New Options: [" $19$ "," $32$ "," $40$ "," $51$ "]
Now:
$h(19) = 19 \pmod 7 = 5$ .
$h(32) = 32 \pmod 7 = 4$ . (Collides with $25$ )
$h(40) = 40 \pmod 7 = 5$ .
$h(51) = 51 \pmod 7 = 2$ .
So, only $32$ collides. This is a valid MCQ. I will use this corrected version.

Step 1: Calculate the hash for the given key $25$ .
>

h(25) = 25 \pmod 7 = 4

Step 2: Calculate the hash for each option.
>

h(19) = 19 \pmod 7 = 5

h(32) = 32 \pmod 7 = 4

h(40) = 40 \pmod 7 = 5

h(51) = 51 \pmod 7 = 2

Step 3: Identify the key that hashes to the same index as

25

. Key

32

hashes to index

4

, which is the same as

25

.
"
:::

📐 Multiplication Method

h(k) = \lfloor m(kA \pmod 1) \rfloor

Where:

k

= key,

m

= size of the hash table,

A

= a constant in range

0 < A < 1

. When to use: A common choice for

A

\frac{\sqrt{5}-1}{2} \approx 0.6180339887

Worked Example: Multiplication Method

Consider a hash table of size $m=1000$ . We want to insert key $12345$ . Use $A = 0.618$ .

Step 1: Calculate $kA$ .

kA = 12345 \times 0.618 = 7620.03

Step 2: Calculate $kA \pmod 1$ . This is the fractional part.

kA \pmod 1 = 0.03

Step 3: Multiply by $m$ and take the floor.

h(12345) = \lfloor 1000 \times 0.03 \rfloor = \lfloor 30 \rfloor = 30

Answer: The key $12345$ is mapped to index $30$ .

:::question type="NAT" question="Using the multiplication method with table size $m=100$ and constant $A=0.57$ , what is the hash index for the key $k=200$ ?" answer="14" hint="Follow the formula: $h(k) = \lfloor m(kA \pmod 1) \rfloor$ . Calculate $kA$ , then its fractional part, then multiply by $m$ and take the floor." solution="Step 1: Calculate $kA$ .
>

kA = 200 \times 0.57 = 114

Step 2: Calculate

kA \pmod 1

.
>

kA \pmod 1 = 114 \pmod 1 = 0

Step 3: Multiply by

m

and take the floor.
>

h(200) = \lfloor 100 \times 0 \rfloor = \lfloor 0 \rfloor = 0

Wait, this is a bad example. If

kA

is an integer,

kA \pmod 1

0

. Let me adjust the key or

A

.
Let

k=201

and

A=0.57

.
Step 1: Calculate

kA

.
>

kA = 201 \times 0.57 = 114.57

Step 2: Calculate

kA \pmod 1

.
>

kA \pmod 1 = 114.57 \pmod 1 = 0.57

Step 3: Multiply by

m

and take the floor.
>

h(201) = \lfloor 100 \times 0.57 \rfloor = \lfloor 57 \rfloor = 57

This works better. I will use

k=201

Revised question:
:::question type="NAT" question="Using the multiplication method with table size $m=100$ and constant $A=0.57$ , what is the hash index for the key $k=201$ ?" answer="57" hint="Follow the formula: $h(k) = \lfloor m(kA \pmod 1) \rfloor$ . Calculate $kA$ , then its fractional part, then multiply by $m$ and take the floor." solution="Step 1: Calculate $kA$ .
>

kA = 201 \times 0.57 = 114.57

Step 2: Calculate

kA \pmod 1

. This is the fractional part of

kA

.
>

kA \pmod 1 = 114.57 - \lfloor 114.57 \rfloor = 114.57 - 114 = 0.57

Step 3: Multiply by

m

and take the floor.
>

h(201) = \lfloor 100 \times 0.57 \rfloor = \lfloor 57 \rfloor = 57

"
:::

---

2. Collision Resolution

When two different keys hash to the same index, a collision occurs. Collision resolution strategies determine how to handle these situations.

2.1. Separate Chaining

Each slot in the hash table points to a linked list (or other dynamic data structure) that contains all keys hashing to that slot.

Worked Example: Separate Chaining

Insert keys $5, 28, 19, 15, 20, 33$ into a hash table of size $m=7$ using the division method $h(k) = k \pmod 7$ .

Step 1: Insert $5$ . $h(5) = 5 \pmod 7 = 5$ .
Table:
Index 0:
Index 1:
Index 2:
Index 3:
Index 4:
Index 5: $5 \rightarrow \operatorname{null}$
Index 6:

Step 2: Insert $28$ . $h(28) = 28 \pmod 7 = 0$ .
Table:
Index 0: $28 \rightarrow \operatorname{null}$
Index 1:
Index 2:
Index 3:
Index 4:
Index 5: $5 \rightarrow \operatorname{null}$
Index 6:

Step 3: Insert $19$ . $h(19) = 19 \pmod 7 = 5$ . Collision at index $5$ . Add to linked list.
Table:
Index 0: $28 \rightarrow \operatorname{null}$
Index 1:
Index 2:
Index 3:
Index 4:
Index 5: $5 \rightarrow 19 \rightarrow \operatorname{null}$
Index 6:

Step 4: Insert $15$ . $h(15) = 15 \pmod 7 = 1$ .
Table:
Index 0: $28 \rightarrow \operatorname{null}$
Index 1: $15 \rightarrow \operatorname{null}$
Index 2:
Index 3:
Index 4:
Index 5: $5 \rightarrow 19 \rightarrow \operatorname{null}$
Index 6:

Step 5: Insert $20$ . $h(20) = 20 \pmod 7 = 6$ .
Table:
Index 0: $28 \rightarrow \operatorname{null}$
Index 1: $15 \rightarrow \operatorname{null}$
Index 2:
Index 3:
Index 4:
Index 5: $5 \rightarrow 19 \rightarrow \operatorname{null}$
Index 6: $20 \rightarrow \operatorname{null}$

Step 6: Insert $33$ . $h(33) = 33 \pmod 7 = 5$ . Collision at index $5$ . Add to linked list.
Table:
Index 0: $28 \rightarrow \operatorname{null}$
Index 1: $15 \rightarrow \operatorname{null}$
Index 2:
Index 3:
Index 4:
Index 5: $5 \rightarrow 19 \rightarrow 33 \rightarrow \operatorname{null}$
Index 6: $20 \rightarrow \operatorname{null}$

Answer: The final state of the hash table with separate chaining.

:::question type="MCQ" question="A hash table of size $m=5$ uses separate chaining and the hash function $h(k) = k \pmod 5$ . If keys $12, 7, 2, 17, 22$ are inserted in that order, which bucket will contain the most elements?" options=["Bucket 0","Bucket 1","Bucket 2","Bucket 3"] answer="Bucket 2" hint="Calculate $k \pmod 5$ for each key and observe which index accumulates the most." solution="Step 1: Calculate hash values for each key:
>

h(12) = 12 \pmod 5 = 2

h(7) = 7 \pmod 5 = 2

h(2) = 2 \pmod 5 = 2

h(17) = 17 \pmod 5 = 2

h(22) = 22 \pmod 5 = 2

Step 2: All keys hash to Bucket 2.
Step 3: Therefore, Bucket 2 will contain all 5 elements, which is the most.
"
:::

---

2.2. Open Addressing

All elements are stored directly in the hash table itself. When a collision occurs, we systematically probe for an empty slot. The probe sequence is defined by a probe function $h(k, i)$ , where $i$ is the probe number (starting from $0$ ).

📐 General Open Addressing Probe Function

h(k, i) = (h'(k) + f(i)) \pmod m

Where:

h'(k)

= auxiliary hash function,

f(i)

= probe sequence function,

i

= probe number (

0, 1, 2, \ldots, m-1

2.2.1. Linear Probing

The probe sequence is linear: $f(i) = i$ . This leads to primary clustering.

📐 Linear Probing

h(k, i) = (h'(k) + i) \pmod m

Where:

h'(k)

= auxiliary hash function (e.g., division method),

i = 0, 1, 2, \ldots, m-1

Worked Example: Linear Probing

Insert keys $5, 28, 19, 15, 20$ into a hash table of size $m=7$ using $h'(k) = k \pmod 7$ and linear probing.

Step 1: Insert $5$ . $h(5, 0) = (5 \pmod 7 + 0) \pmod 7 = 5$ .
Table: $[-, -, -, -, -, \mathbf{5}, -]$

Step 2: Insert $28$ . $h(28, 0) = (28 \pmod 7 + 0) \pmod 7 = 0$ .
Table: $[\mathbf{28}, -, -, -, -, \mathbf{5}, -]$

Step 3: Insert $19$ . $h(19, 0) = (19 \pmod 7 + 0) \pmod 7 = 5$ . Collision at index $5$ .
Try $i=1$ : $h(19, 1) = (5 + 1) \pmod 7 = 6$ . Slot $6$ is empty.
Table: $[\mathbf{28}, -, -, -, -, \mathbf{5}, \mathbf{19}]$

Step 4: Insert $15$ . $h(15, 0) = (15 \pmod 7 + 0) \pmod 7 = 1$ . Slot $1$ is empty.
Table: $[\mathbf{28}, \mathbf{15}, -, -, -, \mathbf{5}, \mathbf{19}]$

Step 5: Insert $20$ . $h(20, 0) = (20 \pmod 7 + 0) \pmod 7 = 6$ . Collision at index $6$ .
Try $i=1$ : $h(20, 1) = (6 + 1) \pmod 7 = 0$ . Collision at index $0$ .
Try $i=2$ : $h(20, 2) = (6 + 2) \pmod 7 = 1$ . Collision at index $1$ .
Try $i=3$ : $h(20, 3) = (6 + 3) \pmod 7 = 2$ . Slot $2$ is empty.
Table: $[\mathbf{28}, \mathbf{15}, \mathbf{20}, -, -, \mathbf{5}, \mathbf{19}]$

Answer: The final state of the hash table using linear probing.

:::question type="MCQ" question="A hash table of size $m=10$ uses linear probing and the hash function $h'(k) = k \pmod{10}$ . If keys $13, 23, 33, 43$ are inserted in that order, what is the final position of key $43$ ?" options=["3","4","5","6"] answer="6" hint="Trace each insertion step. When a collision occurs, increment the probe counter $i$ and re-calculate the index using $h(k, i) = (h'(k) + i) \pmod{10}$ ." solution="Step 1: Initialize an empty table of size 10.
>

[-, -, -, -, -, -, -, -, -, -]

Step 2: Insert

13

h'(13) = 13 \pmod{10} = 3

. Position

3

is empty.
>

[-, -, -, \mathbf{13}, -, -, -, -, -, -]

Step 3: Insert

23

h'(23) = 23 \pmod{10} = 3

. Position

3

is occupied.
Probe

i=1

h(23, 1) = (3 + 1) \pmod{10} = 4

. Position

4

is empty.
>

[-, -, -, \mathbf{13}, \mathbf{23}, -, -, -, -, -]

Step 4: Insert

33

h'(33) = 33 \pmod{10} = 3

. Position

3

is occupied.
Probe

i=1

h(33, 1) = (3 + 1) \pmod{10} = 4

. Position

4

is occupied.
Probe

i=2

h(33, 2) = (3 + 2) \pmod{10} = 5

. Position

5

is empty.
>

[-, -, -, \mathbf{13}, \mathbf{23}, \mathbf{33}, -, -, -, -]

Step 5: Insert

43

h'(43) = 43 \pmod{10} = 3

. Position

3

is occupied.
Probe

i=1

h(43, 1) = (3 + 1) \pmod{10} = 4

. Position

4

is occupied.
Probe

i=2

h(43, 2) = (3 + 2) \pmod{10} = 5

. Position

5

is occupied.
Probe

i=3

h(43, 3) = (3 + 3) \pmod{10} = 6

. Position

6

is empty.
>

[-, -, -, \mathbf{13}, \mathbf{23}, \mathbf{33}, \mathbf{43}, -, -, -]

Step 6: The final position of key

43

6

.
"
:::

---

2.2.2. Quadratic Probing

The probe sequence is quadratic: $f(i) = c_1 i + c_2 i^2$ . This helps to reduce primary clustering but can suffer from secondary clustering.

📐 Quadratic Probing

h(k, i) = (h'(k) + c_1 i + c_2 i^2) \pmod m

Where:

h'(k)

= auxiliary hash function,

c_1, c_2

= positive constants,

i = 0, 1, 2, \ldots, m-1

. When to use: For

c_1=0, c_2=1/2

, the sequence is

h'(k), h'(k) + 1/2, h'(k) + 2, h'(k) + 4.5, \ldots

. Often,

c_1=0, c_2=1

is used (i.e.,

h(k,i) = (h'(k) + i^2) \pmod m

) or

c_1=1/2, c_2=1/2

. If

m

is a prime number and the load factor is less than

0.5

, quadratic probing can guarantee finding an empty slot if one exists.

Worked Example: Quadratic Probing

Insert keys $5, 28, 19, 15, 20$ into a hash table of size $m=7$ using $h'(k) = k \pmod 7$ and quadratic probing with $c_1=0, c_2=1$ (i.e., $f(i) = i^2$ ).

Step 1: Insert $5$ . $h(5, 0) = (5 \pmod 7 + 0^2) \pmod 7 = 5$ .
Table: $[-, -, -, -, -, \mathbf{5}, -]$

Step 2: Insert $28$ . $h(28, 0) = (28 \pmod 7 + 0^2) \pmod 7 = 0$ .
Table: $[\mathbf{28}, -, -, -, -, \mathbf{5}, -]$

Step 3: Insert $19$ . $h(19, 0) = (19 \pmod 7 + 0^2) \pmod 7 = 5$ . Collision at index $5$ .
Try $i=1$ : $h(19, 1) = (5 + 1^2) \pmod 7 = 6$ . Slot $6$ is empty.
Table: $[\mathbf{28}, -, -, -, -, \mathbf{5}, \mathbf{19}]$

Step 4: Insert $15$ . $h(15, 0) = (15 \pmod 7 + 0^2) \pmod 7 = 1$ . Slot $1$ is empty.
Table: $[\mathbf{28}, \mathbf{15}, -, -, -, \mathbf{5}, \mathbf{19}]$

Step 5: Insert $20$ . $h(20, 0) = (20 \pmod 7 + 0^2) \pmod 7 = 6$ . Collision at index $6$ .
Try $i=1$ : $h(20, 1) = (6 + 1^2) \pmod 7 = 0$ . Collision at index $0$ .
Try $i=2$ : $h(20, 2) = (6 + 2^2) \pmod 7 = (6+4) \pmod 7 = 10 \pmod 7 = 3$ . Slot $3$ is empty.
Table: $[\mathbf{28}, \mathbf{15}, -, \mathbf{20}, -, \mathbf{5}, \mathbf{19}]$

Answer: The final state of the hash table using quadratic probing.

:::question type="NAT" question="A hash table of size $m=11$ (prime) uses quadratic probing with $h'(k) = k \pmod{11}$ and $f(i) = i^2$ . If keys $12, 23, 34$ are inserted in that order, what is the final position of key $34$ ?" answer="5" hint="Trace each insertion. For collisions, use $h(k, i) = (h'(k) + i^2) \pmod{11}$ ." solution="Step 1: Initialize an empty table of size 11.
>

[-, -, -, -, -, -, -, -, -, -, -]

Step 2: Insert

12

h'(12) = 12 \pmod{11} = 1

. Position

1

is empty.
>

[-, \mathbf{12}, -, -, -, -, -, -, -, -, -]

Step 3: Insert

23

h'(23) = 23 \pmod{11} = 1

. Position

1

is occupied.
Probe

i=1

h(23, 1) = (1 + 1^2) \pmod{11} = 2

. Position

2

is empty.
>

[-, \mathbf{12}, \mathbf{23}, -, -, -, -, -, -, -, -]

Step 4: Insert

34

h'(34) = 34 \pmod{11} = 1

. Position

1

is occupied.
Probe

i=1

h(34, 1) = (1 + 1^2) \pmod{11} = 2

. Position

2

is occupied.
Probe

i=2

h(34, 2) = (1 + 2^2) \pmod{11} = (1 + 4) \pmod{11} = 5

. Position

5

is empty.
>

[-, \mathbf{12}, \mathbf{23}, -, -, \mathbf{34}, -, -, -, -, -]

Step 5: The final position of key

34

5

.
"
:::

---

2.2.3. Double Hashing

Uses two auxiliary hash functions, $h_1(k)$ and $h_2(k)$ , to compute the probe sequence. This effectively eliminates primary and secondary clustering.

📐 Double Hashing

h(k, i) = (h_1(k) + i \cdot h_2(k)) \pmod m

Where:

h_1(k)

= primary hash function (e.g.,

k \pmod m

h_2(k)

= secondary hash function (e.g.,

1 + (k \pmod{m-1})

where

m

is prime),

i = 0, 1, 2, \ldots, m-1

. Important:

h_2(k)

must never return

0

Worked Example: Double Hashing

Insert keys $5, 28, 19, 15, 20$ into a hash table of size $m=7$ using $h_1(k) = k \pmod 7$ and $h_2(k) = 1 + (k \pmod 5)$ .

Step 1: Insert $5$ . $h_1(5) = 5 \pmod 7 = 5$ . $h(5, 0) = (5 + 0 \cdot h_2(5)) \pmod 7 = 5$ .
Table: $[-, -, -, -, -, \mathbf{5}, -]$

Step 2: Insert $28$ . $h_1(28) = 28 \pmod 7 = 0$ . $h(28, 0) = (0 + 0 \cdot h_2(28)) \pmod 7 = 0$ .
Table: $[\mathbf{28}, -, -, -, -, \mathbf{5}, -]$

Step 3: Insert $19$ . $h_1(19) = 19 \pmod 7 = 5$ . Collision at index $5$ .
Calculate $h_2(19) = 1 + (19 \pmod 5) = 1 + 4 = 5$ .
Try $i=1$ : $h(19, 1) = (h_1(19) + 1 \cdot h_2(19)) \pmod 7 = (5 + 1 \cdot 5) \pmod 7 = 10 \pmod 7 = 3$ . Slot $3$ is empty.
Table: $[\mathbf{28}, -, -, \mathbf{19}, -, \mathbf{5}, -]$

Step 4: Insert $15$ . $h_1(15) = 15 \pmod 7 = 1$ . $h(15, 0) = (1 + 0 \cdot h_2(15)) \pmod 7 = 1$ . Slot $1$ is empty.
Table: $[\mathbf{28}, \mathbf{15}, -, \mathbf{19}, -, \mathbf{5}, -]$

Step 5: Insert $20$ . $h_1(20) = 20 \pmod 7 = 6$ . $h(20, 0) = (6 + 0 \cdot h_2(20)) \pmod 7 = 6$ . Slot $6$ is empty.
Table: $[\mathbf{28}, \mathbf{15}, -, \mathbf{19}, -, \mathbf{5}, \mathbf{20}]$

Answer: The final state of the hash table using double hashing.

:::question type="MCQ" question="A hash table of size $m=7$ uses double hashing with $h_1(k) = k \pmod 7$ and $h_2(k) = 1 + (k \pmod 3)$ . If keys $10, 17, 24$ are inserted in that order, what is the final position of key $24$ ?" options=["0","1","2","3"] answer="3" hint="Calculate $h_1(k)$ and $h_2(k)$ for each key. For collisions, use $h(k, i) = (h_1(k) + i \cdot h_2(k)) \pmod 7$ ." solution="Step 1: Initialize an empty table of size 7.
>

[-, -, -, -, -, -, -]

Step 2: Insert

10

h_1(10) = 10 \pmod 7 = 3

h(10, 0) = (3 + 0 \cdot h_2(10)) \pmod 7 = 3

. Position

3

is empty.
>

[-, -, -, \mathbf{10}, -, -, -]

Step 3: Insert

17

h_1(17) = 17 \pmod 7 = 3

. Position

3

is occupied.

h_2(17) = 1 + (17 \pmod 3) = 1 + 2 = 3

.
Probe

i=1

h(17, 1) = (3 + 1 \cdot 3) \pmod 7 = (3 + 3) \pmod 7 = 6

. Position

6

is empty.
>

[-, -, -, \mathbf{10}, -, -, \mathbf{17}]

Step 4: Insert

24

h_1(24) = 24 \pmod 7 = 3

. Position

3

is occupied.

h_2(24) = 1 + (24 \pmod 3) = 1 + 0 = 1

.
Probe

i=1

h(24, 1) = (3 + 1 \cdot 1) \pmod 7 = 4

. Position

4

is empty.
>

[-, -, -, \mathbf{10}, \mathbf{24}, -, \mathbf{17}]

Step 5: The final position of key

24

4

.
Wait, the answer is 3. Let me recheck the calculation for

24

h_1(24) = 24 \pmod 7 = 3

h_2(24) = 1 + (24 \pmod 3) = 1 + 0 = 1

.
The sequence is

h(24, i) = (3 + i \cdot 1) \pmod 7

i=0: (3 + 0 \cdot 1) \pmod 7 = 3

. (Occupied by

10

)

i=1: (3 + 1 \cdot 1) \pmod 7 = 4

. (Empty)
So

24

should go to index

4

.
The provided answer is '3'. This implies an error in my solution or the question/answer pair.
Let's assume the question meant "What is the initial hash index for key 24?" which is

h_1(24)=3

. But the question asks for the final position.

Let's check the options and my calculation. My calculation leads to 4. Option 3 is given.
Could it be that $h_2(k)$ is defined differently?
If $h_2(k)$ was, for example, $1 + (k \pmod 7)$ and $m$ was $11$ ? No.
Let's stick to my calculation. The question asks for the final position.
$h_1(24)=3$ . This is occupied by $10$ .
$h_2(24)=1+(24 \pmod 3)=1+0=1$ .
$i=0$ : $pos=(3+0\cdot 1)\pmod 7 = 3$ . Occupied.
$i=1$ : $pos=(3+1\cdot 1)\pmod 7 = 4$ . Empty.
So $24$ goes to $4$ .

If the answer is 3, then it means $24$ was inserted at its $h_1(k)$ position directly, which would only happen if that position was empty. But it's occupied by $10$ .
This suggests an error in the question or the provided answer. I will stick to my calculated answer $4$ and state it. But I must match the provided answer.
This is a critical point. I am an expert content writer, so I must ensure the provided solution matches the correct answer.
If the answer must be 3, then $10$ cannot be at 3.
Let's re-run the entire sequence assuming the answer must be 3.
If $10$ is at $0$ , and $17$ is at $1$ , and $24$ is at $3$ .
$h_1(10)=3$ . Okay, $10$ is at $3$ .
$h_1(17)=3$ . Collision. $h_2(17)=1+(17 \pmod 3)=1+2=3$ .
$i=0: (3+0\cdot 3)\pmod 7 = 3$ . Occupied.
$i=1: (3+1\cdot 3)\pmod 7 = 6$ . So $17$ is at $6$ . This is consistent.
Now $24$ .
$h_1(24)=3$ . Occupied by $10$ .
$h_2(24)=1+(24 \pmod 3)=1+0=1$ .
$i=0: (3+0\cdot 1)\pmod 7 = 3$ . Occupied by $10$ .
$i=1: (3+1\cdot 1)\pmod 7 = 4$ . Empty.
So $24$ should be at $4$ .

The provided answer '3' is incorrect based on the problem statement and standard double hashing. I will correct the answer to '4' or modify the question/options to make '3' correct.
Let's change the question slightly to make $3$ a valid answer.
If $m=5$ and $h_1(k)=k \pmod 5$ , $h_2(k)=1+(k \pmod 3)$ . Keys $10, 17, 24$ .
$h_1(10)=0$ . $10$ at $0$ .
$h_1(17)=2$ . $17$ at $2$ .
$h_1(24)=4$ . $24$ at $4$ . This still doesn't give $3$ .

Let's assume the question expects $h_1(24)$ as the final position, which means no collision for $24$ .
This would require $10$ and $17$ to hash elsewhere.
If $10$ was inserted, $h_1(10)=3$ .
If $17$ was inserted, $h_1(17)=3$ . Collision. $h_2(17)=1+(17 \pmod 3) = 1+2=3$ .
$i=0: (3+0\cdot 3)\pmod 7 = 3$ . Occupied.
$i=1: (3+1\cdot 3)\pmod 7 = 6$ . $17$ at $6$ .
Now insert $24$ . $h_1(24)=3$ . Is it occupied? Yes, by $10$ .
So $24$ cannot be at $3$ .

The only way $24$ ends up at position $3$ is if position $3$ is empty when $24$ is inserted.
This means $10$ must not be at $3$ .
If $h_1(10)$ was not $3$ , but something else. For example, if $m=11$ .
If $m=11$ , $h_1(k)=k \pmod{11}$ .
$h_1(10)=10$ . $h_1(17)=6$ . $h_1(24)=2$ . No collisions.
This means my initial answer '4' is correct for the given problem. I will provide '4' as the answer and solution.
The prompt says "answer="Exact option text"". My options are "0", "1", "2", "3". So if 4 is the answer, I need to add 4 to options. Or change the question.
I will add 4 to the options and make sure it is the correct answer.

Revised question and options:
:::question type="MCQ" question="A hash table of size $m=7$ uses double hashing with $h_1(k) = k \pmod 7$ and $h_2(k) = 1 + (k \pmod 3)$ . If keys $10, 17, 24$ are inserted in that order, what is the final position of key $24$ ?" options=["0","1","2","4"] answer="4" hint="Calculate $h_1(k)$ and $h_2(k)$ for each key. For collisions, use $h(k, i) = (h_1(k) + i \cdot h_2(k)) \pmod 7$ ." solution="Step 1: Initialize an empty table of size 7.
>

[-, -, -, -, -, -, -]

Step 2: Insert

10

h_1(10) = 10 \pmod 7 = 3

h(10, 0) = (3 + 0 \cdot h_2(10)) \pmod 7 = 3

. Position

3

is empty.
>

[-, -, -, \mathbf{10}, -, -, -]

Step 3: Insert

17

h_1(17) = 17 \pmod 7 = 3

. Position

3

is occupied.

h_2(17) = 1 + (17 \pmod 3) = 1 + 2 = 3

.
Probe

i=1

h(17, 1) = (h_1(17) + 1 \cdot h_2(17)) \pmod 7 = (3 + 1 \cdot 3) \pmod 7 = 6 \pmod 7 = 6

. Position

6

is empty.
>

[-, -, -, \mathbf{10}, -, -, \mathbf{17}]

Step 4: Insert

24

h_1(24) = 24 \pmod 7 = 3

. Position

3

is occupied.

h_2(24) = 1 + (24 \pmod 3) = 1 + 0 = 1

.
Probe

i=1

h(24, 1) = (h_1(24) + 1 \cdot h_2(24)) \pmod 7 = (3 + 1 \cdot 1) \pmod 7 = 4 \pmod 7 = 4

. Position

4

is empty.
>

[-, -, -, \mathbf{10}, \mathbf{24}, -, \mathbf{17}]

Step 5: The final position of key

24

4

.
"
:::

---

3. Load Factor

The load factor $\alpha$ of a hash table is the ratio of the number of elements $n$ to the table size $m$ . It indicates how full the hash table is and directly impacts performance.

📐 Load Factor

\alpha = \frac{n}{m}

Where:

n

= number of elements stored,

m

= total size of the hash table. When to use: Used to decide when to resize the hash table. For separate chaining,

\alpha

can be greater than 1. For open addressing,

\alpha

must be less than 1.

Worked Example: Load Factor and Resizing

A hash table of size $m=10$ currently holds $n=7$ elements. We want to insert two more elements. What is the load factor after insertion, and should the table be resized if the threshold is $0.8$ ?

Step 1: Calculate initial load factor.

\alpha_{\text{initial}} = \frac{7}{10} = 0.7

Step 2: Determine number of elements after insertion.

n_{\text{new}} = 7 + 2 = 9

Step 3: Calculate new load factor.

\alpha_{\text{new}} = \frac{9}{10} = 0.9

Step 4: Compare with threshold.
Since $\alpha_{\text{new}} = 0.9 > 0.8$ , the table should be resized.

Answer: The new load factor is $0.9$ . The table should be resized.

:::question type="NAT" question="A hash table with separate chaining has a size of $m=13$ . After inserting $20$ elements, what is the load factor? (Provide answer as a decimal rounded to two places)." answer="1.54" hint="The load factor is $n/m$ . For separate chaining, it can be greater than 1." solution="Step 1: Identify the number of elements ( $n$ ) and table size ( $m$ ).
$n = 20$
$m = 13$
Step 2: Calculate the load factor $\alpha = n/m$ .
>

\alpha = \frac{20}{13} \approx 1.53846

Step 3: Round to two decimal places.
>

\alpha \approx 1.54

"
:::

---

4. Universal Hashing

A universal family of hash functions ensures that for any two distinct keys, the probability of collision is at most $1/m$ , where $m$ is the table size. This guarantees good average-case performance even against an adversary choosing keys.

📖 Universal Hash Family

A family of hash functions $\mathcal{H}$ is universal if for any two distinct keys $x, y \in \mathcal{U}$ (universe of keys), the number of hash functions $h \in \mathcal{H}$ for which $h(x) = h(y)$ is at most $|\mathcal{H}|/m$ .

Worked Example: Constructing a Universal Hash Function

Consider a prime number $p$ larger than any possible key, and a hash table of size $m$ . A common universal family is $h_{a,b}(k) = ((ak+b) \pmod p) \pmod m$ , where $a \in \{1, \ldots, p-1\}$ and $b \in \{0, \ldots, p-1\}$ .

Step 1: Choose $p=101$ (a prime number) and $m=10$ . We want to find a hash function for key $k=50$ .
Let's arbitrarily pick $a=3, b=7$ .

Step 2: Calculate the inner modulo operation.

ak+b = 3 \times 50 + 7 = 150 + 7 = 157

(ak+b) \pmod p = 157 \pmod{101} = 56

Step 3: Calculate the outer modulo operation.

((ak+b) \pmod p) \pmod m = 56 \pmod{10} = 6

Answer: The hash function $h_{3,7}(50)$ maps key $50$ to index $6$ .

:::question type="MCQ" question="Which of the following is a key characteristic of a universal hash family?" options=["It guarantees no collisions for any set of keys.","It ensures that the worst-case lookup time is $O(1)$ .","It provides a probabilistic guarantee on collision frequency, independent of the input distribution.","It requires all keys to be prime numbers."] answer="It provides a probabilistic guarantee on collision frequency, independent of the input distribution." hint="Universal hashing aims to minimize the probability of collisions on average, not eliminate them. It provides resilience against malicious inputs." solution="Step 1: Analyze the properties of universal hashing.

Universal hashing does not guarantee no collisions; it makes collisions unlikely on average.

It does not guarantee $O(1)$ worst-case lookup time; in the worst case, all keys could still collide, leading to $O(n)$ if not handled well.

The core strength of universal hashing is that by randomly selecting a hash function from a universal family, the probability of collision for any two distinct keys is low, regardless of which keys are presented. This makes it robust against adversarial inputs.

It does not require keys to be prime numbers.

Step 2: Select the option that matches this characteristic.
Option C accurately describes the probabilistic guarantee that universal hashing offers, making it independent of the input distribution.
"
:::

---

Advanced Applications

Hash tables are widely used in various applications, including symbol tables in compilers, database indexing, caching systems, and implementing sets and maps.

Worked Example: Implementing a Frequency Counter

We want to count the frequency of words in a given text using a hash table. Assume a hash table of size $m=10$ with separate chaining and $h(word) = \sum (\text{ASCII}(char)) \pmod{10}$ .
Text: "apple banana apple orange banana"

Step 1: Initialize an empty hash table.

Step 2: Process "apple".
$h(\text{"apple"}) = (97+112+112+108+101) \pmod{10} = 530 \pmod{10} = 0$ .
Insert "apple" with count 1 into bucket 0.
Table: `[ (apple, 1) ]`

Step 3: Process "banana".
$h(\text{"banana"}) = (98+97+110+97+110+97) \pmod{10} = 609 \pmod{10} = 9$ .
Insert "banana" with count 1 into bucket 9.
Table: `[ (apple, 1), ..., (banana, 1) ]`

Step 4: Process "apple".
$h(\text{"apple"}) = 0$ . Collision at bucket 0. Search for "apple" in the list. Found. Increment count.
Table: `[ (apple, 2) ]`

Step 5: Process "orange".
$h(\text{"orange"}) = (111+114+97+110+103+101) \pmod{10} = 636 \pmod{10} = 6$ .
Insert "orange" with count 1 into bucket 6.
Table: `[ (apple, 2), ..., (orange, 1), ..., (banana, 1) ]`

Step 6: Process "banana".
$h(\text{"banana"}) = 9$ . Collision at bucket 9. Search for "banana" in the list. Found. Increment count.
Table: `[ (apple, 2), ..., (orange, 1), ..., (banana, 2) ]`

Answer: The final frequency counts are: apple: 2, banana: 2, orange: 1.

:::question type="MSQ" question="A cache system uses a hash table to store frequently accessed data. Which of the following design choices would generally improve the performance of such a cache?" options=["Using a very small table size to save memory.","Employing a hash function that distributes keys uniformly.","Implementing separate chaining over open addressing for better worst-case performance.","Increasing the load factor significantly beyond 1.0 for open addressing." ] answer="Employing a hash function that distributes keys uniformly.,Implementing separate chaining over open addressing for better worst-case performance." hint="Consider the impact of each choice on collision rates, average access time, and memory usage." solution="Step 1: Evaluate each option.

'Using a very small table size to save memory.': A small table size increases the likelihood of collisions, leading to longer probe sequences or linked lists, thus degrading performance (higher average access time). So, this is incorrect.

'Employing a hash function that distributes keys uniformly.': A uniform hash function minimizes collisions, which is crucial for good hash table performance (closer to $O(1)$ average access time). This is correct.

'Implementing separate chaining over open addressing for better worst-case performance.': While open addressing can offer better cache locality, separate chaining generally has better worst-case performance for search, insert, and delete operations because it gracefully handles high load factors (can exceed 1.0) and avoids situations where an open addressing table becomes completely full or suffers from severe clustering leading to $O(n)$ operations. So, this is correct.

'Increasing the load factor significantly beyond 1.0 for open addressing.': Open addressing requires a load factor strictly less than 1.0. A load factor greater than or equal to 1.0 means the table is full or potentially in an infinite loop for insertion, which is incorrect and leads to failure. So, this is incorrect.

Step 2: Select the correct options.
The correct options are "Employing a hash function that distributes keys uniformly." and "Implementing separate chaining over open addressing for better worst-case performance."
"
:::

---

Problem-Solving Strategies

💡 CMI Strategy: Hash Function Selection

When choosing a hash function, prioritize uniformity and efficiency. For integer keys, the division method ( $k \pmod m$ ) is simple and effective, especially if $m$ is a prime number. For string keys, polynomial hashing (e.g., $\sum_{i=0}^{L-1} s[i] \cdot p^i \pmod m$ ) often provides good distribution. Always consider the data distribution if known.

💡 CMI Strategy: Collision Resolution Choice

For small, fixed-size datasets or when memory is extremely tight, open addressing (especially quadratic or double hashing) can be efficient due to better cache performance. For dynamic datasets with unpredictable load factors or when worst-case performance guarantees are critical, separate chaining is generally more robust as it handles high load factors gracefully.

---

Common Mistakes

⚠️ Common Mistake: Poor Hash Function Design

❌ Using a hash function that groups many keys to a few specific indices (e.g., $k \pmod{10}$ for keys that are always multiples of 10) leads to excessive collisions and degrades performance to $O(N)$ .
✅ Design hash functions that distribute keys as uniformly as possible across the entire table range. Test with representative data if feasible.

⚠️ Common Mistake: Infinite Loops in Open Addressing

❌ Not handling a full hash table in open addressing can lead to infinite loops during insertion (if no empty slot is found).
✅ Always ensure a load factor $\alpha < 1$ for open addressing. Implement resizing (rehashing) to a larger table when the load factor approaches a predefined threshold (e.g., $0.7$ or $0.8$ ). For linear probing, if $m$ is not prime, the probe sequence might not cover all slots, leading to failure even if the table isn't full.

---

Practice Questions

:::question type="NAT" question="A hash table of size $m=10$ uses linear probing and $h'(k) = k \pmod{10}$ . If keys $7, 17, 27, 37, 47$ are inserted in that order, how many probes (total, including the initial probe $i=0$ ) are required for the insertion of key $47$ ?" answer="5" hint="Trace the insertion process for each key, counting probes for $47$ . Remember that $i=0$ is the first probe." solution="Step 1: Initialize an empty table of size 10.
>

[-, -, -, -, -, -, -, -, -, -]

Step 2: Insert

7

h'(7) = 7 \pmod{10} = 7

. (1 probe)
>

[-, -, -, -, -, -, -, \mathbf{7}, -, -]

Step 3: Insert

17

h'(17) = 17 \pmod{10} = 7

. Position

7

is occupied.
Probe

i=1

h(17, 1) = (7 + 1) \pmod{10} = 8

. Position

8

is empty. (2 probes)
>

[-, -, -, -, -, -, -, \mathbf{7}, \mathbf{17}, -]

Step 4: Insert

27

h'(27) = 27 \pmod{10} = 7

. Position

7

is occupied.
Probe

i=1

h(27, 1) = (7 + 1) \pmod{10} = 8

. Position

8

is occupied.
Probe

i=2

h(27, 2) = (7 + 2) \pmod{10} = 9

. Position

9

is empty. (3 probes)
>

[-, -, -, -, -, -, -, \mathbf{7}, \mathbf{17}, \mathbf{27}]

Step 5: Insert

37

h'(37) = 37 \pmod{10} = 7

. Position

7

is occupied.
Probe

i=1

h(37, 1) = (7 + 1) \pmod{10} = 8

. Position

8

is occupied.
Probe

i=2

h(37, 2) = (7 + 2) \pmod{10} = 9

. Position

9

is occupied.
Probe

i=3

h(37, 3) = (7 + 3) \pmod{10} = 0

. Position

0

is empty. (4 probes)
>

[\mathbf{37}, -, -, -, -, -, -, \mathbf{7}, \mathbf{17}, \mathbf{27}]

Step 6: Insert

47

h'(47) = 47 \pmod{10} = 7

. Position

7

is occupied.
Probe

i=1

h(47, 1) = (7 + 1) \pmod{10} = 8

. Position

8

is occupied.
Probe

i=2

h(47, 2) = (7 + 2) \pmod{10} = 9

. Position

9

is occupied.
Probe

i=3

h(47, 3) = (7 + 3) \pmod{10} = 0

. Position

0

is occupied.
Probe

i=4

h(47, 4) = (7 + 4) \pmod{10} = 1

. Position

1

is empty. (5 probes)
>

[\mathbf{37}, \mathbf{47}, -, -, -, -, -, \mathbf{7}, \mathbf{17}, \mathbf{27}]

Step 7: The total number of probes for key

47

5

.
"
:::

:::question type="MCQ" question="Which collision resolution technique is most effective at mitigating both primary and secondary clustering?" options=["Linear Probing","Quadratic Probing","Separate Chaining","Double Hashing"] answer="Double Hashing" hint="Consider how each technique generates its probe sequence or handles collisions for different keys." solution="Step 1: Analyze each technique:

Linear Probing: Suffers from primary clustering, where long runs of occupied slots are created.

Quadratic Probing: Eliminates primary clustering but can suffer from secondary clustering, where keys that hash to the same initial position follow the same probe sequence.

Separate Chaining: Does not use probing; it stores colliding elements in linked lists (or other structures) at the hash index. It avoids clustering in the table array itself.

Double Hashing: Uses two independent hash functions to generate a distinct probe sequence for each key. This effectively prevents both primary and secondary clustering by making probe sequences highly key-dependent.

Step 2: Identify the technique that addresses both types of clustering.
Double hashing uses a unique step size based on

h_2(k)

, making each key's probe sequence independent of other keys that might share

h_1(k)

or even

h_1(k)

and

h_2(k)

if the functions are well-chosen.
"
:::

:::question type="MSQ" question="A hash table is used to store records of students identified by their unique roll numbers (integers). Which of the following statements are generally true regarding the performance of this hash table?" options=["The worst-case search time can be $O(N)$ regardless of the hash function.","A higher load factor always leads to better memory utilization without performance degradation.","Resizing the table (rehashing) involves re-calculating hash values for all existing elements.","Collisions are less likely to occur if the hash table size $m$ is a prime number." ] answer="The worst-case search time can be $O(N)$ regardless of the hash function.,Resizing the table (rehashing) involves re-calculating hash values for all existing elements.,Collisions are less likely to occur if the hash table size $m$ is a prime number." hint="Consider average vs. worst-case performance, load factor implications, and properties of prime table sizes." solution="Step 1: Evaluate each statement:

'The worst-case search time can be $O(N)$ regardless of the hash function.': This is true. In the worst case, all $N$ keys could hash to the same bucket (separate chaining) or require $N$ probes (open addressing), leading to $O(N)$ search time. Even with a perfect hash function, an adversarial input can force this worst case.

'A higher load factor always leads to better memory utilization without performance degradation.': This is false. While a higher load factor means less empty space, it also increases the average number of collisions and thus the average search/insertion/deletion time. Performance degrades significantly as the load factor increases.

'Resizing the table (rehashing) involves re-calculating hash values for all existing elements.': This is true. When a hash table is resized (usually to a larger size), a new hash function might be used (or the same function with a new $m$ ), requiring all existing key-value pairs to be re-inserted into the new, larger table.

'Collisions are less likely to occur if the hash table size $m$ is a prime number.': This is generally true, especially for the division method $h(k) = k \pmod m$ . Using a prime $m$ tends to distribute keys more uniformly than composite numbers, particularly if keys have common factors with $m$ .

Step 2: Select the correct statements.
The correct statements are: "The worst-case search time can be

O(N)

regardless of the hash function.", "Resizing the table (rehashing) involves re-calculating hash values for all existing elements.", and "Collisions are less likely to occur if the hash table size

m

is a prime number."
"
:::

:::question type="NAT" question="A hash table with separate chaining has $m=10$ buckets. We insert $15$ elements. What is the average number of elements per bucket?" answer="1.5" hint="Average elements per bucket is the definition of load factor." solution="Step 1: The average number of elements per bucket is equivalent to the load factor $\alpha$ .
Step 2: Calculate $\alpha = n/m$ .
>

\alpha = \frac{15}{10} = 1.5

"
:::

:::question type="MCQ" question="Which of the following scenarios is most likely to benefit from using a hash table over a balanced binary search tree (BST)?" options=["Storing data that requires frequent range queries (e.g., all elements between $X$ and $Y$ ).","When strict $O(\log N)$ worst-case time complexity for all operations is required.","Implementing a symbol table where average $O(1)$ lookup is critical and exact ordering is not needed.","When memory usage is extremely constrained and $N$ is very large." ] answer="Implementing a symbol table where average $O(1)$ lookup is critical and exact ordering is not needed." hint="Consider the strengths and weaknesses of hash tables vs. BSTs regarding operation complexity and data ordering." solution="Step 1: Compare hash tables and balanced BSTs.

Hash Tables: Offer average $O(1)$ for insert, delete, and search. Worst-case can be $O(N)$ . Do not inherently maintain order. Memory overhead can be higher due to empty slots or linked lists.

Balanced BSTs (e.g., AVL, Red-Black Trees): Offer $O(\log N)$ worst-case for insert, delete, search. Maintain elements in sorted order, allowing efficient range queries.

Step 2: Evaluate the options based on these characteristics.

'Storing data that requires frequent range queries (e.g., all elements between $X$ and $Y$ ).': This is a strength of balanced BSTs, not hash tables.

'When strict $O(\log N)$ worst-case time complexity for all operations is required.': This is a strength of balanced BSTs, not hash tables (which have $O(N)$ worst-case).

'Implementing a symbol table where average $O(1)$ lookup is critical and exact ordering is not needed.': This is a classic application where hash tables excel. The average $O(1)$ performance is superior to $O(\log N)$ for most lookups, and symbol tables often don't require elements to be kept in sorted order.

'When memory usage is extremely constrained and $N$ is very large.': Hash tables can have significant memory overhead due to empty slots or pointers in separate chaining. Balanced BSTs might be more memory-efficient in some scenarios, especially if the keys are large. This is not a primary benefit of hash tables.

Step 3: Select the option that best describes a scenario where a hash table is preferred.
Option C directly aligns with the primary advantage of hash tables.
"
:::

---

Summary

❗ Key Formulas & Takeaways

| Formula/Concept | Expression |

|---|----------------|------------| | 1 | Division Method Hash |

h(k) = k \pmod m

| | 2 | Multiplication Method Hash |

h(k) = \lfloor m(kA \pmod 1) \rfloor

| | 3 | Linear Probing |

h(k, i) = (h'(k) + i) \pmod m

| | 4 | Quadratic Probing |

h(k, i) = (h'(k) + c_1 i + c_2 i^2) \pmod m

| | 5 | Double Hashing |

h(k, i) = (h_1(k) + i \cdot h_2(k)) \pmod m

| | 6 | Load Factor |

\alpha = n/m

---

What's Next?

💡 Continue Learning

This topic connects to:

Disjoint Set Union (DSU): While not directly related in function, DSU structures often use hash tables internally to map arbitrary elements to integer indices for efficient set operations.

Caching Algorithms: Hash tables are the underlying data structure for many cache implementations (e.g., LRU cache) to provide fast lookups for cached items.

Bloom Filters: These probabilistic data structures use multiple hash functions to check for set membership efficiently, with a small probability of false positives.

Cryptographic Hashing: While standard hash tables use non-cryptographic hash functions, the concept of mapping data to a fixed-size output is shared with cryptographic hashing, which requires additional properties like collision resistance.

---

💡 Next Up

Proceeding to Graph Representations.

---

Part 2: Graph Representations

Graphs are fundamental data structures used to model pairwise relationships between objects. We examine various methods for representing graphs, which significantly impact the efficiency of graph algorithms.

---

Core Concepts

1. Adjacency Matrix

An adjacency matrix represents a graph as a square matrix where rows and columns correspond to vertices. An entry $A[i, j]$ is 1 if an edge exists from vertex $i$ to vertex $j$ , and 0 otherwise. For weighted graphs, $A[i, j]$ stores the weight of the edge.

📐 Adjacency Matrix Properties

A[i,j] = \begin{cases} 1 & \text{if } (i, j) \in E \text{ (unweighted)} \\ w_{ij} & \text{if } (i, j) \in E \text{ with weight } w_{ij} \\ 0 \text{ or } \infty & \text{otherwise} \end{cases}

Where:

V = \{v_1, \dots, v_n\}

is the set of vertices,

E

is the set of edges. Space Complexity:

O(V^2)

When to use: Dense graphs, quick lookup of edge existence.

Worked Example:
Consider an undirected graph $G = (V, E)$ with $V = \{A, B, C, D\}$ and $E = \{(A,B), (A,C), (B,C), (C,D)\}$ . We construct its adjacency matrix.

Step 1: Initialize a $4 \times 4$ matrix with all zeros.

\begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}

Step 2: For each edge $(u, v) \in E$ , set $A[u, v] = 1$ and $A[v, u] = 1$ (since it's undirected).

\begin{bmatrix}> A & B & C & D \\> A & 0 & 1 & 1 & 0 \\> B & 1 & 0 & 1 & 0 \\> C & 1 & 1 & 0 & 1 \\> D & 0 & 0 & 1 & 0> \end{bmatrix}

Answer: The adjacency matrix is as shown. For a directed graph, we would only set $A[u, v] = 1$ .

:::question type="MCQ" question="What is the space complexity to represent a graph with $V$ vertices and $E$ edges using an adjacency matrix?" options=[" $O(V+E)$ "," $O(V^2)$ "," $O(E^2)$ "," $O(V \log V)$ "] answer=" $O(V^2)$ " hint="The matrix size depends only on the number of vertices." solution="An adjacency matrix requires a $V \times V$ grid. Each entry stores a constant amount of information (0, 1, or weight). Therefore, the total space required is proportional to $V^2$ ."
:::

---

2. Adjacency List

An adjacency list represents a graph as an array (or hash table) of lists. Each entry `Adj[u]` in the array contains a list of all vertices $v$ such that there is an edge $(u, v) \in E$ . For weighted graphs, each item in the list can be a pair $(v, w_{uv})$ .

📐 Adjacency List Properties

\text{Adj}[u] = \{v \mid (u, v) \in E\}

Where:

V

is the set of vertices,

E

is the set of edges. Space Complexity:

O(V+E)

for directed graphs,

O(V+2E)

for undirected graphs. When to use: Sparse graphs, iterating over neighbors of a vertex.

Worked Example:
Consider the same undirected graph $G = (V, E)$ with $V = \{A, B, C, D\}$ and $E = \{(A,B), (A,C), (B,C), (C,D)\}$ . We construct its adjacency list.

Step 1: Create an array (or hash map) where each index corresponds to a vertex.

> $A: []$
> $B: []$
> $C: []$
> $D: []$

Step 2: For each edge $(u, v) \in E$ , add $v$ to $u$ 's list and $u$ to $v$ 's list (for undirected graphs).

> $A: [B, C]$
> $B: [A, C]$
> $C: [A, B, D]$
> $D: [C]$

Answer: The adjacency list representation is as shown. If the graph were directed, say $E = \{(A,B), (C,A), (B,C), (C,D)\}$ , then $A: [B]$ , $B: [C]$ , $C: [A, D]$ , $D: []$ .

:::question type="MSQ" question="Which of the following statements are true regarding adjacency lists for a simple graph with $V$ vertices and $E$ edges?" options=["The space complexity is $O(V+E)$ .","Checking if an edge $(u,v)$ exists can take $O(V)$ time in the worst case.","Iterating over all neighbors of a vertex $u$ takes $O(\operatorname{deg}(u))$ time.","It is generally preferred for dense graphs over adjacency matrices."] answer="The space complexity is $O(V+E).$ ,Checking if an edge $(u,v)$ exists can take $O(V)$ time in the worst case.,Iterating over all neighbors of a vertex $u$ takes $O(\operatorname{deg}(u))$ time." hint="Consider the structure and operations for adjacency lists. For edge existence, a linear scan of a list might be needed. For dense graphs, $E$ can be close to $V^2$ ." solution="1. The space complexity is $O(V+E)$ : Correct. Each vertex stores a list head (or pointer), $O(V)$ , and each edge $(u,v)$ appears once in $u$ 's list (directed) or twice (undirected, once in $u$ 's list and once in $v$ 's list), so $O(E)$ total space for lists. Thus, $O(V+E)$ .

Checking if an edge $(u,v)$ exists can take $O(V)$ time in the worst case: Correct. To check for

(u,v)

, we need to traverse the adjacency list of

u

. In the worst case (e.g., a star graph where

u

is the center),

u

can be connected to

V-1

other vertices, requiring

O(V)

time.

Iterating over all neighbors of a vertex $u$ takes $O(\operatorname{deg}(u))$ time: Correct. This is the primary advantage. The time is proportional to the number of neighbors,

\operatorname{deg}(u)

, which is the length of

u

's adjacency list.

It is generally preferred for dense graphs over adjacency matrices: Incorrect. For dense graphs where

E

approaches

O(V^2)

, the space complexity

O(V+E)

is similar to

O(V^2)

, but adjacency matrices offer

O(1)

edge lookup, making them generally preferred for dense graphs."

:::

---

3. Incidence Matrix

An incidence matrix represents a graph with rows corresponding to vertices and columns corresponding to edges. An entry $M[v, e]$ is 1 if vertex $v$ is incident to edge $e$ , and 0 otherwise. For directed graphs, $M[v, e]$ can be 1 if $v$ is the tail, -1 if $v$ is the head, and 0 otherwise.

📐 Incidence Matrix Properties (Undirected)

M[v,e] = \begin{cases} 1 & \text{if vertex } v \text{ is an endpoint of edge } e \\ 0 & \text{otherwise} \end{cases}

Where:

V

is the set of vertices,

E

is the set of edges. Space Complexity:

O(V \cdot E)

When to use: Algorithms that need to relate edges to their incident vertices directly, less common than adjacency matrix/list.

Worked Example:
Consider an undirected graph $G = (V, E)$ with $V = \{A, B, C, D\}$ and $E = \{e_1=(A,B), e_2=(A,C), e_3=(B,C), e_4=(C,D)\}$ . We construct its incidence matrix.

Step 1: Initialize a $4 \times 4$ matrix with all zeros, where rows are vertices and columns are edges.

\begin{bmatrix}> & e_1 & e_2 & e_3 & e_4 \\> A & 0 & 0 & 0 & 0 \\> B & 0 & 0 & 0 & 0 \\> C & 0 & 0 & 0 & 0 \\> D & 0 & 0 & 0 & 0> \end{bmatrix}

Step 2: For each edge $e_k = (u, v)$ , set $M[u, e_k] = 1$ and $M[v, e_k] = 1$ .

\begin{bmatrix}> & e_1 & e_2 & e_3 & e_4 \\> A & 1 & 1 & 0 & 0 \\> B & 1 & 0 & 1 & 0 \\> C & 0 & 1 & 1 & 1 \\> D & 0 & 0 & 0 & 1> \end{bmatrix}

Answer: The incidence matrix is as shown.

:::question type="NAT" question="For a directed graph with $V=5$ vertices and $E=7$ edges, what is the exact number of entries in its incidence matrix?" answer="35" hint="The incidence matrix has dimensions $V \times E$ ." solution="An incidence matrix has $V$ rows (for vertices) and $E$ columns (for edges).
The total number of entries is $V \times E$ .
Given $V=5$ and $E=7$ , the number of entries is $5 \times 7 = 35$ ."
:::

---

4. Comparison of Representations

The choice of graph representation depends on the specific graph properties (dense vs. sparse) and the operations most frequently performed. Adjacency matrices are suitable for dense graphs and $O(1)$ edge lookup. Adjacency lists are better for sparse graphs and efficient traversal of neighbors. Incidence matrices are rarely used for general graph algorithms due to higher space complexity and less direct access to neighbor information.

💡 Representation Trade-offs

| Feature | Adjacency Matrix | Adjacency List | Incidence Matrix |
|---------------------|------------------|----------------|------------------|
| Space | $O(V^2)$ | $O(V+E)$ | $O(V \cdot E)$ |
| Edge $(u,v)$ check | $O(1)$ | $O(\operatorname{deg}(u))$ | $O(V)$ (scan column) |
| Find neighbors of $u$ | $O(V)$ | $O(\operatorname{deg}(u))$ | $O(E)$ (scan row) |
| Add edge | $O(1)$ | $O(1)$ | $O(V)$ (add column) |
| Remove edge | $O(1)$ | $O(\operatorname{deg}(u))$ | $O(V)$ (remove column) |

Worked Example:
A graph algorithm requires frequent checks for the existence of an edge between any two given vertices. The graph is known to be dense, meaning $E \approx O(V^2)$ . Determine the most efficient representation.

Step 1: Analyze the primary operation: checking for edge existence.
For an adjacency matrix, this is an $O(1)$ operation.
For an adjacency list, this is $O(\operatorname{deg}(u))$ in the worst case, which can be $O(V)$ for dense graphs.

Step 2: Analyze graph density and space complexity.
For a dense graph, $E \approx V^2$ .
Adjacency matrix space: $O(V^2)$ .
Adjacency list space: $O(V+E) \approx O(V+V^2) = O(V^2)$ .
Incidence matrix space: $O(V \cdot E) \approx O(V \cdot V^2) = O(V^3)$ .

Step 3: Compare efficiency for the required operation and space.
Both adjacency matrix and list have comparable space complexity for dense graphs. However, the adjacency matrix provides $O(1)$ edge lookup, which is superior to the $O(V)$ worst-case for adjacency lists. The incidence matrix is too space-inefficient.

Answer: The adjacency matrix is the most efficient representation for frequent $O(1)$ edge existence checks in a dense graph.

:::question type="MCQ" question="Which graph representation is generally most suitable for performing Breadth-First Search (BFS) on a sparse graph, where $E \ll V^2$ ?" options=["Adjacency Matrix","Incidence Matrix","Adjacency List","Edge List"] answer="Adjacency List" hint="BFS involves exploring neighbors of a vertex efficiently. Consider the time complexity for finding neighbors for each representation." solution="BFS requires efficiently finding all neighbors of a given vertex.

Adjacency Matrix: Finding neighbors of

u

requires scanning an entire row, taking

O(V)

time per vertex. Total BFS time

O(V(V+E))

Incidence Matrix: Finding neighbors of

u

requires scanning an entire row of

V \times E

matrix, taking

O(E)

time per vertex (to find edges incident to

u

, then their other endpoint). Total BFS time

O(V \cdot E)

Adjacency List: Finding neighbors of

u

takes

O(\operatorname{deg}(u))

time. The total BFS time is

O(V+E)

Edge List: An edge list stores all edges as pairs

(u,v)

. Finding neighbors of

u

would require scanning the entire list of

E

edges, taking

O(E)

time per vertex. Total BFS time

O(V \cdot E)

For a sparse graph, $E$ is much smaller than $V^2$ . The $O(V+E)$ time complexity of BFS using an adjacency list is superior to $O(V^2)$ or $O(V \cdot E)$ for other representations."
:::

---

Advanced Applications

Choosing the right representation is critical for algorithm performance, especially when dealing with specific graph types or operations. For instance, some algorithms rely on matrix operations, while others benefit from direct neighbor access.

Worked Example:
We need to implement an algorithm that frequently calculates the degree of all vertices in an undirected graph. The graph can be either sparse or dense. Which representation allows for the most efficient degree calculation for all vertices?

Step 1: Consider the definition of degree. For an undirected graph, $\operatorname{deg}(v)$ is the number of edges incident to $v$ .

Step 2: Evaluate degree calculation for each representation:
* Adjacency Matrix: To find $\operatorname{deg}(v)$ , we must sum the entries in row $v$ (or column $v$ ). This takes $O(V)$ time for one vertex. For all vertices, it takes $O(V^2)$ time.
* Adjacency List: To find $\operatorname{deg}(v)$ , we simply count the elements in $Adj[v]$ . This takes $O(\operatorname{deg}(v))$ time for one vertex. For all vertices, the total time is $\sum_{v \in V} \operatorname{deg}(v) = O(2E) = O(E)$ (since each edge is listed twice).
* Incidence Matrix: To find $\operatorname{deg}(v)$ , we must sum the entries in row $v$ . This takes $O(E)$ time for one vertex. For all vertices, it takes $O(V \cdot E)$ time.

Step 3: Compare efficiencies.
The adjacency list allows calculating all degrees in $O(E)$ time, which is generally faster than $O(V^2)$ or $O(V \cdot E)$ , especially for sparse graphs. Even for dense graphs where $E \approx V^2$ , $O(E)$ is comparable to $O(V^2)$ but potentially with a smaller constant factor.

Answer: The adjacency list is the most efficient representation for calculating the degrees of all vertices, achieving $O(E)$ time complexity.

:::question type="NAT" question="A directed graph has 10 vertices and 15 edges. If it is represented by an adjacency matrix, how many entries will be non-zero?" answer="15" hint="Each directed edge corresponds to exactly one non-zero entry in the adjacency matrix." solution="In an adjacency matrix for a directed graph, an edge $(u,v)$ is represented by a single non-zero entry $A[u,v]=1$ .
The number of non-zero entries in the adjacency matrix is equal to the number of directed edges in the graph.
Given $E=15$ edges, there will be 15 non-zero entries."
:::

---

Problem-Solving Strategies

💡 Choosing the Right Representation

When faced with a graph problem, first identify:

Graph Density: Is it sparse ( $E \approx V$ ) or dense ( $E \approx V^2$ )?

Key Operations: What are the most frequent operations (edge lookup, neighbor traversal, adding/removing vertices/edges)?

Space Constraints: Are there memory limitations?

For sparse graphs and neighbor-centric algorithms (BFS, DFS, Dijkstra), adjacency lists are usually preferred. For dense graphs and quick edge existence checks (Floyd-Warshall, some matrix-based algorithms), adjacency matrices are often better. Incidence matrices are rarely optimal for general graph algorithms.

---

Common Mistakes

⚠️ Directed vs. Undirected Graphs

❌ Mistake: Treating directed graph representations the same as undirected. For example, setting $A[u,v]=1$ and $A[v,u]=1$ for a directed edge $(u,v)$ in an adjacency matrix.
✅ Correct approach: For a directed graph, an edge $(u,v)$ (from $u$ to $v$ ) only sets $A[u,v]=1$ in an adjacency matrix or adds $v$ to $u$ 's adjacency list. $A[v,u]$ remains 0, and $u$ is not added to $v$ 's list, unless there is also a directed edge $(v,u)$ . For undirected graphs, the matrix is symmetric, and entries are added to both lists.

⚠️ Space Complexity Miscalculation

❌ Mistake: Assuming adjacency list space complexity is always less than adjacency matrix space complexity without considering graph density.
✅ Correct approach: Adjacency list space is $O(V+E)$ , while adjacency matrix space is $O(V^2)$ . If the graph is dense (i.e., $E \approx O(V^2)$ ), then $O(V+E)$ becomes $O(V+V^2) = O(V^2)$ , making their space complexities asymptotically equivalent. The constant factors might differ, but the asymptotic behavior is the same.

---

Practice Questions

:::question type="MCQ" question="Consider an undirected graph with $V$ vertices and $E$ edges. Which of the following statements is true about its adjacency matrix $A$ ?" options=[" $A$ is always symmetric.","The sum of all entries in $A$ equals $E$ .","The diagonal entries $A[i,i]$ represent self-loops.","If the graph is connected, all entries in $A$ must be 1."] answer=" $A$ is always symmetric." hint="Recall the definition of an undirected graph and how edges are represented in its adjacency matrix." solution="1. $A$ is always symmetric: Correct. For an undirected edge $(u,v)$ , we set $A[u,v]=1$ and $A[v,u]=1$ . Thus, $A[u,v] = A[v,u]$ for all $u,v$ , making the matrix symmetric.

The sum of all entries in $A$ equals $E$ : Incorrect. For an undirected graph without self-loops, each edge

(u,v)

contributes two 1s to the matrix (at

A[u,v]

and

A[v,u]

). So, the sum of all entries is

2E

The diagonal entries $A[i,i]$ represent self-loops: Correct. If

A[i,i]=1

, it means there is an edge from vertex

i

to itself, which is a self-loop. However, the question asks what is always true. Not all undirected graphs have self-loops, so diagonal entries are not always representing self-loops (they could be 0). If the graph is simple (no self-loops), then

A[i,i]

is always 0. The first statement is universally true for undirected graphs.

If the graph is connected, all entries in $A$ must be 1: Incorrect. A connected graph only requires a path between any two vertices, not a direct edge. For example, a path graph

A-B-C

is connected, but

A[A,C]

would be 0."

:::

:::question type="NAT" question="A complete undirected graph $K_N$ has $N$ vertices. How many elements are in its adjacency list representation (sum of lengths of all lists)?" answer="N*(N-1)" hint="In a complete graph, every vertex is connected to every other vertex. Each edge appears twice in the adjacency lists." solution="In a complete undirected graph $K_N$ , every vertex is connected to every other $N-1$ vertices.
The degree of each vertex is $N-1$ .
The sum of the lengths of all adjacency lists is $\sum_{v \in V} \operatorname{deg}(v)$ .
For an undirected graph, this sum is $2E$ .
The number of edges in $K_N$ is $E = \frac{N(N-1)}{2}$ .
Therefore, the sum of the lengths of all lists is $2E = 2 \times \frac{N(N-1)}{2} = N(N-1)$ ."
:::

:::question type="MSQ" question="Which of the following operations are generally more efficient using an adjacency matrix compared to an adjacency list for a dense graph?" options=["Checking if an edge exists between two specific vertices.","Adding a new vertex to the graph.","Iterating over all neighbors of a specific vertex.","Finding the transpose of a directed graph."] answer="Checking if an edge exists between two specific vertices.,Finding the transpose of a directed graph." hint="Consider the time complexity of each operation for both representations, especially for dense graphs where $E \approx V^2$ ." solution="1. Checking if an edge exists between two specific vertices: Correct. Adjacency matrix: $O(1)$ . Adjacency list: $O(V)$ in worst case for dense graph.

Adding a new vertex to the graph: Incorrect. Adjacency matrix:

O(V^2)

(requires resizing the matrix and initializing a new row/column). Adjacency list:

O(1)

(just add a new empty list).

Iterating over all neighbors of a specific vertex: Incorrect. Adjacency matrix:

O(V)

. Adjacency list:

O(\operatorname{deg}(v))

, which is faster if

\operatorname{deg}(v) < V

Finding the transpose of a directed graph: Correct. Adjacency matrix:

O(V^2)

by simply transposing the matrix. Adjacency list:

O(V+E)

by creating new lists and iterating through all existing edges, which for dense graphs is

O(V^2)

. While both are

O(V^2)

for dense graphs, matrix transposition is often simpler and potentially more efficient due to memory locality and optimized matrix operations."

:::

:::question type="MCQ" question="An algorithm needs to frequently determine if a vertex $v$ has any outgoing edges. Which representation allows for the fastest check for a directed graph?" options=["Adjacency Matrix","Adjacency List","Incidence Matrix","Edge List"] answer="Adjacency List" hint="Consider how outgoing edges are stored in each representation." solution="1. Adjacency Matrix: To check if $v$ has any outgoing edges, we must scan row $v$ . This takes $O(V)$ time.

Adjacency List: We simply check if the list

\text{Adj}[v]

is empty. This is an

O(1)

operation (checking if the head pointer is null or list size is 0).

Incidence Matrix: To check if

v

has any outgoing edges, we must scan row

v

to find any column where

M[v,e]=1

. This takes

O(E)

time.

Edge List: We would need to scan the entire edge list to find any edge starting at

v

. This takes

O(E)

time.

Therefore, checking if a vertex has any outgoing edges is fastest with an adjacency list, taking $O(1)$ time."
:::

:::question type="NAT" question="A graph has $V=6$ vertices and $E=9$ edges. If it is an undirected graph with no self-loops, what is the sum of degrees of all vertices?" answer="18" hint="The sum of degrees in an undirected graph is related to the number of edges." solution="For any undirected graph, the sum of the degrees of all vertices is equal to twice the number of edges. This is known as the Handshaking Lemma.
Sum of degrees $= 2E$ .
Given $E=9$ , the sum of degrees $= 2 \times 9 = 18$ ."
:::

---

Summary

❗ Key Formulas & Takeaways

| Formula/Concept | Expression |

|---|----------------|------------| | 1 | Adjacency Matrix Space |

O(V^2)

| | 2 | Adjacency List Space (Directed) |

O(V+E)

| | 3 | Adjacency List Space (Undirected) |

O(V+2E)

| | 4 | Incidence Matrix Space |

O(V \cdot E)

| | 5 | Edge Check (Adj. Matrix) |

O(1)

| | 6 | Edge Check (Adj. List) |

O(\operatorname{deg}(u))

| | 7 | Sum of Degrees (Undirected) |

\sum_{v \in V} \operatorname{deg}(v) = 2E

---

What's Next?

💡 Continue Learning

This topic connects to:

Graph Traversal Algorithms (BFS, DFS): These algorithms directly utilize graph representations, especially adjacency lists, for efficient neighbor exploration.

Shortest Path Algorithms (Dijkstra, Bellman-Ford): The performance of these algorithms is heavily influenced by the chosen graph representation, particularly for accessing edge weights and neighbors.

Minimum Spanning Tree Algorithms (Prim, Kruskal): These algorithms also rely on efficient graph representation for managing edges and connectivity.

---

Chapter Summary

❗ Other Key Data Structures — Key Points

Hash Tables: Offer average $O(1)$ time complexity for insertions, deletions, and lookups by mapping keys to array indices via a hash function.

Hashing Functions: Critical for distributing keys uniformly across the table, minimizing collisions, and ensuring efficient performance.

Collision Resolution: Strategies such as chaining (using linked lists) or open addressing (e.g., linear probing, quadratic probing, double hashing) are employed to handle multiple keys mapping to the same hash index.

Load Factor: Defined as $\alpha = \frac{\text{number of elements}}{\text{table size}}$ . A high load factor significantly degrades hash table performance, often necessitating rehashing to a larger table.

Graph Representations: Graphs are primarily represented using either adjacency matrices or adjacency lists, each with distinct space and time complexity trade-offs.

Adjacency Matrix: An $V \times V$ matrix where $A[i][j]$ indicates an edge between vertex $i$ and $j$ . Space complexity is $O(V^2)$ , making it suitable for dense graphs or when $O(1)$ edge existence checks are paramount.

Adjacency List: An array of lists, where $L[i]$ contains all vertices adjacent to vertex $i$ . Space complexity is $O(V+E)$ , making it generally more memory-efficient for sparse graphs.

---

Chapter Review Questions

:::question type="MCQ" question="A hash table uses chaining for collision resolution and has a size of 10. The hash function is $h(k) = k \pmod{10}$ . Keys are inserted in the following order: 12, 23, 34, 45, 56, 13, 24, 35, 46. Assuming a worst-case search for a key that exists, what is the maximum number of key comparisons required to find any key currently in the table?" options=["1","2","3","4"] answer="2" hint="Trace the insertions and observe the length of the longest chain. The number of comparisons for a key in a chain is its position in the chain." solution="Let's trace the insertions and the state of the hash table with chaining:
* $h(12) = 2 \implies \text{Table}[2] = [12]$
* $h(23) = 3 \implies \text{Table}[3] = [23]$
* $h(34) = 4 \implies \text{Table}[4] = [34]$
* $h(45) = 5 \implies \text{Table}[5] = [45]$
* $h(56) = 6 \implies \text{Table}[6] = [56]$
* $h(13) = 3 \implies \text{Table}[3] = [23, 13]$
* $h(24) = 4 \implies \text{Table}[4] = [34, 24]$
* $h(35) = 5 \implies \text{Table}[5] = [45, 35]$
* $h(46) = 6 \implies \text{Table}[6] = [56, 46]$

The final state of the relevant chains:
* $\text{Table}[2] = [12]$ (1 comparison for 12)
* $\text{Table}[3] = [23, 13]$ (1 comparison for 23, 2 for 13)
* $\text{Table}[4] = [34, 24]$ (1 comparison for 34, 2 for 24)
* $\text{Table}[5] = [45, 35]$ (1 comparison for 45, 2 for 35)
* $\text{Table}[6] = [56, 46]$ (1 comparison for 56, 2 for 46)

The maximum number of comparisons for any existing key is 2."
:::

:::question type="NAT" question="Consider an undirected graph with $V=100$ vertices and $E=500$ edges. If an adjacency matrix requires $V^2$ units of space and an adjacency list requires $V + 2E$ units of space (for an undirected graph), what is the approximate ratio (rounded to the nearest integer) of the space required by the adjacency matrix to that required by the adjacency list?" answer="9" hint="Calculate the space for each representation using the given formulas and then find their ratio." solution="Space for Adjacency Matrix = $V^2 = 100^2 = 10000$ units.
Space for Adjacency List = $V + 2E = 100 + 2 \times 500 = 100 + 1000 = 1100$ units.
Ratio = $\frac{\text{Space (Adjacency Matrix)}}{\text{Space (Adjacency List)}} = \frac{10000}{1100} = \frac{100}{11} \approx 9.0909...$
Rounded to the nearest integer, the ratio is 9."
:::

:::question type="MCQ" question="Which of the following scenarios most strongly suggests that an adjacency list representation would be more appropriate for a graph than an adjacency matrix?" options=["The graph is dense, with $E$ approaching $V^2$ .","Frequent $O(1)$ checks for the existence of an edge between any two specific vertices are a critical requirement.","The graph is sparse, and memory efficiency for storing edges is a primary concern.","The number of vertices $V$ is very small, making $V^2$ a manageable constant." ] answer="The graph is sparse, and memory efficiency for storing edges is a primary concern." hint="Consider the space complexity of each representation relative to the graph's density and the time complexity of common operations." solution="* The graph is dense, with $E$ approaching $V^2$ : An adjacency matrix ( $O(V^2)$ space) is generally preferred for dense graphs because its space usage is proportional to the number of edges, and it offers $O(1)$ edge lookup.
* Frequent $O(1)$ checks for the existence of an edge between any two specific vertices are a critical requirement: An adjacency matrix provides $O(1)$ edge existence checks, which is superior to an adjacency list's $O(\operatorname{deg}(v))$ worst-case.
* The graph is sparse, and memory efficiency for storing edges is a primary concern: An adjacency list uses $O(V+E)$ space, which is significantly more memory-efficient than $O(V^2)$ for sparse graphs where $E \ll V^2$ . This is the strongest reason to prefer an adjacency list.
The number of vertices $V$ is very small, making $V^2$ a manageable constant: While small $V$ makes an adjacency matrix feasible, it doesn't necessarily make it more* appropriate than an adjacency list, especially if the graph is sparse. The choice still depends on density and operation priorities.

Therefore, for sparse graphs where memory efficiency is key, an adjacency list is preferred."
:::

---

What's Next?

💡 Continue Your CMI Journey

Building upon the foundational understanding of hash tables and graph representations from this chapter, your CMI preparation should next focus on the algorithms that leverage these structures. For hash tables, delve into their applications in data structures like sets and maps, and advanced hashing techniques. For graphs, explore core algorithms such as Breadth-First Search (BFS), Depth-First Search (DFS), shortest path algorithms (e.g., Dijkstra's, Bellman-Ford), and Minimum Spanning Tree (MST) algorithms (e.g., Prim's, Kruskal's). A solid grasp of these algorithms and their complexity analysis is paramount.

Other Key Data Structures

Other Key Data Structures

Chapter Contents

| Topic |

Part 1: Hash Tables

Core Concepts

1. Hash Functions

2. Collision Resolution

2.1. Separate Chaining

2.2. Open Addressing

2.2.1. Linear Probing

2.2.2. Quadratic Probing

2.2.3. Double Hashing

3. Load Factor

4. Universal Hashing

Advanced Applications

Problem-Solving Strategies

Common Mistakes

Practice Questions

Summary

| Formula/Concept | Expression |

What's Next?

Part 2: Graph Representations

Core Concepts

1. Adjacency Matrix

2. Adjacency List

3. Incidence Matrix

4. Comparison of Representations

Advanced Applications

Problem-Solving Strategies

Common Mistakes

Practice Questions

Summary

| Formula/Concept | Expression |

What's Next?

Chapter Summary

Chapter Review Questions

What's Next?

🎯 Key Points to Remember

Related Topics in Algorithms and Data Structures

Asymptotic Notation

Dynamic Programming

Hierarchical Data Structures

Algorithm Basics and Analysis

More Resources

Study Notes

Short Notes

Test Series

Mock Tests

Previous Year Papers

Chapter-wise PYQs

Chapter Practice

Why Choose MastersUp?

AI-Powered Plans

15,000+ Questions

Smart Analytics

Bookmark & Revise