Basic Data Structures

Overview

The study of data structures forms the bedrock upon which efficient algorithms are built. In this chapter, we shall explore the fundamental constructs used to organize and store data, which are indispensable for solving a wide array of computational problems. Our focus will be on the abstract properties, concrete implementations, and performance characteristics of these structures. A deep understanding of how to select and manipulate the appropriate data structure is not merely an academic exercise; it is a critical skill for success in the Graduate Aptitude Test in Engineering (GATE), where questions frequently test the trade-offs between different data organization schemes.

We will begin by examining linear structures such as linked lists, stacks, and queues, each offering unique mechanisms for data access and modification. Subsequently, our investigation will proceed to non-linear structures, namely trees and hash tables. For each structure, we will rigorously analyze the time and space complexity of its core operations, often expressed using Big-O notation, such as $O(1)$ , $O(\log n)$ , or $O(n)$ . This analytical approach is central to the GATE syllabus and is essential for developing the ability to design algorithms that are not only correct but also optimally performant.

---

Chapter Contents

| # | Topic | What You'll Learn |
|---|-------|-------------------|
| 1 | Linked Lists | Dynamic data storage and pointer manipulation. |
| 2 | Stacks | The LIFO principle and its applications. |
| 3 | Queues | The FIFO principle for scheduling tasks. |
| 4 | Trees | Hierarchical data representation and traversal algorithms. |
| 5 | Hash Tables | Efficient key-value mapping and collision resolution. |

---

Learning Objectives

❗ By the End of This Chapter

After completing this chapter, you will be able to:

Implement and analyze the fundamental operations (insertion, deletion, traversal) for linked lists, stacks, and queues.

Differentiate between various tree structures, including binary trees and binary search trees, and perform standard traversal operations.

Explain the principles of hashing, hash functions, and common collision resolution techniques.

Evaluate the asymptotic time and space complexity of operations for each data structure and justify the selection of a specific structure for a given problem scenario.

---

We now turn our attention to Linked Lists...

Part 1: Linked Lists

Introduction

In our study of data structures, we first encounter static structures such as arrays, where the size is fixed at compile time. While efficient for certain applications, this rigidity presents a significant limitation when the volume of data is unpredictable or varies dynamically during program execution. To address this, we turn our attention to dynamic data structures, of which the linked list is the most fundamental.

A linked list is a linear collection of data elements, called nodes, where the linear order is not given by their physical placement in memory. Instead, each element points to the next, forming a chain-like structure. This approach provides remarkable flexibility in memory allocation and management, making it an indispensable tool in a programmer's arsenal for tasks ranging from implementing other abstract data types like stacks and queues to managing memory in operating systems.

📖 Linked List

A linked list is a linear data structure consisting of a sequence of nodes, where each node contains two fields: a data field and a reference (or link) to the next node in the sequence. The first node in the list is referred to as the `head`, and the last node's link points to `NULL`, signifying the end of the list.

---

Key Concepts

1. Structure of a Node

The fundamental building block of any linked list is the node. A node is a self-referential structure that encapsulates both the data it holds and the mechanism to access the subsequent node.

We can visualize a single node as follows:

Data

Next Pointer

In programming, this structure is typically implemented using a `struct` in C/C++ or a `class` in Python/Java.

```c
// A node structure in C
struct Node {
int data;
struct Node* next;
};
```

```python

A node class in Python

class Node:
def __init__(self, data):
self.data = data
self.next = None
```

A sequence of these nodes, linked together, forms the list. The `head` pointer is a special pointer that always points to the first node, serving as the entry point to the list.

2. Types of Linked Lists

While the fundamental concept remains the same, linked lists can be categorized based on their traversal capabilities and structure.

A. Singly Linked List
This is the simplest form, where each node contains data and a single pointer to the next node. Traversal is unidirectional, from head to tail.

head

A

B

C

NULL

B. Doubly Linked List
In this variant, each node possesses two pointers: one to the `next` node and another to the `previous` node. This structure permits bidirectional traversal, which can simplify certain operations like deletion, at the cost of additional memory for the second pointer.

C. Circular Linked List
A circular linked list is a variation where the `next` pointer of the last node points back to the `head` node instead of `NULL`. This creates a continuous loop, which is useful in applications requiring round-robin scheduling.

3. Basic Operations

The utility of a linked list is realized through its fundamental operations. Let us consider the common operations for a singly linked list.

A. Traversal
To traverse a list, we begin at the `head` and follow the `next` pointers until we reach a `NULL` pointer.

```python
def traverse(head):
current = head
while current is not None:
print(current.data)
current = current.next
```

B. Insertion
A new node can be inserted at three locations: the beginning, the end, or a specific position. Let us examine insertion at the beginning.

Worked Example:

Problem: Given a linked list with head pointer `head`, insert a new node with data `5` at the beginning of the list.

Solution:

Step 1: Allocate memory for the new node and assign its data.

newNode = createNode(5)

Step 2: Point the `next` pointer of the new node to the current head of the list. This links the new node to the rest of the list.

newNode \rightarrow next = head

Step 3: Update the main `head` pointer to point to the new node, making it the new first node.

head = newNode

Answer: The list is now updated with the new node at the front. The time complexity for this operation is $O(1)$ .

C. Deletion
Deletion also has three cases: removing the first node, the last node, or a specific node. Deleting the first node is an $O(1)$ operation, while deleting from the end or a specific position requires traversal, leading to an $O(n)$ complexity in the worst case for a singly linked list.

---

Problem-Solving Strategies

💡 GATE Strategy: The Two-Pointer Technique

For problems involving finding the middle element, detecting a cycle, or other relational position tasks in a linked list, the "fast and slow pointer" approach is exceptionally efficient.

Slow Pointer: Moves one step at a time (`slow = slow.next`).
Fast Pointer: Moves two steps at a time (`fast = fast.next.next`).

When the fast pointer reaches the end, the slow pointer will be at the middle. If the fast pointer ever equals the slow pointer (and neither is `NULL`), a cycle is detected. This technique often reduces the complexity from

O(n^2)

O(n)

with a single pass.

---

Common Mistakes

⚠️ Avoid These Errors

❌ Dereferencing a NULL pointer. Attempting to access `current->next` when `current` is `NULL` will cause a runtime error. Always check for `NULL` before accessing a pointer's members.

✅ Correct approach: Use conditional checks like `while (current != NULL)` and `if (current->next != NULL)`.

❌ Losing the head pointer. Modifying the `head` pointer during traversal without a temporary variable to hold its original value will result in losing access to the beginning of the list.

✅ Correct approach: Always use a temporary pointer for traversal, e.g., `Node* temp = head; while (temp != NULL) { ... }`.

---

Practice Questions

:::question type="MCQ" question="In a singly linked list, which of the following operations has a time complexity that is not dependent on the length of the list, $n$ ?" options=["Insertion at the end of the list","Deletion of the last node","Searching for an element","Insertion at the beginning of the list"] answer="Insertion at the beginning of the list" hint="Consider which operations do not require traversing the list." solution="Insertion at the end, deletion of the last node, and searching all require traversing the list to reach the target position, resulting in an $O(n)$ time complexity in the worst case. However, insertion at the beginning only requires updating the head pointer and the new node's next pointer, which is a constant time $O(1)$ operation."
:::

:::question type="NAT" question="Consider a singly linked list with $n$ nodes. What is the minimum number of pointer modifications required to insert a new node between the third and fourth nodes?" answer="2" hint="Think about which pointers need to be re-linked to accommodate the new node." solution="Let the third node be $N_3$ and the fourth node be $N_4$ . Originally, $N_3 \rightarrow \text{next}$ points to $N_4$ . To insert a new node, $N_{new}$ , we must perform two pointer modifications:

The

\text{next}

pointer of

N_{new}

must be set to point to

N_4

The

\text{next}

pointer of

N_3

must be updated to point to

N_{new}

Therefore, a minimum of 2 pointer modifications are required."
:::

:::question type="MSQ" question="Which of the following are advantages of a doubly linked list over a singly linked list? Select ALL that apply." options=["It allows traversal in both forward and backward directions.","It requires less memory per node.","Deletion of a node is more efficient if a pointer to the node to be deleted is given.","It is simpler to implement."] answer="A,C" hint="Evaluate each option based on the structural differences between singly and doubly linked lists." solution="A is correct because the `previous` pointer allows backward traversal. C is correct because with a pointer to the node to be deleted, we can access its `previous` node in $O(1)$ time to update pointers, whereas in a singly list, we would need to traverse from the head to find the previous node. B is incorrect; a doubly linked list requires more memory due to the extra `previous` pointer. D is incorrect; the management of two pointers makes its implementation more complex than a singly linked list."
:::

:::question type="MCQ" question="What is the primary characteristic of a circular linked list?" options=["It must be sorted.","The last node points to NULL.","The last node points to the first node.","It does not have a head pointer."] answer="The last node points to the first node." hint="Recall the structural definition that distinguishes a circular linked list from a standard linear one." solution="The defining feature of a circular linked list is that the `next` reference of the tail node points back to the `head` node, forming a closed loop. This is in contrast to a standard linear linked list where the last node's `next` reference is `NULL`."
:::

---

Summary

❗ Key Takeaways for GATE

Dynamic Sizing: Linked lists are dynamic data structures, allowing for efficient insertion and deletion without the need for resizing, unlike arrays.

Non-Contiguous Memory: Nodes in a linked list can be stored anywhere in memory; the order is maintained by pointers, not by physical location.

Time Complexity: Insertion/deletion at the beginning of a singly linked list is $O(1)$ . Traversal, searching, and insertion/deletion at the end or at a specific position are $O(n)$ .

Pointer Management: Correctly managing pointers, especially the `head` pointer and checking for `NULL` conditions, is critical to avoid errors.

---

What's Next?

💡 Continue Learning

This foundational understanding of linked lists is crucial for mastering more complex data structures.

Stacks & Queues: These abstract data types can be efficiently implemented using linked lists. A stack's LIFO (Last-In, First-Out) behavior maps directly to insertion and deletion at the head of a list ( $O(1)$ operations).

Trees & Graphs: These are non-linear data structures where nodes (or vertices) are connected by pointers (or edges). The concept of a node with data and pointers is a direct extension of the linked list node.

Mastering these connections will provide a comprehensive understanding of how fundamental structures build into more advanced concepts frequently tested in GATE.

---

💡 Moving Forward

Now that you understand Linked Lists, let's explore Stacks which builds on these concepts.

---

Part 2: Stacks

Introduction

In the study of data structures, we encounter fundamental organizational principles that govern how data is stored and accessed. The stack is one of the most elementary yet powerful of these structures. It is a linear data structure that operates on a principle of constrained access, meaning elements can only be added or removed from one end, conventionally referred to as the "top" of the stack. This access mechanism is known as Last-In, First-Out (LIFO).

The LIFO paradigm is not merely a theoretical construct; it mirrors numerous processes in computing. For instance, the management of function calls within a program's execution is handled by a call stack. Similarly, parsing arithmetic expressions, implementing undo functionality in software, and navigating graph or tree structures via depth-first search all rely on the stack's unique properties. For the GATE examination, a thorough understanding of stack operations and the ability to trace algorithms that utilize them is of paramount importance.

📖 Stack

A stack is an Abstract Data Type (ADT) that represents a collection of elements with two principal operations: Push, which adds an element to the collection, and Pop, which removes the most recently added element. This behavior ensures that the element to be removed is always the one that has been in the collection for the shortest amount of time, a principle known as Last-In, First-Out (LIFO).

---

Key Concepts

1. The LIFO Principle: Last-In, First-Out

The defining characteristic of a stack is its LIFO access protocol. We can visualize this concept using an analogy of a stack of plates. A new plate is placed on top of the stack, and when a plate is needed, it is the one from the top that is removed. It is impractical, if not impossible, to remove a plate from the bottom or middle without disturbing the entire stack.

This principle dictates that the last element to be inserted (pushed) onto the stack will be the first element to be removed (popped). This behavior is fundamental to its application in various algorithms. Let us consider the primary operations that enforce this principle.

Push Operation

A

B
TOP

C

Push(C)

Pop Operation

A

B

C
TOP

Pop() → C

2. Core Stack Operations

A stack ADT is characterized by a specific set of operations. For GATE, proficiency with these is non-negotiable.

a) Push
The `Push` operation adds an element to the top of the stack. If the stack is implemented with a fixed-size array and is already full, this operation results in a condition known as Stack Overflow.

```python

Pseudocode for Push

procedure Push(stack, element):
if stack is full:
report Overflow Error
else:
increment top_pointer
stack[top_pointer] = element
```

b) Pop
The `Pop` operation removes the element at the top of the stack and returns it. If an attempt is made to pop from an empty stack, a Stack Underflow error occurs.

```python

Pseudocode for Pop

procedure Pop(stack):
if stack is empty:
report Underflow Error
else:
element = stack[top_pointer]
decrement top_pointer
return element
```

c) Peek (or Top)
The `Peek` operation returns the value of the top element without removing it from the stack. This is useful for inspecting the top element before deciding on an action.

```python

Pseudocode for Peek

procedure Peek(stack):
if stack is empty:
report Stack is Empty
else:
return stack[top_pointer]
```

d) IsEmpty
This boolean operation checks if the stack contains any elements. It returns true if the stack is empty and false otherwise. It is crucial for preventing underflow errors.

```python

Pseudocode for IsEmpty

procedure IsEmpty(stack):
return top_pointer == -1 # Assuming -1 indicates an empty stack
```

📐 Time Complexity of Stack Operations

For a stack implemented using either an array or a linked list, the time complexity of the core operations is constant.

\text{Push}: O(1) \\ \text{Pop}: O(1) \\ \text{Peek}: O(1) \\ \text{IsEmpty}: O(1)

Variables:

None; the complexity is independent of the number of elements, $n$ , in the stack.

When to use: This is a fundamental property. Always assume

O(1)

complexity for these operations unless specified otherwise by a problem's context.

3. Implementations: Array vs. Linked List

While the stack is an abstract data type, it requires a concrete underlying implementation. The two primary methods are using an array or a linked list.

* Array Implementation: A fixed-size array is used to store elements. A variable, often named `top`, is used as an index to keep track of the top-most element. This implementation is simple and memory-efficient (no pointers to store), but it suffers from a fixed capacity. An attempt to push onto a full stack causes an overflow.

* Linked List Implementation: Each element is a node containing data and a pointer to the next node. The `top` of the stack is simply the head of the linked list. Pushing an element involves adding a new node at the head, and popping involves removing the head node. This implementation is dynamic and can grow as needed, limited only by available memory.

---

Worked Example: Tracing a Stack-Based Algorithm

Many GATE questions require the manual tracing of an algorithm's execution. Let us work through an example that mirrors this style.

Problem:

Consider the following pseudocode that operates on an empty stack $S$ and an integer variable $accumulator$ . What is the final value of $accumulator$ after the code finishes execution?

```pseudocode
Create empty stack S
accumulator = 0
i = 1

while (i <= 6) {
Push(S, i)
if (i % 2 == 0) {
val1 = Pop(S)
val2 = Pop(S)
Push(S, val1 * val2)
}
i = i + 1
}

while (S is not empty) {
accumulator = accumulator + Pop(S)
}

Output accumulator
```

Solution:

We will trace the state of the stack $S$ and the variable $i$ through each iteration of the first `while` loop.

Iteration $i = 1$ :

`Push( $S$ , 1)`: Stack $S$ is `[1]`. (Bottom to top)

` $i \% 2 == 0$ ` is false.

Iteration $i = 2$ :

`Push( $S$ , 2)`: Stack $S$ is `[1, 2]`.

` $i \% 2 == 0$ ` is true.

` $val1$ = Pop( $S$ )`: $val1$ becomes 2. Stack $S$ is `[1]`.

` $val2$ = Pop( $S$ )`: $val2$ becomes 1. Stack $S$ is `[]`.

`Push( $S$ , $val1 \times val2$ )`: `Push( $S$ , $2 \times 1$ )`. Stack $S$ is `[2]`.

Iteration $i = 3$ :

`Push( $S$ , 3)`: Stack $S$ is `[2, 3]`.

` $i \% 2 == 0$ ` is false.

Iteration $i = 4$ :

`Push( $S$ , 4)`: Stack $S$ is `[2, 3, 4]`.

` $i \% 2 == 0$ ` is true.

` $val1$ = Pop( $S$ )`: $val1$ becomes 4. Stack $S$ is `[2, 3]`.

` $val2$ = Pop( $S$ )`: $val2$ becomes 3. Stack $S$ is `[2]`.

`Push( $S$ , $val1 \times val2$ )`: `Push( $S$ , $4 \times 3$ )`. Stack $S$ is `[2, 12]`.

Iteration $i = 5$ :

`Push( $S$ , 5)`: Stack $S$ is `[2, 12, 5]`.

` $i \% 2 == 0$ ` is false.

Iteration $i = 6$ :

`Push( $S$ , 6)`: Stack $S$ is `[2, 12, 5, 6]`.

` $i \% 2 == 0$ ` is true.

` $val1$ = Pop( $S$ )`: $val1$ becomes 6. Stack $S$ is `[2, 12, 5]`.

` $val2$ = Pop( $S$ )`: $val2$ becomes 5. Stack $S$ is `[2, 12]`.

`Push( $S$ , $val1 \times val2$ )`: `Push( $S$ , $6 \times 5$ )`. Stack $S$ is `[2, 12, 30]`.

The first loop terminates. The final state of stack

S

(from bottom to top) is `[2, 12, 30]`.

Now, we execute the second `while` loop.

Step 1: $accumulator$ is initially 0. Stack is `[2, 12, 30]`.

$accumulator = 0 + \text{Pop}(S)$ : $accumulator$ becomes 30. Stack is `[2, 12]`.

Step 2:

$accumulator = 30 + \text{Pop}(S)$ : $accumulator$ becomes $30 + 12 = 42$ . Stack is `[2]`.

Step 3:

$accumulator = 42 + \text{Pop}(S)$ : $accumulator$ becomes $42 + 2 = 44$ . Stack is `[]`.

The stack is now empty, and the loop terminates.

Answer: The final value of $accumulator$ is 44.

---

Problem-Solving Strategies

💡 Trace with a Table

For GATE questions involving pseudocode and stacks, the most reliable method is to trace the execution manually using a table. Do not try to solve it mentally, as this can lead to errors.

Create columns for:

The loop counter or current instruction.

The state of the stack (write it out explicitly, e.g., `[10, 5, 20]`).

The values of all relevant variables (e.g., $sum$ , $flag$ , $temp$ ).

Update the table row-by-row for each step in the algorithm. This systematic approach minimizes confusion and ensures accuracy, especially under exam pressure.

---

Common Mistakes

⚠️ Avoid These Errors

❌ Confusing Pop and Peek: Assuming `Pop()` only reads the top value.

✅ Remember that `Pop()` removes the element from the stack and returns its value. `Peek()` (or `Top()`) reads the value without removing it.

❌ Incorrect Underflow/Overflow Handling: Performing a `Pop()` or `Peek()` on an empty stack, or a `Push()` on a full, array-based stack.

✅ Always mentally check the `IsEmpty()` condition before a `Pop()` and the `IsFull()` condition (if applicable) before a `Push()` when tracing code. GATE problems often test these edge cases.

❌ Reversing Stack Order: When listing the contents of a stack after several operations, listing them in the order they were pushed (First-In, First-Out).

✅ Always remember LIFO. The last element pushed is at the top and will be the first one popped. When visualizing, draw the stack vertically with the top element at the highest position.

---

Practice Questions

:::question type="NAT" question="An empty stack $S$ and an integer $result = 0$ are given. The following operations are performed in sequence: `Push( $S$ , 10)`, `Push( $S$ , 5)`, `Push( $S$ , 20)`, $result = result + \text{Pop}(S)$ , `Push( $S$ , 15)`, $result = result + \text{Pop}(S)$ , $result = result + \text{Pop}(S)$ . What is the final value of $result$ ?" answer="50" hint="Trace the operations one by one, keeping track of the stack's state and the value of ' $result$ '." solution="
Step 1: Initialize $S = []$ , $result = 0$ .

Step 2: `Push( $S$ , 10)`.
$S$ is `[10]`.

Step 3: `Push( $S$ , 5)`.
$S$ is `[10, 5]`.

Step 4: `Push( $S$ , 20)`.
$S$ is `[10, 5, 20]`.

Step 5: $result = result + \text{Pop}(S)$ .
The popped value is 20.
$result = 0 + 20 = 20$ .
$S$ is `[10, 5]`.

Step 6: `Push( $S$ , 15)`.
$S$ is `[10, 5, 15]`.

Step 7: $result = result + \text{Pop}(S)$ .
The popped value is 15.
$result = 20 + 15 = 35$ .
$S$ is `[10, 5]`.

Step 8: $result = result + \text{Pop}(S)$ .
The popped value is 5.
$result = 35 + 5 = 50$ .
$S$ is `[10]`.

Result: The final value of $result$ is \boxed{50}.
"
:::

:::question type="MCQ" question="A stack is implemented using an array of size 10 (indices 0 to 9). The $top$ pointer is initialized to -1. After performing 5 `Push` operations, the $top$ pointer will have the value:" options=["5", "6", "4", "3"] answer="4" hint="The ' $top$ ' pointer stores the index of the last element inserted. If the first element is at index 0, what will be the index of the fifth element?" solution="
Step 1: Initially, $top = -1$ .

Step 2: After the 1st `Push`, $top$ becomes 0.

Step 3: After the 2nd `Push`, $top$ becomes 1.

Step 4: After the 3rd `Push`, $top$ becomes 2.

Step 5: After the 4th `Push`, $top$ becomes 3.

Step 6: After the 5th `Push`, $top$ becomes 4.

Result: The $top$ pointer will point to index \boxed{4}.
"
:::

:::question type="MSQ" question="Which of the following applications correctly use a stack data structure?" options=["Implementing function calls in a programming language", "Implementing a First-In-First-Out (FIFO) queue", "Evaluating a postfix arithmetic expression", "Implementing a breadth-first search (BFS) in a graph"] answer="Implementing function calls in a programming language,Evaluating a postfix arithmetic expression" hint="Think about which applications require a Last-In-First-Out (LIFO) access pattern." solution="

Implementing function calls: This uses a call stack. When a function is called, its activation record is pushed onto the stack. When it returns, the record is popped. This is a classic LIFO application. So, this option is correct.

Implementing a FIFO queue: A queue is inherently First-In, First-Out, which is the opposite of a stack's LIFO behavior. While a queue can be implemented using two stacks, a single stack itself is not a FIFO structure. So, this option is incorrect.

Evaluating a postfix expression: When scanning a postfix expression, operands are pushed onto a stack. When an operator is encountered, the required number of operands are popped, the operation is performed, and the result is pushed back. This is a standard stack application. So, this option is correct.

Implementing BFS: Breadth-first search explores a graph level by level, which requires a FIFO data structure, namely a queue, to store the nodes to visit. Depth-first search (DFS), in contrast, uses a stack. So, this option is incorrect.

"
:::

:::question type="NAT" question="Consider the following pseudocode operating on an empty stack $S$ . What is the value returned at the end?

```pseudocode
procedure CalculateValue():
Create empty stack S
sum = 0
for i from 1 to 5:
Push(S, i)
if (IsEmpty(S) == false and Peek(S) > 3):
sum = sum + Pop(S)
return sum
```
" answer="9" hint="Trace the for loop from $i$ =1 to 5. The $sum$ is only updated when the top element of the stack is greater than 3." solution="
Step 1: Initialize $S = []$ , $sum = 0$ .

Loop $i = 1$ :

`Push( $S$ , 1)`. $S$ is `[1]`.

`Peek( $S$ )` is 1, which is not > 3. $sum$ remains 0.

Loop $i = 2$ :

`Push( $S$ , 2)`. $S$ is `[1, 2]`.

`Peek( $S$ )` is 2, which is not > 3. $sum$ remains 0.

Loop $i = 3$ :

`Push( $S$ , 3)`. $S$ is `[1, 2, 3]`.

`Peek( $S$ )` is 3, which is not > 3. $sum$ remains 0.

Loop $i = 4$ :

`Push( $S$ , 4)`. $S$ is `[1, 2, 3, 4]`.

`Peek( $S$ )` is 4, which is > 3.

$sum = sum + \text{Pop}(S)$ . `Pop( $S$ )` returns 4. $sum = 0 + 4 = 4$ .

$S$ is now `[1, 2, 3]`.

Loop $i = 5$ :

`Push( $S$ , 5)`. $S$ is `[1, 2, 3, 5]`.

`Peek( $S$ )` is 5, which is > 3.

$sum = sum + \text{Pop}(S)$ . `Pop( $S$ )` returns 5. $sum = 4 + 5 = 9$ .

$S$ is now `[1, 2, 3]`.

Step 2: The loop finishes. The function returns

sum

Result: The final value of $sum$ is \boxed{9}.
"
:::

---

Summary

❗ Key Takeaways for GATE

LIFO is Paramount: All stack-related problems are built upon the Last-In, First-Out principle. Every operation (`Push`, `Pop`) must be interpreted through this lens.

Master Algorithm Tracing: The ability to systematically trace pseudocode involving a stack is the most frequently tested skill. Use a table to track the stack's state and all variables to avoid errors.

Know the Core Operations: Understand the precise function of `Push`, `Pop`, `Peek`, and `IsEmpty`, including their constant $O(1)$ time complexity and the error conditions of Underflow and Overflow.

---

What's Next?

💡 Continue Learning

The stack is a building block for more complex topics. A solid foundation here will aid in understanding related data structures and algorithms.

Queues: Contrast the stack's LIFO behavior with the queue's First-In, First-Out (FIFO) principle. Many problems involve using both or comparing their properties.
Recursion: Every recursive function call is managed internally by the program's call stack. Understanding stacks demystifies how recursion works and helps in analyzing its space complexity.
Expression Evaluation: Stacks are central to algorithms for converting expressions from infix to postfix or prefix notation, and for evaluating these notations directly. This is a classic and important application area.

Mastering these connections will provide a more comprehensive and robust preparation for the GATE examination.

---

💡 Moving Forward

Now that you understand Stacks, let's explore Queues which builds on these concepts.

---

Part 3: Queues

Introduction

In the study of data structures, the Queue represents a fundamental concept of ordered data collection. Analogous to a queue of people waiting for a service, this linear data structure adheres to a strict principle of operation known as First-In, First-Out (FIFO). Elements are added at one end, referred to as the rear, and removed from the opposite end, known as the front. This behavior makes queues indispensable in a multitude of computing scenarios, ranging from managing requests on a single shared resource, such as a printer, to implementing task scheduling algorithms in operating systems and traversing graph or tree structures.

Our examination of queues will proceed from the foundational principles to their practical implementations. We will explore the core operations that define a queue's behavior and analyze the common methods of implementation using both arrays and linked lists. Furthermore, we will investigate important variations, such as the circular queue, which optimizes space utilization, and the double-ended queue (deque), which provides enhanced flexibility by allowing operations at both ends. A thorough understanding of these concepts is essential, as they frequently form the basis for more complex algorithms and systems-level problems encountered in the GATE examination.

📖 Queue

A Queue is an abstract data type (ADT) that functions as a linear collection of elements, characterized by the First-In, First-Out (FIFO) principle. In a queue, insertion of a new element, known as enqueue, occurs at one end, termed the rear (or tail), while the removal of an existing element, known as dequeue, occurs at the other end, termed the front (or head).

---

Key Concepts

1. Basic Queue Operations

A queue is primarily defined by two operations: enqueue and dequeue. To manage these operations, we typically maintain two pointers or indices: $front$ and $rear$ .

`front`: Points to the first element in the queue. This is the element that will be removed next.
`rear`: Points to the last element in the queue. This is where the next new element will be inserted.

The primary operations are:

`enqueue(element)`: Adds `element` to the `rear` of the queue.
`dequeue()`: Removes and returns the element from the `front` of the queue.
`peek()` (or `front()`): Returns the element at the `front` of the queue without removing it.
`isEmpty()`: Returns true if the queue contains no elements.
`isFull()`: Returns true if the queue has reached its maximum capacity (relevant for static, array-based implementations).

Let us visualize the state of a queue as a sequence of operations is performed.

Conceptual View of a Queue

A

B

C

Front

Rear

Enqueue (add)

Dequeue (remove)

Worked Example:

Problem: An initially empty queue undergoes the following sequence of operations: `enqueue(5)`, `enqueue(12)`, `dequeue()`, `enqueue(8)`, `enqueue(9)`, `dequeue()`. What are the contents of the queue after these operations?

Solution:

Let $Q$ be the queue, with the front on the left.

Step 1: Initial state
The queue is empty.
$Q: []$ , $front = -1$ , $rear = -1$

Step 2: `enqueue(5)`
The element 5 is added. $front$ and $rear$ both point to it.
$Q: [5]$ , $front = 0$ , $rear = 0$

Step 3: `enqueue(12)`
The element 12 is added to the rear. $rear$ is updated.
$Q: [5, 12]$ , $front = 0$ , $rear = 1$

Step 4: `dequeue()`
The element at the front (5) is removed. $front$ is updated.
$Q: [12]$ , $front = 1$ , $rear = 1$

Step 5: `enqueue(8)`
The element 8 is added to the rear. $rear$ is updated.
$Q: [12, 8]$ , $front = 1$ , $rear = 2$

Step 6: `enqueue(9)`
The element 9 is added to the rear. $rear$ is updated.
$Q: [12, 8, 9]$ , $front = 1$ , $rear = 3$

Step 7: `dequeue()`
The element at the front (12) is removed. $front$ is updated.
$Q: [8, 9]$ , $front = 2$ , $rear = 3$

Answer: The final contents of the queue are $[8, 9]$ .

---

2. Circular Queue

A naive array-based implementation of a queue can be inefficient. As elements are dequeued, the $front$ pointer moves forward, leaving unused space at the beginning of the array. This "drifting" effect means the queue can become "full" even when there are empty slots in the array.

The circular queue elegantly solves this problem by treating the array as a circle. When the $rear$ or $front$ pointer reaches the end of the array, it wraps around to the beginning. This is achieved using the modulo operator.

📐 Circular Queue Pointer Updates

Given a queue implemented with an array of size $N$ :

To enqueue an element:

rear = (rear + 1) \pmod N

To dequeue an element:

front = (front + 1) \pmod N

Variables:

$N$ = capacity of the array

$front$ = index of the front element

$rear$ = index of the rear element

When to use: For efficient, fixed-size queue implementations using arrays.

A critical aspect of circular queues is determining the `isFull` and `isEmpty` conditions. A common convention is to keep one slot empty to distinguish between a full and an empty queue.

Empty Condition: $front == rear$
Full Condition: $(rear + 1) \pmod N == front$

⚠️ Common Mistake

❌ A common error is to fill the entire array. If $rear$ becomes equal to $front$ after an enqueue operation, this state is indistinguishable from the initial empty state.

✅ To avoid this ambiguity, a circular queue of size $N$ is considered full when it contains $N-1$ elements. The condition $(rear + 1) \pmod N == front$ correctly identifies this state.

Worked Example:

Problem: Consider a circular queue of size 5, initially empty ( $front=0$ , $rear=0$ ). Trace the state of $front$ and $rear$ after the following operations: `enqueue(A)`, `enqueue(B)`, `enqueue(C)`, `dequeue()`, `enqueue(D)`, `enqueue(E)`.

Solution:

Let the array size be $N=5$ . $Q$ is the array.

Step 1: Initial state
$front = 0$ , $rear = 0$ . Queue is empty.
$Q: [ , , , , ]$

Step 2: `enqueue(A)`
$rear = (0 + 1) \pmod 5 = 1$ . $Q[1] = A$ .
$front = 0$ , $rear = 1$ .
$Q: [ , A, , , ]$

Step 3: `enqueue(B)`
$rear = (1 + 1) \pmod 5 = 2$ . $Q[2] = B$ .
$front = 0$ , $rear = 2$ .
$Q: [ , A, B, , ]$

Step 4: `enqueue(C)`
$rear = (2 + 1) \pmod 5 = 3$ . $Q[3] = C$ .
$front = 0$ , $rear = 3$ .
$Q: [ , A, B, C, ]$

Step 5: `dequeue()`
The element at $(front + 1) \pmod 5$ , which is $Q[1]$ , is removed.
$front = (0 + 1) \pmod 5 = 1$ .
$front = 1$ , $rear = 3$ .
$Q: [ , , B, C, ]$ (A is removed)

Step 6: `enqueue(D)`
$rear = (3 + 1) \pmod 5 = 4$ . $Q[4] = D$ .
$front = 1$ , $rear = 4$ .
$Q: [ , , B, C, D]$

Step 7: `enqueue(E)`
$rear = (4 + 1) \pmod 5 = 0$ . $Q[0] = E$ .
$front = 1$ , $rear = 0$ .
$Q: [E, , B, C, D]$

Answer: The final state is $front = 1$ , $rear = 0$ . The queue is now full, as $(rear + 1) \pmod 5 = (0 + 1) \pmod 5 = 1$ , which is equal to $front$ . The elements in order are B, C, D, E.

---

3. Double-Ended Queue (Deque)

A double-ended queue, or deque (pronounced "deck"), is a generalization of a queue that allows for insertion and deletion of elements at both the front and the rear. This makes it a more versatile data structure.

The fundamental operations of a deque are:

`insertFirst(e)`: Inserts element `e` at the beginning.
`insertLast(e)`: Inserts element `e` at the end (equivalent to `enqueue`).
`removeFirst()`: Removes and returns the element from the beginning (equivalent to `dequeue`).
`removeLast()`: Removes and returns the element from the end.

Because of this flexibility, a deque can be used to implement both a stack (using `insertFirst` and `removeFirst`) and a queue (using `insertLast` and `removeFirst`).

Worked Example:

Problem: An initially empty deque $D$ undergoes the following sequence of operations. Trace the contents of $D$ and determine the final value of the variable $x$ .
`insertLast(20)`
`insertFirst(10)`
`insertLast(30)`
$x \leftarrow \operatorname{removeLast}()$
`insertFirst(5)`
$x \leftarrow \operatorname{removeFirst}()$

Solution:

Let us trace the state of the deque $D$ , with the front on the left.

Step 1: Initial state
$D$ is empty.
$D: []$

Step 2: `insertLast(20)`
20 is added to the rear.
$D: [20]$

Step 3: `insertFirst(10)`
10 is added to the front.
$D: [10, 20]$

Step 4: `insertLast(30)`
30 is added to the rear.
$D: [10, 20, 30]$

Step 5: $x \leftarrow \operatorname{removeLast}()$
The last element, 30, is removed and assigned to $x$ .
$x = 30$
$D: [10, 20]$

Step 6: `insertFirst(5)`
5 is added to the front.
$D: [5, 10, 20]$

Step 7: $x \leftarrow \operatorname{removeFirst}()$
The first element, 5, is removed and assigned to $x$ .
$x = 5$
$D: [10, 20]$

Answer: The final value of $x$ is 5.

---

Problem-Solving Strategies

💡 GATE Strategy: Tracing Queue Operations

When faced with a sequence of queue or deque operations, do not attempt to solve it mentally. This is highly error-prone.

Draw It Out: On your rough sheet, draw a series of boxes representing the array or linked list. For an array of size $N$ , draw $N$ boxes and label indices $0$ to $N-1$ .

Maintain Pointers: Keep track of the $front$ and $rear$ indices. Write their current values and update them for each operation.

Update Contents: Cross out removed elements and write in new elements as they are enqueued. For circular queues, be meticulous with the modulo arithmetic: $(index + 1) \pmod N$ .

Track Variables: If the question involves assigning dequeued values to variables (like $a \leftarrow \operatorname{removeFirst}()$ ), maintain a separate section on your sheet to track the current value of each variable.

This methodical, visual approach minimizes calculation errors under exam pressure.

---

Common Mistakes

⚠️ Avoid These Errors

❌ Confusing FIFO with LIFO: The most fundamental error is mixing up the behavior of a queue (First-In, First-Out) with that of a stack (Last-In, First-Out). Always remember: in a queue, the element that has been in the collection the longest is the first one to be removed.

✅ Correct Approach: Associate "Queue" with a real-world queue (a line of people). The person who arrived first is served first.

❌ Incorrect Circular Queue Conditions: For a circular array of size $N$ with `front` and `rear` pointers, it's easy to misidentify the full/empty conditions.

✅ Correct Approach: Assuming the convention where one slot is kept empty, the conditions are: - Empty: `front == rear` - Full: `(rear + 1) % N == front`

❌ Forgetting Deque Flexibility: Students sometimes treat a deque like a standard queue, only considering operations at opposite ends.

✅ Correct Approach: Read the operation names carefully: `insertFirst`, `removeLast`, etc. Each one specifies exactly which end of the deque is being modified.

---

Practice Questions

:::question type="MCQ" question="A data structure allows insertion only at the rear and deletion from both the front and the rear. This structure is a specific type of:" options=["Input-restricted deque", "Output-restricted deque", "Stack", "Circular Queue"] answer="Input-restricted deque" hint="Consider which operations are restricted. 'Input' refers to insertion, and 'Output' refers to deletion." solution="A standard deque allows insertion and deletion at both ends. An 'input-restricted' deque restricts insertions to one end (in this case, the rear). An 'output-restricted' deque restricts deletions to one end. Since this structure allows deletion from both ends but insertion only at one, it is an input-restricted deque.
Answer: \boxed{Input-restricted deque}"
:::

:::question type="NAT" question="An initially empty circular queue of capacity 6 (indices 0 to 5) is managed with `front` and `rear` pointers, both initialized to 0. After the following sequence of operations, what is the value of the `front` pointer?
`enqueue(10)`, `enqueue(20)`, `enqueue(30)`, `dequeue()`, `enqueue(40)`, `enqueue(50)`, `dequeue()`, `enqueue(60)`" answer="2" hint="Trace the values of front and rear using modulo arithmetic with N=6. Remember that dequeue updates the front pointer." solution="
Initial: `front = 0`, `rear = 0`
1. enqueue(10): `rear = $(0+1)\%6 = 1$ `.
2. enqueue(20): `rear = $(1+1)\%6 = 2$ `.
3. enqueue(30): `rear = $(2+1)\%6 = 3$ `.
`front=0`, `rear=3`. Contents: `[10, 20, 30]`
4. dequeue(): `front = $(0+1)\%6 = 1$ `.
`front=1`, `rear=3`. Contents: `[20, 30]`
5. enqueue(40): `rear = $(3+1)\%6 = 4$ `.
6. enqueue(50): `rear = $(4+1)\%6 = 5$ `.
`front=1`, `rear=5`. Contents: `[20, 30, 40, 50]`
7. dequeue(): `front = $(1+1)\%6 = 2$ `.
`front=2`, `rear=5`. Contents: `[30, 40, 50]`
8. enqueue(60): `rear = $(5+1)\%6 = 0$ `.
`front=2`, `rear=0`. Contents: `[30, 40, 50, 60]`

Result: The final value of the `front` pointer is 2.
Answer: \boxed{2}"
:::

:::question type="MSQ" question="Which of the following statements about queue implementations are correct?" options=["A linked-list implementation of a queue has a worst-case time complexity of $O(1)$ for both enqueue and dequeue operations.", "An array-based implementation of a simple (non-circular) queue can lead to wasted space.", "A deque can be used to implement a stack.", "In a circular queue of size $N$ , the maximum number of elements that can be stored is always $N$ ."] answer="A,B,C" hint="Analyze each implementation's properties. Consider time complexity, space efficiency, and functional equivalence." solution="

A: Correct. In a linked-list implementation with both `head` and `tail` pointers, enqueue (adding to tail) and dequeue (removing from head) are $O(1)$ operations.

B: Correct. In a simple array-based queue, as elements are dequeued, the `front` pointer moves forward, leaving the space at the beginning of the array unusable.

C: Correct. A stack (LIFO) can be implemented using a deque by performing all insertions (`insertFirst`) and deletions (`removeFirst`) at the same end.

D: Incorrect. To distinguish between a full and an empty state, a circular queue of size $N$ can typically hold a maximum of $N-1$ elements.

Answer: \boxed{A,B,C}"
:::

:::question type="NAT" question="In an empty double-ended queue, the following operations are performed sequentially:
`insertFirst(5)`
`insertLast(15)`
`insertFirst(2)`
`v = removeLast()`
`insertLast(25)`
`v = removeFirst()`
`insertFirst(10)`
`v = removeLast()`

The final value of the variable `v` is _____.
" answer="25" hint="Carefully trace the state of the deque and the value of `v` after each removal operation." solution="
Step 1: Initial state
`D: []`

Step 2: `insertFirst(5)`
`D: [5]`

Step 3: `insertLast(15)`
`D: [5, 15]`

Step 4: `insertFirst(2)`
`D: [2, 5, 15]`

Step 5: `v = removeLast()`
`v` becomes 15. `D` becomes `[2, 5]`.

Step 6: `insertLast(25)`
`D: [2, 5, 25]`

Step 7: `v = removeFirst()`
`v` becomes 2. `D` becomes `[5, 25]`.

Step 8: `insertFirst(10)`
`D: [10, 5, 25]`

Step 9: `v = removeLast()`
`v` becomes 25. `D` becomes `[10, 5]`.

Result: The final value assigned to `v` is 25.
Answer: \boxed{25}"
:::

---

Summary

❗ Key Takeaways for GATE

FIFO Principle: The defining characteristic of a queue is its First-In, First-Out (FIFO) nature. This contrasts directly with the Last-In, First-Out (LIFO) principle of a stack.

Core Operations: The fundamental operations are `enqueue` (add to rear) and `dequeue` (remove from front). Be proficient in tracing the state of a queue through a series of these operations.

Circular Queue Efficiency: The circular queue is the standard array-based implementation. Master the modulo arithmetic for pointer updates (`(index + 1) % N`) and the conditions for empty (`front == rear`) and full (`(rear + 1) % N == front`).

Deque Versatility: The double-ended queue (deque) is a powerful generalization, allowing insertion and deletion at both ends. Understand its four primary operations: `insertFirst`, `insertLast`, `removeFirst`, `removeLast`.

---

What's Next?

💡 Continue Learning

This topic connects to several other crucial areas in the GATE syllabus. Mastering queues is a prerequisite for understanding:

Stacks: The stack is the logical counterpart to the queue (LIFO vs. FIFO). Questions often test the differences between them.

Breadth-First Search (BFS): The BFS algorithm for traversing trees and graphs relies fundamentally on a queue to manage the nodes to be visited.

Operating Systems: Queues are used extensively in OS for scheduling. For example, the First-Come, First-Served (FCFS) CPU scheduling algorithm is a direct implementation of a queue.

Master these connections for a comprehensive understanding of how data structures are applied in algorithms and systems.

---

💡 Moving Forward

Now that you understand Queues, let's explore Trees which builds on these concepts.

---

Part 4: Trees

Introduction

In the study of data structures, we transition from linear collections, such as arrays and linked lists, to more complex, non-linear arrangements of data. Among these, the tree stands as a structure of paramount importance. A tree is an abstract data type that simulates a hierarchical structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes. Its non-linear nature allows for the representation of data with intrinsic parent-child relationships, a feature not elegantly captured by linear structures.

The utility of trees is pervasive in computer science. They form the basis for file systems, where directories branch into subdirectories and files. They are fundamental to the organization of data for efficient searching, as seen in Binary Search Trees and their balanced variants. Furthermore, trees are instrumental in parsing expressions in compilers, representing decision processes in artificial intelligence, and as the underlying structure for more advanced data structures like heaps and tries. A firm grasp of tree terminology, properties, and traversal algorithms is therefore indispensable for a competitive examination such as GATE.

📖 Tree

A tree is a finite set of one or more nodes such that there is a specially designated node called the root, and the remaining nodes are partitioned into $n \ge 0$ disjoint sets $T_1, T_2, \dots, T_n$ , where each of these sets is a tree. The sets $T_1, T_2, \dots, T_n$ are called the subtrees of the root. Formally, a tree can be defined as a connected, acyclic, undirected graph. A rooted tree is a tree in which one vertex has been designated as the root.

---

Key Concepts

1. Fundamental Terminology of Rooted Trees

To discuss trees with precision, we must first establish a common vocabulary. Let us consider a general rooted tree to illustrate these fundamental concepts.

* Node: The basic entity in a tree that contains data and may link to other nodes.
* Edge: A connection between two nodes.
* Root: The topmost node in a tree, which has no parent. A tree has exactly one root.
* Parent: A node that has an edge to a child node. Every node except the root has exactly one parent.
* Child: A node that has an edge from a parent node.
* Siblings: Nodes that share the same parent.
* Leaf Node (or Terminal Node): A node with no children.
* Internal Node (or Non-Terminal Node): A node with at least one child.
* Path: A sequence of nodes and edges connecting a node with a descendant.
* Depth of a Node: The number of edges from the root to the node. The depth of the root is 0.
* Level of a Node: The level of a node is defined by $1 + \text{depth of the node}$ . The level of the root is 1. (Note: Some conventions start levels at 0).
* Height of a Node: The number of edges on the longest path from the node to a leaf. The height of a leaf node is 0.
* Height of a Tree: The height of its root node. The height of a tree with a single node is 0.

The following diagram illustrates these relationships within a sample rooted tree.

A

B

C

D

E

F

G

H

Root (A)
Level 2
Level 1
Level 3
Level 4
Leaf Nodes: D, F, G, H
Internal Nodes: A, B, C, E
Height of Tree = 3
Siblings: {B, C}, {D, E}, {G, H}

---

2. Binary Trees

A particularly important and frequently encountered type of tree is the binary tree.

📖 Binary Tree

A binary tree is a rooted tree in which each node has at most two children, which are referred to as the left child and the right child. A node's left child is the root of its left subtree, and its right child is the root of its right subtree.

Several special classes of binary trees are defined based on their structure.

* Full Binary Tree: A binary tree in which every node has either 0 or 2 children. There are no nodes with only one child.
* Complete Binary Tree: A binary tree in which all levels are completely filled except possibly the last level, and the last level has all its nodes as far left as possible. This structure is critical for implementing priority queues using heaps.
* Perfect Binary Tree: A binary tree in which all internal nodes have two children and all leaf nodes are at the same level. A perfect binary tree of height $H$ has exactly $2^{H+1} - 1$ nodes.
* Skewed Binary Tree: A binary tree where each node is connected to only one child (either left or right). This degenerates into a structure resembling a linked list.

Full

Complete

Perfect

Skewed

---

3. Properties of Binary Trees

Several properties and inequalities relate the number of nodes, leaves, internal nodes, and height. These are frequently tested in competitive examinations. Let $N$ be the total number of nodes, $L$ be the number of leaf nodes, $I$ be the number of internal nodes, and $H$ be the height of the binary tree.

📐 Nodes in a Binary Tree

N = L + I

Variables:

$N$ = Total number of nodes

$L$ = Number of leaf nodes

$I$ = Number of internal nodes

When to use: This fundamental identity holds for any non-empty tree and is useful for relating the different node counts.

A particularly powerful property relates the number of leaf nodes to the number of internal nodes in a full binary tree.

📐 Leaves in a Full Binary Tree

L = I + 1

Variables:

$L$ = Number of leaf nodes

$I$ = Number of internal nodes

When to use: This applies only to non-empty full binary trees. It provides a direct relationship between the tree's branching structure and its leaves. For any non-empty binary tree (not necessarily full), the inequality

L \le I + 1

holds.

The height of a binary tree imposes strict bounds on the total number of nodes it can contain.

📐 Bounds on Nodes vs. Height

H + 1 \le N \le 2^{H+1} - 1

Variables:

$H$ = Height of the tree (height of a single node is 0)

$N$ = Total number of nodes

When to use:

The minimum number of nodes, $N_{min} = H + 1$ , occurs in a skewed binary tree.

The maximum number of nodes, $N_{max} = 2^{H+1} - 1$ , occurs in a perfect binary tree.

From these primary relationships, we can derive bounds for other quantities. For instance, the number of leaf nodes $L$ is bounded by $1 \le L \le 2^H$ . The minimum of 1 leaf occurs in a skewed tree, while the maximum of $2^H$ leaves occurs in a perfect binary tree.

Worked Example:

Problem: A full binary tree has 25 leaf nodes. Calculate the total number of nodes in the tree.

Solution:

Step 1: Identify the given information and the type of tree.
We are given a full binary tree with $L = 25$ .

Step 2: Apply the property relating leaf nodes and internal nodes for a full binary tree.

L = I + 1

Step 3: Solve for the number of internal nodes, $I$ .

25 = I + 1

I = 25 - 1 = 24

Step 4: Calculate the total number of nodes, $N$ , using the fundamental identity $N = L + I$ .

N = 25 + 24

N = 49

Answer: $\boxed{49}$

---

4. Tree Traversal Algorithms

Traversing a tree means visiting every node in a specified order. For binary trees, the three most common depth-first traversal methods are Inorder, Preorder, and Postorder. The names are derived from the position of the root node's visit relative to its left and right subtrees.

Let us consider a node $N$ , its left subtree $L$ , and its right subtree $R$ .

Inorder Traversal (LNR):

* Traverse the left subtree: Visit(

L

)
* Visit the root: Visit(

N

)
* Traverse the right subtree: Visit(

R

)
Application:* In a Binary Search Tree (BST), an inorder traversal visits the nodes in ascending sorted order.

Preorder Traversal (NLR):

* Visit the root: Visit(

N

)
* Traverse the left subtree: Visit(

L

)
* Traverse the right subtree: Visit(

R

)
Application:* Useful for creating a copy of the tree or for obtaining the prefix expression of an expression tree.

Postorder Traversal (LRN):

* Traverse the left subtree: Visit(

L

)
* Traverse the right subtree: Visit(

R

)
* Visit the root: Visit(

N

)
Application:* Essential for deleting nodes in a tree to ensure children are deleted before their parent, or for obtaining the postfix expression of an expression tree.

Worked Example:

Problem: For the binary tree shown below, determine the Inorder, Preorder, and Postorder traversals.

F

B

G

A

D

I

C

E

H

Solution:

1. Inorder Traversal (LNR):

Start at root F. Go left to B.

At B, go left to A. A is a leaf, visit A.

Return to B, visit B.

At B, go right to D.

At D, go left to C. C is a leaf, visit C.

Return to D, visit D.

At D, go right to E. E is a leaf, visit E.

Left subtree of F is done. Visit F.

Go right to G.

At G, go left to I. I is a leaf, visit I.

Return to G, visit G.

At G, go right to H. H is a leaf, visit H.

Result (Inorder): A, B, C, D, E, F, I, G, H

2. Preorder Traversal (NLR):

Start at root F, visit F.

Go left to B, visit B.

At B, go left to A, visit A.

Return to B, go right to D, visit D.

At D, go left to C, visit C.

Return to D, go right to E, visit E.

Return to F, go right to G, visit G.

At G, go left to I, visit I.

Return to G, go right to H, visit H.

Result (Preorder): F, B, A, D, C, E, G, I, H

3. Postorder Traversal (LRN):

Start at root F. Go left to B.

At B, go left to A. A is a leaf, visit A.

Return to B, go right to D.

At D, go left to C. C is a leaf, visit C.

Return to D, go right to E. E is a leaf, visit E.

Return to D, both children visited, visit D.

Return to B, both children visited, visit B.

Return to F. Go right to G.

At G, go left to I. I is a leaf, visit I.

Return to G, go right to H. H is a leaf, visit H.

Return to G, both children visited, visit G.

Return to F, both children visited, visit F.

Result (Postorder): A, C, E, D, B, I, H, G, F

---

5. Reconstruction of Binary Trees

A common and important problem is to reconstruct a binary tree given its traversal sequences. The ability to do so depends on the specific traversals provided.

❗ Must Remember

To uniquely reconstruct a general binary tree, a combination of Inorder traversal with either Preorder or Postorder traversal is required.

The Preorder or Postorder traversal is used to identify the root of any subtree.
The Inorder traversal is used to determine the nodes in the left and right subtrees of that root.

Reconstruction from Preorder and Inorder:

The first element in the Preorder sequence is the root of the tree.

Locate this root element in the Inorder sequence.

All elements to the left of the root in the Inorder sequence form the left subtree.

All elements to the right of the root in the Inorder sequence form the right subtree.

Recursively apply steps 1-4 to the left and right subtrees using the corresponding sub-sequences from the Preorder and Inorder traversals.

Reconstruction from Postorder and Inorder:
The logic is analogous.

The last element in the Postorder sequence is the root of the tree.

The rest of the process is the same as above.

The Ambiguity of Preorder and Postorder:
A combination of Preorder and Postorder traversals is not sufficient to uniquely reconstruct a general binary tree. This is because ambiguity arises when a node has only one child. We cannot determine if it is a left or a right child.

Special Case: Full Binary Trees
This ambiguity is resolved if the tree is known to be a full binary tree. In a full binary tree, every node has either 0 or 2 children. Therefore, if an internal node is known to have a child, it must have two. This constraint makes reconstruction from Preorder and Postorder traversals possible and unique.

---

Problem-Solving Strategies

💡 GATE Strategy

Reconstruction Problems: Always identify the root first. For Preorder, it's the first element. For Postorder, it's the last. Use this root to partition the Inorder sequence. This divide-and-conquer approach is the fastest way to solve reconstruction problems.

Property-based Inequalities: When faced with questions involving inequalities on $N, L, I, H$ , test the extreme cases:

Minimum Nodes / Maximum Height: A skewed tree ( $N = H+1$ ).
Maximum Nodes / Minimum Height: A perfect or complete binary tree (

N = 2^{H+1} - 1

).
By plugging in values for these two cases, you can often quickly validate or eliminate options.

---

Common Mistakes

⚠️ Avoid These Errors

❌ Applying Full Tree Properties to General Trees: The property $L = I + 1$ is only guaranteed for non-empty full binary trees. Do not assume it holds for all binary trees.
❌ Assuming Preorder + Postorder is Sufficient: For a general binary tree, Preorder and Postorder traversals are not enough for unique reconstruction. This combination is only sufficient if you are explicitly told the tree is a full binary tree.
❌ Confusing Height and Levels: Remember the standard convention: height is the number of edges on the longest path from the root to a leaf. A single-node tree has height $0$ . The number of levels is $H + 1$ . Be mindful of the definitions used in the question.

---

Practice Questions

:::question type="MCQ" question="The inorder and preorder traversals of a binary tree are D, B, H, E, A, F, C, G and A, B, D, E, H, C, F, G respectively. What is the postorder traversal of the tree?" options=["D, H, E, B, F, G, C, A","D, H, E, B, G, F, C, A","H, E, D, B, F, G, C, A","H, D, E, B, G, F, C, A"] answer="D, H, E, B, G, F, C, A" hint="First, construct the tree. The root is the first element of the preorder traversal. Use the inorder traversal to find the left and right subtrees." solution="
Step 1: Identify the root.
From the preorder traversal (A, B, D, E, H, C, F, G), the root is 'A'.

Step 2: Partition the inorder traversal using the root.
The inorder traversal is (D, B, H, E, A, F, C, G).
Elements to the left of 'A': D, B, H, E. This is the left subtree.
Elements to the right of 'A': F, C, G. This is the right subtree.

Step 3: Reconstruct the left subtree.
Preorder for left subtree: B, D, E, H
Inorder for left subtree: D, B, H, E
The root of this subtree is 'B'.
Left of 'B' in inorder: D. Right of 'B': H, E.
The left child of 'B' is 'D'.
For the right subtree of 'B': Preorder is E, H; Inorder is H, E. The root is 'E'. Left of 'E' is 'H'. So 'H' is the left child of 'E'.

Step 4: Reconstruct the right subtree.
Preorder for right subtree: C, F, G
Inorder for right subtree: F, C, G
The root of this subtree is 'C'.
Left of 'C' in inorder: F. Right of 'C': G.
The left child of 'C' is 'F' and the right child is 'G'.

Step 5: The final tree structure is:
Root: A
A's left child: B
A's right child: C
B's left child: D
B's right child: E
E's left child: H
C's left child: F
C's right child: G

Step 6: Perform a postorder (LRN) traversal on the constructed tree.
Visit left subtree of A (subtree B):
Visit left subtree of B (D): visit D
Visit right subtree of B (subtree E):
Visit left subtree of E (H): visit H
Visit right subtree of E (null)
Visit E
Visit B
Visit right subtree of A (subtree C):
Visit left subtree of C (F): visit F
Visit right subtree of C (G): visit G
Visit C
Visit A

Result: The postorder traversal is D, H, E, B, G, F, C, A.
Answer: \boxed{D, H, E, B, G, F, C, A}
"
:::

:::question type="NAT" question="What is the maximum number of nodes in a binary tree of height 5? (Assume the height of a tree with a single node is 0)." answer="63" hint="Use the formula that relates the maximum number of nodes to the height of a binary tree. This occurs when the tree is perfect." solution="
Step 1: Recall the formula for the maximum number of nodes in a binary tree of height $H$ .

N_{max} = 2^{H+1} - 1

Step 2: Substitute the given height $H = 5$ into the formula.

N_{max} = 2^{5+1} - 1

N_{max} = 2^6 - 1

Step 3: Calculate the final value.

N_{max} = 64 - 1

N_{max} = 63

Result: The maximum number of nodes is 63.
Answer: \boxed{63}
"
:::

:::question type="MSQ" question="Let a non-empty binary tree have $N$ nodes and $L$ leaf nodes. Which of the following statements is/are ALWAYS TRUE?" options=["If the tree is full, then $N$ must be odd.","The number of nodes with one child is at most 1.","The number of edges is $N-1$ .","If the tree is complete, $L \ge \lceil N/2 \rceil$ ."] answer="A,C,D" hint="Consider the properties of full and complete binary trees, and the basic definition of a tree as a graph." solution="

Option A: In a full binary tree, every internal node has 2 children. Let $I$ be the number of internal nodes. We know $L = I + 1$ . The total number of nodes is $N = L + I = (I+1) + I = 2I + 1$ . Since $I$ is an integer, $2I$ is even, and $2I+1$ must be odd. Thus, this statement is TRUE.

Option B: A binary tree can have many nodes with one child. For example, a skewed tree where every node has one child. This statement is FALSE.

Option C: A tree is a connected, acyclic graph. A fundamental property of any tree (not just binary) with $N$ nodes is that it has exactly $N-1$ edges. This statement is TRUE.

Option D: In a complete binary tree, the nodes on the last level are filled from left to right. The parent of any node at index $i$ (in an array representation) is at $\lfloor (i-1)/2 \rfloor$ . The leaf nodes begin at index $\lfloor N/2 \rfloor$ . The number of leaf nodes is therefore $N - (\lfloor N/2 \rfloor) = \lceil N/2 \rceil$ . Thus, $L = \lceil N/2 \rceil$ . The condition $L \ge \lceil N/2 \rceil$ is therefore always met. This statement is TRUE.

Answer: \boxed{A,C,D}
"
:::

:::question type="MCQ" question="Which of the following pairs of traversals is sufficient to uniquely reconstruct a full binary tree?" options=["Preorder and Level-order","Inorder and Level-order","Postorder and Preorder","Inorder and Postorder"] answer="Postorder and Preorder" hint="Recall the special case for tree reconstruction. While Inorder is generally required, a specific type of tree has a special property." solution="
For a general binary tree, Inorder traversal is required along with either Preorder or Postorder. This is because Inorder uniquely partitions the nodes into left and right subtrees.

However, the question specifies a full binary tree. In a full binary tree, every node has either 0 or 2 children. This property removes the ambiguity that normally exists when using only Preorder and Postorder. If a node in the traversal sequence implies a child, we know it must have two children (unless it's a leaf), allowing for unique reconstruction.

Therefore, for a full binary tree, the pair of Postorder and Preorder traversals is sufficient. Inorder combined with another traversal (like Level-order or Postorder) is also sufficient, but the question asks for a pair that becomes sufficient because the tree is full. The pair (Postorder, Preorder) uniquely fits this description.
Answer: \boxed{Postorder and Preorder}
"
:::

---

Summary

❗ Key Takeaways for GATE

Traversal Algorithms are Foundational: You must be able to derive Preorder (NLR), Inorder (LNR), and Postorder (LRN) traversals for any given binary tree quickly and accurately.

Tree Reconstruction Rules: For a general binary tree, unique reconstruction requires (Inorder + Preorder) or (Inorder + Postorder). For a full binary tree, (Preorder + Postorder) is also sufficient.

Key Property Formulas: Commit to memory the relationships between nodes and height:

$N = L + I$ (always holds for non-empty trees).

L = I + 1

(for non-empty full binary trees).
*

H + 1 \le N \le 2^{H+1} - 1

(bounds on nodes for a given height

H

).
These are frequently used to solve numerical and property-based questions.

---

What's Next?

💡 Continue Learning

This topic provides the essential foundation for more specialized tree structures. Your preparation should now extend to:

Binary Search Trees (BSTs): A crucial ordered tree structure where the Inorder traversal yields a sorted sequence of nodes. Understanding their properties, operations (search, insert, delete), and balancing mechanisms (like AVL trees) is the next logical step.

Heaps: Heaps are a specific type of tree (usually a complete binary tree) that satisfy the heap property. They are fundamental for implementing priority queues and for algorithms like HeapSort.

Graph Algorithms: Trees are a specific type of graph (connected and acyclic). The depth-first traversal methods learned here are a direct precursor to Depth-First Search (DFS) in graphs. Understanding trees first makes the transition to general graphs much more intuitive.

---

💡 Moving Forward

Now that you understand Trees, let's explore Hash Tables which builds on these concepts.

---

Part 5: Hash Tables

Introduction

In the study of data structures, the efficiency of data retrieval is a paramount concern. While structures like arrays offer constant-time access given an index, they are inefficient for searching for a specific value. Conversely, balanced binary search trees provide logarithmic-time search, insertion, and deletion, which is efficient but not optimal. We now turn our attention to a data structure that, under reasonable assumptions, can perform all these fundamental operations in average-time complexity of $O(1)$ : the hash table.

A hash table is a powerful data structure that implements an associative array, also known as a dictionary or map. It maps keys to values, enabling rapid retrieval of a value when its corresponding key is known. This is achieved by using a special function, the hash function, to compute an index into an array of buckets or slots, from which the desired value can be found. The remarkable average-case efficiency of hash tables makes them indispensable in a vast array of applications, including database indexing, caching systems, symbol tables in compilers, and the implementation of sets and maps in modern programming languages. For the GATE examination, a firm grasp of their mechanics, collision resolution strategies, and performance analysis is essential.

📖 Hash Table

A Hash Table is a data structure that stores key-value pairs. It uses a hash function to compute an index, or hash code, from a key, into an array of slots. The key is then stored at this index. Ideally, the hash function distributes keys uniformly across the array, allowing for average-time complexity of $O(1)$ for insertion, deletion, and search operations.

---

Key Components of a Hash Table

The efficacy of a hash table rests upon two primary components: the hash function and the underlying array structure.

1. The Hash Function

The hash function is the core of the hash table. It is a mathematical function that takes a key as input and transforms it into an integer index that falls within the bounds of the table's array.

Let $U$ be the universe of all possible keys and $m$ be the number of slots in the hash table, indexed $0, 1, \dots, m-1$ . A hash function $h$ is a mapping:

h: U \to \{0, 1, \dots, m-1\}

Properties of a Good Hash Function:
* Determinism: For a given key $k$ , the hash function $h(k)$ must always produce the same index.
* Uniform Distribution: The function should map the expected keys as evenly as possible over the available slots. This minimizes collisions.
* Efficiency: The function must be fast to compute.

A commonly used and simple method for hashing integer keys is the Division Method.

📐 The Division Method

h(k) = k \pmod m

Variables:

$k$ = The key to be hashed.

$m$ = The size of the hash table (number of slots).

When to use: This is a general-purpose and frequently encountered hashing technique in problems. It is simple to implement and often effective, provided

m

is chosen carefully (e.g., a prime number not too close to a power of 2).

2. The Table (Array of Slots)

The table is simply an array of size $m$ where the data is stored. Each position in the array is referred to as a slot or bucket. The index computed by the hash function directly corresponds to a slot in this array.

---

Collision Resolution Strategies

A collision occurs when two distinct keys, $k_1$ and $k_2$ , are mapped to the same slot by the hash function, i.e., $h(k_1) = h(k_2)$ where $k_1 \neq k_2$ . Since only one element can be stored in a single slot, we require a systematic method for handling such events. There are two primary strategies for collision resolution.

1. Separate Chaining

In separate chaining, each slot in the hash table does not hold an element directly, but rather a pointer to a data structure (typically a linked list) that stores all elements that have hashed to that slot.

Separate Chaining

0

1

2

...

m-1

Key A

Key B

Key C

(Collision)

The load factor, $\alpha$ , is defined as $\alpha = n/m$ , where $n$ is the number of elements and $m$ is the number of slots. For separate chaining, $\alpha$ can be less than, equal to, or greater than 1. The average length of a chain is $\alpha$ .

2. Open Addressing

In open addressing, all elements are stored directly within the hash table itself. When a collision occurs, we systematically probe the table for an empty slot where the new key can be placed. The probe sequence depends on the key and the probe number $i$ (for $i=0, 1, 2, \dots$ ).

The hash function is extended to include the probe number:

h: U \times \{0, 1, \dots, m-1\} \to \{0, 1, \dots, m-1\}

For open addressing, the load factor $\alpha = n/m$ must always be less than 1 ( $\alpha < 1$ ), as the table cannot store more elements than it has slots.

There are several common methods for computing the probe sequences.

a. Linear Probing

This is the simplest method. If the initial slot $h'(k)$ is occupied, we try the next slot, and then the next, and so on, wrapping around the table if necessary.

The probe sequence is defined by:

h(k, i) = (h'(k) + i) \pmod m

where

h'(k)

is the initial hash function (e.g.,

k \pmod m

) and

i = 0, 1, 2, \dots

A significant drawback of linear probing is primary clustering, where long runs of occupied slots build up, increasing the average search time.

Worked Example:

Problem: Consider a hash table of size $m=10$ with the hash function $h(k) = k \pmod{10}$ . Insert the keys 58, 28, 49, 18, 19 using linear probing.

Solution:

The hash table is initially empty: `[ , , , , , , , , , ]`

1. Insert 58:
$h(58, 0) = 58 \pmod{10} = 8$ . Slot 8 is empty.
`[ , , , , , , , , 58, ]`

2. Insert 28:
$h(28, 0) = 28 \pmod{10} = 8$ . Slot 8 is occupied.
$h(28, 1) = (8 + 1) \pmod{10} = 9$ . Slot 9 is empty.
`[ , , , , , , , , 58, 28]`

3. Insert 49:
$h(49, 0) = 49 \pmod{10} = 9$ . Slot 9 is occupied.
$h(49, 1) = (9 + 1) \pmod{10} = 0$ . Slot 0 is empty.
`[49, , , , , , , , 58, 28]`

4. Insert 18:
$h(18, 0) = 18 \pmod{10} = 8$ . Slot 8 is occupied.
$h(18, 1) = (8 + 1) \pmod{10} = 9$ . Slot 9 is occupied.
$h(18, 2) = (8 + 2) \pmod{10} = 0$ . Slot 0 is occupied.
$h(18, 3) = (8 + 3) \pmod{10} = 1$ . Slot 1 is empty.
`[49, 18, , , , , , , 58, 28]`

5. Insert 19:
$h(19, 0) = 19 \pmod{10} = 9$ . Slot 9 is occupied.
$h(19, 1) = (9 + 1) \pmod{10} = 0$ . Slot 0 is occupied.
$h(19, 2) = (9 + 2) \pmod{10} = 1$ . Slot 1 is occupied.
$h(19, 3) = (9 + 3) \pmod{10} = 2$ . Slot 2 is empty.
`[49, 18, 19, , , , , , 58, 28]`

Answer: \boxed{Final Table State: `[49, 18, 19, , , , , , 58, 28]`}

b. Quadratic Probing

Quadratic probing aims to alleviate primary clustering. The probe sequence is defined by:

h(k, i) = (h'(k) + c_1 i + c_2 i^2) \pmod m

where

c_1

and

c_2

are constants (

c_2 \neq 0

). A common choice is

c_1=0, c_2=1

. This method can lead to secondary clustering, where keys that initially hash to the same slot will follow the same probe sequence.

c. Double Hashing

Double hashing is one of the most effective open addressing methods. It uses a second hash function, $h_2(k)$ , to determine the step size for the probe sequence.

h(k, i) = (h_1(k) + i \cdot h_2(k)) \pmod m

The second hash function

h_2(k)

must be chosen carefully, such that

h_2(k)

is relatively prime to

m

for all keys

k

. This ensures that all slots in the table are eventually probed.

---

Performance Analysis of Open Addressing

The performance of open addressing is heavily dependent on the load factor $\alpha$ . As $\alpha$ approaches 1, the table becomes full, and finding an empty slot becomes increasingly difficult. The analysis typically relies on the assumption of uniform hashing, an idealization where each key is equally likely to have any of the $m!$ permutations of $\{0, 1, \dots, m-1\}$ as its probe sequence.

📖 Load Factor (

\alpha

)

For an open-address hash table with $n$ elements and $m$ slots, the load factor $\alpha$ is the ratio of the number of elements to the number of slots.

\alpha = \frac{n}{m}

It represents how full the table is. For open addressing, we must have $0 \le \alpha < 1$ .

The key performance metric is the expected number of probes required for an operation.

📐 Expected Probes (Unsuccessful Search)

Under the assumption of uniform hashing, the expected number of probes in an unsuccessful search is at most:

E[\text{probes}] \le \frac{1}{1-\alpha}

Variables:

$\alpha$ = The load factor $n/m$ .

When to use: Use this formula to calculate the average number of slots that must be examined to determine that a key is not in the table. This is also used for insertions.

📐 Expected Probes (Successful Search)

Under the assumption of uniform hashing, the expected number of probes in a successful search is at most:

E[\text{probes}] \le \frac{1}{\alpha} \ln \left( \frac{1}{1-\alpha} \right)

Variables:

$\alpha$ = The load factor $n/m$ .

When to use: Use this formula to calculate the average number of slots that must be examined to find a key that is in the table.

❗ Insertion Performance

To insert a key into an open-address hash table, we must first search for an empty slot. This process is identical to an unsuccessful search for that key. If the key were already present, the insertion would not proceed (assuming no duplicates). The search for an empty slot terminates once one is found. Therefore, the expected number of probes required for an insertion is equal to the expected number of probes for an unsuccessful search.

E[\text{probes for insertion}] = E[\text{probes for unsuccessful search}] \le \frac{1}{1-\alpha}

---

Problem-Solving Strategies

💡 GATE Strategy: Tracing Insertions

When asked to trace insertions with a given collision resolution scheme (like linear probing):

Draw the table: Create a visual representation of the array slots, e.g., `[0, 1, 2, ..., m-1]`.

Process one key at a time: For each key in the sequence:

h'(k) = k \pmod m

i=1, 2, 3, \dots

(h'(k) + i) \pmod m

Be meticulous: Keep track of the table state after each insertion. A small calculation error can cascade.

💡 GATE Strategy: Performance Questions

For questions about the expected number of probes:

Identify the operation: Is it a successful search, an unsuccessful search, or an insertion?

Calculate the load factor: Find $n$ (number of elements) and $m$ (table size) to compute $\alpha = n/m$ .

Apply the correct formula:

For unsuccessful search or insertion, use $\frac{1}{1-\alpha}$ .
For successful search, use

\frac{1}{\alpha} \ln \left( \frac{1}{1-\alpha} \right)

.
* Remember that these formulas are based on the idealization of uniform hashing.

---

Common Mistakes

⚠️ Common Mistakes in Hashing

❌ Forgetting to apply the modulo $m$ operator after each step in a probe sequence. For example, in linear probing, calculating $h'(k)+1, h'(k)+2, \dots$ without taking the modulus at each step.
✅ The correct form is $(h'(k) + i) \pmod m$ . The result must always be a valid index from $0$ to $m-1$ .

❌ Calculating the load factor incorrectly as $\alpha = m/n$ .
✅ The load factor is always the number of elements divided by the number of slots: $\alpha = n/m$ . It represents the "fullness" of the table.

❌ Assuming an insertion takes the same number of probes as a successful search.
✅ An insertion requires finding an empty slot, which is equivalent to an unsuccessful search. Therefore, its performance is governed by the formula for unsuccessful searches, not successful ones.

---

Practice Questions

:::question type="MCQ" question="A hash table of size 10 uses the hash function $h(k) = k \pmod{10}$ and linear probing. After inserting the keys 2, 12, 22, 32, 5, 15 in that order, where is the key 25 subsequently placed?" options=["Slot 6","Slot 7","Slot 8","Slot 9"] answer="Slot 8" hint="Trace the insertion of each key one by one, resolving collisions with linear probing. Then, find the first empty slot for the final key, starting from its initial hash position." solution="
Step 1: Initialize the hash table of size 10. `T = [ , , , , , , , , , ]`

Step 2: Insert the key 2.
$h(2) = 2 \pmod{10} = 2$ . Slot 2 is empty.
`T[2] = 2`.

Step 3: Insert the key 12.
$h(12) = 12 \pmod{10} = 2$ . Slot 2 is occupied.
Probe 1: $(2+1) \pmod{10} = 3$ . Slot 3 is empty.
`T[3] = 12`.

Step 4: Insert the key 22.
$h(22) = 22 \pmod{10} = 2$ . Slot 2 is occupied.
Probe 1: $(2+1) \pmod{10} = 3$ . Slot 3 is occupied.
Probe 2: $(2+2) \pmod{10} = 4$ . Slot 4 is empty.
`T[4] = 22`.

Step 5: Insert the key 32.
$h(32) = 32 \pmod{10} = 2$ . Slots 2, 3, 4 are occupied.
Probe 3: $(2+3) \pmod{10} = 5$ . Slot 5 is empty.
`T[5] = 32`.

Step 6: Insert the key 5.
$h(5) = 5 \pmod{10} = 5$ . Slot 5 is occupied.
Probe 1: $(5+1) \pmod{10} = 6$ . Slot 6 is empty.
`T[6] = 5`.

Step 7: Insert the key 15.
$h(15) = 15 \pmod{10} = 5$ . Slots 5, 6 are occupied.
Probe 2: $(5+2) \pmod{10} = 7$ . Slot 7 is empty.
`T[7] = 15`.

Step 8: Insert the final key 25.
$h(25) = 25 \pmod{10} = 5$ . Slots 5, 6, 7 are occupied.
Probe 3: $(5+3) \pmod{10} = 8$ . Slot 8 is empty.
`T[8] = 25`.

Answer: \boxed{Slot 8}
"
:::

:::question type="NAT" question="An open-address hash table of size 1000 stores 750 elements. Assuming uniform hashing, what is the expected number of probes required for an insertion?" answer="4" hint="An insertion requires the same number of probes as an unsuccessful search. First, calculate the load factor." solution="
Step 1: Identify the given parameters.
Number of elements, $n = 750$ .
Number of slots, $m = 1000$ .

Step 2: Calculate the load factor $\alpha$ .

\alpha = \frac{n}{m} = \frac{750}{1000} = 0.75

Step 3: Apply the formula for the expected number of probes for an insertion. The expected number of probes for an insertion is the same as for an unsuccessful search.

E[\text{probes}] \le \frac{1}{1-\alpha}

Step 4: Substitute the value of $\alpha$ and compute the result.

E[\text{probes}] \le \frac{1}{1 - 0.75} = \frac{1}{0.25} = 4

Answer: \boxed{4}
"
:::

:::question type="MSQ" question="Which of the following statements about hash tables are correct?" options=["In open addressing, the load factor $\alpha$ must be strictly less than 1.","Linear probing is susceptible to a phenomenon called primary clustering.","Separate chaining cannot have a load factor greater than 1.","Double hashing uses two different hash functions to resolve collisions."] answer="In open addressing, the load factor $\alpha$ must be strictly less than 1.,Linear probing is susceptible to a phenomenon called primary clustering.,Double hashing uses two different hash functions to resolve collisions." hint="Evaluate each statement based on the fundamental properties of separate chaining and open addressing collision resolution methods." solution="

Option A: Correct. In open addressing, elements are stored in the table itself. If $n=m$ , the table is full, and an insertion of a new element is impossible. Thus, the load factor $\alpha=n/m$ must be less than 1.

Option B: Correct. Primary clustering occurs when keys that hash to different initial slots begin to form a single large cluster. This happens in linear probing because any collision in the cluster's vicinity will add to the end of the cluster, making it longer.

Option C: Incorrect. In separate chaining, each slot is a pointer to a linked list (or other data structure). The number of elements $n$ can exceed the number of slots $m$ , resulting in a load factor $\alpha > 1$ . This simply means the average length of the linked lists is greater than 1.

Option D: Correct. Double hashing uses a primary hash function $h_1(k)$ to find the initial slot and a secondary hash function $h_2(k)$ to determine the step size for probing, resulting in a probe sequence of $(h_1(k) + i \cdot h_2(k)) \pmod m$ .

Answer: \boxed{In open addressing, the load factor

\alpha

must be strictly less than 1.,Linear probing is susceptible to a phenomenon called primary clustering.,Double hashing uses two different hash functions to resolve collisions.}
"
:::

:::question type="NAT" question="Consider a hash table of size 13 with hash function $h(k) = k \pmod{13}$ . Linear probing is used for collision resolution. After inserting the keys 14, 27, 2, 15 in that order, the key 15 is stored at which index?" answer="4" hint="Trace the insertions carefully. Note that 14, 27, 2, and 15 will cause collisions." solution="
We simulate the insertions step-by-step into the hash table of size 13 (indices 0-12). The hash function is $h(k) = k \pmod{13}$ and collision resolution is linear probing.

Step 1: Insert key 14.
$h(14) = 14 \pmod{13} = 1$ . Slot 1 is empty.
`T[1] = 14`.

Step 2: Insert key 27.
$h(27) = 27 \pmod{13} = 1$ . Slot 1 is occupied.
Probe 1: $(1+1) \pmod{13} = 2$ . Slot 2 is empty.
`T[2] = 27`.

Step 3: Insert key 2.
$h(2) = 2 \pmod{13} = 2$ . Slot 2 is occupied.
Probe 1: $(2+1) \pmod{13} = 3$ . Slot 3 is empty.
`T[3] = 2`.

Step 4: Insert key 15.
$h(15) = 15 \pmod{13} = 2$ . Slot 2 is occupied.
Probe 1: $(2+1) \pmod{13} = 3$ . Slot 3 is occupied.
Probe 2: $(2+2) \pmod{13} = 4$ . Slot 4 is empty.
`T[4] = 15`.

Answer: \boxed{4}
"
:::

---

Summary

❗ Key Takeaways for GATE

Fundamental Goal: Hash tables aim to provide $O(1)$ average-time complexity for search, insertion, and deletion by mapping keys to array indices via a hash function.

Collisions are Inevitable: When two keys map to the same index, a collision occurs. The strategy for resolving these collisions is critical.

Open Addressing vs. Chaining: Understand the two main collision resolution paradigms. For open addressing (linear probing, etc.), all elements are in the table, and $\alpha < 1$ . For separate chaining, each slot can hold multiple elements (e.g., in a list), and $\alpha$ can be $\ge 1$ .

Linear Probing Mechanics: For GATE problems, be prepared to manually trace insertions using linear probing: $h(k, i) = (h'(k) + i) \pmod m$ .

Performance is Tied to Load Factor: The performance of open addressing degrades as the load factor $\alpha$ approaches 1. The expected number of probes for an insertion is the same as for an unsuccessful search, which is at most $\frac{1}{1-\alpha}$ .

---

What's Next?

💡 Continue Learning

This topic connects to:

Trees (BSTs and Balanced Trees): Compare the performance guarantees. Hash tables offer better average-case performance ( $O(1)$ ), but trees like AVL or Red-Black trees provide a worst-case guarantee of $O(\log n)$ , which hash tables lack (worst-case is $O(n)$ ).

Complexity Analysis: Hash tables are a classic case study in amortized and average-case analysis, showing how a structure can be highly efficient in practice despite a poor worst-case bound.

Database Indexing: In database systems, hash indexing is a common method for building indices to allow for rapid retrieval of records based on a key value. Understanding hash tables is fundamental to understanding this aspect of database design.

Master these connections for a comprehensive understanding of how different data structures are chosen for specific computational problems.

---

Chapter Summary

📖 Basic Data Structures - Key Takeaways

In this chapter, we have undertaken a systematic study of the fundamental data structures that form the bedrock of efficient algorithm design. A thorough command of these concepts is not merely beneficial but essential for success in the GATE examination. From our detailed discussion, we distill the following critical points for retention:

The Principle of Organization: The primary purpose of any data structure is to organize data in a manner that facilitates efficient access and modification. The choice of data structure is always a trade-off, primarily between the time complexity of operations and the space complexity of storage.

Linear vs. Non-linear Structures: We have established a clear distinction between linear structures (Linked Lists, Stacks, Queues), where elements have a sequential relationship, and non-linear structures (Trees, Graphs), where elements can have multiple predecessor and successor relationships. Hash Tables represent a distinct category, organizing data by key-value pairs through a hash function.

Asymptotic Performance is Key: For GATE, understanding the average-case and worst-case time complexities of core operations is paramount. We have seen that Stacks and Queues offer $O(1)$ performance for their primary operations. Balanced Binary Search Trees provide reliable $O(\log n)$ for search, insertion, and deletion, whereas Hash Tables offer an average-case performance of $O(1)$ at the cost of a potential $O(n)$ worst case due to collisions.

Abstract Data Types (ADTs) vs. Implementations: It is crucial to distinguish between an ADT and its underlying implementation. A Stack (LIFO) or a Queue (FIFO) is an abstract concept; it can be implemented using either arrays or linked lists, each with its own advantages regarding memory allocation and overflow handling.

The Role of Pointers and Recursion: The implementation and manipulation of dynamic structures like Linked Lists and Trees are intrinsically linked to the concepts of pointers and memory allocation. Furthermore, many tree operations, such as traversals, are most elegantly expressed and understood through recursion, which itself implicitly uses a stack (the call stack).

Collision Resolution in Hashing: A hash table's performance is dominated by its collision resolution strategy. We have examined the two primary methods: Separate Chaining (using linked lists) and Open Addressing (probing for the next available slot). Understanding the mechanisms and performance implications of each is a frequent subject of examination.

---

Chapter Review Questions

:::question type="MCQ" question="A process involves inserting a sequence of distinct keys into an initially empty Binary Search Tree (BST). Subsequently, the elements are dequeued one by one from a queue that was populated by a level-order traversal of the final BST. If the dequeued sequence is 25, 15, 40, 10, 20, 35, 50, which of the following insertion sequences could have produced this result?" options=["25, 15, 10, 20, 40, 35, 50", "25, 40, 50, 35, 15, 20, 10", "25, 15, 40, 10, 35, 20, 50", "25, 40, 15, 10, 20, 35, 50"] answer="25, 15, 10, 20, 40, 35, 50" hint="The first element of a level-order traversal is always the root of the tree. Use this fact to determine the root from the dequeued sequence. Then, for each subsequent key in a potential insertion sequence, verify that it adheres to the BST property relative to the nodes already in the tree." solution="
The problem requires us to work backward from a level-order traversal to identify a valid insertion sequence for a Binary Search Tree (BST).

Identify the Root: The level-order traversal of any tree always starts with the root. From the given dequeued sequence (25, 15, 40, 10, 20, 35, 50), we can definitively conclude that the root of the BST is 25. This eliminates option B, which starts with 25 but then inserts 40, which would make 40 the root. All other options start with 25.

Analyze the Tree Structure from Level-Order:

- Level 0: 25
- Level 1: 15, 40 (15 must be the left child of 25, and 40 must be the right child)
- Level 2: 10, 20, 35, 50 (10 and 20 are children of 15; 35 and 50 are children of 40)
- Specifically, 10 < 15, so 10 is the left child of 15. 20 > 15, so 20 is the right child of 15.
- Similarly, 35 < 40, so 35 is the left child of 40. 50 > 40, so 50 is the right child of 40.

Test the Remaining Options: We must now check if the insertion sequences in options A, C, and D produce this exact tree structure.

- Option A: 25, 15, 10, 20, 40, 35, 50
- `insert(25)`: root is 25.
- `insert(15)`: $15 < 25$ , goes left. Tree: 25(L:15)
- `insert(10)`: $10 < 25 \rightarrow 10 < 15$ , goes left. Tree: 25(L:15(L:10))
- `insert(20)`: $20 < 25 \rightarrow 20 > 15$ , goes right. Tree: 25(L:15(L:10, R:20))
- `insert(40)`: $40 > 25$ , goes right. Tree: 25(L:..., R:40)
- `insert(35)`: $35 > 25 \rightarrow 35 < 40$ , goes left. Tree: 25(L:..., R:40(L:35))
- `insert(50)`: $50 > 25 \rightarrow 50 > 40$ , goes right. Tree: 25(L:..., R:40(L:35, R:50))
This sequence correctly constructs the tree inferred from the level-order traversal.

- Option C: 25, 15, 40, 10, 35, 20, 50
- This sequence also produces the exact same final tree.

- Option D: 25, 40, 15, 10, 20, 35, 50
- This sequence also produces the exact same final tree.

Multiple insertion sequences can lead to the same BST. However, in a single-choice question, we select the most direct or commonly expected one. Option A follows a natural left-to-right, level-by-level insertion pattern for the children, which is often implied.

Answer: \boxed{25, 15, 10, 20, 40, 35, 50}
"
:::

:::question type="NAT" question="A hash table of size 11 is used to store integer keys, with the hash function $h(k) = k \pmod{11}$ . Collisions are resolved using linear probing. The keys 23, 45, 12, 34, 56, 24, 80 are inserted in this exact order into the initially empty table. What is the index (or slot) where the final key, 80, is stored?" answer="7" hint="Calculate the hash value for each key sequentially. If the target slot is occupied, probe the next slot ( $i+1, i+2, \dots$ ) cyclically until an empty slot is found. Keep track of the state of the hash table after each insertion." solution="
We simulate the insertions step-by-step into the hash table of size 11 (indices 0-10). The hash function is $h(k) = k \pmod{11}$ and collision resolution is linear probing.

Insert 23:

h(23) = 23 \pmod{11} = 1

. Slot 1 is empty. Table[1] = 23.

Insert 45:

h(45) = 45 \pmod{11} = 1

. Slot 1 is occupied. Probe to slot 2. Table[2] = 45.

Insert 12:

h(12) = 12 \pmod{11} = 1

. Slot 1 is occupied. Probe to 2 (occupied), then 3. Table[3] = 12.

Insert 34:

h(34) = 34 \pmod{11} = 1

. Slot 1 is occupied. Probe to 2, 3 (occupied), then 4. Table[4] = 34.

Insert 56:

h(56) = 56 \pmod{11} = 1

. Slot 1 is occupied. Probe to 2, 3, 4 (occupied), then 5. Table[5] = 56.

At this point, the table is: `[empty, 23, 45, 12, 34, 56, empty, empty, empty, empty, empty]`

Insert 24:

h(24) = 24 \pmod{11} = 2

. Slot 2 is occupied. Probe to 3, 4, 5 (occupied), then 6. Slot 6 is empty. Table[6] = 24.

The table is now: `[empty, 23, 45, 12, 34, 56, 24, empty, empty, empty, empty]`

Insert 80:

h(80) = 80 \pmod{11} = 3

. Slot 3 is occupied. Probe to 4, 5, 6 (all occupied), then 7. Slot 7 is empty. Table[7] = 80.

Answer: \boxed{7}
"
:::

:::question type="MCQ" question="Consider a queue implemented using a singly linked list. The queue holds pointers `front` and `rear`. Which of the following represents the most efficient time complexities for the ENQUEUE and DEQUEUE operations, respectively, under the standard implementation?" options=[" $O(n)$ , $O(1)$ ", " $O(1)$ , $O(n)$ ", " $O(1)$ , $O(1)$ ", " $O(\log n)$ , $O(\log n)$ "] answer=" $O(1)$ , $O(1)$ " hint="Think about where new elements are added and where they are removed in a queue. How can the `front` and `rear` pointers be maintained to make these operations independent of the number of elements?" solution="
Let us analyze the implementation of a queue using a singly linked list. A queue follows a First-In, First-Out (FIFO) discipline.

ENQUEUE: This operation adds a new element to the end (rear) of the queue.

- If we maintain a `rear` pointer that points to the last node in the linked list, we can add a new node in constant time. - The steps are: 1. Create a new node. 2. Set `rear->next` to point to this new node. 3. Update `rear` to point to the new node. - This sequence of pointer manipulations does not depend on the number of elements,

n

, in the queue. Thus, the time complexity is

O(1)

DEQUEUE: This operation removes an element from the start (front) of the queue.

- We maintain a `front` pointer that points to the first node in the linked list. - The steps are: 1. Store the data from the node pointed to by `front`. 2. Create a temporary pointer to the `front` node. 3. Update `front` to `front->next`. 4. Free the memory of the old front node using the temporary pointer. - This process is also a fixed number of pointer operations and is independent of the queue's size. Therefore, the time complexity is

O(1)

By maintaining both `front` and `rear` pointers, we can perform both ENQUEUE (at the rear) and DEQUEUE (from the front) in constant time. Therefore, the respective complexities are $O(1)$ and $O(1)$ .
Answer: \boxed{ $O(1)$ , $O(1)$ }
"
:::

---

What's Next?

💡 Continue Your GATE Journey

Having completed this chapter on Basic Data Structures, you have established a firm foundation for a significant portion of the Computer Science and Information Technology syllabus. The concepts of organizing data for efficiency are not isolated; they are the fundamental building blocks for more complex topics.

Connections to Your Learning:

Previous Chapters: Your understanding of programming fundamentals, particularly pointers, dynamic memory allocation, and recursion, has been directly applied in the implementation and analysis of linked lists and trees. The mathematical concepts of logarithms and summations are essential for the complexity analysis we have performed.

Future Chapters: The mastery of these basic structures is a prerequisite for advancing in your preparation.

- Algorithms: This is the most direct application. Algorithms like Breadth-First Search (BFS) and Depth-First Search (DFS) are defined by their use of Queues and Stacks, respectively. The efficiency of sorting algorithms, such as Heap Sort, and graph algorithms, like Dijkstra's or Prim's, depends critically on advanced data structures (Heaps, Priority Queues) that are extensions of the concepts learned here. - Advanced Data Structures: This chapter is the gateway to studying more sophisticated structures like Heaps, AVL Trees, Red-Black Trees, B/B+ Trees, and Tries. Each of these addresses specific limitations of the basic structures we have covered. - Compiler Design: Hash tables are the standard implementation for the symbol tables used by compilers to keep track of identifiers like variable and function names. - Database Management Systems: The indexing mechanisms that make databases fast rely heavily on tree-based structures, particularly B+ Trees, which are an evolution of the BSTs we discussed. Hashing is also a fundamental technique for database indexing and query processing.

We will build upon this foundation immediately in the next chapter as we explore the design and analysis of algorithms that operate upon these very structures.

Basic Data Structures

Basic Data Structures

Overview

Chapter Contents

Learning Objectives

Part 1: Linked Lists

Introduction

Key Concepts

1. Structure of a Node

A node class in Python

2. Types of Linked Lists

3. Basic Operations

Problem-Solving Strategies

Common Mistakes

Practice Questions

Summary

What's Next?

Part 2: Stacks

Introduction

Key Concepts

1. The LIFO Principle: Last-In, First-Out

2. Core Stack Operations

Pseudocode for Push

Pseudocode for Pop

Pseudocode for Peek

Pseudocode for IsEmpty

3. Implementations: Array vs. Linked List

Worked Example: Tracing a Stack-Based Algorithm

Problem-Solving Strategies

Common Mistakes

Practice Questions

Summary

What's Next?

Part 3: Queues

Introduction

Key Concepts

1. Basic Queue Operations

2. Circular Queue

3. Double-Ended Queue (Deque)

Problem-Solving Strategies

Common Mistakes

Practice Questions

Summary

What's Next?

Part 4: Trees

Introduction

Key Concepts

1. Fundamental Terminology of Rooted Trees

2. Binary Trees

3. Properties of Binary Trees

4. Tree Traversal Algorithms

5. Reconstruction of Binary Trees

Problem-Solving Strategies

Common Mistakes

Practice Questions

Summary

What's Next?

Part 5: Hash Tables

Introduction

Key Components of a Hash Table

1. The Hash Function

2. The Table (Array of Slots)

Collision Resolution Strategies

1. Separate Chaining

2. Open Addressing

a. Linear Probing

b. Quadratic Probing

c. Double Hashing

Performance Analysis of Open Addressing

Problem-Solving Strategies

Common Mistakes

Practice Questions

Summary

What's Next?

Chapter Summary

Chapter Review Questions

What's Next?

🎯 Key Points to Remember

Related Topics in Programming, Data Structures and Algorithms

Graph Theory and Algorithms