Basic Data Structures
Overview
The study of data structures forms the bedrock upon which efficient algorithms are built. In this chapter, we shall explore the fundamental constructs used to organize and store data, which are indispensable for solving a wide array of computational problems. Our focus will be on the abstract properties, concrete implementations, and performance characteristics of these structures. A deep understanding of how to select and manipulate the appropriate data structure is not merely an academic exercise; it is a critical skill for success in the Graduate Aptitude Test in Engineering (GATE), where questions frequently test the trade-offs between different data organization schemes.
We will begin by examining linear structures such as linked lists, stacks, and queues, each offering unique mechanisms for data access and modification. Subsequently, our investigation will proceed to non-linear structures, namely trees and hash tables. For each structure, we will rigorously analyze the time and space complexity of its core operations, often expressed using Big-O notation, such as , , or . This analytical approach is central to the GATE syllabus and is essential for developing the ability to design algorithms that are not only correct but also optimally performant.
---
Chapter Contents
| # | Topic | What You'll Learn |
|---|-------|-------------------|
| 1 | Linked Lists | Dynamic data storage and pointer manipulation. |
| 2 | Stacks | The LIFO principle and its applications. |
| 3 | Queues | The FIFO principle for scheduling tasks. |
| 4 | Trees | Hierarchical data representation and traversal algorithms. |
| 5 | Hash Tables | Efficient key-value mapping and collision resolution. |
---
Learning Objectives
After completing this chapter, you will be able to:
- Implement and analyze the fundamental operations (insertion, deletion, traversal) for linked lists, stacks, and queues.
- Differentiate between various tree structures, including binary trees and binary search trees, and perform standard traversal operations.
- Explain the principles of hashing, hash functions, and common collision resolution techniques.
- Evaluate the asymptotic time and space complexity of operations for each data structure and justify the selection of a specific structure for a given problem scenario.
---
We now turn our attention to Linked Lists...
Part 1: Linked Lists
Introduction
In our study of data structures, we first encounter static structures such as arrays, where the size is fixed at compile time. While efficient for certain applications, this rigidity presents a significant limitation when the volume of data is unpredictable or varies dynamically during program execution. To address this, we turn our attention to dynamic data structures, of which the linked list is the most fundamental.
A linked list is a linear collection of data elements, called nodes, where the linear order is not given by their physical placement in memory. Instead, each element points to the next, forming a chain-like structure. This approach provides remarkable flexibility in memory allocation and management, making it an indispensable tool in a programmer's arsenal for tasks ranging from implementing other abstract data types like stacks and queues to managing memory in operating systems.
A linked list is a linear data structure consisting of a sequence of nodes, where each node contains two fields: a data field and a reference (or link) to the next node in the sequence. The first node in the list is referred to as the `head`, and the last node's link points to `NULL`, signifying the end of the list.
---
Key Concepts
1. Structure of a Node
The fundamental building block of any linked list is the node. A node is a self-referential structure that encapsulates both the data it holds and the mechanism to access the subsequent node.
We can visualize a single node as follows:
In programming, this structure is typically implemented using a `struct` in C/C++ or a `class` in Python/Java.
```c
// A node structure in C
struct Node {
int data;
struct Node* next;
};
```
```python
A node class in Python
class Node:
def __init__(self, data):
self.data = data
self.next = None
```
A sequence of these nodes, linked together, forms the list. The `head` pointer is a special pointer that always points to the first node, serving as the entry point to the list.
2. Types of Linked Lists
While the fundamental concept remains the same, linked lists can be categorized based on their traversal capabilities and structure.
A. Singly Linked List
This is the simplest form, where each node contains data and a single pointer to the next node. Traversal is unidirectional, from head to tail.
B. Doubly Linked List
In this variant, each node possesses two pointers: one to the `next` node and another to the `previous` node. This structure permits bidirectional traversal, which can simplify certain operations like deletion, at the cost of additional memory for the second pointer.
C. Circular Linked List
A circular linked list is a variation where the `next` pointer of the last node points back to the `head` node instead of `NULL`. This creates a continuous loop, which is useful in applications requiring round-robin scheduling.
3. Basic Operations
The utility of a linked list is realized through its fundamental operations. Let us consider the common operations for a singly linked list.
A. Traversal
To traverse a list, we begin at the `head` and follow the `next` pointers until we reach a `NULL` pointer.
```python
def traverse(head):
current = head
while current is not None:
print(current.data)
current = current.next
```
B. Insertion
A new node can be inserted at three locations: the beginning, the end, or a specific position. Let us examine insertion at the beginning.
Worked Example:
Problem: Given a linked list with head pointer `head`, insert a new node with data `5` at the beginning of the list.
Solution:
Step 1: Allocate memory for the new node and assign its data.
Step 2: Point the `next` pointer of the new node to the current head of the list. This links the new node to the rest of the list.
Step 3: Update the main `head` pointer to point to the new node, making it the new first node.
Answer: The list is now updated with the new node at the front. The time complexity for this operation is .
C. Deletion
Deletion also has three cases: removing the first node, the last node, or a specific node. Deleting the first node is an operation, while deleting from the end or a specific position requires traversal, leading to an complexity in the worst case for a singly linked list.
---
Problem-Solving Strategies
For problems involving finding the middle element, detecting a cycle, or other relational position tasks in a linked list, the "fast and slow pointer" approach is exceptionally efficient.
- Slow Pointer: Moves one step at a time (`slow = slow.next`).
- Fast Pointer: Moves two steps at a time (`fast = fast.next.next`).
---
Common Mistakes
- ❌ Dereferencing a NULL pointer. Attempting to access `current->next` when `current` is `NULL` will cause a runtime error. Always check for `NULL` before accessing a pointer's members.
- ❌ Losing the head pointer. Modifying the `head` pointer during traversal without a temporary variable to hold its original value will result in losing access to the beginning of the list.
---
Practice Questions
:::question type="MCQ" question="In a singly linked list, which of the following operations has a time complexity that is not dependent on the length of the list, ?" options=["Insertion at the end of the list","Deletion of the last node","Searching for an element","Insertion at the beginning of the list"] answer="Insertion at the beginning of the list" hint="Consider which operations do not require traversing the list." solution="Insertion at the end, deletion of the last node, and searching all require traversing the list to reach the target position, resulting in an time complexity in the worst case. However, insertion at the beginning only requires updating the head pointer and the new node's next pointer, which is a constant time operation."
:::
:::question type="NAT" question="Consider a singly linked list with nodes. What is the minimum number of pointer modifications required to insert a new node between the third and fourth nodes?" answer="2" hint="Think about which pointers need to be re-linked to accommodate the new node." solution="Let the third node be and the fourth node be . Originally, points to . To insert a new node, , we must perform two pointer modifications:
Therefore, a minimum of 2 pointer modifications are required."
:::
:::question type="MSQ" question="Which of the following are advantages of a doubly linked list over a singly linked list? Select ALL that apply." options=["It allows traversal in both forward and backward directions.","It requires less memory per node.","Deletion of a node is more efficient if a pointer to the node to be deleted is given.","It is simpler to implement."] answer="A,C" hint="Evaluate each option based on the structural differences between singly and doubly linked lists." solution="A is correct because the `previous` pointer allows backward traversal. C is correct because with a pointer to the node to be deleted, we can access its `previous` node in time to update pointers, whereas in a singly list, we would need to traverse from the head to find the previous node. B is incorrect; a doubly linked list requires more memory due to the extra `previous` pointer. D is incorrect; the management of two pointers makes its implementation more complex than a singly linked list."
:::
:::question type="MCQ" question="What is the primary characteristic of a circular linked list?" options=["It must be sorted.","The last node points to NULL.","The last node points to the first node.","It does not have a head pointer."] answer="The last node points to the first node." hint="Recall the structural definition that distinguishes a circular linked list from a standard linear one." solution="The defining feature of a circular linked list is that the `next` reference of the tail node points back to the `head` node, forming a closed loop. This is in contrast to a standard linear linked list where the last node's `next` reference is `NULL`."
:::
---
Summary
- Dynamic Sizing: Linked lists are dynamic data structures, allowing for efficient insertion and deletion without the need for resizing, unlike arrays.
- Non-Contiguous Memory: Nodes in a linked list can be stored anywhere in memory; the order is maintained by pointers, not by physical location.
- Time Complexity: Insertion/deletion at the beginning of a singly linked list is . Traversal, searching, and insertion/deletion at the end or at a specific position are .
- Pointer Management: Correctly managing pointers, especially the `head` pointer and checking for `NULL` conditions, is critical to avoid errors.
---
What's Next?
This foundational understanding of linked lists is crucial for mastering more complex data structures.
- Stacks & Queues: These abstract data types can be efficiently implemented using linked lists. A stack's LIFO (Last-In, First-Out) behavior maps directly to insertion and deletion at the head of a list ( operations).
- Trees & Graphs: These are non-linear data structures where nodes (or vertices) are connected by pointers (or edges). The concept of a node with data and pointers is a direct extension of the linked list node.
Mastering these connections will provide a comprehensive understanding of how fundamental structures build into more advanced concepts frequently tested in GATE.
---
Now that you understand Linked Lists, let's explore Stacks which builds on these concepts.
---
Part 2: Stacks
Introduction
In the study of data structures, we encounter fundamental organizational principles that govern how data is stored and accessed. The stack is one of the most elementary yet powerful of these structures. It is a linear data structure that operates on a principle of constrained access, meaning elements can only be added or removed from one end, conventionally referred to as the "top" of the stack. This access mechanism is known as Last-In, First-Out (LIFO).
The LIFO paradigm is not merely a theoretical construct; it mirrors numerous processes in computing. For instance, the management of function calls within a program's execution is handled by a call stack. Similarly, parsing arithmetic expressions, implementing undo functionality in software, and navigating graph or tree structures via depth-first search all rely on the stack's unique properties. For the GATE examination, a thorough understanding of stack operations and the ability to trace algorithms that utilize them is of paramount importance.
A stack is an Abstract Data Type (ADT) that represents a collection of elements with two principal operations: Push, which adds an element to the collection, and Pop, which removes the most recently added element. This behavior ensures that the element to be removed is always the one that has been in the collection for the shortest amount of time, a principle known as Last-In, First-Out (LIFO).
---
Key Concepts
#
## 1. The LIFO Principle: Last-In, First-Out
The defining characteristic of a stack is its LIFO access protocol. We can visualize this concept using an analogy of a stack of plates. A new plate is placed on top of the stack, and when a plate is needed, it is the one from the top that is removed. It is impractical, if not impossible, to remove a plate from the bottom or middle without disturbing the entire stack.
This principle dictates that the last element to be inserted (pushed) onto the stack will be the first element to be removed (popped). This behavior is fundamental to its application in various algorithms. Let us consider the primary operations that enforce this principle.
#
## 2. Core Stack Operations
A stack ADT is characterized by a specific set of operations. For GATE, proficiency with these is non-negotiable.
a) Push
The `Push` operation adds an element to the top of the stack. If the stack is implemented with a fixed-size array and is already full, this operation results in a condition known as Stack Overflow.
```python
Pseudocode for Push
procedure Push(stack, element):
if stack is full:
report Overflow Error
else:
increment top_pointer
stack[top_pointer] = element
```
b) Pop
The `Pop` operation removes the element at the top of the stack and returns it. If an attempt is made to pop from an empty stack, a Stack Underflow error occurs.
```python
Pseudocode for Pop
procedure Pop(stack):
if stack is empty:
report Underflow Error
else:
element = stack[top_pointer]
decrement top_pointer
return element
```
c) Peek (or Top)
The `Peek` operation returns the value of the top element without removing it from the stack. This is useful for inspecting the top element before deciding on an action.
```python
Pseudocode for Peek
procedure Peek(stack):
if stack is empty:
report Stack is Empty
else:
return stack[top_pointer]
```
d) IsEmpty
This boolean operation checks if the stack contains any elements. It returns true if the stack is empty and false otherwise. It is crucial for preventing underflow errors.
```python
Pseudocode for IsEmpty
procedure IsEmpty(stack):
return top_pointer == -1 # Assuming -1 indicates an empty stack
```
For a stack implemented using either an array or a linked list, the time complexity of the core operations is constant.
Variables:
- None; the complexity is independent of the number of elements, , in the stack.
When to use: This is a fundamental property. Always assume complexity for these operations unless specified otherwise by a problem's context.
#
## 3. Implementations: Array vs. Linked List
While the stack is an abstract data type, it requires a concrete underlying implementation. The two primary methods are using an array or a linked list.
* Array Implementation: A fixed-size array is used to store elements. A variable, often named `top`, is used as an index to keep track of the top-most element. This implementation is simple and memory-efficient (no pointers to store), but it suffers from a fixed capacity. An attempt to push onto a full stack causes an overflow.
* Linked List Implementation: Each element is a node containing data and a pointer to the next node. The `top` of the stack is simply the head of the linked list. Pushing an element involves adding a new node at the head, and popping involves removing the head node. This implementation is dynamic and can grow as needed, limited only by available memory.
---
---
Worked Example: Tracing a Stack-Based Algorithm
Many GATE questions require the manual tracing of an algorithm's execution. Let us work through an example that mirrors this style.
Problem:
Consider the following pseudocode that operates on an empty stack and an integer variable . What is the final value of after the code finishes execution?
```pseudocode
Create empty stack S
accumulator = 0
i = 1
while (i <= 6) {
Push(S, i)
if (i % 2 == 0) {
val1 = Pop(S)
val2 = Pop(S)
Push(S, val1 * val2)
}
i = i + 1
}
while (S is not empty) {
accumulator = accumulator + Pop(S)
}
Output accumulator
```
Solution:
We will trace the state of the stack and the variable through each iteration of the first `while` loop.
Iteration :
- `Push(, 1)`: Stack is `[1]`. (Bottom to top)
- `` is false.
Iteration :
- `Push(, 2)`: Stack is `[1, 2]`.
- `` is true.
- ` = Pop()`: becomes 2. Stack is `[1]`.
- ` = Pop()`: becomes 1. Stack is `[]`.
- `Push(, )`: `Push(, )`. Stack is `[2]`.
Iteration :
- `Push(, 3)`: Stack is `[2, 3]`.
- `` is false.
Iteration :
- `Push(, 4)`: Stack is `[2, 3, 4]`.
- `` is true.
- ` = Pop()`: becomes 4. Stack is `[2, 3]`.
- ` = Pop()`: becomes 3. Stack is `[2]`.
- `Push(, )`: `Push(, )`. Stack is `[2, 12]`.
Iteration :
- `Push(, 5)`: Stack is `[2, 12, 5]`.
- `` is false.
Iteration :
- `Push(, 6)`: Stack is `[2, 12, 5, 6]`.
- `` is true.
- ` = Pop()`: becomes 6. Stack is `[2, 12, 5]`.
- ` = Pop()`: becomes 5. Stack is `[2, 12]`.
- `Push(, )`: `Push(, )`. Stack is `[2, 12, 30]`.
The first loop terminates. The final state of stack (from bottom to top) is `[2, 12, 30]`.
Now, we execute the second `while` loop.
Step 1: is initially 0. Stack is `[2, 12, 30]`.
- : becomes 30. Stack is `[2, 12]`.
Step 2:
- : becomes . Stack is `[2]`.
Step 3:
- : becomes . Stack is `[]`.
The stack is now empty, and the loop terminates.
Answer: The final value of is 44.
---
Problem-Solving Strategies
For GATE questions involving pseudocode and stacks, the most reliable method is to trace the execution manually using a table. Do not try to solve it mentally, as this can lead to errors.
Create columns for:
- The loop counter or current instruction.
- The state of the stack (write it out explicitly, e.g., `[10, 5, 20]`).
- The values of all relevant variables (e.g., , , ).
Update the table row-by-row for each step in the algorithm. This systematic approach minimizes confusion and ensures accuracy, especially under exam pressure.
---
Common Mistakes
- ❌ Confusing Pop and Peek: Assuming `Pop()` only reads the top value.
- ❌ Incorrect Underflow/Overflow Handling: Performing a `Pop()` or `Peek()` on an empty stack, or a `Push()` on a full, array-based stack.
- ❌ Reversing Stack Order: When listing the contents of a stack after several operations, listing them in the order they were pushed (First-In, First-Out).
---
Practice Questions
:::question type="NAT" question="An empty stack and an integer are given. The following operations are performed in sequence: `Push(, 10)`, `Push(, 5)`, `Push(, 20)`, , `Push(, 15)`, , . What is the final value of ?" answer="50" hint="Trace the operations one by one, keeping track of the stack's state and the value of ''." solution="
Step 1: Initialize , .
Step 2: `Push(, 10)`.
is `[10]`.
Step 3: `Push(, 5)`.
is `[10, 5]`.
Step 4: `Push(, 20)`.
is `[10, 5, 20]`.
Step 5: .
The popped value is 20.
.
is `[10, 5]`.
Step 6: `Push(, 15)`.
is `[10, 5, 15]`.
Step 7: .
The popped value is 15.
.
is `[10, 5]`.
Step 8: .
The popped value is 5.
.
is `[10]`.
Result: The final value of is \boxed{50}.
"
:::
:::question type="MCQ" question="A stack is implemented using an array of size 10 (indices 0 to 9). The pointer is initialized to -1. After performing 5 `Push` operations, the pointer will have the value:" options=["5", "6", "4", "3"] answer="4" hint="The '' pointer stores the index of the last element inserted. If the first element is at index 0, what will be the index of the fifth element?" solution="
Step 1: Initially, .
Step 2: After the 1st `Push`, becomes 0.
Step 3: After the 2nd `Push`, becomes 1.
Step 4: After the 3rd `Push`, becomes 2.
Step 5: After the 4th `Push`, becomes 3.
Step 6: After the 5th `Push`, becomes 4.
Result: The pointer will point to index \boxed{4}.
"
:::
:::question type="MSQ" question="Which of the following applications correctly use a stack data structure?" options=["Implementing function calls in a programming language", "Implementing a First-In-First-Out (FIFO) queue", "Evaluating a postfix arithmetic expression", "Implementing a breadth-first search (BFS) in a graph"] answer="Implementing function calls in a programming language,Evaluating a postfix arithmetic expression" hint="Think about which applications require a Last-In-First-Out (LIFO) access pattern." solution="
- Implementing function calls: This uses a call stack. When a function is called, its activation record is pushed onto the stack. When it returns, the record is popped. This is a classic LIFO application. So, this option is correct.
- Implementing a FIFO queue: A queue is inherently First-In, First-Out, which is the opposite of a stack's LIFO behavior. While a queue can be implemented using two stacks, a single stack itself is not a FIFO structure. So, this option is incorrect.
- Evaluating a postfix expression: When scanning a postfix expression, operands are pushed onto a stack. When an operator is encountered, the required number of operands are popped, the operation is performed, and the result is pushed back. This is a standard stack application. So, this option is correct.
- Implementing BFS: Breadth-first search explores a graph level by level, which requires a FIFO data structure, namely a queue, to store the nodes to visit. Depth-first search (DFS), in contrast, uses a stack. So, this option is incorrect.
:::
:::question type="NAT" question="Consider the following pseudocode operating on an empty stack . What is the value returned at the end?
```pseudocode
procedure CalculateValue():
Create empty stack S
sum = 0
for i from 1 to 5:
Push(S, i)
if (IsEmpty(S) == false and Peek(S) > 3):
sum = sum + Pop(S)
return sum
```
" answer="9" hint="Trace the for loop from =1 to 5. The is only updated when the top element of the stack is greater than 3." solution="
Step 1: Initialize , .
Loop :
- `Push(, 1)`. is `[1]`.
- `Peek()` is 1, which is not > 3. remains 0.
Loop :
- `Push(, 2)`. is `[1, 2]`.
- `Peek()` is 2, which is not > 3. remains 0.
Loop :
- `Push(, 3)`. is `[1, 2, 3]`.
- `Peek()` is 3, which is not > 3. remains 0.
Loop :
- `Push(, 4)`. is `[1, 2, 3, 4]`.
- `Peek()` is 4, which is > 3.
- . `Pop()` returns 4. .
- is now `[1, 2, 3]`.
Loop :
- `Push(, 5)`. is `[1, 2, 3, 5]`.
- `Peek()` is 5, which is > 3.
- . `Pop()` returns 5. .
- is now `[1, 2, 3]`.
Step 2: The loop finishes. The function returns .
Result: The final value of is \boxed{9}.
"
:::
---
Summary
- LIFO is Paramount: All stack-related problems are built upon the Last-In, First-Out principle. Every operation (`Push`, `Pop`) must be interpreted through this lens.
- Master Algorithm Tracing: The ability to systematically trace pseudocode involving a stack is the most frequently tested skill. Use a table to track the stack's state and all variables to avoid errors.
- Know the Core Operations: Understand the precise function of `Push`, `Pop`, `Peek`, and `IsEmpty`, including their constant time complexity and the error conditions of Underflow and Overflow.
---
What's Next?
The stack is a building block for more complex topics. A solid foundation here will aid in understanding related data structures and algorithms.
- Queues: Contrast the stack's LIFO behavior with the queue's First-In, First-Out (FIFO) principle. Many problems involve using both or comparing their properties.
- Recursion: Every recursive function call is managed internally by the program's call stack. Understanding stacks demystifies how recursion works and helps in analyzing its space complexity.
- Expression Evaluation: Stacks are central to algorithms for converting expressions from infix to postfix or prefix notation, and for evaluating these notations directly. This is a classic and important application area.
---
---
Now that you understand Stacks, let's explore Queues which builds on these concepts.
---
Part 3: Queues
Introduction
In the study of data structures, the Queue represents a fundamental concept of ordered data collection. Analogous to a queue of people waiting for a service, this linear data structure adheres to a strict principle of operation known as First-In, First-Out (FIFO). Elements are added at one end, referred to as the rear, and removed from the opposite end, known as the front. This behavior makes queues indispensable in a multitude of computing scenarios, ranging from managing requests on a single shared resource, such as a printer, to implementing task scheduling algorithms in operating systems and traversing graph or tree structures.
Our examination of queues will proceed from the foundational principles to their practical implementations. We will explore the core operations that define a queue's behavior and analyze the common methods of implementation using both arrays and linked lists. Furthermore, we will investigate important variations, such as the circular queue, which optimizes space utilization, and the double-ended queue (deque), which provides enhanced flexibility by allowing operations at both ends. A thorough understanding of these concepts is essential, as they frequently form the basis for more complex algorithms and systems-level problems encountered in the GATE examination.
A Queue is an abstract data type (ADT) that functions as a linear collection of elements, characterized by the First-In, First-Out (FIFO) principle. In a queue, insertion of a new element, known as enqueue, occurs at one end, termed the rear (or tail), while the removal of an existing element, known as dequeue, occurs at the other end, termed the front (or head).
---
Key Concepts
1. Basic Queue Operations
A queue is primarily defined by two operations: enqueue and dequeue. To manage these operations, we typically maintain two pointers or indices: and .
- `front`: Points to the first element in the queue. This is the element that will be removed next.
- `rear`: Points to the last element in the queue. This is where the next new element will be inserted.
- `enqueue(element)`: Adds `element` to the `rear` of the queue.
- `dequeue()`: Removes and returns the element from the `front` of the queue.
- `peek()` (or `front()`): Returns the element at the `front` of the queue without removing it.
- `isEmpty()`: Returns true if the queue contains no elements.
- `isFull()`: Returns true if the queue has reached its maximum capacity (relevant for static, array-based implementations).
Worked Example:
Problem: An initially empty queue undergoes the following sequence of operations: `enqueue(5)`, `enqueue(12)`, `dequeue()`, `enqueue(8)`, `enqueue(9)`, `dequeue()`. What are the contents of the queue after these operations?
Solution:
Let be the queue, with the front on the left.
Step 1: Initial state
The queue is empty.
, ,
Step 2: `enqueue(5)`
The element 5 is added. and both point to it.
, ,
Step 3: `enqueue(12)`
The element 12 is added to the rear. is updated.
, ,
Step 4: `dequeue()`
The element at the front (5) is removed. is updated.
, ,
Step 5: `enqueue(8)`
The element 8 is added to the rear. is updated.
, ,
Step 6: `enqueue(9)`
The element 9 is added to the rear. is updated.
, ,
Step 7: `dequeue()`
The element at the front (12) is removed. is updated.
, ,
Answer: The final contents of the queue are .
---
2. Circular Queue
A naive array-based implementation of a queue can be inefficient. As elements are dequeued, the pointer moves forward, leaving unused space at the beginning of the array. This "drifting" effect means the queue can become "full" even when there are empty slots in the array.
The circular queue elegantly solves this problem by treating the array as a circle. When the or pointer reaches the end of the array, it wraps around to the beginning. This is achieved using the modulo operator.
Given a queue implemented with an array of size :
To enqueue an element:
To dequeue an element:
Variables:
- = capacity of the array
- = index of the front element
- = index of the rear element
When to use: For efficient, fixed-size queue implementations using arrays.
A critical aspect of circular queues is determining the `isFull` and `isEmpty` conditions. A common convention is to keep one slot empty to distinguish between a full and an empty queue.
- Empty Condition:
- Full Condition:
❌ A common error is to fill the entire array. If becomes equal to after an enqueue operation, this state is indistinguishable from the initial empty state.
✅ To avoid this ambiguity, a circular queue of size is considered full when it contains elements. The condition correctly identifies this state.
Worked Example:
Problem: Consider a circular queue of size 5, initially empty (, ). Trace the state of and after the following operations: `enqueue(A)`, `enqueue(B)`, `enqueue(C)`, `dequeue()`, `enqueue(D)`, `enqueue(E)`.
Solution:
Let the array size be . is the array.
Step 1: Initial state
, . Queue is empty.
Step 2: `enqueue(A)`
. .
, .
Step 3: `enqueue(B)`
. .
, .
Step 4: `enqueue(C)`
. .
, .
Step 5: `dequeue()`
The element at , which is , is removed.
.
, .
(A is removed)
Step 6: `enqueue(D)`
. .
, .
Step 7: `enqueue(E)`
. .
, .
Answer: The final state is , . The queue is now full, as , which is equal to . The elements in order are B, C, D, E.
---
3. Double-Ended Queue (Deque)
A double-ended queue, or deque (pronounced "deck"), is a generalization of a queue that allows for insertion and deletion of elements at both the front and the rear. This makes it a more versatile data structure.
The fundamental operations of a deque are:
- `insertFirst(e)`: Inserts element `e` at the beginning.
- `insertLast(e)`: Inserts element `e` at the end (equivalent to `enqueue`).
- `removeFirst()`: Removes and returns the element from the beginning (equivalent to `dequeue`).
- `removeLast()`: Removes and returns the element from the end.
Worked Example:
Problem: An initially empty deque undergoes the following sequence of operations. Trace the contents of and determine the final value of the variable .
`insertLast(20)`
`insertFirst(10)`
`insertLast(30)`
`insertFirst(5)`
Solution:
Let us trace the state of the deque , with the front on the left.
Step 1: Initial state
is empty.
Step 2: `insertLast(20)`
20 is added to the rear.
Step 3: `insertFirst(10)`
10 is added to the front.
Step 4: `insertLast(30)`
30 is added to the rear.
Step 5:
The last element, 30, is removed and assigned to .
Step 6: `insertFirst(5)`
5 is added to the front.
Step 7:
The first element, 5, is removed and assigned to .
Answer: The final value of is 5.
---
Problem-Solving Strategies
When faced with a sequence of queue or deque operations, do not attempt to solve it mentally. This is highly error-prone.
- Draw It Out: On your rough sheet, draw a series of boxes representing the array or linked list. For an array of size , draw boxes and label indices to .
- Maintain Pointers: Keep track of the and indices. Write their current values and update them for each operation.
- Update Contents: Cross out removed elements and write in new elements as they are enqueued. For circular queues, be meticulous with the modulo arithmetic: .
- Track Variables: If the question involves assigning dequeued values to variables (like ), maintain a separate section on your sheet to track the current value of each variable.
This methodical, visual approach minimizes calculation errors under exam pressure.
---
---
Common Mistakes
- ❌ Confusing FIFO with LIFO: The most fundamental error is mixing up the behavior of a queue (First-In, First-Out) with that of a stack (Last-In, First-Out). Always remember: in a queue, the element that has been in the collection the longest is the first one to be removed.
- ❌ Incorrect Circular Queue Conditions: For a circular array of size with `front` and `rear` pointers, it's easy to misidentify the full/empty conditions.
- ❌ Forgetting Deque Flexibility: Students sometimes treat a deque like a standard queue, only considering operations at opposite ends.
---
Practice Questions
:::question type="MCQ" question="A data structure allows insertion only at the rear and deletion from both the front and the rear. This structure is a specific type of:" options=["Input-restricted deque", "Output-restricted deque", "Stack", "Circular Queue"] answer="Input-restricted deque" hint="Consider which operations are restricted. 'Input' refers to insertion, and 'Output' refers to deletion." solution="A standard deque allows insertion and deletion at both ends. An 'input-restricted' deque restricts insertions to one end (in this case, the rear). An 'output-restricted' deque restricts deletions to one end. Since this structure allows deletion from both ends but insertion only at one, it is an input-restricted deque.
Answer: \boxed{Input-restricted deque}"
:::
:::question type="NAT" question="An initially empty circular queue of capacity 6 (indices 0 to 5) is managed with `front` and `rear` pointers, both initialized to 0. After the following sequence of operations, what is the value of the `front` pointer?
`enqueue(10)`, `enqueue(20)`, `enqueue(30)`, `dequeue()`, `enqueue(40)`, `enqueue(50)`, `dequeue()`, `enqueue(60)`" answer="2" hint="Trace the values of front and rear using modulo arithmetic with N=6. Remember that dequeue updates the front pointer." solution="
Initial: `front = 0`, `rear = 0`
1. enqueue(10): `rear = `.
2. enqueue(20): `rear = `.
3. enqueue(30): `rear = `.
`front=0`, `rear=3`. Contents: `[10, 20, 30]`
4. dequeue(): `front = `.
`front=1`, `rear=3`. Contents: `[20, 30]`
5. enqueue(40): `rear = `.
6. enqueue(50): `rear = `.
`front=1`, `rear=5`. Contents: `[20, 30, 40, 50]`
7. dequeue(): `front = `.
`front=2`, `rear=5`. Contents: `[30, 40, 50]`
8. enqueue(60): `rear = `.
`front=2`, `rear=0`. Contents: `[30, 40, 50, 60]`
Result: The final value of the `front` pointer is 2.
Answer: \boxed{2}"
:::
:::question type="MSQ" question="Which of the following statements about queue implementations are correct?" options=["A linked-list implementation of a queue has a worst-case time complexity of for both enqueue and dequeue operations.", "An array-based implementation of a simple (non-circular) queue can lead to wasted space.", "A deque can be used to implement a stack.", "In a circular queue of size , the maximum number of elements that can be stored is always ."] answer="A,B,C" hint="Analyze each implementation's properties. Consider time complexity, space efficiency, and functional equivalence." solution="
- A: Correct. In a linked-list implementation with both `head` and `tail` pointers, enqueue (adding to tail) and dequeue (removing from head) are operations.
- B: Correct. In a simple array-based queue, as elements are dequeued, the `front` pointer moves forward, leaving the space at the beginning of the array unusable.
- C: Correct. A stack (LIFO) can be implemented using a deque by performing all insertions (`insertFirst`) and deletions (`removeFirst`) at the same end.
- D: Incorrect. To distinguish between a full and an empty state, a circular queue of size can typically hold a maximum of elements.
:::
:::question type="NAT" question="In an empty double-ended queue, the following operations are performed sequentially:
`insertFirst(5)`
`insertLast(15)`
`insertFirst(2)`
`v = removeLast()`
`insertLast(25)`
`v = removeFirst()`
`insertFirst(10)`
`v = removeLast()`
The final value of the variable `v` is _____.
" answer="25" hint="Carefully trace the state of the deque and the value of `v` after each removal operation." solution="
Step 1: Initial state
`D: []`
Step 2: `insertFirst(5)`
`D: [5]`
Step 3: `insertLast(15)`
`D: [5, 15]`
Step 4: `insertFirst(2)`
`D: [2, 5, 15]`
Step 5: `v = removeLast()`
`v` becomes 15. `D` becomes `[2, 5]`.
Step 6: `insertLast(25)`
`D: [2, 5, 25]`
Step 7: `v = removeFirst()`
`v` becomes 2. `D` becomes `[5, 25]`.
Step 8: `insertFirst(10)`
`D: [10, 5, 25]`
Step 9: `v = removeLast()`
`v` becomes 25. `D` becomes `[10, 5]`.
Result: The final value assigned to `v` is 25.
Answer: \boxed{25}"
:::
---
Summary
- FIFO Principle: The defining characteristic of a queue is its First-In, First-Out (FIFO) nature. This contrasts directly with the Last-In, First-Out (LIFO) principle of a stack.
- Core Operations: The fundamental operations are `enqueue` (add to rear) and `dequeue` (remove from front). Be proficient in tracing the state of a queue through a series of these operations.
- Circular Queue Efficiency: The circular queue is the standard array-based implementation. Master the modulo arithmetic for pointer updates (`(index + 1) % N`) and the conditions for empty (`front == rear`) and full (`(rear + 1) % N == front`).
- Deque Versatility: The double-ended queue (deque) is a powerful generalization, allowing insertion and deletion at both ends. Understand its four primary operations: `insertFirst`, `insertLast`, `removeFirst`, `removeLast`.
---
What's Next?
This topic connects to several other crucial areas in the GATE syllabus. Mastering queues is a prerequisite for understanding:
- Stacks: The stack is the logical counterpart to the queue (LIFO vs. FIFO). Questions often test the differences between them.
- Breadth-First Search (BFS): The BFS algorithm for traversing trees and graphs relies fundamentally on a queue to manage the nodes to be visited.
- Operating Systems: Queues are used extensively in OS for scheduling. For example, the First-Come, First-Served (FCFS) CPU scheduling algorithm is a direct implementation of a queue.
Master these connections for a comprehensive understanding of how data structures are applied in algorithms and systems.
---
Now that you understand Queues, let's explore Trees which builds on these concepts.
---
Part 4: Trees
Introduction
In the study of data structures, we transition from linear collections, such as arrays and linked lists, to more complex, non-linear arrangements of data. Among these, the tree stands as a structure of paramount importance. A tree is an abstract data type that simulates a hierarchical structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes. Its non-linear nature allows for the representation of data with intrinsic parent-child relationships, a feature not elegantly captured by linear structures.
The utility of trees is pervasive in computer science. They form the basis for file systems, where directories branch into subdirectories and files. They are fundamental to the organization of data for efficient searching, as seen in Binary Search Trees and their balanced variants. Furthermore, trees are instrumental in parsing expressions in compilers, representing decision processes in artificial intelligence, and as the underlying structure for more advanced data structures like heaps and tries. A firm grasp of tree terminology, properties, and traversal algorithms is therefore indispensable for a competitive examination such as GATE.
A tree is a finite set of one or more nodes such that there is a specially designated node called the root, and the remaining nodes are partitioned into disjoint sets , where each of these sets is a tree. The sets are called the subtrees of the root. Formally, a tree can be defined as a connected, acyclic, undirected graph. A rooted tree is a tree in which one vertex has been designated as the root.
---
Key Concepts
1. Fundamental Terminology of Rooted Trees
To discuss trees with precision, we must first establish a common vocabulary. Let us consider a general rooted tree to illustrate these fundamental concepts.
* Node: The basic entity in a tree that contains data and may link to other nodes.
* Edge: A connection between two nodes.
* Root: The topmost node in a tree, which has no parent. A tree has exactly one root.
* Parent: A node that has an edge to a child node. Every node except the root has exactly one parent.
* Child: A node that has an edge from a parent node.
* Siblings: Nodes that share the same parent.
* Leaf Node (or Terminal Node): A node with no children.
* Internal Node (or Non-Terminal Node): A node with at least one child.
* Path: A sequence of nodes and edges connecting a node with a descendant.
* Depth of a Node: The number of edges from the root to the node. The depth of the root is 0.
* Level of a Node: The level of a node is defined by . The level of the root is 1. (Note: Some conventions start levels at 0).
* Height of a Node: The number of edges on the longest path from the node to a leaf. The height of a leaf node is 0.
* Height of a Tree: The height of its root node. The height of a tree with a single node is 0.
The following diagram illustrates these relationships within a sample rooted tree.
---
2. Binary Trees
A particularly important and frequently encountered type of tree is the binary tree.
A binary tree is a rooted tree in which each node has at most two children, which are referred to as the left child and the right child. A node's left child is the root of its left subtree, and its right child is the root of its right subtree.
Several special classes of binary trees are defined based on their structure.
* Full Binary Tree: A binary tree in which every node has either 0 or 2 children. There are no nodes with only one child.
* Complete Binary Tree: A binary tree in which all levels are completely filled except possibly the last level, and the last level has all its nodes as far left as possible. This structure is critical for implementing priority queues using heaps.
* Perfect Binary Tree: A binary tree in which all internal nodes have two children and all leaf nodes are at the same level. A perfect binary tree of height has exactly nodes.
* Skewed Binary Tree: A binary tree where each node is connected to only one child (either left or right). This degenerates into a structure resembling a linked list.
---
3. Properties of Binary Trees
Several properties and inequalities relate the number of nodes, leaves, internal nodes, and height. These are frequently tested in competitive examinations. Let be the total number of nodes, be the number of leaf nodes, be the number of internal nodes, and be the height of the binary tree.
Variables:
- = Total number of nodes
- = Number of leaf nodes
- = Number of internal nodes
When to use: This fundamental identity holds for any non-empty tree and is useful for relating the different node counts.
A particularly powerful property relates the number of leaf nodes to the number of internal nodes in a full binary tree.
Variables:
- = Number of leaf nodes
- = Number of internal nodes
When to use: This applies only to non-empty full binary trees. It provides a direct relationship between the tree's branching structure and its leaves. For any non-empty binary tree (not necessarily full), the inequality holds.
The height of a binary tree imposes strict bounds on the total number of nodes it can contain.
Variables:
- = Height of the tree (height of a single node is 0)
- = Total number of nodes
- The minimum number of nodes, , occurs in a skewed binary tree.
- The maximum number of nodes, , occurs in a perfect binary tree.
When to use: This inequality is crucial for analyzing the efficiency of tree-based algorithms.
From these primary relationships, we can derive bounds for other quantities. For instance, the number of leaf nodes is bounded by . The minimum of 1 leaf occurs in a skewed tree, while the maximum of leaves occurs in a perfect binary tree.
Worked Example:
Problem: A full binary tree has 25 leaf nodes. Calculate the total number of nodes in the tree.
Solution:
Step 1: Identify the given information and the type of tree.
We are given a full binary tree with .
Step 2: Apply the property relating leaf nodes and internal nodes for a full binary tree.
Step 3: Solve for the number of internal nodes, .
Step 4: Calculate the total number of nodes, , using the fundamental identity .
Answer:
---
4. Tree Traversal Algorithms
Traversing a tree means visiting every node in a specified order. For binary trees, the three most common depth-first traversal methods are Inorder, Preorder, and Postorder. The names are derived from the position of the root node's visit relative to its left and right subtrees.
Let us consider a node , its left subtree , and its right subtree .
* Traverse the left subtree: Visit()
* Visit the root: Visit()
* Traverse the right subtree: Visit()
Application:* In a Binary Search Tree (BST), an inorder traversal visits the nodes in ascending sorted order.
* Visit the root: Visit()
* Traverse the left subtree: Visit()
* Traverse the right subtree: Visit()
Application:* Useful for creating a copy of the tree or for obtaining the prefix expression of an expression tree.
* Traverse the left subtree: Visit()
* Traverse the right subtree: Visit()
* Visit the root: Visit()
Application:* Essential for deleting nodes in a tree to ensure children are deleted before their parent, or for obtaining the postfix expression of an expression tree.
Worked Example:
Problem: For the binary tree shown below, determine the Inorder, Preorder, and Postorder traversals.
Solution:
1. Inorder Traversal (LNR):
- Start at root F. Go left to B.
- At B, go left to A. A is a leaf, visit A.
- Return to B, visit B.
- At B, go right to D.
- At D, go left to C. C is a leaf, visit C.
- Return to D, visit D.
- At D, go right to E. E is a leaf, visit E.
- Left subtree of F is done. Visit F.
- Go right to G.
- At G, go left to I. I is a leaf, visit I.
- Return to G, visit G.
- At G, go right to H. H is a leaf, visit H.
Result (Inorder): A, B, C, D, E, F, I, G, H
2. Preorder Traversal (NLR):
- Start at root F, visit F.
- Go left to B, visit B.
- At B, go left to A, visit A.
- Return to B, go right to D, visit D.
- At D, go left to C, visit C.
- Return to D, go right to E, visit E.
- Return to F, go right to G, visit G.
- At G, go left to I, visit I.
- Return to G, go right to H, visit H.
Result (Preorder): F, B, A, D, C, E, G, I, H
3. Postorder Traversal (LRN):
- Start at root F. Go left to B.
- At B, go left to A. A is a leaf, visit A.
- Return to B, go right to D.
- At D, go left to C. C is a leaf, visit C.
- Return to D, go right to E. E is a leaf, visit E.
- Return to D, both children visited, visit D.
- Return to B, both children visited, visit B.
- Return to F. Go right to G.
- At G, go left to I. I is a leaf, visit I.
- Return to G, go right to H. H is a leaf, visit H.
- Return to G, both children visited, visit G.
- Return to F, both children visited, visit F.
Result (Postorder): A, C, E, D, B, I, H, G, F
---
5. Reconstruction of Binary Trees
A common and important problem is to reconstruct a binary tree given its traversal sequences. The ability to do so depends on the specific traversals provided.
To uniquely reconstruct a general binary tree, a combination of Inorder traversal with either Preorder or Postorder traversal is required.
- The Preorder or Postorder traversal is used to identify the root of any subtree.
- The Inorder traversal is used to determine the nodes in the left and right subtrees of that root.
Reconstruction from Preorder and Inorder:
Reconstruction from Postorder and Inorder:
The logic is analogous.
The Ambiguity of Preorder and Postorder:
A combination of Preorder and Postorder traversals is not sufficient to uniquely reconstruct a general binary tree. This is because ambiguity arises when a node has only one child. We cannot determine if it is a left or a right child.
Special Case: Full Binary Trees
This ambiguity is resolved if the tree is known to be a full binary tree. In a full binary tree, every node has either 0 or 2 children. Therefore, if an internal node is known to have a child, it must have two. This constraint makes reconstruction from Preorder and Postorder traversals possible and unique.
---
Problem-Solving Strategies
- Reconstruction Problems: Always identify the root first. For Preorder, it's the first element. For Postorder, it's the last. Use this root to partition the Inorder sequence. This divide-and-conquer approach is the fastest way to solve reconstruction problems.
- Property-based Inequalities: When faced with questions involving inequalities on , test the extreme cases:
Minimum Nodes / Maximum Height: A skewed tree ().
Maximum Nodes / Minimum Height: A perfect or complete binary tree ().
By plugging in values for these two cases, you can often quickly validate or eliminate options.
---
Common Mistakes
- ❌ Applying Full Tree Properties to General Trees: The property is only guaranteed for non-empty full binary trees. Do not assume it holds for all binary trees.
- ❌ Assuming Preorder + Postorder is Sufficient: For a general binary tree, Preorder and Postorder traversals are not enough for unique reconstruction. This combination is only sufficient if you are explicitly told the tree is a full binary tree.
- ❌ Confusing Height and Levels: Remember the standard convention: height is the number of edges on the longest path from the root to a leaf. A single-node tree has height . The number of levels is . Be mindful of the definitions used in the question.
---
Practice Questions
:::question type="MCQ" question="The inorder and preorder traversals of a binary tree are D, B, H, E, A, F, C, G and A, B, D, E, H, C, F, G respectively. What is the postorder traversal of the tree?" options=["D, H, E, B, F, G, C, A","D, H, E, B, G, F, C, A","H, E, D, B, F, G, C, A","H, D, E, B, G, F, C, A"] answer="D, H, E, B, G, F, C, A" hint="First, construct the tree. The root is the first element of the preorder traversal. Use the inorder traversal to find the left and right subtrees." solution="
Step 1: Identify the root.
From the preorder traversal (A, B, D, E, H, C, F, G), the root is 'A'.
Step 2: Partition the inorder traversal using the root.
The inorder traversal is (D, B, H, E, A, F, C, G).
Elements to the left of 'A': D, B, H, E. This is the left subtree.
Elements to the right of 'A': F, C, G. This is the right subtree.
Step 3: Reconstruct the left subtree.
Preorder for left subtree: B, D, E, H
Inorder for left subtree: D, B, H, E
The root of this subtree is 'B'.
Left of 'B' in inorder: D. Right of 'B': H, E.
The left child of 'B' is 'D'.
For the right subtree of 'B': Preorder is E, H; Inorder is H, E. The root is 'E'. Left of 'E' is 'H'. So 'H' is the left child of 'E'.
Step 4: Reconstruct the right subtree.
Preorder for right subtree: C, F, G
Inorder for right subtree: F, C, G
The root of this subtree is 'C'.
Left of 'C' in inorder: F. Right of 'C': G.
The left child of 'C' is 'F' and the right child is 'G'.
Step 5: The final tree structure is:
Root: A
A's left child: B
A's right child: C
B's left child: D
B's right child: E
E's left child: H
C's left child: F
C's right child: G
Step 6: Perform a postorder (LRN) traversal on the constructed tree.
Visit left subtree of A (subtree B):
Visit left subtree of B (D): visit D
Visit right subtree of B (subtree E):
Visit left subtree of E (H): visit H
Visit right subtree of E (null)
Visit E
Visit B
Visit right subtree of A (subtree C):
Visit left subtree of C (F): visit F
Visit right subtree of C (G): visit G
Visit C
Visit A
Result: The postorder traversal is D, H, E, B, G, F, C, A.
Answer: \boxed{D, H, E, B, G, F, C, A}
"
:::
:::question type="NAT" question="What is the maximum number of nodes in a binary tree of height 5? (Assume the height of a tree with a single node is 0)." answer="63" hint="Use the formula that relates the maximum number of nodes to the height of a binary tree. This occurs when the tree is perfect." solution="
Step 1: Recall the formula for the maximum number of nodes in a binary tree of height .
Step 2: Substitute the given height into the formula.
Step 3: Calculate the final value.
Result: The maximum number of nodes is 63.
Answer: \boxed{63}
"
:::
:::question type="MSQ" question="Let a non-empty binary tree have nodes and leaf nodes. Which of the following statements is/are ALWAYS TRUE?" options=["If the tree is full, then must be odd.","The number of nodes with one child is at most 1.","The number of edges is .","If the tree is complete, ."] answer="A,C,D" hint="Consider the properties of full and complete binary trees, and the basic definition of a tree as a graph." solution="
- Option A: In a full binary tree, every internal node has 2 children. Let be the number of internal nodes. We know . The total number of nodes is . Since is an integer, is even, and must be odd. Thus, this statement is TRUE.
- Option B: A binary tree can have many nodes with one child. For example, a skewed tree where every node has one child. This statement is FALSE.
- Option C: A tree is a connected, acyclic graph. A fundamental property of any tree (not just binary) with nodes is that it has exactly edges. This statement is TRUE.
- Option D: In a complete binary tree, the nodes on the last level are filled from left to right. The parent of any node at index (in an array representation) is at . The leaf nodes begin at index . The number of leaf nodes is therefore . Thus, . The condition is therefore always met. This statement is TRUE.
"
:::
:::question type="MCQ" question="Which of the following pairs of traversals is sufficient to uniquely reconstruct a full binary tree?" options=["Preorder and Level-order","Inorder and Level-order","Postorder and Preorder","Inorder and Postorder"] answer="Postorder and Preorder" hint="Recall the special case for tree reconstruction. While Inorder is generally required, a specific type of tree has a special property." solution="
For a general binary tree, Inorder traversal is required along with either Preorder or Postorder. This is because Inorder uniquely partitions the nodes into left and right subtrees.
However, the question specifies a full binary tree. In a full binary tree, every node has either 0 or 2 children. This property removes the ambiguity that normally exists when using only Preorder and Postorder. If a node in the traversal sequence implies a child, we know it must have two children (unless it's a leaf), allowing for unique reconstruction.
Therefore, for a full binary tree, the pair of Postorder and Preorder traversals is sufficient. Inorder combined with another traversal (like Level-order or Postorder) is also sufficient, but the question asks for a pair that becomes sufficient because the tree is full. The pair (Postorder, Preorder) uniquely fits this description.
Answer: \boxed{Postorder and Preorder}
"
:::
---
Summary
- Traversal Algorithms are Foundational: You must be able to derive Preorder (NLR), Inorder (LNR), and Postorder (LRN) traversals for any given binary tree quickly and accurately.
- Tree Reconstruction Rules: For a general binary tree, unique reconstruction requires (Inorder + Preorder) or (Inorder + Postorder). For a full binary tree, (Preorder + Postorder) is also sufficient.
- Key Property Formulas: Commit to memory the relationships between nodes and height:
(always holds for non-empty trees).
(for non-empty full binary trees).
* (bounds on nodes for a given height ).
These are frequently used to solve numerical and property-based questions.
---
What's Next?
This topic provides the essential foundation for more specialized tree structures. Your preparation should now extend to:
- Binary Search Trees (BSTs): A crucial ordered tree structure where the Inorder traversal yields a sorted sequence of nodes. Understanding their properties, operations (search, insert, delete), and balancing mechanisms (like AVL trees) is the next logical step.
- Heaps: Heaps are a specific type of tree (usually a complete binary tree) that satisfy the heap property. They are fundamental for implementing priority queues and for algorithms like HeapSort.
- Graph Algorithms: Trees are a specific type of graph (connected and acyclic). The depth-first traversal methods learned here are a direct precursor to Depth-First Search (DFS) in graphs. Understanding trees first makes the transition to general graphs much more intuitive.
---
Now that you understand Trees, let's explore Hash Tables which builds on these concepts.
---
Part 5: Hash Tables
Introduction
In the study of data structures, the efficiency of data retrieval is a paramount concern. While structures like arrays offer constant-time access given an index, they are inefficient for searching for a specific value. Conversely, balanced binary search trees provide logarithmic-time search, insertion, and deletion, which is efficient but not optimal. We now turn our attention to a data structure that, under reasonable assumptions, can perform all these fundamental operations in average-time complexity of : the hash table.
A hash table is a powerful data structure that implements an associative array, also known as a dictionary or map. It maps keys to values, enabling rapid retrieval of a value when its corresponding key is known. This is achieved by using a special function, the hash function, to compute an index into an array of buckets or slots, from which the desired value can be found. The remarkable average-case efficiency of hash tables makes them indispensable in a vast array of applications, including database indexing, caching systems, symbol tables in compilers, and the implementation of sets and maps in modern programming languages. For the GATE examination, a firm grasp of their mechanics, collision resolution strategies, and performance analysis is essential.
A Hash Table is a data structure that stores key-value pairs. It uses a hash function to compute an index, or hash code, from a key, into an array of slots. The key is then stored at this index. Ideally, the hash function distributes keys uniformly across the array, allowing for average-time complexity of for insertion, deletion, and search operations.
---
Key Components of a Hash Table
The efficacy of a hash table rests upon two primary components: the hash function and the underlying array structure.
1. The Hash Function
The hash function is the core of the hash table. It is a mathematical function that takes a key as input and transforms it into an integer index that falls within the bounds of the table's array.
Let be the universe of all possible keys and be the number of slots in the hash table, indexed . A hash function is a mapping:
Properties of a Good Hash Function:
* Determinism: For a given key , the hash function must always produce the same index.
* Uniform Distribution: The function should map the expected keys as evenly as possible over the available slots. This minimizes collisions.
* Efficiency: The function must be fast to compute.
A commonly used and simple method for hashing integer keys is the Division Method.
Variables:
- = The key to be hashed.
- = The size of the hash table (number of slots).
When to use: This is a general-purpose and frequently encountered hashing technique in problems. It is simple to implement and often effective, provided is chosen carefully (e.g., a prime number not too close to a power of 2).
2. The Table (Array of Slots)
The table is simply an array of size where the data is stored. Each position in the array is referred to as a slot or bucket. The index computed by the hash function directly corresponds to a slot in this array.
---
Collision Resolution Strategies
A collision occurs when two distinct keys, and , are mapped to the same slot by the hash function, i.e., where . Since only one element can be stored in a single slot, we require a systematic method for handling such events. There are two primary strategies for collision resolution.
1. Separate Chaining
In separate chaining, each slot in the hash table does not hold an element directly, but rather a pointer to a data structure (typically a linked list) that stores all elements that have hashed to that slot.
The load factor, , is defined as , where is the number of elements and is the number of slots. For separate chaining, can be less than, equal to, or greater than 1. The average length of a chain is .
2. Open Addressing
In open addressing, all elements are stored directly within the hash table itself. When a collision occurs, we systematically probe the table for an empty slot where the new key can be placed. The probe sequence depends on the key and the probe number (for ).
The hash function is extended to include the probe number:
For open addressing, the load factor must always be less than 1 (), as the table cannot store more elements than it has slots.
There are several common methods for computing the probe sequences.
a. Linear Probing
This is the simplest method. If the initial slot is occupied, we try the next slot, and then the next, and so on, wrapping around the table if necessary.
The probe sequence is defined by:
where is the initial hash function (e.g., ) and .
A significant drawback of linear probing is primary clustering, where long runs of occupied slots build up, increasing the average search time.
Worked Example:
Problem: Consider a hash table of size with the hash function . Insert the keys 58, 28, 49, 18, 19 using linear probing.
Solution:
The hash table is initially empty: `[ , , , , , , , , , ]`
1. Insert 58:
. Slot 8 is empty.
`[ , , , , , , , , 58, ]`
2. Insert 28:
. Slot 8 is occupied.
. Slot 9 is empty.
`[ , , , , , , , , 58, 28]`
3. Insert 49:
. Slot 9 is occupied.
. Slot 0 is empty.
`[49, , , , , , , , 58, 28]`
4. Insert 18:
. Slot 8 is occupied.
. Slot 9 is occupied.
. Slot 0 is occupied.
. Slot 1 is empty.
`[49, 18, , , , , , , 58, 28]`
5. Insert 19:
. Slot 9 is occupied.
. Slot 0 is occupied.
. Slot 1 is occupied.
. Slot 2 is empty.
`[49, 18, 19, , , , , , 58, 28]`
Answer: \boxed{Final Table State: `[49, 18, 19, , , , , , 58, 28]`}
b. Quadratic Probing
Quadratic probing aims to alleviate primary clustering. The probe sequence is defined by:
where and are constants (). A common choice is . This method can lead to secondary clustering, where keys that initially hash to the same slot will follow the same probe sequence.
c. Double Hashing
Double hashing is one of the most effective open addressing methods. It uses a second hash function, , to determine the step size for the probe sequence.
The second hash function must be chosen carefully, such that is relatively prime to for all keys . This ensures that all slots in the table are eventually probed.
---
Performance Analysis of Open Addressing
The performance of open addressing is heavily dependent on the load factor . As approaches 1, the table becomes full, and finding an empty slot becomes increasingly difficult. The analysis typically relies on the assumption of uniform hashing, an idealization where each key is equally likely to have any of the permutations of as its probe sequence.
For an open-address hash table with elements and slots, the load factor is the ratio of the number of elements to the number of slots.
It represents how full the table is. For open addressing, we must have .
The key performance metric is the expected number of probes required for an operation.
Under the assumption of uniform hashing, the expected number of probes in an unsuccessful search is at most:
Variables:
- = The load factor .
When to use: Use this formula to calculate the average number of slots that must be examined to determine that a key is not in the table. This is also used for insertions.
Under the assumption of uniform hashing, the expected number of probes in a successful search is at most:
Variables:
- = The load factor .
When to use: Use this formula to calculate the average number of slots that must be examined to find a key that is in the table.
To insert a key into an open-address hash table, we must first search for an empty slot. This process is identical to an unsuccessful search for that key. If the key were already present, the insertion would not proceed (assuming no duplicates). The search for an empty slot terminates once one is found. Therefore, the expected number of probes required for an insertion is equal to the expected number of probes for an unsuccessful search.
---
Problem-Solving Strategies
When asked to trace insertions with a given collision resolution scheme (like linear probing):
- Draw the table: Create a visual representation of the array slots, e.g., `[0, 1, 2, ..., m-1]`.
- Process one key at a time: For each key in the sequence:
- Be meticulous: Keep track of the table state after each insertion. A small calculation error can cascade.
a. Calculate the initial hash index: .
b. Check if the slot at that index is empty. If yes, place the key and move to the next key in the sequence.
c. If the slot is occupied, a collision has occurred. Apply the probing formula for until an empty slot is found. For linear probing, this is simply .
For questions about the expected number of probes:
- Identify the operation: Is it a successful search, an unsuccessful search, or an insertion?
- Calculate the load factor: Find (number of elements) and (table size) to compute .
- Apply the correct formula:
For unsuccessful search or insertion, use .
For successful search, use .
* Remember that these formulas are based on the idealization of uniform hashing.
---
Common Mistakes
- ❌ Forgetting to apply the modulo operator after each step in a probe sequence. For example, in linear probing, calculating without taking the modulus at each step.
- ✅ The correct form is . The result must always be a valid index from to .
- ❌ Calculating the load factor incorrectly as .
- ✅ The load factor is always the number of elements divided by the number of slots: . It represents the "fullness" of the table.
- ❌ Assuming an insertion takes the same number of probes as a successful search.
- ✅ An insertion requires finding an empty slot, which is equivalent to an unsuccessful search. Therefore, its performance is governed by the formula for unsuccessful searches, not successful ones.
---
Practice Questions
:::question type="MCQ" question="A hash table of size 10 uses the hash function and linear probing. After inserting the keys 2, 12, 22, 32, 5, 15 in that order, where is the key 25 subsequently placed?" options=["Slot 6","Slot 7","Slot 8","Slot 9"] answer="Slot 8" hint="Trace the insertion of each key one by one, resolving collisions with linear probing. Then, find the first empty slot for the final key, starting from its initial hash position." solution="
Step 1: Initialize the hash table of size 10. `T = [ , , , , , , , , , ]`
Step 2: Insert the key 2.
. Slot 2 is empty.
`T[2] = 2`.
Step 3: Insert the key 12.
. Slot 2 is occupied.
Probe 1: . Slot 3 is empty.
`T[3] = 12`.
Step 4: Insert the key 22.
. Slot 2 is occupied.
Probe 1: . Slot 3 is occupied.
Probe 2: . Slot 4 is empty.
`T[4] = 22`.
Step 5: Insert the key 32.
. Slots 2, 3, 4 are occupied.
Probe 3: . Slot 5 is empty.
`T[5] = 32`.
Step 6: Insert the key 5.
. Slot 5 is occupied.
Probe 1: . Slot 6 is empty.
`T[6] = 5`.
Step 7: Insert the key 15.
. Slots 5, 6 are occupied.
Probe 2: . Slot 7 is empty.
`T[7] = 15`.
Step 8: Insert the final key 25.
. Slots 5, 6, 7 are occupied.
Probe 3: . Slot 8 is empty.
`T[8] = 25`.
Answer: \boxed{Slot 8}
"
:::
:::question type="NAT" question="An open-address hash table of size 1000 stores 750 elements. Assuming uniform hashing, what is the expected number of probes required for an insertion?" answer="4" hint="An insertion requires the same number of probes as an unsuccessful search. First, calculate the load factor." solution="
Step 1: Identify the given parameters.
Number of elements, .
Number of slots, .
Step 2: Calculate the load factor .
Step 3: Apply the formula for the expected number of probes for an insertion. The expected number of probes for an insertion is the same as for an unsuccessful search.
Step 4: Substitute the value of and compute the result.
Answer: \boxed{4}
"
:::
:::question type="MSQ" question="Which of the following statements about hash tables are correct?" options=["In open addressing, the load factor must be strictly less than 1.","Linear probing is susceptible to a phenomenon called primary clustering.","Separate chaining cannot have a load factor greater than 1.","Double hashing uses two different hash functions to resolve collisions."] answer="In open addressing, the load factor must be strictly less than 1.,Linear probing is susceptible to a phenomenon called primary clustering.,Double hashing uses two different hash functions to resolve collisions." hint="Evaluate each statement based on the fundamental properties of separate chaining and open addressing collision resolution methods." solution="
- Option A: Correct. In open addressing, elements are stored in the table itself. If , the table is full, and an insertion of a new element is impossible. Thus, the load factor must be less than 1.
- Option B: Correct. Primary clustering occurs when keys that hash to different initial slots begin to form a single large cluster. This happens in linear probing because any collision in the cluster's vicinity will add to the end of the cluster, making it longer.
- Option C: Incorrect. In separate chaining, each slot is a pointer to a linked list (or other data structure). The number of elements can exceed the number of slots , resulting in a load factor . This simply means the average length of the linked lists is greater than 1.
- Option D: Correct. Double hashing uses a primary hash function to find the initial slot and a secondary hash function to determine the step size for probing, resulting in a probe sequence of .
"
:::
:::question type="NAT" question="Consider a hash table of size 13 with hash function . Linear probing is used for collision resolution. After inserting the keys 14, 27, 2, 15 in that order, the key 15 is stored at which index?" answer="4" hint="Trace the insertions carefully. Note that 14, 27, 2, and 15 will cause collisions." solution="
We simulate the insertions step-by-step into the hash table of size 13 (indices 0-12). The hash function is and collision resolution is linear probing.
Step 1: Insert key 14.
. Slot 1 is empty.
`T[1] = 14`.
Step 2: Insert key 27.
. Slot 1 is occupied.
Probe 1: . Slot 2 is empty.
`T[2] = 27`.
Step 3: Insert key 2.
. Slot 2 is occupied.
Probe 1: . Slot 3 is empty.
`T[3] = 2`.
Step 4: Insert key 15.
. Slot 2 is occupied.
Probe 1: . Slot 3 is occupied.
Probe 2: . Slot 4 is empty.
`T[4] = 15`.
Answer: \boxed{4}
"
:::
---
Summary
- Fundamental Goal: Hash tables aim to provide average-time complexity for search, insertion, and deletion by mapping keys to array indices via a hash function.
- Collisions are Inevitable: When two keys map to the same index, a collision occurs. The strategy for resolving these collisions is critical.
- Open Addressing vs. Chaining: Understand the two main collision resolution paradigms. For open addressing (linear probing, etc.), all elements are in the table, and . For separate chaining, each slot can hold multiple elements (e.g., in a list), and can be .
- Linear Probing Mechanics: For GATE problems, be prepared to manually trace insertions using linear probing: .
- Performance is Tied to Load Factor: The performance of open addressing degrades as the load factor approaches 1. The expected number of probes for an insertion is the same as for an unsuccessful search, which is at most .
---
What's Next?
This topic connects to:
- Trees (BSTs and Balanced Trees): Compare the performance guarantees. Hash tables offer better average-case performance (), but trees like AVL or Red-Black trees provide a worst-case guarantee of , which hash tables lack (worst-case is ).
- Complexity Analysis: Hash tables are a classic case study in amortized and average-case analysis, showing how a structure can be highly efficient in practice despite a poor worst-case bound.
- Database Indexing: In database systems, hash indexing is a common method for building indices to allow for rapid retrieval of records based on a key value. Understanding hash tables is fundamental to understanding this aspect of database design.
Master these connections for a comprehensive understanding of how different data structures are chosen for specific computational problems.
---
Chapter Summary
In this chapter, we have undertaken a systematic study of the fundamental data structures that form the bedrock of efficient algorithm design. A thorough command of these concepts is not merely beneficial but essential for success in the GATE examination. From our detailed discussion, we distill the following critical points for retention:
- The Principle of Organization: The primary purpose of any data structure is to organize data in a manner that facilitates efficient access and modification. The choice of data structure is always a trade-off, primarily between the time complexity of operations and the space complexity of storage.
- Linear vs. Non-linear Structures: We have established a clear distinction between linear structures (Linked Lists, Stacks, Queues), where elements have a sequential relationship, and non-linear structures (Trees, Graphs), where elements can have multiple predecessor and successor relationships. Hash Tables represent a distinct category, organizing data by key-value pairs through a hash function.
- Asymptotic Performance is Key: For GATE, understanding the average-case and worst-case time complexities of core operations is paramount. We have seen that Stacks and Queues offer performance for their primary operations. Balanced Binary Search Trees provide reliable for search, insertion, and deletion, whereas Hash Tables offer an average-case performance of at the cost of a potential worst case due to collisions.
- Abstract Data Types (ADTs) vs. Implementations: It is crucial to distinguish between an ADT and its underlying implementation. A Stack (LIFO) or a Queue (FIFO) is an abstract concept; it can be implemented using either arrays or linked lists, each with its own advantages regarding memory allocation and overflow handling.
- The Role of Pointers and Recursion: The implementation and manipulation of dynamic structures like Linked Lists and Trees are intrinsically linked to the concepts of pointers and memory allocation. Furthermore, many tree operations, such as traversals, are most elegantly expressed and understood through recursion, which itself implicitly uses a stack (the call stack).
- Collision Resolution in Hashing: A hash table's performance is dominated by its collision resolution strategy. We have examined the two primary methods: Separate Chaining (using linked lists) and Open Addressing (probing for the next available slot). Understanding the mechanisms and performance implications of each is a frequent subject of examination.
---
Chapter Review Questions
:::question type="MCQ" question="A process involves inserting a sequence of distinct keys into an initially empty Binary Search Tree (BST). Subsequently, the elements are dequeued one by one from a queue that was populated by a level-order traversal of the final BST. If the dequeued sequence is 25, 15, 40, 10, 20, 35, 50, which of the following insertion sequences could have produced this result?" options=["25, 15, 10, 20, 40, 35, 50", "25, 40, 50, 35, 15, 20, 10", "25, 15, 40, 10, 35, 20, 50", "25, 40, 15, 10, 20, 35, 50"] answer="25, 15, 10, 20, 40, 35, 50" hint="The first element of a level-order traversal is always the root of the tree. Use this fact to determine the root from the dequeued sequence. Then, for each subsequent key in a potential insertion sequence, verify that it adheres to the BST property relative to the nodes already in the tree." solution="
The problem requires us to work backward from a level-order traversal to identify a valid insertion sequence for a Binary Search Tree (BST).
- Level 0: 25
- Level 1: 15, 40 (15 must be the left child of 25, and 40 must be the right child)
- Level 2: 10, 20, 35, 50 (10 and 20 are children of 15; 35 and 50 are children of 40)
- Specifically, 10 < 15, so 10 is the left child of 15. 20 > 15, so 20 is the right child of 15.
- Similarly, 35 < 40, so 35 is the left child of 40. 50 > 40, so 50 is the right child of 40.
- Option A: 25, 15, 10, 20, 40, 35, 50
- `insert(25)`: root is 25.
- `insert(15)`: , goes left. Tree: 25(L:15)
- `insert(10)`: , goes left. Tree: 25(L:15(L:10))
- `insert(20)`: , goes right. Tree: 25(L:15(L:10, R:20))
- `insert(40)`: , goes right. Tree: 25(L:..., R:40)
- `insert(35)`: , goes left. Tree: 25(L:..., R:40(L:35))
- `insert(50)`: , goes right. Tree: 25(L:..., R:40(L:35, R:50))
This sequence correctly constructs the tree inferred from the level-order traversal.
- Option C: 25, 15, 40, 10, 35, 20, 50
- This sequence also produces the exact same final tree.
- Option D: 25, 40, 15, 10, 20, 35, 50
- This sequence also produces the exact same final tree.
Multiple insertion sequences can lead to the same BST. However, in a single-choice question, we select the most direct or commonly expected one. Option A follows a natural left-to-right, level-by-level insertion pattern for the children, which is often implied.
Answer: \boxed{25, 15, 10, 20, 40, 35, 50}
"
:::
:::question type="NAT" question="A hash table of size 11 is used to store integer keys, with the hash function . Collisions are resolved using linear probing. The keys 23, 45, 12, 34, 56, 24, 80 are inserted in this exact order into the initially empty table. What is the index (or slot) where the final key, 80, is stored?" answer="7" hint="Calculate the hash value for each key sequentially. If the target slot is occupied, probe the next slot () cyclically until an empty slot is found. Keep track of the state of the hash table after each insertion." solution="
We simulate the insertions step-by-step into the hash table of size 11 (indices 0-10). The hash function is and collision resolution is linear probing.
At this point, the table is: `[empty, 23, 45, 12, 34, 56, empty, empty, empty, empty, empty]`
The table is now: `[empty, 23, 45, 12, 34, 56, 24, empty, empty, empty, empty]`
Answer: \boxed{7}
"
:::
:::question type="MCQ" question="Consider a queue implemented using a singly linked list. The queue holds pointers `front` and `rear`. Which of the following represents the most efficient time complexities for the ENQUEUE and DEQUEUE operations, respectively, under the standard implementation?" options=[", ", ", ", ", ", ", "] answer=", " hint="Think about where new elements are added and where they are removed in a queue. How can the `front` and `rear` pointers be maintained to make these operations independent of the number of elements?" solution="
Let us analyze the implementation of a queue using a singly linked list. A queue follows a First-In, First-Out (FIFO) discipline.
- ENQUEUE: This operation adds a new element to the end (rear) of the queue.
- DEQUEUE: This operation removes an element from the start (front) of the queue.
By maintaining both `front` and `rear` pointers, we can perform both ENQUEUE (at the rear) and DEQUEUE (from the front) in constant time. Therefore, the respective complexities are and .
Answer: \boxed{, }
"
:::
---
What's Next?
Having completed this chapter on Basic Data Structures, you have established a firm foundation for a significant portion of the Computer Science and Information Technology syllabus. The concepts of organizing data for efficiency are not isolated; they are the fundamental building blocks for more complex topics.
Connections to Your Learning:
- Previous Chapters: Your understanding of programming fundamentals, particularly pointers, dynamic memory allocation, and recursion, has been directly applied in the implementation and analysis of linked lists and trees. The mathematical concepts of logarithms and summations are essential for the complexity analysis we have performed.
- Future Chapters: The mastery of these basic structures is a prerequisite for advancing in your preparation.
We will build upon this foundation immediately in the next chapter as we explore the design and analysis of algorithms that operate upon these very structures.