File Systems

Overview

In our study of operating systems, we now transition from the transient world of main memory and process management to the persistent domain of secondary storage. The primary challenge in this area is to manage vast amounts of information in a manner that is both efficient for the system and convenient for the user. The fundamental abstraction that the operating system provides for this purpose is the file system, which imposes a logical structure upon the physical chaos of storage devices. This chapter introduces the foundational concepts upon which all modern storage management is built.

We shall explore how the operating system views a file not as a collection of physical blocks on a disk, but as a logical unit of information with specific attributes and permissible operations. We will then examine how these individual files are organized into a coherent structure using directories. The design of the directory structure is critical, as it directly impacts usability, efficiency, and the ability to share information among users.

For the GATE examination, a mastery of these core concepts is non-negotiable. Questions frequently test the understanding of file attributes, access methods, and the trade-offs between different directory structures. Furthermore, these principles serve as the essential prerequisite for understanding the more complex topics of file allocation methods and disk scheduling, which we will address in subsequent chapters. A firm grasp of the material presented here is therefore the first and most crucial step toward solving a significant class of problems in the Operating Systems section.

---

Chapter Contents

| # | Topic | What You'll Learn |
|---|-------|-------------------|
| 1 | File Concepts | File attributes, operations, and access methods |
| 2 | Directory Structure | Organizing files using various directory models |

---

Learning Objectives

❗ By the End of This Chapter

After completing this chapter, you will be able to:

Define the concept of a file and describe its fundamental attributes and operations.

Differentiate between sequential, direct, and other file access methods.

Explain the purpose of a directory system and the standard operations performed on directories.

Compare and contrast single-level, two-level, tree-structured, and acyclic-graph directory structures.

---

We now turn our attention to File Concepts...

Part 1: File Concepts

Introduction

In the logical organization of a file system, a file is viewed as a sequence of logical blocks. However, the physical storage medium, the disk, is a collection of physical blocks. The operating system is responsible for mapping these logical blocks onto the physical blocks of the disk. The specific method by which this mapping is accomplished is known as the file allocation method. The choice of allocation method has significant implications for system performance, storage efficiency, and the types of file access (sequential, direct) that can be supported effectively.

We will examine the three principal file allocation strategies: contiguous, linked, and indexed. Each method presents a different set of trade-offs between space utilization and performance. Understanding these trade-offs is critical for analyzing file system behavior, a common task in the GATE examination. Our focus will be on the mechanisms, advantages, disadvantages, and performance characteristics, particularly concerning disk I/O operations, for each of these methods.

📖 File Allocation Method

A file allocation method is an operating system mechanism that manages how disk blocks are allocated to files. It defines the data structures used to track the mapping of a file's logical blocks to the physical blocks on a secondary storage device.

---

Key Concepts

We shall now delve into the specifics of each major allocation strategy. For each, we will consider its structure, performance implications, and characteristic problems such as fragmentation.

1. Contiguous Allocation

The simplest allocation strategy is contiguous allocation, where a file is stored in a contiguous sequence of disk blocks. The directory entry for a file needs to store only two pieces of information: the address of the starting block and the total length of the file (in blocks).

Contiguous Allocation

0

1

2

3

4

5

6

7

Directory

File A
Start: 2, Len: 3

Performance and Characteristics:

* Access: This method excels at both sequential and direct (random) access. To access the $i^{th}$ logical block of a file starting at block $b$ , the physical block address is simply $b+i$ . This calculation is trivial and requires only a single disk seek to the starting block for sequential reads.
* Disadvantages: The primary issue is external fragmentation. As files are created and deleted, free space on the disk becomes broken into small, non-contiguous holes. A new file may not be allocated even if there is enough total free space, simply because no single hole is large enough. Furthermore, file growth is problematic; if a file needs to expand, a new, larger contiguous space must be found, and the entire file must be copied, which is a very expensive operation.

Worked Example: Disk Access Analysis

Problem: A file occupies 50 contiguous blocks on disk, starting at block 100. A new block needs to be inserted after the 20th block of the file. Assume sufficient free space exists immediately following the file. Calculate the total number of disk I/O operations required.

Solution:

Step 1: Analyze the initial state and the required operation.
The file occupies blocks 100 through 149. We need to insert a new block after block 119 (the 20th block). This means all subsequent blocks must be shifted by one position.

Step 2: Calculate the number of blocks to be shifted.
The blocks from the 21st to the 50th must be moved.
Number of blocks to shift = $50 - 20 = 30$ blocks.
These are blocks 120 through 149.

Step 3: Calculate the disk I/O for shifting the blocks.
To shift these 30 blocks, each must be read from its original location and written to its new location.

Read blocks 120 to 149: 30 disk reads.

Write these blocks to locations 121 to 150: 30 disk writes.

Total I/O for shifting =

30 + 30 = 60

operations.

Step 4: Account for writing the new block.
The new block of data (assumed to be in memory) must be written to the now-free position at block 120.

Write the new block: 1 disk write.

Step 5: Sum the total I/O operations.
Total I/O = (I/O for shifting) + (I/O for new block) =

60 + 1 = 61

Answer: 61 disk I/O operations are required.

---

2. Linked Allocation

Linked allocation overcomes the problem of external fragmentation by storing a file as a linked list of disk blocks, which may be scattered anywhere on the disk. The directory entry contains a pointer to the first block of the file. Each block, in turn, contains a pointer to the next block in the sequence. The last block's pointer is null.

Linked Allocation

B1

B2

B3

NULL

Directory

File B
Start: B1

Performance and Characteristics:

* Space Utilization: Linked allocation solves the external fragmentation problem. Any free block can be used to satisfy an allocation request. However, it introduces a form of internal fragmentation. A portion of every data block must be used to store the pointer to the next block, which reduces the space available for user data.
* Access: Sequential access is feasible, though potentially slow if blocks are widely scattered, as each block read requires a separate disk seek. Direct access is extremely inefficient. To find the $i^{th}$ block, one must traverse the list from the beginning, performing $i$ disk reads.
* Reliability: This scheme is not robust. If a pointer in the chain is lost or damaged, the remainder of the file becomes inaccessible.

📐 Usable Space in Linked Allocation

S_{usable} = S_{block} - S_{pointer}

Variables:

$S_{usable}$ = Space available for data in a block

$S_{block}$ = Total size of a disk block

$S_{pointer}$ = Size of the pointer field

When to use: When calculating the number of blocks required for a file or determining the amount of space wasted due to pointer overhead (a form of internal fragmentation).

Worked Example: Internal Fragmentation Calculation

Problem: A file of size 2 MB is stored on a disk with a block size of 8 KB. The file system uses linked allocation, where each block reserves 4 bytes for a pointer. Calculate the total space wasted due to this pointer overhead for the given file. Assume $1 \text{ KB} = 2^{10}$ bytes and $1 \text{ MB} = 2^{20}$ bytes.

Solution:

Step 1: Convert all sizes to a common unit (bytes).
File size = $2 \times 2^{20}$ bytes.
Block size = $8 \times 2^{10} = 2^3 \times 2^{10} = 2^{13}$ bytes.
Pointer size = 4 bytes.

Step 2: Calculate the usable data space per block.
Using the formula $S_{usable} = S_{block} - S_{pointer}$ :

S_{usable} = 8192 - 4 = 8188 \text{ bytes}

Step 3: Calculate the number of blocks required to store the file.
The number of blocks is the ceiling of the total file size divided by the usable space per block.

N_{blocks} = \lceil \frac{\text{File Size}}{S_{usable}} \rceil = \lceil \frac{2 \times 2^{20}}{8188} \rceil

N_{blocks} = \lceil \frac{2097152}{8188} \rceil = \lceil 256.12... \rceil = 257 \text{ blocks}

Step 4: Calculate the total space allocated to the file.

\text{Total Allocated Space} = N_{blocks} \times S_{block} = 257 \times 8192 = 2105344 \text{ bytes}

Step 5: Calculate the total space wasted.
The wasted space consists of two parts: the pointer overhead in all allocated blocks and the unused space in the last block.
Total pointer overhead = $N_{blocks} \times S_{pointer} = 257 \times 4 = 1028$ bytes.
Data in last block = File Size - (Data in first $N_{blocks}-1$ blocks)
Data in last block = $2097152 - (256 \times 8188) = 2097152 - 2096128 = 1024$ bytes.
Wasted space in last block = $S_{usable} - 1024 = 8188 - 1024 = 7164$ bytes.
Total wasted space = (Pointer overhead) + (Wasted space in last block)

\text{Total Wasted Space} = 1028 + 7164 = 8192 \text{ bytes}

Alternatively, a simpler way to see this is that the total wasted space is the total allocated space minus the actual file size.
Total Wasted Space = $2105344 - 2097152 = 8192$ bytes.

Answer: 8192 bytes.

---

3. Indexed Allocation

Indexed allocation resolves the direct access problem of linked allocation while retaining its solution to external fragmentation. It achieves this by collecting all the pointers for a file's data blocks into a single location: the index block. The directory entry for the file contains the address of this index block. The $i^{th}$ entry in the index block points to the $i^{th}$ block of the file.

Indexed Allocation

File C
Index Block: 19

Index Block (19)

ptr to 9

ptr to 16

ptr to 1
...

Block 1

Block 9

Block 16

Performance and Characteristics:

* Access: Indexed allocation supports direct access without suffering from external fragmentation. To access the $i^{th}$ block, we use the $i^{th}$ entry in the index block. This requires two disk accesses: one for the index block and one for the data block.
* Space Utilization: The main overhead is the space consumed by the index block itself. For a very small file (e.g., one or two blocks), an entire block is wasted on the index. Conversely, for very large files, a single index block may not be sufficient to hold all the pointers. This leads to more complex schemes like linked index blocks or multi-level indexing.
* Internal Fragmentation: Like the other methods, indexed allocation can suffer from internal fragmentation in the last data block if the file size is not an exact multiple of the block size.

---

Problem-Solving Strategies

A common category of GATE questions involves calculating the number of disk I/O operations for a given file operation under a specific allocation scheme. The key is to meticulously trace the reads and writes required.

💡 GATE Strategy: Analyzing Disk I/O

When asked to find the number of disk accesses for an operation (e.g., insertion, deletion, read), break it down into fundamental steps:

Locate: How many reads are needed to find the relevant block(s)?

Contiguous: 0 reads (address is calculated).

Linked:

Indexed:

Read Data (if needed): How many blocks must be read from their old locations? (e.g., for shifting data).

Write Data (if needed): How many blocks must be written to new locations?

Update Pointers/Metadata: How many blocks containing pointers or metadata (e.g., index blocks, preceding blocks in linked allocation) must be read, modified, and written back? Each such update typically costs 1 read + 1 write, but if the block is already in memory from a previous step, you only need to account for the write.

Assume the File Control Block (FCB) or directory entry is already in memory unless stated otherwise.

---

Common Mistakes

⚠️ Avoid These Errors

❌ Confusing Fragmentation Types: Do not mix up external and internal fragmentation. External fragmentation is about free space between files (contiguous allocation), while internal fragmentation is about wasted space within allocated blocks (linked and indexed allocation due to pointers, and in the last block of any file).
❌ Miscalculating Insertion I/O in Linked Allocation: To insert a block between block $i$ and block $i+1$ , you do not need to read block $i+1$ . You only need to:

1. Read block

i

to get its pointer. 2. Write the new block to a free location. 3. Write back the updated block

i

with its new pointer. ✅ This correctly totals to 3 disk accesses (1 read, 2 writes).

❌ Forgetting Pointer Overhead: When calculating the number of blocks for a file in linked allocation, always subtract the pointer size from the block size to get the usable data space per block.

✅ `Num_blocks = ceil(File_Size / (Block_Size - Pointer_Size))`

---

Practice Questions

:::question type="NAT" question="A file system uses linked allocation with a disk block size of 4096 bytes. Each block pointer is 8 bytes. A file of size 2,560,000 bytes is to be stored. What is the total amount of space (in bytes) that is wasted within the allocated blocks due to internal fragmentation (both from pointer overhead and in the last block)?" answer="4096" hint="First, find the usable space per block. Then, calculate the number of blocks needed. Finally, determine the difference between total allocated space and the actual file size." solution="
Step 1: Calculate the usable space per block.
The block size is 4096 bytes and the pointer size is 8 bytes.

S_{usable} = S_{block} - S_{pointer} = 4096 - 8 = 4088 \text{ bytes}

Step 2: Calculate the number of blocks required.
The file size is 2,560,000 bytes.

N_{blocks} = \lceil \frac{\text{File Size}}{S_{usable}} \rceil = \lceil \frac{2560000}{4088} \rceil

N_{blocks} = \lceil 626.22... \rceil = 627 \text{ blocks}

Step 3: Calculate the total space allocated to the file.

\text{Total Allocated Space} = N_{blocks} \times S_{block} = 627 \times 4096 = 2568192 \text{ bytes}

Step 4: Calculate the total wasted space.
This is the difference between the total allocated space and the actual file size.

\text{Wasted Space} = \text{Total Allocated Space} - \text{File Size}

\text{Wasted Space} = 2568192 - 2560000 = 8192 \text{ bytes}

Wait, I made a mistake in my own solution calculation. Let me re-verify.

Total data that can be stored in 626 full blocks = $626 \times 4088 = 2559088$ bytes.
Remaining data to be stored = $2560000 - 2559088 = 912$ bytes.
This remaining data will be stored in the 627th block.
Space wasted in the last block = Usable space - data stored = $4088 - 912 = 3176$ bytes.
Total space wasted due to pointer overhead = $627 \times 8 = 5016$ bytes.
Total wasted space = (wasted in last block) + (pointer overhead) = $3176 + 5016 = 8192$ bytes.

The simpler method is indeed:
Total Allocated Space = $627 \times 4096 = 2568192$ bytes.
File Size = $2560000$ bytes.
Wasted Space = $2568192 - 2560000 = 8192$ bytes.

Let me re-check the question to see if I can make it simpler. Let's adjust the numbers to get a cleaner answer.
Let file size be 2,555,000 bytes.
$N_{blocks} = \lceil \frac{2555000}{4088} \rceil = \lceil 624.99... \rceil = 625$ blocks.
Total Allocated = $625 \times 4096 = 2560000$ bytes.
Wasted = $2560000 - 2555000 = 5000$ bytes. This is a good NAT question.

Let's re-write the question with these numbers.

New Question: A file system uses linked allocation with a disk block size of 4096 bytes. Each block pointer is 8 bytes. A file of size 2,555,000 bytes is to be stored. What is the total amount of space (in bytes) that is wasted within the allocated blocks due to internal fragmentation (both from pointer overhead and in the last block)?

Solution:
Step 1: Calculate the usable space per block.

S_{usable} = 4096 - 8 = 4088 \text{ bytes}

Step 2: Calculate the number of blocks required.

N_{blocks} = \lceil \frac{2555000}{4088} \rceil = \lceil 624.99... \rceil = 625 \text{ blocks}

Step 3: Calculate the total space allocated.

\text{Total Allocated Space} = 625 \times 4096 = 2560000 \text{ bytes}

Step 4: Calculate the total wasted space.

\text{Wasted Space} = 2560000 - 2555000 = 5000 \text{ bytes}

Result: 5000
"
:::

:::question type="MCQ" question="Which of the following is a primary advantage of contiguous file allocation?" options=["It is free from external fragmentation.", "File size can be easily extended.", "It offers efficient direct and sequential access.", "It does not suffer from internal fragmentation."] answer="It offers efficient direct and sequential access." hint="Consider the physical layout of the file on the disk and how that affects access patterns." solution="
In contiguous allocation, all blocks of a file are stored in a physically contiguous sequence on the disk.

Sequential Access: To read the file sequentially, the disk head can move from one block to the next with minimal seek time, making it very efficient.

Direct Access: The address of any block $i$ can be calculated directly from the starting address, requiring only one seek to the desired block.

Option A is incorrect because contiguous allocation is the primary cause of external fragmentation.

Option B is incorrect as extending a file is difficult; it may require moving the entire file if adjacent space is not free.

Option D is incorrect as the last block of the file can have internal fragmentation if the file size is not a multiple of the block size.

Therefore, the principal advantage is its performance for both direct and sequential access.
"
:::

:::question type="NAT" question="A file consisting of 200 blocks is stored using contiguous allocation (System A) and linked allocation (System B). A new block is to be inserted after the 75th block. Let $n_A$ and $n_B$ be the number of disk I/O operations required for the insertion in System A and System B, respectively. Assume the new block's data is already in main memory and there is free space available. The value of $n_A + n_B$ is ______." answer="254" hint="For contiguous allocation, calculate the cost of shifting the subsequent blocks. For linked allocation, calculate the cost of updating the pointers." solution="
Calculation for System A (Contiguous Allocation):

Step 1: Identify blocks to be shifted.
To insert a block after the 75th block, all blocks from the 76th to the 200th must be shifted one position to the right.
Number of blocks to shift = $200 - 75 = 125$ blocks.

Step 2: Calculate I/O for shifting.
Each of the 125 blocks must be read and then written to its new location.
Disk Reads = 125.
Disk Writes = 125.

Step 3: Calculate I/O for the new block.
The new block must be written into the now-vacant 76th position.
Disk Writes = 1.

Step 4: Calculate $n_A$ .

n_A = (\text{Reads for shift}) + (\text{Writes for shift}) + (\text{Write for new block})

n_A = 125 + 125 + 1 = 251

Calculation for System B (Linked Allocation):

Step 1: Identify necessary pointer updates.
To insert a new block after the 75th block, we must change the pointer of the 75th block to point to the new block. The new block's pointer must then point to the original 76th block.

Step 2: Calculate the required I/O operations.

Read the 75th block: This is necessary to get its current pointer (which points to the 76th block) and to modify it. (1 Read)

Write the new block: The new block is written to a free location on the disk. Its pointer field is set to point to the original 76th block. (1 Write)

Write back the 75th block: The 75th block, now with its pointer updated to point to the new block, must be written back to the disk. (1 Write)

Step 3: Calculate $n_B$ .

n_B = 1 + 1 + 1 = 3

Final Calculation:

Step 1: Sum the values of $n_A$ and $n_B$ .

n_A + n_B = 251 + 3 = 254

Result: 254
"
:::

:::question type="MSQ" question="Which of the following statements are TRUE regarding indexed file allocation?" options=["It suffers from external fragmentation.", "It provides efficient direct (random) access to file blocks.", "A single file may require more than one disk block to store its pointers.", "It is less reliable than linked allocation because loss of the index block makes the entire file inaccessible."] answer="B,C,D" hint="Analyze each property of indexed allocation concerning fragmentation, access patterns, and overhead." solution="

A: It suffers from external fragmentation. This is FALSE. Like linked allocation, indexed allocation can use any free block on the disk, thus avoiding external fragmentation.

B: It provides efficient direct (random) access to file blocks. This is TRUE. To access the $i^{th}$ block, the system reads the index block and uses the $i^{th}$ pointer to go directly to the data block. This is much faster than the sequential traversal required in linked allocation.

C: A single file may require more than one disk block to store its pointers. This is TRUE. If a file is very large, the number of pointers to its data blocks may exceed the capacity of a single index block. In such cases, schemes like linking index blocks or multi-level indexing are used, requiring more than one block for pointers.

D: It is less reliable than linked allocation because loss of the index block makes the entire file inaccessible. This is TRUE. In linked allocation, losing a pointer to one block only makes the rest of the file inaccessible. In indexed allocation, if the single index block is lost or corrupted, all pointers to all data blocks of the file are lost, rendering the entire file useless. This represents a single point of failure.

"
:::

---

Summary

❗ Key Takeaways for GATE

Contiguous Allocation: Its key strength is performance (fast sequential and direct access). Its critical weakness is external fragmentation and difficulty with file growth.

Linked Allocation: It solves external fragmentation but introduces pointer overhead (internal fragmentation). It is efficient for sequential access but extremely inefficient for direct access.

Indexed Allocation: This is a hybrid approach that provides good direct access (unlike linked) and avoids external fragmentation (unlike contiguous). Its overhead is the index block itself, which can be a single point of failure.

I/O Cost Analysis: Be prepared to calculate the number of disk reads and writes for operations like insertion. For contiguous, this involves shifting blocks. For linked, it involves updating pointers.

---

What's Next?

💡 Continue Learning

This topic connects to:

Free Space Management: The methods for allocating blocks to files (contiguous, linked, indexed) are intrinsically tied to how the operating system keeps track of free blocks (e.g., using bit vectors, linked lists, or grouping).

File System Implementation: Understanding allocation methods is a prerequisite for studying entire file systems like FAT (File Allocation Table), which is a variation of linked allocation, and inode-based systems (like in UNIX/Linux), which use a sophisticated form of indexed allocation.

Master these connections for a comprehensive understanding of storage management in operating systems.

---

💡 Moving Forward

Now that you understand File Concepts, let's explore Directory Structure which builds on these concepts.

---

Part 2: Directory Structure

Introduction

In any modern operating system, the file system provides the mechanism for online storage and access to both data and programs. A critical component of the file system's architecture is its directory structure. The directory, in essence, is a symbolic data structure that contains information about a collection of files. It provides a mapping from human-readable file names to their corresponding entries on the storage device.

The choice of how a directory is implemented has profound implications for the performance, scalability, and efficiency of the entire file system. Operations such as file creation, deletion, searching, and renaming are all fundamentally dependent on the underlying data structure used to organize the directory. We shall explore the most common implementations, analyzing their respective advantages and disadvantages, with a particular focus on the performance characteristics relevant to competitive examinations like GATE.

📖 Directory

A directory is a file system cataloging structure which contains references to other computer files, and possibly other directories. On many systems, directories are treated as special files that can only be manipulated by the operating system. A directory entry typically associates a file name with its corresponding file control block (FCB) or inode, which contains the file's metadata (e.g., permissions, size, and pointers to data blocks).

---

Key Concepts

The implementation of a directory structure can be viewed as the selection of an appropriate data structure to manage the collection of file entries. We will examine the three principal methods: the linear list, the hash table, and the tree-based structure.

1. Linear List Implementation

The simplest method for implementing a directory is to use a linear list, such as an array or a linked list, of file names with pointers to their corresponding data blocks or metadata structures. When a new file is created, its entry is appended to the list.

Linear List Directory: /usr/bin

cat
Pointer to Inode 121

grep
Pointer to Inode 45

ls
Pointer to Inode 88

...

[Empty Slot]

Let us analyze the performance of fundamental directory operations under this scheme. Assume the directory contains $n$ files.

File Creation: To create a new file, we must first verify that no existing file has the same name. This uniqueness check is critical. Consequently, we must scan the entire list of $n$ entries from beginning to end. Only after confirming the name is not in use can we add the new entry at the end of the list or in the first available empty slot. This operation, therefore, necessarily* requires a full scan of the directory. The complexity is $O(n)$ .

* File Deletion: To delete a file, we must first search for its entry. This requires a linear search through the list. In the average case, we expect to search $n/2$ entries. In the worst case, we search all $n$ entries. Once found, the entry is marked as unused or removed. The search dominates the complexity, which is $O(n)$ .

File Search (Opening/Accessing): Similar to deletion, locating a file involves a linear search. The complexity is $O(n)$ . It is important to note that this operation does not necessarily* require a full scan; the search terminates as soon as the file is found.

* Listing the Directory: To list all files, we simply traverse the entire list from start to finish, which takes time proportional to the number of entries, $O(n)$ .

The primary disadvantage of the linear list implementation is its poor performance as the number of files in the directory grows. The linear search time makes it unsuitable for large directories.

2. Hash Table Implementation

To overcome the performance bottleneck of the linear list, a hash table can be employed. In this structure, a hash function is applied to the file name to compute an index into a hash table. This index points to a bucket, which is typically a linked list of directory entries that hash to the same value.

Hash Table Directory Implementation

hash(filename)

Hash Table

0

1

k
...

M-1

"grep" -> Inode 45

NULL

"cat" -> Inode 121

"man" -> Inode 92

"cat" -> hash("cat") % M = k

Let us re-evaluate the performance of operations with this structure.

* File Creation: To create a file, we first compute the hash of its name to identify the correct bucket. We then search only this bucket (a short linked list) to check for name collisions. If the hash table is well-designed and the hash function provides good distribution, the length of each bucket's chain will be small and constant on average. Thus, the check for uniqueness and the subsequent insertion take, on average, $O(1)$ time. This is a significant improvement over the linear list's $O(n)$ requirement.

* File Deletion & Search: Both operations are similarly efficient. We compute the hash of the file name, go directly to the corresponding bucket, and perform a short linear search on that bucket's chain. The average time complexity is $O(1)$ .

The main disadvantage of the hash table is its fixed size and the potential for performance degradation if many file names hash to the same bucket (collisions). However, for most general-purpose file systems, this implementation provides excellent average-case performance.

3. Tree-Based Implementation

For extremely large directories, such as those found in high-performance computing or large database systems, even a hash table may not be sufficient. In such cases, a balanced tree structure, typically a B-tree or a B+ tree, is used to store the directory entries. The entries are kept sorted by file name.

This structure guarantees that the time to perform any fundamental directory operation—creation, deletion, or search—is logarithmic with respect to the number of files in the directory. That is, the complexity is $O(\log n)$ . While the implementation is more complex than a linear list or hash table, its scalability and guaranteed worst-case performance make it ideal for demanding environments.

---

Problem-Solving Strategies

When faced with a GATE question concerning directory structures, the primary task is to identify the underlying data structure being described and then apply knowledge of that data structure's performance characteristics.

💡 GATE Strategy: Analyze the Operation's Core Requirement

For any given directory operation, first determine its logical necessity.

Creation: The core requirement is to guarantee uniqueness. Ask yourself: "To be 100% certain this file name is unique, what is the minimum number of existing entries I must check?" For a linear list, the answer is all of them. For a hash table, it is only the entries in one bucket.

Search/Deletion/Rename: The core requirement is to find the entry. Ask yourself: "What is the search algorithm for this data structure?" For a linear list, it is a linear scan. For a hash table, it is a hash lookup.

Distinguish "Worst-Case" vs. "Necessary": An operation like searching a linear list has a worst-case of a full scan (if the item is at the end or not present). However, it does not necessarily require a full scan (the item could be at the beginning). In contrast, checking for uniqueness upon creation in a linear list necessarily requires a full scan, as stopping early provides no guarantee of uniqueness.

---

Common Mistakes

A frequent point of confusion is the subtle but critical difference between the worst-case behavior of an algorithm and an operation that must, by its very nature, examine all elements.

⚠️ Avoid These Errors

❌ Confusing Worst-Case Search with Necessary Full Scan: Assuming that because searching a file in a linear list has a worst-case time of $O(n)$ , it always requires a full scan.

✅ Correct Approach: A search operation terminates as soon as the item is found. A uniqueness check for file creation, however, cannot terminate early; it must verify against every single entry to fulfill its requirement.

❌ Ignoring the Data Structure: Applying the performance characteristics of one implementation (e.g., a hash table) to a problem that explicitly specifies another (e.g., a linear list).

✅ Correct Approach: Always base your analysis on the data structure given in the problem statement. The performance of `create`, `delete`, and `search` varies dramatically between a linear list and a hash table.

---

Practice Questions

:::question type="MCQ" question="A file system uses a hash table with chaining to implement its directories. Each bucket in the hash table points to a linked list of file entries. Which of the following statements is TRUE regarding the creation of a new file in a directory with $n$ entries?" options=["The operation necessarily requires $O(n)$ time to check for name collisions.","The average-case time complexity for the operation is $O(1)$ .","The worst-case time complexity is always better than a linear list implementation.","The hash table must be resized every time a new file is added."] answer="The average-case time complexity for the operation is $O(1)$ ." hint="Consider how a hash table lookup works and what determines its efficiency. The key is the 'average case'." solution="
Step 1: Analyze the file creation process in a hash-based directory. To create a new file, we compute `hash(filename)` to find the correct bucket.

Step 2: We then traverse the linked list (chain) associated with that bucket to ensure no file with the same name already exists.

Step 3: In a well-distributed hash table, the length of these chains is expected to be a small constant, independent of the total number of files, $n$ . Therefore, the search and insertion take constant time on average. This makes the average-case time complexity $O(1)$ .

Step 4: Evaluate the other options.

$O(n)$ time is incorrect; we only search one bucket, not the entire set of $n$ files.

The worst-case complexity occurs if all $n$ files hash to the same bucket, making the hash table degenerate into a linear list. In this specific scenario, the worst-case is $O(n)$ , which is not better than a linear list.

Resizing the hash table is not done on every insertion but only when the load factor exceeds a certain threshold.

Result: The correct statement is that the average-case time complexity is

O(1)

.
"
:::

:::question type="NAT" question="A directory is implemented as a linear array of 1024 entries. Searching for a specific file name takes 2 microseconds per entry comparison. In the worst-case scenario, what is the time required, in microseconds, to successfully delete an existing file from this directory?" answer="2048" hint="Deletion requires two main actions: finding the file and then marking it as deleted. Consider the worst case for the 'find' operation." solution="
Step 1: The operation is file deletion. This consists of two sub-operations: searching for the file entry and then marking it as deleted. The time for marking is negligible compared to the search time.

Step 2: The implementation is a linear array of 1024 entries. The search is a linear scan.

Step 3: The question asks for the worst-case scenario for a successful deletion. The worst case for finding an existing file in a linear list occurs when the file is the very last entry in the list.

Step 4: To find the last entry, we must compare it with all 1024 entries.
Total comparisons = 1024.

Step 5: Calculate the total time.
Time = (Number of comparisons) × (Time per comparison)

Time = 1024 \times 2 \ \mu s

Time = 2048 \ \mu s

Result: The time required in the worst-case scenario is 2048 microseconds.
"
:::

:::question type="MSQ" question="Consider a directory that contains a very large number of files ( $n \to \infty$ ). Which of the following directory implementations would provide a time complexity of $O(\log n)$ or better for searching for a file? (Select all that apply)" options=["A linear list sorted by file name","A B+ Tree","A hash table with a good hash function","An unsorted linear list"] answer="A B+ Tree,A hash table with a good hash function" hint="Analyze the search time complexity for each data structure mentioned. 'Better' than $O(\log n)$ includes $O(1)$ ." solution="

A linear list sorted by file name: A sorted list can be searched using binary search, which has a time complexity of $O(\log n)$ . However, insertions and deletions are $O(n)$ , making it inefficient for dynamic directories. The question asks about search, for which it is $O(\log n)$ . Wait, the prompt says "or better". Is a sorted linear array $O(\log n)$ for search? Yes. But a sorted linear list is still $O(n)$ for search because you can't do random access. The option is ambiguous. Let's assume it implies an array. Still, the other options are clearer. Let's re-evaluate. The question is about search time. For a sorted array, search is $O(\log n)$ . Let's hold this.

A B+ Tree: B-Trees and B+ Trees are balanced tree structures specifically designed for storage systems. They guarantee that search, insertion, and deletion operations have a time complexity of $O(\log n)$ . This option is correct.

A hash table with a good hash function: A well-designed hash table provides an average-case time complexity of $O(1)$ for search operations. Since $O(1)$ is better than $O(\log n)$ , this option is correct.

An unsorted linear list: Searching an unsorted list requires a linear scan, which has a time complexity of $O(n)$ . This is not $O(\log n)$ or better.

Revisiting the sorted linear list: While search is $O(\log n)$ in a sorted array, file systems rarely use this because insertions and deletions are very expensive ( $O(n)$ to shift elements). Compared to B+ Tree and Hash Table, it's a less practical but theoretically valid answer for the search complexity part. However, GATE questions usually focus on the most standard and practical implementations. B+ Trees and Hash Tables are the standard answers for efficient directory lookups. The most correct and standard answers are B+ Tree and Hash Table. Let's stick with those as the intended answer.

Result:

A B+ Tree provides $O(\log n)$ search time.

A hash table provides $O(1)$ average-case search time, which is better than $O(\log n)$ .

Both are correct.
"
:::

:::question type="MCQ" question="In a file system that uses a linear list to implement directories, which operation's successful completion is most likely to be the fastest, on average?" options=["Creating a new file.","Renaming an existing file.","Searching for a file that is located at the end of the list.","Deleting a file that is located at the beginning of the list."] answer="Deleting a file that is located at the beginning of the list." hint="Analyze the number of comparisons required for each operation on average or in the specific case mentioned." solution="
Step 1: Analyze the complexity of each operation in a linear list directory with $n$ entries.

Step 2: Evaluate "Creating a new file". This necessarily requires scanning all $n$ entries to check for duplicates. The time is proportional to $n$ .

Step 3: Evaluate "Renaming an existing file". This requires finding the file first (average $n/2$ comparisons) and then modifying its entry. The complexity is dominated by the search, $O(n)$ .

Step 4: Evaluate "Searching for a file that is located at the end of the list". This is the worst-case search scenario, requiring $n$ comparisons. The time is proportional to $n$ .

Step 5: Evaluate "Deleting a file that is located at the beginning of the list". This requires finding the file and then marking it deleted. Since the file is at the very beginning, the search operation takes only 1 comparison. This is the best-case scenario for search/delete.

Step 6: Compare the scenarios. Deleting the first file requires a constant number of operations (find it in the first step), while all other options require, on average or by definition, a number of operations proportional to $n$ .

Result: Deleting a file located at the beginning of the list is the fastest operation.
"
:::

---

Summary

The choice of directory implementation is a classic trade-off between simplicity, performance, and scalability. Understanding these trade-offs is essential for analyzing file system behavior.

❗ Key Takeaways for GATE

Linear List: Simple to implement but inefficient for large directories. Search, Deletion, and Renaming are $O(n)$ . Critically, Creation is also $O(n)$ because it necessarily requires a full scan to ensure name uniqueness.

Hash Table: Offers excellent average-case performance ( $O(1)$ ) for all major operations (Create, Delete, Search). It is the most common implementation in modern general-purpose file systems. Its main weakness is the worst-case scenario where collisions degrade performance to $O(n)$ .

B-Tree / B+ Tree: Provides guaranteed logarithmic performance ( $O(\log n)$ ) for all operations, making it highly scalable and suitable for directories with a massive number of files. The implementation is the most complex of the three.

"Necessary" vs. "Worst-Case": Be precise. Searching for an item has a worst case of a full scan. Creating an item in a linear list has a necessary condition of a full scan.

---

What's Next?

💡 Continue Learning

A directory structure defines how files are organized logically. This concept is intrinsically linked to how the files' data is physically stored on disk.

File Allocation Methods: Explore how the pointers within a directory entry (e.g., in an inode) lead to the actual data blocks on the disk. Study Contiguous, Linked, and Indexed allocation methods to understand the complete path from file name to file data.
Disk Scheduling Algorithms: Once the file system determines which blocks to access (a decision influenced by the directory and allocation method), the OS must schedule the disk I/O requests. Understanding algorithms like FCFS, SSTF, SCAN, and C-SCAN is the next logical step.

Mastering these connections provides a comprehensive view of storage management in operating systems for GATE.

---

Chapter Summary

In this chapter, we have introduced the fundamental concepts of file systems, which form the logical interface between the user and the storage devices. We began by defining a file as an abstraction provided by the operating system to store information. Our discussion then proceeded to the various attributes, types, and operations that characterize a file. Subsequently, we explored the critical role of the directory in organizing these files. We analyzed several directory structures, from the rudimentary single-level system to the more sophisticated tree-structured and acyclic-graph models that are prevalent in modern operating systems. It is clear from our discussion that the choice of directory structure profoundly impacts the system's capabilities regarding naming, grouping, and sharing of files.

📖 File Systems - Key Takeaways

File as an Abstraction: A file is a named collection of related information, recorded on secondary storage, that is managed by the operating system as a single unit. Its essential attributes include name, identifier, type, location, size, protection, and time/date stamps.

Core File Operations: The operating system provides a set of system calls to manipulate files. The most fundamental operations are `create`, `write`, `read`, `reposition` (seek), `delete`, and `truncate`. These operations form the basic file system interface for all user applications.

Directory's Role: A directory is a symbolic data structure that contains information about a collection of files. Its primary function is to map human-readable file names to their corresponding file control blocks or internal identifiers.

Hierarchy in Directory Structures: The tree-structured directory is the most common and intuitive model. It establishes a hierarchy of directories and files, enabling logical organization and preventing naming conflicts. This structure necessitates the concepts of an absolute path (from the root) and a relative path (from the current directory).

Sharing and Links: Acyclic-graph directory structures extend the tree model to permit file and directory sharing. This is primarily achieved through links. Understanding the distinction between a hard link and a symbolic (soft) link is crucial.

Hard vs. Symbolic Links: A hard link creates another directory entry that points to the same underlying file metadata (inode). In contrast, a symbolic link is a special file that contains the path to the target file. Consequently, deleting the original file will invalidate a symbolic link, but a file's data is only deallocated when its reference count (number of hard links) drops to zero.

---

Chapter Review Questions

:::question type="MCQ" question="Consider a Unix-like file system with a tree-structured directory. To access the file located at `/usr/programs/compiler`, the operating system must first access the root directory. Assuming that no directory information is cached in memory and each directory is stored in a single disk block, what is the minimum number of disk I/O operations required to read the first block of the `compiler` file?" options=["2","3","4","5"] answer="C" hint="Think about the number of directories that must be traversed to locate the file's metadata, and then the final access to the file's data itself." solution="To resolve the path `/usr/programs/compiler`, the file system must perform a sequence of directory lookups. Each lookup requires reading a disk block corresponding to that directory.

Read the root directory (`/`): The first step is to read the root directory to find the entry for `usr`. This requires 1 disk I/O.

Read the `usr` directory: Using the information from the root, the system reads the `usr` directory block to find the entry for `programs`. This is the second disk I/O.

Read the `programs` directory: Using the information from `usr`, the system reads the `programs` directory block to find the entry for `compiler`. This entry contains the location information (e.g., inode number) for the file `compiler`. This is the third disk I/O.

Read the `compiler` file block: Finally, using the location information obtained in the previous step, the system can read the first data block of the `compiler` file itself. This is the fourth disk I/O.

The total number of I/O operations is the sum of accesses for each component in the path plus the final access to the file's data.

\text{Total I/O} = \text{I/O for root} + \text{I/O for usr} + \text{I/O for programs} + \text{I/O for compiler file}

\text{Total I/O} = 1 + 1 + 1 + 1 = 4

Therefore, a minimum of 4 disk I/O operations are required.
"
:::

:::question type="NAT" question="A file system uses a two-level directory structure. The Master File Directory (MFD) can accommodate up to 128 users. Each User File Directory (UFD) can hold a maximum of 64 file entries. A single directory entry, in either the MFD or a UFD, requires 32 bytes of storage. What is the total size, in kilobytes (KB), required to store all UFDs if every user utilizes their directory to its maximum capacity? (Assume 1 KB = 1024 bytes)" answer="256" hint="Calculate the total size for one fully utilized UFD, and then scale that for the total number of users supported by the MFD." solution="First, we determine the size of a single User File Directory (UFD) when it is at maximum capacity.

Maximum number of file entries per UFD = 64 entries
Size of each directory entry = 32 bytes

The size of one full UFD can be calculated as:

\text{Size per UFD} = (\text{Number of entries}) \times (\text{Size per entry})

\text{Size per UFD} = 64 \times 32 \text{ bytes} = 2048 \text{ bytes}

Next, we need to find the total size required for all UFDs. The system supports a maximum of 128 users, and each user has one UFD.

Total number of users (and thus UFDs) = 128

The total size for all UFDs is:

\text{Total UFD Size} = (\text{Number of UFDs}) \times (\text{Size per UFD})

\text{Total UFD Size} = 128 \times 2048 \text{ bytes} = 262144 \text{ bytes}

Finally, we must convert this size from bytes to kilobytes (KB), where

1 \text{ KB} = 1024 \text{ bytes}

\text{Total Size in KB} = \frac{262144 \text{ bytes}}{1024 \text{ bytes/KB}} = 256 \text{ KB}

The total size required is 256 KB. " :::

:::question type="MCQ" question="Which of the following statements accurately distinguishes between hard links and symbolic links in a Unix-like file system?" options=["A hard link can span across different file systems, whereas a symbolic link cannot.","Deleting the original file makes a hard link invalid, but a symbolic link remains valid.","A hard link is a directory entry pointing to the file's inode, while a symbolic link is a file containing the path to the original file.","A file's reference count in its inode is incremented when a new symbolic link is created for it."] answer="C" hint="Recall the fundamental implementation difference. One is a pointer to the metadata, and the other is a pointer to a name." solution="Let us analyze each option:

A: This is incorrect. Hard links operate at the level of inodes, which are unique only within a single file system (partition). Therefore, hard links cannot span across different file systems. Symbolic links, which store a path name, can point to files on any mounted file system.

B: This is incorrect. Deleting the original file name (or any hard link to it) simply decrements the reference count in the inode. The file data is not removed until the count becomes zero. Thus, other hard links remain valid. Conversely, deleting the target file makes a symbolic link "dangling" or invalid, as the path it contains no longer points to an existing file.

C: This is correct. A hard link is essentially a new name (directory entry) that points directly to the same inode as the original file. The inode contains all metadata about the file, including the location of its data blocks. A symbolic link is a distinct file of a special type whose content is the string representing the path to the target file.

D: This is incorrect. The reference count (or link count) in an inode tracks the number of hard links pointing to it. Creating a symbolic link does not affect the target file's inode or its reference count.

"
:::

:::question type="NAT" question="In a file system that supports hard links, a file named `report.txt` is created. Three additional hard links are then created for this file, named `final_report`, `draft_v3`, and `/home/guest/report_link`. Subsequently, the original file `report.txt` and the link `draft_v3` are deleted. How many hard links to the file's data must still be deleted before the disk blocks allocated to the file are freed?" answer="2" hint="Focus on the reference count stored in the file's inode. A file's data is deallocated only when this count reaches zero." solution="The reference count in a file's inode tracks the number of directory entries (hard links) pointing to it. The file's data blocks are deallocated only when this count becomes 0. Let's trace the value of the reference count.

File creation: When `report.txt` is created, it has one directory entry.

- Reference Count = 1.

Creating three hard links: Three more hard links (`final_report`, `draft_v3`, `/home/guest/report_link`) are created. Each creation increments the reference count by 1.

- Reference Count =

1 + 3 = 4

Deleting `report.txt`: The `rm` command on a file name deletes the directory entry and decrements the inode's reference count.

- Reference Count =

4 - 1 = 3

Deleting `draft_v3`: Similarly, deleting this hard link removes another directory entry and decrements the count.

- Reference Count =

3 - 1 = 2

At this point, the reference count is 2. The two remaining hard links are `final_report` and `/home/guest/report_link`. To free the disk blocks, the reference count must be reduced to 0. Therefore, both of these remaining links must be deleted.

The number of links that still need to be deleted is 2.
"
:::

---

What's Next?

💡 Continue Your GATE Journey

Having completed File Systems, you have established a firm foundation for understanding how the operating system presents a logical view of storage to users and applications. This knowledge is a critical prerequisite for several advanced topics.

Key connections:

Relation to Previous Learning: Our study of Process Management showed that processes often begin by loading an executable file from disk. The file system concepts we have just covered describe the logical structures used to locate and access that executable. Furthermore, the Memory Management unit's concept of a buffer cache is directly applicable here, as it is used to hold recently accessed file data in main memory to accelerate I/O.

Building Blocks for Future Chapters: The concepts from this chapter are foundational for the following topics:

- File System Implementation: We have discussed what files and directories are. The next logical step is to understand how they are implemented on a physical disk. We will explore various allocation methods (Contiguous, Linked, and Indexed) and free-space management techniques.
- Disk Scheduling: Since files reside on secondary storage devices like hard disks, understanding how the OS efficiently services a queue of I/O requests is crucial for performance. This chapter provides the context for why those I/O requests are generated.
- I/O Systems and Protection: The file system is a major component of the OS I/O subsystem. We will later delve into the hardware and software layers of I/O. The protection attributes of a file, which we briefly touched upon, will be explored in greater depth in the context of system security and access control mechanisms.

File Systems

File Systems

Overview

Chapter Contents

Learning Objectives

Part 1: File Concepts

Introduction

Key Concepts

1. Contiguous Allocation

2. Linked Allocation

3. Indexed Allocation

Problem-Solving Strategies

Common Mistakes

Practice Questions

Summary

What's Next?

Part 2: Directory Structure

Introduction

Key Concepts

1. Linear List Implementation

2. Hash Table Implementation

3. Tree-Based Implementation

Problem-Solving Strategies

Common Mistakes

Practice Questions

Summary

What's Next?

Chapter Summary

Chapter Review Questions

What's Next?

🎯 Key Points to Remember

Related Topics in Operating System

Concurrency and Synchronization

Deadlock

Processes, Threads, and System Calls

CPU and I/O Scheduling

More Resources

Study Notes

Short Notes

Test Series

Mock Tests

Previous Year Papers

Chapter-wise PYQs

Chapter Practice

Why Choose MastersUp?

AI-Powered Plans

15,000+ Questions

Smart Analytics

Bookmark & Revise