Number Representations

Overview

While our previous discussions have centered on the representation of integers, the computational demands of modern science and engineering necessitate a system capable of handling numbers of vastly different magnitudes, including fractional values. The limitations of fixed-point arithmetic, with its static allocation of bits for integer and fractional parts, render it inadequate for applications requiring a wide dynamic range. In this chapter, we shall investigate the principles and standards governing the representation of real numbers in digital systems, a cornerstone of numerical computation.

We will systematically deconstruct the architecture of floating-point numbers, a representation analogous to scientific notation in the binary domain. This method partitions a bit string into three critical fields: the sign, the exponent, and the mantissa (or significand). By manipulating the exponent, the radix point can effectively "float," allowing the system to express both minuscule fractions and exceptionally large quantities with a finite and fixed number of bits. Our focus will be on the universally adopted IEEE 754 standard, which provides a robust and consistent framework for floating-point arithmetic across diverse computing platforms.

For the GATE examination, a deep and functional understanding of number representations is not merely academic but a practical necessity. Examiners frequently formulate problems that test the candidate's ability to convert numbers between decimal and IEEE 754 formats, to interpret the bit patterns of special values, and to analyze the precision and range limitations inherent in the system. A command of these concepts is therefore essential for success in questions spanning Digital Logic, Computer Organization, and Architecture.

---

Chapter Contents

| # | Topic | What You'll Learn |
|---|-------|-------------------|
| 1 | Floating-Point Representation | IEEE 754 standard for representing real numbers. |

---

Learning Objectives

❗ By the End of This Chapter

After completing this chapter, you will be able to:

Describe the structure of a floating-point number, including the sign, biased exponent, and mantissa fields.

Represent a given decimal number in the IEEE 754 single-precision (32-bit) and double-precision (64-bit) formats.

Convert binary representations in IEEE 754 format back into their decimal floating-point equivalents.

Analyze the representation of special values such as $\pm 0$ , $\pm \infty$ , and NaN (Not a Number), and understand the concepts of normalization and precision.

---

We now turn our attention to Floating-Point Representation...

Part 1: Floating-Point Representation

Introduction

In the domain of digital computation, representing real numbers with both fractional parts and a wide dynamic range is a fundamental challenge. While fixed-point representations are suitable for applications where the range of values is well-defined and limited, they are inadequate for scientific and engineering computations that involve very large or very small quantities. Floating-point representation addresses this limitation by encoding a number in a form analogous to scientific notation, separating its magnitude (as a significand) and its scale (as an exponent).

This chapter provides a comprehensive treatment of floating-point representation as specified by the IEEE 754 standard, which is the ubiquitous convention used in modern computing hardware. A thorough understanding of this standard is indispensable for GATE, as it forms the basis for questions involving number representation, conversion, and arithmetic. We shall explore the structure of these representations, the process of converting between decimal and floating-point formats, and the special values that handle exceptional cases such as infinity and undefined results.

📖 Floating-Point Number

A floating-point number is a representation of a real number in the form:

V = (-1)^S \times M \times B^E

where

S

is the sign,

M

is the significand (or mantissa),

B

is the base (typically 2 in modern computers), and

E

is the exponent. The IEEE 754 standard specifies precise formats for encoding

S

M

, and

E

in binary.

---

Key Concepts

The most common format encountered in GATE is the IEEE 754 single-precision (32-bit) standard. We will focus our discussion on this format.

S
1 bit

Exponent (E)
8 bits

Fraction / Mantissa (F)
23 bits
Bit 31
30..23
22..0

The 32 bits are allocated as follows:

Sign (S): 1 bit. $0$ for positive, $1$ for negative.

Exponent (E): 8 bits. Stored in a biased format.

Fraction (F): 23 bits. Represents the fractional part of the significand.

1. The Biased Exponent

The 8-bit exponent field does not directly represent the exponent. Instead, it stores a value from which a bias must be subtracted to obtain the true exponent. This allows the representation of both positive and negative exponents without needing a separate sign bit for the exponent itself.

For single-precision, the bias is $127$ . The 8-bit exponent field, $E$ , can represent unsigned integers from $0$ to $255$ .

The true exponent, $e$ , is calculated as:

e = E - 127

The values $E=0$ and $E=255$ are reserved for special cases. For normal numbers, $E$ ranges from $1$ to $254$ , which corresponds to a true exponent range of $-126$ to $+127$ .

2. Normalized Representation

Most floating-point numbers are stored in a normalized format. In binary scientific notation, any non-zero number can be written with a single '1' before the binary point (e.g., $1.xxxx \times 2^y$ ). Since this leading '1' is always present for normalized numbers, it does not need to be stored explicitly. This is known as the implicit or hidden leading bit.

The significand, $M$ , is therefore $1.F$ , where $F$ is the 23-bit fraction field. This provides an extra bit of precision.

📐 Value of a Normalized Number

V = (-1)^S \times (1.F)_2 \times 2^{(E-127)}

Variables:

$S$ : The sign bit (0 or 1).

$F$ : The 23-bit fraction field, interpreted as a binary fraction.

$E$ : The 8-bit unsigned integer in the exponent field, where $1 \le E \le 254$ .

When to use: For converting an IEEE 754 single-precision binary representation to its decimal value, when the exponent field

E

is not all 0s or all 1s.

Worked Example 1: IEEE 754 to Decimal

Problem: Convert the IEEE 754 single-precision number represented by the hexadecimal value `0xC1A00000` to its decimal equivalent.

Solution:

Step 1: Convert the hexadecimal representation to its 32-bit binary equivalent.

`C` = `1100`, `1` = `0001`, `A` = `1010`, `0` = `0000`

0xC1A00000 = 1100\;0001\;1010\;0000\;0000\;0000\;0000\;0000_2

Step 2: Parse the binary string into its S, E, and F components.

Sign (S): The first bit is $1$ .
Exponent (E): The next 8 bits are $1000\;0011$ .
Fraction (F): The remaining 23 bits are $010\;0000\;0000\;0000\;0000\;0000$ .

S = 1

E = 10000011_2

F = 010\;0000\;0000\;0000\;0000\;0000

Step 3: Calculate the decimal value of the exponent field $E$ and the true exponent $e$ .

E = 1 \times 2^7 + 0 \times 2^6 + \dots + 1 \times 2^1 + 1 \times 2^0 = 128 + 2 + 1 = 131

The true exponent is $e = E - \text{bias}$ .

e = 131 - 127 = 4

Step 4: Construct the significand $(1.F)_2$ and convert it to decimal.

The significand is $1.F = 1.0100..._2$ .

(1.01)_2 = 1 \times 2^0 + 0 \times 2^{-1} + 1 \times 2^{-2} = 1 + 0.25 = 1.25

Step 5: Assemble the final value using the formula $V = (-1)^S \times (1.F)_2 \times 2^e$ .

V = (-1)^1 \times 1.25 \times 2^4

V = -1 \times 1.25 \times 16

V = -20.0

Answer: $\boxed{-20.0}$

---

3. Special Values and Denormalized Numbers

The reserved exponent values $E=0$ ( $(00000000_2)$ ) and $E=255$ ( $(11111111_2)$ ) are used to represent special quantities.

| Case | Exponent (E) | Fraction (F) | Represents | Value |
| :--- | :--- | :--- | :--- | :--- |
| Zero | 0 | 0 | Zero | $\pm 0$ |
| Denormalized | 0 | $\neq 0$ | Very small numbers | $(-1)^S \times (0.F)_2 \times 2^{-126}$ |
| Normalized | 1 to 254 | Any | Normal numbers | $(-1)^S \times (1.F)_2 \times 2^{(E-127)}$ |
| Infinity | 255 | 0 | Infinity | $\pm \infty$ |
| NaN | 255 | $\neq 0$ | Not a Number | NaN |

Denormalized numbers (or subnormal numbers) fill the gap between the smallest normalized number and zero. They use a modified formula where the implicit leading bit is 0, and the exponent is fixed at $-126$ . This allows for gradual underflow.

Infinity is used to represent results of operations like division by zero.

NaN (Not a Number) represents the result of invalid operations, such as $0/0$ or $\sqrt{-1}$ .

❗ Must Remember

The smallest positive normalized number has $E=1$ and $F=0$ .
The value is

(1.0)_2 \times 2^{(1-127)} = 1.0 \times 2^{-126}

The largest positive normalized number has $E=254$ and $F$ consisting of all 1s.
The value is approx.

(2 - 2^{-23}) \times 2^{(254-127)} \approx 2 \times 2^{127} \approx 3.4 \times 10^{38}

---

Problem-Solving Strategies

For GATE questions, particularly those involving hexadecimal representations, speed and accuracy are paramount.

💡 GATE Strategy: Hex to S-E-F Parsing

A 32-bit number is an 8-digit hexadecimal string. Let the hex string be $H_7 H_6 H_5 H_4 H_3 H_2 H_1 H_0$ .

Convert the first two hex digits ( $H_7 H_6$ ) to binary. This 8-bit pattern gives you the sign bit and the first 7 bits of the exponent.

S

H_7

Convert the third hex digit ( $H_5$ ) to binary. The MSB of this 4-bit pattern is the last bit of the exponent. The other 3 bits are the start of the fraction.

The remaining hex digits ( $H_4$ to $H_0$ ) directly form the rest of the fraction.

Example: `0xC1A00000`

$H_7 H_6 = \text{C1}_{16} = 1100\;0001_2$ .

$S=1$ . The first 7 bits of $E$ are $100\;0001$ .

$H_5 = \text{A}_{16} = 1010_2$ .

The MSB `1` is the last bit of $E$ . So, $E = 1000\;0011_2$ .

The remaining `010` are the start of $F$ .

Thus, $F = 010..._2$ . This is much faster than writing out all 32 bits.

💡 Comparing Floating-Point Numbers

To compare two positive floating-point numbers, you can often avoid full decimal conversion.

Compare their exponent fields ( $E$ ). The number with the larger exponent field is larger.

If the exponents are equal, compare their fraction fields ( $F$ ). The number with the larger fraction field is larger.

This works because the binary representations are ordered lexicographically, just like integers, for positive numbers. For negative numbers, the reverse is true.

---

Common Mistakes

⚠️ Avoid These Errors

❌ Forgetting the implicit '1.': A common error is to calculate the value using $(0.F)_2$ instead of $(1.F)_2$ for normalized numbers.

✅ Correct approach: Always remember that for

1 \le E \le 254

, the significand is

1.F

❌ Ignoring the bias: Calculating the value using $2^E$ instead of $2^{(E-127)}$ .

✅ Correct approach: Always subtract the bias (127 for single-precision) from the exponent field

E

to find the true exponent.

❌ Confusing denormalized and normalized formulas: Applying the normalized formula when $E=0$ .

✅ Correct approach: If

E=0

and

F \neq 0

, the number is denormalized. The value is

(-1)^S \times (0.F)_2 \times 2^{-126}

. The implicit bit is 0 and the exponent is fixed at -126.

❌ Errors in floating-point arithmetic: Adding exponents directly during addition.

✅ Correct approach: For addition/subtraction, the exponents must first be equalized by shifting the significand of the smaller number. For multiplication, the true exponents are added (

e_{new} = e_1 + e_2

), which means the biased exponents are handled as

E_{new} = E_1 + E_2 - \text{bias}

---

Practice Questions

:::question type="MCQ" question="A 32-bit single-precision IEEE 754 number is given by the hexadecimal representation `0x00000000`. What does this number represent?" options=["Smallest positive denormalized number","Positive zero","Smallest positive normalized number","NaN"] answer="Positive zero" hint="Analyze the exponent and fraction fields. What special case does $E=0$ and $F=0$ correspond to?" solution="
Step 1: Convert the hexadecimal representation to binary.

0x00000000 = 0000\;0000\;0000\;0000\;0000\;0000\;0000\;0000_2

Step 2: Parse the binary string into S, E, and F components.

Sign (S): 0

Exponent (E): 00000000

Fraction (F): 00000000000000000000000

Step 3: Identify the case based on the values of E and F.
According to the IEEE 754 standard, when the exponent field

E

is all zeros and the fraction field

F

is also all zeros, the number represents zero.

Step 4: Determine the sign.
Since the sign bit $S$ is 0, the number represents positive zero.

Result: The number represents positive zero.
"
:::

:::question type="NAT" question="A number is represented in IEEE 754 single-precision format with Sign = 1, Exponent = 10000010, and Fraction = 11000000000000000000000. The decimal value of this number is ______." answer="-14.0" hint="Use the formula $V = (-1)^S \times (1.F)_2 \times 2^{(E-127)}$ . First, calculate the decimal value of $E$ and the true exponent." solution="
Step 1: Identify the given components.

$S = 1$ (The number is negative).

$E = 10000010_2$

$F = 11000000000000000000000$

Step 2: Calculate the decimal value of the exponent field

E

E = 1 \cdot 2^7 + 1 \cdot 2^1 = 128 + 2 = 130

Step 3: Calculate the true exponent $e$ .

e = E - \text{bias} = 130 - 127 = 3

Step 4: Construct the significand $(1.F)_2$ and find its decimal value.

1.F = 1.11_2

(1.11)_2 = 1 \cdot 2^0 + 1 \cdot 2^{-1} + 1 \cdot 2^{-2} = 1 + 0.5 + 0.25 = 1.75

Step 5: Calculate the final decimal value.

V = (-1)^S \times (1.F) \times 2^e

V = (-1)^1 \times 1.75 \times 2^3

V = -1 \times 1.75 \times 8

V = -14.0

Result: The decimal value is -14.0.
"
:::

:::question type="MSQ" question="Two single-precision IEEE 754 floating-point numbers are given by $X = \text{0x41200000}$ and $Y = \text{0xC1200000}$ . Which of the following statements is/are correct?" options=[" $X = -Y$ "," $X + Y = 0$ "," $X = 10.0$ "," $Y = -10.0$ "] answer="A,B,C,D" hint="Convert both X and Y to their decimal representations. Observe the relationship between their binary patterns." solution="
Step 1: Analyze number $X = \text{0x41200000}$ .

Binary: $0100\;0001\;0010\;0000\;...\;0000_2$

$S = 0$ (Positive)

$E = 1000\;0010_2 = 130$ . True exponent $e = 130 - 127 = 3$ .

$F = 010\;0...0$ . Significand is $1.F = 1.01_2 = 1 + 1/4 = 1.25$ .

Value of $X = (+1) \times 1.25 \times 2^3 = 1.25 \times 8 = 10.0$ .

So, statement C is correct.

Step 2: Analyze number

Y = \text{0xC1200000}

Binary: $1100\;0001\;0010\;0000\;...\;0000_2$

$S = 1$ (Negative)

$E = 1000\;0010_2 = 130$ . True exponent $e = 130 - 127 = 3$ .

$F = 010\;0...0$ . Significand is $1.F = 1.01_2 = 1.25$ .

Value of $Y = (-1) \times 1.25 \times 2^3 = -1.25 \times 8 = -10.0$ .

So, statement D is correct.

Step 3: Evaluate the relationships between X and Y.

From Step 1 and 2, we have $X=10.0$ and $Y=-10.0$ .

It is clear that $X = -Y$ . Statement A is correct.

It also follows that $X + Y = 10.0 + (-10.0) = 0$ . Statement B is correct.

Result: All four statements A, B, C, and D are correct.
"
:::

:::question type="MCQ" question="What is the IEEE 754 single-precision representation of the decimal number $-6.5$ ?" options=["0xC0D00000","0x40D00000","0xC0B00000","0xC0E00000"] answer="0xC0D00000" hint="First, convert 6.5 to binary. Then, normalize it to the form $1.F \times 2^e$ . Finally, find S, E, and F and assemble the 32-bit pattern." solution="
Step 1: Convert the absolute value of the number (6.5) to binary.

Integer part: $6_{10} = 110_2$ .

Fractional part: $0.5_{10} = 1 \times 2^{-1} = 0.1_2$ .

So, $6.5_{10} = 110.1_2$ .

Step 2: Normalize the binary number.
To normalize, we move the binary point to be after the first '1'.

110.1_2 = 1.101_2 \times 2^2

Step 3: Determine S, e, E, and F.

The number is negative, so $S=1$ .

The true exponent is $e=2$ .

The biased exponent is $E = e + \text{bias} = 2 + 127 = 129$ .

In binary, $E = 129_{10} = 10000001_2$ .

The fraction part $F$ is the part of the significand after the binary point: $F = 101$ .

We must pad this to 23 bits: $F = 10100000000000000000000$ .

Step 4: Assemble the 32-bit representation.

S: 1

E: 10000001

F: 10100000000000000000000

Combined: $1\;10000001\;10100000000000000000000$

Step 5: Convert the binary representation to hexadecimal.
Group the bits into sets of four:

1100\;0000\;1101\;0000\;0000\;0000\;0000\;0000

C\;0\;D\;0\;0\;0\;0\;0

The hexadecimal representation is `0xC0D00000`.

Result: The correct option is `0xC0D00000`.
"
:::

---

Summary

❗ Key Takeaways for GATE

Master the Single-Precision Format: Know the 1-8-23 bit allocation for Sign, Exponent, and Fraction. The bias is always 127.

Memorize the Core Formula: The value of a normalized number is $V = (-1)^S \times (1.F)_2 \times 2^{(E-127)}$ . This is the most frequently used formula.

Recognize Special Cases: Be able to instantly identify Zero, Infinity, NaN, and Denormalized numbers based on the exponent field ( $E=0$ or $E=255$ ). This is crucial for eliminating options in MCQs.

Practice Hexadecimal Conversion: Many questions provide numbers in hexadecimal. Be swift in converting hex to binary and parsing it into S, E, and F fields.

---

What's Next?

💡 Continue Learning

This topic is a cornerstone of computer arithmetic and has strong connections to other areas of the GATE syllabus.

Computer Arithmetic: Floating-point representation is the foundation for understanding floating-point addition, subtraction, multiplication, and division algorithms, and the hardware that implements them.
Computer Organization and Architecture: Understanding how floating-point numbers are stored in registers and processed by the Floating-Point Unit (FPU) is essential. This knowledge is relevant to topics like instruction sets and pipelining.

Mastering these connections will provide a more holistic understanding of how computers handle numerical data, a skill highly valued in the GATE examination.

---

Chapter Summary

📖 Number Representations - Key Takeaways

In our study of number representations, we have progressed from simple integer schemes to the more complex and powerful floating-point formats. For the GATE examination, a thorough command of the following principles is essential.

The Fundamental Trade-off: We have established that fixed-point and floating-point representations embody a fundamental trade-off. While fixed-point offers uniform precision, its range is limited. Floating-point, conversely, provides a vast dynamic range at the cost of variable precision, where the gap between representable numbers increases with their magnitude.

IEEE 754 Standard Structure: The IEEE 754 standard is the cornerstone of modern floating-point arithmetic. Its structure, comprising a sign bit ( $S$ ), a biased exponent ( $E$ ), and a fractional mantissa ( $F$ ), must be thoroughly understood. The value of a normalized number is given by $V = (-1)^S \times (1.F) \times 2^{(E - \text{Bias})}$ .

Biased Exponent: The use of a biased exponent is a critical design choice. It allows for the representation of both very large and very small magnitudes while enabling efficient comparison of exponents, as they can be treated as unsigned integers. The bias for single-precision is 127, and for double-precision, it is 1023.

Normalization and the Implicit Bit: To maximize precision, floating-point numbers are typically stored in a normalized form, which mandates a single non-zero digit to the left of the radix point. In the binary system of IEEE 754, this leading digit is always '1'. Storing this bit is redundant; hence, it is made implicit, effectively granting an extra bit of precision to the mantissa.

Special Values: The IEEE 754 standard reserves specific exponent patterns to represent special values. An exponent of all zeros signifies either zero (if the mantissa is also all zeros) or a denormalized number. An exponent of all ones represents either infinity (if the mantissa is all zeros) or Not-a-Number (NaN).

Conversion Proficiency: Fluency in converting between decimal values and their IEEE 754 single-precision (32-bit) and double-precision (64-bit) binary or hexadecimal representations is a non-negotiable skill. This includes the process of normalization, bias calculation, and bit-pattern assembly.

Limitations and Consequences: We must remain cognizant of the inherent limitations of finite-precision arithmetic. Concepts such as machine epsilon, rounding errors (e.g., round-to-nearest, ties-to-even), overflow (exceeding the largest representable number), and underflow (becoming too small to represent) are frequent sources of error in numerical computations and are important topics for examination.

---

Chapter Review Questions

:::question type="MCQ" question="A 32-bit single-precision floating-point number is represented in hexadecimal as `0xC1700000`. What is the decimal value represented by this bit pattern?" options=["-15.0", "-14.0", "-30.0", "-1.75"] answer="A" hint="First, convert the hexadecimal representation to binary. Then, partition the bits into sign, exponent, and mantissa fields according to the IEEE 754 single-precision format. Remember to account for the exponent bias and the implicit leading '1' of the mantissa." solution="
The hexadecimal representation is $0xC1700000$ .

Step 1: Convert from Hexadecimal to Binary
We convert each hex digit to its 4-bit binary equivalent:

C $\rightarrow$ 1100

1 $\rightarrow$ 0001

7 $\rightarrow$ 0111

0 $\rightarrow$ 0000

The full 32-bit binary pattern is:

\underbrace{1}_{S} \underbrace{10000010}_{E} \underbrace{11100000000000000000000}_{F}

Step 2: Deconstruct the IEEE 754 Fields

Sign Bit (S): The first bit is $1$ , which indicates a negative number.

Exponent (E): The next 8 bits are $10000010_2$ . The decimal value is $2^7 + 2^1 = 128 + 2 = 130$ .

Mantissa (F): The remaining 23 bits are $11100...0$ .

Step 3: Calculate the Actual Exponent
For single-precision, the bias is 127.
The actual exponent is

E - \text{Bias} = 130 - 127 = 3

Step 4: Reconstruct the Value
The number is normalized, so the value is given by the formula $V = (-1)^S \times (1.F)_2 \times 2^{\text{Exponent}}$ .
The mantissa part $(1.F)_2$ is $1.111_2$ .

(1.111)_2 = 1 \times 2^0 + 1 \times 2^{-1} + 1 \times 2^{-2} + 1 \times 2^{-3} = 1 + 0.5 + 0.25 + 0.125 = 1.875

Now, we apply the exponent:

V = -1 \times 1.875 \times 2^3

V = -1.875 \times 8 = -15.0

Therefore, the correct decimal value is

-15.0

.
"
:::

:::question type="NAT" question="Consider a hypothetical 12-bit floating-point representation with 1 bit for the sign, 5 bits for the exponent using a bias of 15, and 6 bits for the mantissa. Calculate the total number of distinct, positive, normalized floating-point numbers that can be represented in this format." answer="1920" hint="The number of representable values depends on the number of possible combinations of the exponent and mantissa fields. Remember that certain exponent values are reserved for special cases (zero, denormalized, infinity, NaN) and are not used for normalized numbers." solution="
Step 1: Analyze the Format

Total bits = 12

Sign bits = 1 (We are only considering positive numbers, so this is fixed to 0).

Exponent bits = 5

Mantissa bits = 6

Step 2: Determine the Range of Valid Exponent Fields for Normalized Numbers
The exponent field has 5 bits, so it can represent decimal values from 0 to

2^5 - 1 = 31

.
In any IEEE-like standard, the exponent field of all 0s and all 1s are reserved.

Exponent `00000` (decimal 0) is reserved for zero and denormalized numbers.

Exponent `11111` (decimal 31) is reserved for Infinity and NaN.

The range of biased exponent values for normalized numbers is therefore from 1 to 30, inclusive.
The number of valid exponent patterns is

30 - 1 + 1 = 30

Step 3: Determine the Number of Mantissa Combinations
The mantissa field has 6 bits. For each valid exponent, any combination of these 6 bits represents a unique fractional part.
The number of possible mantissa patterns is $2^6 = 64$ .

Step 4: Calculate the Total Number of Normalized Positive Numbers
The total count is the product of the number of valid exponent patterns and the number of possible mantissa patterns.

\text{Total Numbers} = (\text{Number of Valid Exponents}) \times (\text{Number of Mantissa Patterns})

\text{Total Numbers} = 30 \times 64 = 1920

The final answer is 1920.
"
:::

:::question type="MSQ" question="Which of the following statements regarding the IEEE 754 single-precision floating-point standard is/are correct? (This is a Multiple Select Question)" options=["The bit pattern for +0.0 is identical to the bit pattern for -0.0.", "The gap between any two consecutive representable numbers is uniform across the entire number line.", "The number of representable values between $2^k$ and $2^{k+1}$ (for valid $k$ ) is constant.", "Denormalized numbers allow for a 'gradual underflow' by representing values smaller than the smallest normalized number."] answer="C,D" hint="Consider the structure of the IEEE 754 representation. How does the sign bit function? How does the exponent affect the spacing of numbers? What is the specific purpose of denormalized numbers?" solution="
Let us evaluate each statement:

A: The bit pattern for +0.0 is identical to the bit pattern for -0.0.

This statement is incorrect. The representation for +0.0 is `0 00000000 00000000000000000000000` (0x00000000), while the representation for -0.0 is `1 00000000 00000000000000000000000` (0x80000000). They differ in the sign bit.

B: The gap between any two consecutive representable numbers is uniform across the entire number line.

This statement is incorrect. This property is characteristic of fixed-point representations. In floating-point, the gap (or ulp, unit in the last place) is proportional to the magnitude of the number. The gap between

1.0

and the next number is

2^{-23}

, while the gap between

1024.0

(

2^{10}

) and the next number is

2^{10} \times 2^{-23} = 2^{-13}

, which is much larger.

C: The number of representable values between $2^k$ and $2^{k+1}$ (for valid $k$ ) is constant.

This statement is correct. For any given exponent

k

, the values are of the form

(1.F)_2 \times 2^k

. The mantissa field

F

has 23 bits, which allows for

2^{23}

different patterns. Therefore, for any valid exponent value (i.e., any power-of-2 interval), there are

2^{23}

representable numbers in that interval.

D: Denormalized numbers allow for a 'gradual underflow' by representing values smaller than the smallest normalized number.

This statement is correct. The smallest positive normalized number has an exponent field of `00000001` and a mantissa of all zeros, representing

1.0 \times 2^{1-127} = 2^{-126}

. Denormalized numbers (with an exponent field of `00000000` and a non-zero mantissa) fill the gap between

2^{-126}

and 0, preventing an abrupt drop to zero when a calculation result falls below the minimum normalized value.

Thus, the correct statements are C and D.
"
:::

:::question type="NAT" question="The decimal value $-90.0$ is represented in the IEEE 754 single-precision format. What is the decimal value of the 8-bit biased exponent field?" answer="133" hint="First, convert the absolute decimal value to binary. Then, normalize the binary number to the form $1.M \times 2^E$ . Finally, calculate the biased exponent using the formula $E_{biased} = E_{actual} + \text{Bias}$ ." solution="
Step 1: Convert the absolute decimal value to binary.
The integer part is $90$ .

90_{10} = 64 + 16 + 8 + 2 = 2^6 + 2^4 + 2^3 + 2^1 = 1011010_2

Since there is no fractional part,

90.0_{10} = 1011010.0_2

Step 2: Normalize the binary number.
We need to express the number in the form $1.M \times 2^E$ . We move the binary point 6 places to the left.

1011010.0_2 = 1.011010_2 \times 2^6

The actual exponent is

E_{actual} = 6

Step 3: Determine the sign bit and mantissa.
The number is negative, so the sign bit $S=1$ .
The mantissa $F$ consists of the bits to the right of the binary point in the normalized form: $011010$ .

Step 4: Calculate the biased exponent.
For the IEEE 754 single-precision format, the bias is 127.
The biased exponent is calculated as:

E_{biased} = E_{actual} + \text{Bias}

E_{biased} = 6 + 127 = 133

The decimal value of the exponent field is 133.

For completeness, the 8-bit binary representation of the biased exponent is $133_{10} = 128 + 4 + 1 = 10000101_2$ . The full 32-bit pattern for $-90.0$ would be:
`1 10000101 01101000000000000000000`.
"
:::

---

What's Next?

💡 Continue Your GATE Journey

Having completed our exploration of Number Representations, we have established a firm foundation in how data is encoded at the most fundamental level. These concepts are not isolated; rather, they are the essential prerequisites for understanding the hardware that manipulates this data.

Key Connections:

Relation to Previous Learning: This chapter builds directly upon your knowledge of basic number systems (binary, hexadecimal) and integer representations (such as 2's complement). Floating-point representation is the logical and necessary extension required to handle real numbers, which are ubiquitous in scientific and engineering computation.

Foundation for Digital Logic: The bit-level structures we have analyzed, particularly the IEEE 754 format, are directly implemented in hardware. Your understanding of how a number is partitioned into sign, exponent, and mantissa is crucial for the next chapters in Digital Logic, where you will study the design of arithmetic circuits. You will see how specialized logic is required to handle exponent addition, mantissa alignment, and normalization within an Arithmetic Logic Unit (ALU).

Bridge to Computer Organization and Architecture: The principles of floating-point arithmetic form a cornerstone of Computer Organization. The performance of a processor, especially for scientific workloads, is heavily dependent on its Floating-Point Unit (FPU). Understanding representation errors like overflow, underflow, and precision loss provides context for concepts like instruction set design, pipelining of arithmetic operations, and the architectural differences between CPUs and GPUs.

Number Representations

Number Representations

Overview

Chapter Contents

Learning Objectives

Part 1: Floating-Point Representation

Introduction

Key Concepts

1. The Biased Exponent

2. Normalized Representation

3. Special Values and Denormalized Numbers

Problem-Solving Strategies

Common Mistakes

Practice Questions

Summary

What's Next?

Chapter Summary

Chapter Review Questions

What's Next?

🎯 Key Points to Remember

Related Topics in Digital Logic

Combinational Circuits

Sequential Circuits

Boolean Algebra and Minimization

Computer Arithmetic

More Resources

Study Notes

Short Notes

Test Series

Mock Tests

Previous Year Papers

Chapter-wise PYQs

Chapter Practice

Why Choose MastersUp?

AI-Powered Plans

15,000+ Questions

Smart Analytics

Bookmark & Revise