Number Representations
Overview
While our previous discussions have centered on the representation of integers, the computational demands of modern science and engineering necessitate a system capable of handling numbers of vastly different magnitudes, including fractional values. The limitations of fixed-point arithmetic, with its static allocation of bits for integer and fractional parts, render it inadequate for applications requiring a wide dynamic range. In this chapter, we shall investigate the principles and standards governing the representation of real numbers in digital systems, a cornerstone of numerical computation.
We will systematically deconstruct the architecture of floating-point numbers, a representation analogous to scientific notation in the binary domain. This method partitions a bit string into three critical fields: the sign, the exponent, and the mantissa (or significand). By manipulating the exponent, the radix point can effectively "float," allowing the system to express both minuscule fractions and exceptionally large quantities with a finite and fixed number of bits. Our focus will be on the universally adopted IEEE 754 standard, which provides a robust and consistent framework for floating-point arithmetic across diverse computing platforms.
For the GATE examination, a deep and functional understanding of number representations is not merely academic but a practical necessity. Examiners frequently formulate problems that test the candidate's ability to convert numbers between decimal and IEEE 754 formats, to interpret the bit patterns of special values, and to analyze the precision and range limitations inherent in the system. A command of these concepts is therefore essential for success in questions spanning Digital Logic, Computer Organization, and Architecture.
---
Chapter Contents
| # | Topic | What You'll Learn |
|---|-------|-------------------|
| 1 | Floating-Point Representation | IEEE 754 standard for representing real numbers. |
---
---
Learning Objectives
After completing this chapter, you will be able to:
- Describe the structure of a floating-point number, including the sign, biased exponent, and mantissa fields.
- Represent a given decimal number in the IEEE 754 single-precision (32-bit) and double-precision (64-bit) formats.
- Convert binary representations in IEEE 754 format back into their decimal floating-point equivalents.
- Analyze the representation of special values such as , , and NaN (Not a Number), and understand the concepts of normalization and precision.
---
---
We now turn our attention to Floating-Point Representation...
Part 1: Floating-Point Representation
Introduction
In the domain of digital computation, representing real numbers with both fractional parts and a wide dynamic range is a fundamental challenge. While fixed-point representations are suitable for applications where the range of values is well-defined and limited, they are inadequate for scientific and engineering computations that involve very large or very small quantities. Floating-point representation addresses this limitation by encoding a number in a form analogous to scientific notation, separating its magnitude (as a significand) and its scale (as an exponent).
This chapter provides a comprehensive treatment of floating-point representation as specified by the IEEE 754 standard, which is the ubiquitous convention used in modern computing hardware. A thorough understanding of this standard is indispensable for GATE, as it forms the basis for questions involving number representation, conversion, and arithmetic. We shall explore the structure of these representations, the process of converting between decimal and floating-point formats, and the special values that handle exceptional cases such as infinity and undefined results.
A floating-point number is a representation of a real number in the form:
where is the sign, is the significand (or mantissa), is the base (typically 2 in modern computers), and is the exponent. The IEEE 754 standard specifies precise formats for encoding , , and in binary.
---
---
Key Concepts
The most common format encountered in GATE is the IEEE 754 single-precision (32-bit) standard. We will focus our discussion on this format.
The 32 bits are allocated as follows:
- Sign (S): 1 bit. for positive, for negative.
- Exponent (E): 8 bits. Stored in a biased format.
- Fraction (F): 23 bits. Represents the fractional part of the significand.
1. The Biased Exponent
The 8-bit exponent field does not directly represent the exponent. Instead, it stores a value from which a bias must be subtracted to obtain the true exponent. This allows the representation of both positive and negative exponents without needing a separate sign bit for the exponent itself.
For single-precision, the bias is . The 8-bit exponent field, , can represent unsigned integers from to .
The true exponent, , is calculated as:
The values and are reserved for special cases. For normal numbers, ranges from to , which corresponds to a true exponent range of to .
2. Normalized Representation
Most floating-point numbers are stored in a normalized format. In binary scientific notation, any non-zero number can be written with a single '1' before the binary point (e.g., ). Since this leading '1' is always present for normalized numbers, it does not need to be stored explicitly. This is known as the implicit or hidden leading bit.
The significand, , is therefore , where is the 23-bit fraction field. This provides an extra bit of precision.
Variables:
- : The sign bit (0 or 1).
- : The 23-bit fraction field, interpreted as a binary fraction.
- : The 8-bit unsigned integer in the exponent field, where .
When to use: For converting an IEEE 754 single-precision binary representation to its decimal value, when the exponent field is not all 0s or all 1s.
Worked Example 1: IEEE 754 to Decimal
Problem: Convert the IEEE 754 single-precision number represented by the hexadecimal value `0xC1A00000` to its decimal equivalent.
Solution:
Step 1: Convert the hexadecimal representation to its 32-bit binary equivalent.
`C` = `1100`, `1` = `0001`, `A` = `1010`, `0` = `0000`
Step 2: Parse the binary string into its S, E, and F components.
- Sign (S): The first bit is .
- Exponent (E): The next 8 bits are .
- Fraction (F): The remaining 23 bits are .
Step 3: Calculate the decimal value of the exponent field and the true exponent .
The true exponent is .
Step 4: Construct the significand and convert it to decimal.
The significand is .
Step 5: Assemble the final value using the formula .
Answer:
---
#
## 3. Special Values and Denormalized Numbers
The reserved exponent values () and () are used to represent special quantities.
| Case | Exponent (E) | Fraction (F) | Represents | Value |
| :--- | :--- | :--- | :--- | :--- |
| Zero | 0 | 0 | Zero | |
| Denormalized | 0 | | Very small numbers | |
| Normalized | 1 to 254 | Any | Normal numbers | |
| Infinity | 255 | 0 | Infinity | |
| NaN | 255 | | Not a Number | NaN |
Denormalized numbers (or subnormal numbers) fill the gap between the smallest normalized number and zero. They use a modified formula where the implicit leading bit is 0, and the exponent is fixed at . This allows for gradual underflow.
Infinity is used to represent results of operations like division by zero.
NaN (Not a Number) represents the result of invalid operations, such as or .
The smallest positive normalized number has and .
The value is
The largest positive normalized number has and consisting of all 1s.
The value is approx.
---
---
Problem-Solving Strategies
For GATE questions, particularly those involving hexadecimal representations, speed and accuracy are paramount.
A 32-bit number is an 8-digit hexadecimal string. Let the hex string be .
- Convert the first two hex digits () to binary. This 8-bit pattern gives you the sign bit and the first 7 bits of the exponent.
- Convert the third hex digit () to binary. The MSB of this 4-bit pattern is the last bit of the exponent. The other 3 bits are the start of the fraction.
- The remaining hex digits ( to ) directly form the rest of the fraction.
- is the MSB of the binary form of .
Example: `0xC1A00000`
- .
- . The first 7 bits of are .
- .
- The MSB `1` is the last bit of . So, .
- The remaining `010` are the start of .
- Thus, . This is much faster than writing out all 32 bits.
To compare two positive floating-point numbers, you can often avoid full decimal conversion.
- Compare their exponent fields (). The number with the larger exponent field is larger.
- If the exponents are equal, compare their fraction fields (). The number with the larger fraction field is larger.
This works because the binary representations are ordered lexicographically, just like integers, for positive numbers. For negative numbers, the reverse is true.
---
---
Common Mistakes
- ❌ Forgetting the implicit '1.': A common error is to calculate the value using instead of for normalized numbers.
- ❌ Ignoring the bias: Calculating the value using instead of .
- ❌ Confusing denormalized and normalized formulas: Applying the normalized formula when .
- ❌ Errors in floating-point arithmetic: Adding exponents directly during addition.
---
---
Practice Questions
:::question type="MCQ" question="A 32-bit single-precision IEEE 754 number is given by the hexadecimal representation `0x00000000`. What does this number represent?" options=["Smallest positive denormalized number","Positive zero","Smallest positive normalized number","NaN"] answer="Positive zero" hint="Analyze the exponent and fraction fields. What special case does and correspond to?" solution="
Step 1: Convert the hexadecimal representation to binary.
Step 2: Parse the binary string into S, E, and F components.
- Sign (S): 0
- Exponent (E): 00000000
- Fraction (F): 00000000000000000000000
Step 3: Identify the case based on the values of E and F.
According to the IEEE 754 standard, when the exponent field is all zeros and the fraction field is also all zeros, the number represents zero.
Step 4: Determine the sign.
Since the sign bit is 0, the number represents positive zero.
Result: The number represents positive zero.
"
:::
:::question type="NAT" question="A number is represented in IEEE 754 single-precision format with Sign = 1, Exponent = 10000010, and Fraction = 11000000000000000000000. The decimal value of this number is ______." answer="-14.0" hint="Use the formula . First, calculate the decimal value of and the true exponent." solution="
Step 1: Identify the given components.
- (The number is negative).
Step 2: Calculate the decimal value of the exponent field .
Step 3: Calculate the true exponent .
Step 4: Construct the significand and find its decimal value.
Step 5: Calculate the final decimal value.
Result: The decimal value is -14.0.
"
:::
:::question type="MSQ" question="Two single-precision IEEE 754 floating-point numbers are given by and . Which of the following statements is/are correct?" options=["","","",""] answer="A,B,C,D" hint="Convert both X and Y to their decimal representations. Observe the relationship between their binary patterns." solution="
Step 1: Analyze number .
- Binary:
- (Positive)
- . True exponent .
- . Significand is .
- Value of .
- So, statement C is correct.
Step 2: Analyze number .
- Binary:
- (Negative)
- . True exponent .
- . Significand is .
- Value of .
- So, statement D is correct.
Step 3: Evaluate the relationships between X and Y.
- From Step 1 and 2, we have and .
- It is clear that . Statement A is correct.
- It also follows that . Statement B is correct.
Result: All four statements A, B, C, and D are correct.
"
:::
:::question type="MCQ" question="What is the IEEE 754 single-precision representation of the decimal number ?" options=["0xC0D00000","0x40D00000","0xC0B00000","0xC0E00000"] answer="0xC0D00000" hint="First, convert 6.5 to binary. Then, normalize it to the form . Finally, find S, E, and F and assemble the 32-bit pattern." solution="
Step 1: Convert the absolute value of the number (6.5) to binary.
- Integer part: .
- Fractional part: .
- So, .
Step 2: Normalize the binary number.
To normalize, we move the binary point to be after the first '1'.
Step 3: Determine S, e, E, and F.
- The number is negative, so .
- The true exponent is .
- The biased exponent is .
- In binary, .
- The fraction part is the part of the significand after the binary point: .
- We must pad this to 23 bits: .
Step 4: Assemble the 32-bit representation.
- S: 1
- E: 10000001
- F: 10100000000000000000000
- Combined:
Step 5: Convert the binary representation to hexadecimal.
Group the bits into sets of four:
The hexadecimal representation is `0xC0D00000`.
Result: The correct option is `0xC0D00000`.
"
:::
---
Summary
- Master the Single-Precision Format: Know the 1-8-23 bit allocation for Sign, Exponent, and Fraction. The bias is always 127.
- Memorize the Core Formula: The value of a normalized number is . This is the most frequently used formula.
- Recognize Special Cases: Be able to instantly identify Zero, Infinity, NaN, and Denormalized numbers based on the exponent field ( or ). This is crucial for eliminating options in MCQs.
- Practice Hexadecimal Conversion: Many questions provide numbers in hexadecimal. Be swift in converting hex to binary and parsing it into S, E, and F fields.
---
What's Next?
This topic is a cornerstone of computer arithmetic and has strong connections to other areas of the GATE syllabus.
- Computer Arithmetic: Floating-point representation is the foundation for understanding floating-point addition, subtraction, multiplication, and division algorithms, and the hardware that implements them.
- Computer Organization and Architecture: Understanding how floating-point numbers are stored in registers and processed by the Floating-Point Unit (FPU) is essential. This knowledge is relevant to topics like instruction sets and pipelining.
---
Chapter Summary
In our study of number representations, we have progressed from simple integer schemes to the more complex and powerful floating-point formats. For the GATE examination, a thorough command of the following principles is essential.
- The Fundamental Trade-off: We have established that fixed-point and floating-point representations embody a fundamental trade-off. While fixed-point offers uniform precision, its range is limited. Floating-point, conversely, provides a vast dynamic range at the cost of variable precision, where the gap between representable numbers increases with their magnitude.
- IEEE 754 Standard Structure: The IEEE 754 standard is the cornerstone of modern floating-point arithmetic. Its structure, comprising a sign bit (), a biased exponent (), and a fractional mantissa (), must be thoroughly understood. The value of a normalized number is given by .
- Biased Exponent: The use of a biased exponent is a critical design choice. It allows for the representation of both very large and very small magnitudes while enabling efficient comparison of exponents, as they can be treated as unsigned integers. The bias for single-precision is 127, and for double-precision, it is 1023.
- Normalization and the Implicit Bit: To maximize precision, floating-point numbers are typically stored in a normalized form, which mandates a single non-zero digit to the left of the radix point. In the binary system of IEEE 754, this leading digit is always '1'. Storing this bit is redundant; hence, it is made implicit, effectively granting an extra bit of precision to the mantissa.
- Special Values: The IEEE 754 standard reserves specific exponent patterns to represent special values. An exponent of all zeros signifies either zero (if the mantissa is also all zeros) or a denormalized number. An exponent of all ones represents either infinity (if the mantissa is all zeros) or Not-a-Number (NaN).
- Conversion Proficiency: Fluency in converting between decimal values and their IEEE 754 single-precision (32-bit) and double-precision (64-bit) binary or hexadecimal representations is a non-negotiable skill. This includes the process of normalization, bias calculation, and bit-pattern assembly.
- Limitations and Consequences: We must remain cognizant of the inherent limitations of finite-precision arithmetic. Concepts such as machine epsilon, rounding errors (e.g., round-to-nearest, ties-to-even), overflow (exceeding the largest representable number), and underflow (becoming too small to represent) are frequent sources of error in numerical computations and are important topics for examination.
---
Chapter Review Questions
:::question type="MCQ" question="A 32-bit single-precision floating-point number is represented in hexadecimal as `0xC1700000`. What is the decimal value represented by this bit pattern?" options=["-15.0", "-14.0", "-30.0", "-1.75"] answer="A" hint="First, convert the hexadecimal representation to binary. Then, partition the bits into sign, exponent, and mantissa fields according to the IEEE 754 single-precision format. Remember to account for the exponent bias and the implicit leading '1' of the mantissa." solution="
The hexadecimal representation is .
Step 1: Convert from Hexadecimal to Binary
We convert each hex digit to its 4-bit binary equivalent:
- C 1100
- 1 0001
- 7 0111
- 0 0000
The full 32-bit binary pattern is:
Step 2: Deconstruct the IEEE 754 Fields
- Sign Bit (S): The first bit is , which indicates a negative number.
- Exponent (E): The next 8 bits are . The decimal value is .
- Mantissa (F): The remaining 23 bits are .
Step 3: Calculate the Actual Exponent
For single-precision, the bias is 127.
The actual exponent is .
Step 4: Reconstruct the Value
The number is normalized, so the value is given by the formula .
The mantissa part is .
Now, we apply the exponent:
Therefore, the correct decimal value is .
"
:::
:::question type="NAT" question="Consider a hypothetical 12-bit floating-point representation with 1 bit for the sign, 5 bits for the exponent using a bias of 15, and 6 bits for the mantissa. Calculate the total number of distinct, positive, normalized floating-point numbers that can be represented in this format." answer="1920" hint="The number of representable values depends on the number of possible combinations of the exponent and mantissa fields. Remember that certain exponent values are reserved for special cases (zero, denormalized, infinity, NaN) and are not used for normalized numbers." solution="
Step 1: Analyze the Format
- Total bits = 12
- Sign bits = 1 (We are only considering positive numbers, so this is fixed to 0).
- Exponent bits = 5
- Mantissa bits = 6
Step 2: Determine the Range of Valid Exponent Fields for Normalized Numbers
The exponent field has 5 bits, so it can represent decimal values from 0 to .
In any IEEE-like standard, the exponent field of all 0s and all 1s are reserved.
- Exponent `00000` (decimal 0) is reserved for zero and denormalized numbers.
- Exponent `11111` (decimal 31) is reserved for Infinity and NaN.
The range of biased exponent values for normalized numbers is therefore from 1 to 30, inclusive.
The number of valid exponent patterns is .
Step 3: Determine the Number of Mantissa Combinations
The mantissa field has 6 bits. For each valid exponent, any combination of these 6 bits represents a unique fractional part.
The number of possible mantissa patterns is .
Step 4: Calculate the Total Number of Normalized Positive Numbers
The total count is the product of the number of valid exponent patterns and the number of possible mantissa patterns.
The final answer is 1920.
"
:::
:::question type="MSQ" question="Which of the following statements regarding the IEEE 754 single-precision floating-point standard is/are correct? (This is a Multiple Select Question)" options=["The bit pattern for +0.0 is identical to the bit pattern for -0.0.", "The gap between any two consecutive representable numbers is uniform across the entire number line.", "The number of representable values between and (for valid ) is constant.", "Denormalized numbers allow for a 'gradual underflow' by representing values smaller than the smallest normalized number."] answer="C,D" hint="Consider the structure of the IEEE 754 representation. How does the sign bit function? How does the exponent affect the spacing of numbers? What is the specific purpose of denormalized numbers?" solution="
Let us evaluate each statement:
- A: The bit pattern for +0.0 is identical to the bit pattern for -0.0.
- B: The gap between any two consecutive representable numbers is uniform across the entire number line.
- C: The number of representable values between and (for valid ) is constant.
- D: Denormalized numbers allow for a 'gradual underflow' by representing values smaller than the smallest normalized number.
Thus, the correct statements are C and D.
"
:::
:::question type="NAT" question="The decimal value is represented in the IEEE 754 single-precision format. What is the decimal value of the 8-bit biased exponent field?" answer="133" hint="First, convert the absolute decimal value to binary. Then, normalize the binary number to the form . Finally, calculate the biased exponent using the formula ." solution="
Step 1: Convert the absolute decimal value to binary.
The integer part is .
Since there is no fractional part, .
Step 2: Normalize the binary number.
We need to express the number in the form . We move the binary point 6 places to the left.
The actual exponent is .
Step 3: Determine the sign bit and mantissa.
The number is negative, so the sign bit .
The mantissa consists of the bits to the right of the binary point in the normalized form: .
Step 4: Calculate the biased exponent.
For the IEEE 754 single-precision format, the bias is 127.
The biased exponent is calculated as:
The decimal value of the exponent field is 133.
For completeness, the 8-bit binary representation of the biased exponent is . The full 32-bit pattern for would be:
`1 10000101 01101000000000000000000`.
"
:::
---
What's Next?
Having completed our exploration of Number Representations, we have established a firm foundation in how data is encoded at the most fundamental level. These concepts are not isolated; rather, they are the essential prerequisites for understanding the hardware that manipulates this data.
Key Connections:
- Relation to Previous Learning: This chapter builds directly upon your knowledge of basic number systems (binary, hexadecimal) and integer representations (such as 2's complement). Floating-point representation is the logical and necessary extension required to handle real numbers, which are ubiquitous in scientific and engineering computation.
- Foundation for Digital Logic: The bit-level structures we have analyzed, particularly the IEEE 754 format, are directly implemented in hardware. Your understanding of how a number is partitioned into sign, exponent, and mantissa is crucial for the next chapters in Digital Logic, where you will study the design of arithmetic circuits. You will see how specialized logic is required to handle exponent addition, mantissa alignment, and normalization within an Arithmetic Logic Unit (ALU).
- Bridge to Computer Organization and Architecture: The principles of floating-point arithmetic form a cornerstone of Computer Organization. The performance of a processor, especially for scientific workloads, is heavily dependent on its Floating-Point Unit (FPU). Understanding representation errors like overflow, underflow, and precision loss provides context for concepts like instruction set design, pipelining of arithmetic operations, and the architectural differences between CPUs and GPUs.