Database Design and Modeling

Overview

The design of a database is the foundational blueprint upon which all subsequent data management operations are built. Before any data can be stored, queried, or managed, a logical and coherent structure must be established. This chapter is dedicated to the principles of database design and modeling, which form the critical bridge between real-world requirements and a functional database system. We shall explore the methodologies for abstracting complex information into structured models, focusing first on the conceptual level and then proceeding to the logical implementation.

For the GATE aspirant, a mastery of these concepts is non-negotiable. Questions derived from this topic frequently appear and are designed to test a candidate's core competency in translating problem descriptions into viable database schemas. The ability to correctly interpret requirements, construct an Entity-Relationship (ER) diagram, and subsequently map it to a set of relations is a recurring theme in the examination. A thorough understanding of the principles discussed herein is therefore paramount for achieving a high score.

---

Chapter Contents

| # | Topic | What You'll Learn |
|---|-------|-------------------|
| 1 | ER-Model | Conceptual modeling with entities and relationships. |
| 2 | Relational Model | Structuring data into tables with constraints. |

---

Learning Objectives

❗ By the End of This Chapter

After completing this chapter, you will be able to:

Construct Entity-Relationship (ER) diagrams to model real-world scenarios.

Identify and classify entities, attributes, relationships, and participation constraints.

Translate a conceptual ER-Model into a corresponding relational schema.

Define and apply integrity constraints, such as primary and foreign keys, in the Relational Model.

---

We now turn our attention to the ER-Model...

Part 1: ER-Model

Introduction

The Entity-Relationship (ER) Model serves as a foundational tool in conceptual database design. It provides a high-level, abstract view of the data that is to be stored in a database, focusing on the entities involved, the attributes that describe them, and the relationships that exist among them. By representing these components graphically, the ER model facilitates communication between database designers and end-users, ensuring that the proposed database structure accurately reflects the real-world domain it aims to model.

For the GATE examination, a proficient understanding of the ER model is essential for problems related to database schema design and the subsequent conversion of a conceptual model into a logical, relational model. We will explore the fundamental constructs of the ER model, including entities, attributes, relationships, and the constraints that govern them. This conceptual clarity is the first and most critical step toward designing a robust and efficient database system.

📖 Entity-Relationship Model

The Entity-Relationship (ER) Model is a high-level conceptual data model that describes the structure of a database. It represents real-world objects as entities, the properties of these entities as attributes, and the associations between entities as relationships. The graphical representation of this model is known as an ER diagram.

---

Key Concepts

The ER model is built upon three primary building blocks: entities, attributes, and relationships. Let us examine each in detail.

1. Entities and Entity Sets

An entity is a distinguishable real-world object or concept. For instance, in a university database, specific students, professors, or courses would be entities. An entity set is a collection of similar entities. The entity set `Student` would contain all student entities in the university.

Representation: An entity set is represented by a rectangle in an ER diagram.

2. Attributes

Attributes are the properties that describe an entity. For a `Student` entity, attributes might include `RollNo`, `Name`, and `DateOfBirth`. We classify attributes into several types.

Simple vs. Composite: A simple attribute is atomic and cannot be further subdivided (e.g., `RollNo`). A composite attribute can be divided into smaller sub-parts (e.g., `Name` can be composed of `FirstName` and `LastName`).
Single-Valued vs. Multi-Valued: A single-valued attribute holds a single value for an entity (e.g., `DateOfBirth`). A multi-valued attribute can hold multiple values (e.g., `PhoneNumber`, as a student may have more than one). A multi-valued attribute is depicted by a double-lined oval.
Stored vs. Derived: A stored attribute is one whose value is explicitly stored in the database (e.g., `DateOfBirth`). A derived attribute is one whose value can be calculated from other attributes (e.g., `Age` can be derived from `DateOfBirth`). A derived attribute is shown with a dashed oval.
Key Attribute: A key attribute (or primary key) is an attribute whose value uniquely identifies each entity in an entity set. It is represented by an oval with the attribute name underlined.

Key

Multi-valued

Derived

Composite

Sub1

Sub2

3. Relationships and Relationship Sets

A relationship is an association among two or more entities. For example, `Enrolls_In` could be a relationship associating a `Student` entity with a `Course` entity. A relationship set is a collection of similar relationships.

Representation: A relationship set is represented by a diamond shape.
Degree of Relationship: The number of participating entity sets. A binary relationship has a degree of two, a ternary relationship has a degree of three, and so on. Binary relationships are the most common.

4. Cardinality Ratios and Participation Constraints

These constraints define the rules of association between entities.

Cardinality Ratio: Specifies the maximum number of relationship instances an entity can participate in. The common ratios for a binary relationship between entity sets A and B are:

- One-to-One (1:1): An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one entity in A. - One-to-Many (1:N): An entity in A can be associated with any number of entities in B, but an entity in B can be associated with at most one entity in A. - Many-to-Many (M:N): An entity in A can be associated with any number of entities in B, and an entity in B can be associated with any number of entities in A.

Participation Constraint: Specifies whether the existence of an entity depends on its being related to another entity via the relationship.

- Total Participation: Every entity in the entity set must participate in at least one relationship instance. This is represented by a double line from the entity set to the relationship. - Partial Participation: Some entities in the entity set may not participate in any relationship instance. This is represented by a single line.

Entity A

Entity B

Total

Partial

M
N

5. Weak Entity Sets

A weak entity set is one whose existence is dependent on another entity set, called the identifying or owner entity set. A weak entity does not have a primary key of its own; it has a partial key (discriminator) that, when combined with the primary key of the owner entity, uniquely identifies a weak entity instance.

Representation: A weak entity set is shown by a double-lined rectangle, and its identifying relationship is shown by a double-lined diamond. The partial key of the weak entity is underlined with a dashed line.

Worked Example:

Problem: Design an ER diagram for a university database where a `Professor` can teach multiple `Courses`. Each `Professor` has a unique `Prof_ID`, a `Name`, and a `Salary`. Each `Course` has a unique `Course_ID`, a `Title`, and `Credits`. The relationship is named `Teaches`. A professor must teach at least one course, but a course might not have a professor assigned yet.

Solution:

Step 1: Identify Entity Sets
We have two entity sets: `Professor` and `Course`.

Step 2: Identify Attributes

`Professor`: `Prof_ID` (key), `Name`, `Salary`.

`Course`: `Course_ID` (key), `Title`, `Credits`.

Step 3: Identify the Relationship
The relationship is `Teaches` between `Professor` and `Course`.

Step 4: Determine Cardinality and Participation

Cardinality: A `Professor` can teach multiple `Courses` (N), and a `Course` is taught by one `Professor` (1). Thus, the cardinality is 1:N from `Professor` to `Course`.

Participation: A professor must teach at least one course (Total Participation of `Professor` in `Teaches`). A course might not have a professor assigned (Partial Participation of `Course` in `Teaches`).

Step 5: Draw the ER Diagram

Professor

Course

Teaches

---

Problem-Solving Strategies

A common GATE question related to the ER model involves converting an ER diagram into a relational schema (a set of tables). The number of tables required is a frequent point of inquiry.

💡 ER to Relational Mapping: Table Count

To determine the minimum number of tables required for an ER diagram:

Strong Entity Sets: Each strong entity set maps to its own table.

Multi-valued Attributes: Each multi-valued attribute requires a separate table, consisting of the primary key of the parent entity and the attribute itself.

Weak Entity Sets: A weak entity set is typically merged with its owner entity set if the relationship is one-to-one. More commonly, it requires its own table that includes the primary key of the owner entity set as a foreign key.

Relationships:

- 1:1 Relationship: Can be merged into one of the entity tables by placing the primary key of one as a foreign key in the other. No new table is needed.
- 1:N Relationship: No new table. The primary key of the entity on the '1' side is placed as a foreign key in the table for the entity on the 'N' side.
- M:N Relationship: Always requires a new table. This table's primary key is a composite of the primary keys of the participating entity sets.

---

Common Mistakes

⚠️ Avoid These Errors

❌ Confusing Cardinality and Participation: Cardinality refers to the maximum number of associations (1 or Many), while participation refers to the minimum (0 for partial, 1 for total). Do not use them interchangeably.
❌ Incorrectly Mapping M:N Relationships: Attempting to map an M:N relationship without creating a new table is a fundamental error. You cannot simply place a foreign key in one of the entity tables, as this would violate normalization principles and fail to capture the relationship correctly.
❌ Ignoring Weak Entities: Forgetting that a weak entity's primary key is a composite key, formed from its partial key and the primary key of its owner entity. This is crucial for correct relational schema conversion.

---

Practice Questions

:::question type="MCQ" question="In an ER diagram, an attribute that can be calculated from other related attributes is known as a:" options=["Composite attribute","Multi-valued attribute","Derived attribute","Key attribute"] answer="Derived attribute" hint="Consider the attribute 'Age' derived from 'Date of Birth'." solution="A derived attribute is one whose value is not stored directly but is computed from other attributes. For example, Age can be derived from the stored attribute Date of Birth. A composite attribute can be broken down (e.g., Name into FirstName, LastName). A multi-valued attribute can hold multiple values (e.g., PhoneNumbers). A key attribute uniquely identifies an entity."
:::

:::question type="NAT" question="An ER model consists of two strong entity sets, E1 and E2, which are related through a many-to-many relationship R. Entity set E1 has exactly one multi-valued attribute, and E2 has only simple, single-valued attributes. What is the minimum number of tables required to convert this ER model into a relational schema?" answer="4" hint="Count the tables needed for each strong entity, the M:N relationship, and the multi-valued attribute separately." solution="
Step 1: Map the strong entity set E1. A table is created for E1 containing its primary key and all simple attributes. This accounts for 1 table.

Step 2: Map the strong entity set E2. A second table is created for E2, containing its primary key and all its attributes. This accounts for 1 table.

Step 3: Map the many-to-many relationship R. An M:N relationship always requires a new table to store the association. This table's primary key is the combination of the primary keys of E1 and E2. This accounts for 1 table.

Step 4: Map the multi-valued attribute of E1. A multi-valued attribute requires its own separate table. This table will contain the primary key of E1 (as a foreign key) and the values of the multi-valued attribute. This accounts for 1 table.

Step 5: Sum the total number of tables.
Total tables = 1 (for E1) + 1 (for E2) + 1 (for R) + 1 (for multi-valued attribute) = 4.

Result: \boxed{4}
"
:::

:::question type="MSQ" question="Which of the following statements about the ER model are correct?" options=["A weak entity set always has total participation in its identifying relationship.","A many-to-many relationship is mapped to a new relation (table) in the relational model.","A derived attribute must be physically stored in the database table.","The degree of a relationship is the number of attributes it possesses."] answer="A,B" hint="Evaluate each statement based on the core definitions of ER model components and their mapping to the relational model." solution="

A: This is correct. By definition, a weak entity cannot exist without its owner entity. Therefore, every weak entity instance must be associated with an owner entity instance, implying total participation.

B: This is correct. An M:N relationship cannot be represented by just placing foreign keys in the entity tables. It requires a separate junction or bridge table whose primary key is a composite of the primary keys of the participating entities.

C: This is incorrect. A derived attribute is, by definition, not stored. Its value is calculated on-the-fly from other stored attributes to save space and avoid redundancy.

D: This is incorrect. The degree of a relationship is the number of participating entity sets, not the number of attributes. For example, a binary relationship involves two entity sets.

"
:::

---

Summary

❗ Key Takeaways for GATE

Core Components: The ER model is fundamentally composed of entities (rectangles), attributes (ovals), and relationships (diamonds). Master their graphical representations.

Constraints are Crucial: Cardinality ratios (1:1, 1:N, M:N) and participation constraints (total/partial) define the business rules of the database. Pay close attention to the double line for total participation.

ER-to-Relational Mapping: The most testable concept is converting an ER diagram to a set of tables. Remember the key rules: a new table is required for each M:N relationship and for each multi-valued attribute.

---

What's Next?

💡 Continue Learning

This topic connects to:

Relational Model: The ER model is the conceptual blueprint. The Relational Model is the logical implementation. Understanding how ER constructs (entities, relationships) are translated into relational constructs (tables, primary keys, foreign keys) is a critical skill.

Normalization: After converting an ER model to a set of tables, the next step is to ensure the schema is well-designed by applying normalization principles ( $1NF$ , $2NF$ , $3NF$ , $BCNF$ ) to eliminate redundancy and prevent data anomalies.

Master these connections for comprehensive GATE preparation!

---

💡 Moving Forward

Now that you understand ER-Model, let's explore Relational Model which builds on these concepts.

---

Part 2: Relational Model

Introduction

The Relational Model provides the theoretical foundation for the vast majority of modern database systems. It is a model for database management based on first-order predicate logic, first formulated by E. F. Codd. Its primary virtue lies in its simplicity and rigorous mathematical underpinnings, which allow for data to be represented in a structured, intuitive manner. In this model, data is organized into tables, referred to as relations, which consist of rows, or tuples, and columns, or attributes.

Understanding the relational model is not merely an academic exercise; it is fundamental to designing, querying, and managing relational databases effectively. For the GATE examination, a firm grasp of its core components—relations, keys, and integrity constraints—is essential for tackling more advanced topics such as relational algebra, SQL, and normalization. We shall now proceed to formally define these constituent elements.

📖 Relation

A relation is a two-dimensional table used to store a collection of related data. Mathematically, a relation is a subset of the Cartesian product of a list of domains. A relation consists of a schema and an instance. The schema defines the structure (column names and data types), while the instance is the set of data present in the relation at a given point in time.

---

Key Concepts

The relational model is characterized by a precise set of terminologies. Let us consider a sample relation, `STUDENTS`, to illustrate these concepts.

`STUDENTS` Relation Instance:

| StudentID | Name | Dept | CGPA |
| :--- | :--- | :--- | :--- |
| 101 | Anjali | CS | 8.5 |
| 102 | Vikram | EE | 9.1 |
| 103 | Priya | CS | 8.9 |
| 104 | Rohan | ME | 7.8 |

1. Basic Terminology

Attribute: An attribute is a named column of a relation that represents a property of the entity. In our `STUDENTS` relation, ` $StudentID$ `, ` $Name$ `, ` $Dept$ `, and ` $CGPA$ ` are attributes.

Domain: The domain of an attribute is the set of all possible atomic (indivisible) values that the attribute can take. For example, the domain of ` $CGPA$ ` might be the set of all real numbers between 0.0 and 10.0. The domain of ` $Dept$ ` could be {'CS', 'EE', 'ME', 'CE'}.

Tuple: A tuple is a row in a relation. It represents a single data record or an entity. In the `STUDENTS` relation, the row `(102, 'Vikram', 'EE', 9.1)` is a tuple.

Relation Schema: The schema specifies the name of the relation and the set of its attributes and their corresponding domains. The schema for our example is `STUDENTS( $StudentID$ : Integer, $Name$ : String, $Dept$ : String, $CGPA$ : Real)`.

Relation Instance: A relation instance is a finite set of tuples. The table shown above is an instance of the `STUDENTS` relation. Importantly, a relation is a set of tuples, which implies that no two tuples can be identical and the ordering of tuples is immaterial.

2. Properties of a Relation

We can quantify the size and structure of a relation using two key metrics: degree and cardinality.

📐 Degree of a Relation

The degree of a relation is the number of attributes (columns) in its schema.

Degree = |\{A_1, A_2, \dots, A_n\}| = n

Variables:

$\{A_1, A_2, \dots, A_n\}$ is the set of attributes in the relation schema.

When to use: To determine the "width" or complexity of the relation's structure.

📐 Cardinality of a Relation

The cardinality of a relation is the number of tuples (rows) in its instance.

Cardinality = |\{t_1, t_2, \dots, t_m\}| = m

Variables:

$\{t_1, t_2, \dots, t_m\}$ is the set of tuples in the relation instance.

When to use: To determine the "size" or number of records in the relation at a specific time.

Worked Example:

Problem: For the `STUDENTS` relation instance provided above, determine its degree and cardinality.

Solution:

Step 1: Identify the attributes of the relation.
The attributes are ` $StudentID$ `, ` $Name$ `, ` $Dept$ `, and ` $CGPA$ `.

Step 2: Count the number of attributes to find the degree.
There are 4 attributes.

Degree = 4

Step 3: Identify the tuples in the relation instance.
The tuples are (101, ...), (102, ...), (103, ...), and (104, ...).

Step 4: Count the number of tuples to find the cardinality.
There are 4 tuples.

Cardinality = 4

Answer: The degree of the `STUDENTS` relation is 4, and its cardinality is 4.

3. Keys in the Relational Model

Keys are a fundamental concept used to uniquely identify tuples within a relation and to establish links between relations.

Superkey: A superkey is a set of one or more attributes that, taken collectively, can uniquely identify a tuple in a relation. For the `STUDENTS` relation, ` $\{StudentID\}$ ` is a superkey. Similarly, ` $\{StudentID, Name\}$ ` is also a superkey, as is ` $\{StudentID, Name, Dept, CGPA\}$ `.

Candidate Key: A candidate key is a minimal superkey. This means it is a superkey from which no attribute can be removed without it losing its unique identification property. In our example, ` $\{StudentID\}$ ` is a candidate key. If we assume student names are not unique, then ` $\{Name\}$ ` is not a candidate key.

Primary Key: The primary key is one of the candidate keys chosen by the database designer to be the principal means of uniquely identifying a tuple. It is a matter of convention and design choice. In the `STUDENTS` relation, ` $StudentID$ ` would be the natural choice for the primary key.

Foreign Key: A foreign key is an attribute or a set of attributes in one relation that refers to the primary key of another relation (or, in some cases, the same relation). It is the mechanism used to enforce relationships between tables. For instance, if we had another relation `COURSES_ENROLLED( $CourseID$ , $StudentID$ )`, the ` $StudentID$ ` attribute in this new relation would be a foreign key referencing the ` $StudentID$ ` primary key in the `STUDENTS` relation.

4. Integrity Constraints

Integrity constraints are rules that the data in a database must adhere to, ensuring data accuracy and consistency.

Entity Integrity Constraint: This constraint states that no attribute of a primary key can have a `NULL` value. This is because the primary key is used to identify individual tuples, and a `NULL` value is undefined, making unique identification impossible.

Referential Integrity Constraint: This constraint is specified between two relations and is used to maintain consistency among tuples. It dictates that if a relation `

R1

` has a foreign key that references the primary key of a relation `

R2

`, then for every tuple in `

R1

`, the value of the foreign key must either be `NULL` or match a value in the primary key of some tuple in `

R2

Domain Constraints: These are the most basic constraints, specifying that the value of every attribute in every tuple must be from the domain of that attribute. For example, the `

CGPA

` value must be a real number between 0.0 and 10.0.

---

Problem-Solving Strategies

💡 GATE Strategy: Key Identification

When asked to identify keys from a relation schema:

Look for attributes that are explicitly stated or implied to be unique (e.g., ID, RollNumber, SSN). These are strong candidates for the primary key.

A superkey is any set of attributes containing a candidate key.

A candidate key is a minimal superkey. To test if a superkey is minimal, check if removing any single attribute from it makes it no longer a superkey. If so, it is minimal.

---

Common Mistakes

⚠️ Avoid These Errors

❌ Confusing degree and cardinality. Degree is the number of columns (attributes), which is fixed by the schema. Cardinality is the number of rows (tuples), which changes as data is added or removed.
❌ Assuming a superkey must be a single attribute. A superkey is a set of attributes and can contain multiple attributes.
❌ Believing all candidate keys are single attributes. A candidate key can also be composite (consisting of multiple attributes).
❌ Forgetting that a primary key cannot be `NULL`. This is the essence of the entity integrity constraint.

---

Practice Questions

:::question type="MCQ" question="A relation R has a degree of 5 and a cardinality of 20. If 3 tuples are deleted and 2 attributes are added, what will be the new degree and cardinality?" options=["Degree = 7, Cardinality = 17", "Degree = 5, Cardinality = 17", "Degree = 7, Cardinality = 20", "Degree = 2, Cardinality = 23"] answer="Degree = 7, Cardinality = 17" hint="Recall the definitions of degree and cardinality. Degree relates to the schema (columns), and cardinality relates to the instance (rows)." solution="
Step 1: Analyze the initial state.
Initial Degree = 5 (number of attributes)
Initial Cardinality = 20 (number of tuples)

Step 2: Analyze the effect of deleting tuples.
Deleting 3 tuples affects the cardinality.
New Cardinality = 20 - 3 = 17

Step 3: Analyze the effect of adding attributes.
Adding 2 attributes (columns) affects the degree. This is a schema modification.
New Degree = 5 + 2 = 7

Result: The new degree is 7 and the new cardinality is 17.
Answer: \boxed{\text{Degree = 7, Cardinality = 17}}
"
:::

:::question type="NAT" question="Consider a relation `EMPLOYEE(EmpID, Name, DeptID, ManagerID)`. The `EmpID` is the primary key. `DeptID` is a foreign key referencing the `DEPARTMENT` table. `ManagerID` is a foreign key that references `EmpID` within the same `EMPLOYEE` table. What is the degree of this relation?" answer="4" hint="The degree of a relation is the count of its attributes or columns." solution="
Step 1: Identify the attributes in the relation schema.
The schema is `EMPLOYEE(EmpID, Name, DeptID, ManagerID)`.

Step 2: Count the number of attributes.
The attributes are:

EmpID

Name

DeptID

ManagerID

Result: There are 4 attributes. Therefore, the degree of the relation is 4.
Answer: \boxed{4}
"
:::

:::question type="MSQ" question="Which of the following statements about keys in the relational model are correct?" options=["Every superkey is a candidate key.", "Every candidate key is a superkey.", "A primary key must be a candidate key.", "A relation can have multiple primary keys."] answer="Every candidate key is a superkey.,A primary key must be a candidate key." hint="Consider the definitions of superkey, candidate key, and primary key. Think about minimality and the process of selection." solution="

Option A: "Every superkey is a candidate key." This is incorrect. A candidate key is a minimal superkey. For example, if `{ID}` is a candidate key, then `{ID, Name}` is a superkey but not a candidate key.

Option B: "Every candidate key is a superkey." This is correct by definition. A candidate key is a specific type of superkey (one that is minimal).

Option C: "A primary key must be a candidate key." This is correct. The primary key is chosen by the database designer from the set of available candidate keys.

Option D: "A relation can have multiple primary keys." This is incorrect. A relation can have multiple candidate keys, but only one is designated as the primary key.

Answer: \boxed{\text{Every candidate key is a superkey.,A primary key must be a candidate key.}}
"
:::

:::question type="MCQ" question="The constraint that ensures that no primary key attribute can have a NULL value is known as:" options=["Domain Constraint", "Entity Integrity Constraint", "Referential Integrity Constraint", "Functional Dependency"] answer="Entity Integrity Constraint" hint="Recall the three main types of integrity constraints and their specific purposes." solution="

Domain Constraint: Ensures values are from a specified domain (e.g., CGPA between 0 and 10).

Entity Integrity Constraint: Specifically states that no part of a primary key can be NULL.

Referential Integrity Constraint: Governs the relationship between a foreign key and a primary key.

Functional Dependency: A concept related to normalization, not a basic integrity constraint in this context.

Therefore, the correct answer is the Entity Integrity Constraint.
Answer: \boxed{\text{Entity Integrity Constraint}}
"
:::

---

Summary

❗ Key Takeaways for GATE

Structure: Data is stored in relations (tables), which are composed of tuples (rows) and attributes (columns). A relation is formally a set of tuples, implying no duplicates and no inherent order.

Metrics: The degree is the number of attributes (columns) and is a property of the schema. The cardinality is the number of tuples (rows) and is a property of the instance.

Keys are Critical: A superkey uniquely identifies a tuple. A candidate key is a minimal superkey. The primary key is the chosen candidate key to uniquely identify tuples and cannot be NULL (Entity Integrity). A foreign key creates relationships between tables (Referential Integrity).

---

What's Next?

💡 Continue Learning

This topic provides the essential vocabulary for more advanced database concepts.

Relational Algebra & Calculus: The relational model is the structure upon which the query languages of relational algebra and calculus operate. Understanding tuples and attributes is crucial for writing and optimizing queries.

Functional Dependencies & Normalization: The concept of keys, particularly candidate keys, is the starting point for understanding functional dependencies, which are the basis for database normalization (1NF, 2NF, 3NF, BCNF).

---

Chapter Summary

📖 Database Design and Modeling - Key Takeaways

In this chapter, we have laid the groundwork for all subsequent topics in database management by exploring the principles of conceptual and logical database design. A thorough understanding of these foundational concepts is paramount for success. The essential points to retain are as follows:

ER Model as a Blueprint: We have established the Entity-Relationship (ER) model as a high-level, semantic tool for conceptual database design. Its core constructs—entities (strong and weak), attributes (simple, composite, multi-valued, derived), and relationships—serve as the abstract blueprint before any implementation is considered.

The Importance of Constraints: The precision of a database model is determined by its constraints. We have seen that cardinality ratios (1:1, 1:N, M:N) and participation constraints (total and partial) are critical for accurately capturing real-world business rules. These constraints directly dictate the structure of the resulting relational schema.

Mapping ER to Relational Schema: A central skill developed in this chapter is the algorithmic translation of an ER diagram into a set of relational tables. The minimum number of tables required is a function of the entities and the types of relationships between them. A many-to-many relationship, for instance, will always necessitate a distinct relation in the schema.

Weak Entities and Identifying Relationships: We have distinguished weak entities, which lack a primary key of their own and are existence-dependent on a strong (owner) entity. In the relational model, the relation corresponding to a weak entity will have a primary key composed of the primary key of the owner entity and the partial key of the weak entity itself.

Role of Keys in the Relational Model: We have defined the hierarchy of keys—superkey, candidate key, and primary key—which are fundamental to ensuring data uniqueness. Furthermore, the foreign key acts as the logical pointer that implements relationships and enforces referential integrity, preventing data inconsistencies between related tables.

Advanced Modeling with Generalization/Specialization: The ISA hierarchy (generalization/specialization) allows for the modeling of entity subtypes. The choice of mapping this structure to relations—whether using one table with nulls, a table for each subclass, or a table for each class—involves a trade-off between performance, storage efficiency, and ease of constraint enforcement.

---

Chapter Review Questions

:::question type="MCQ" question="Consider an ER model with a many-to-one relationship $R$ from entity set $E_1$ to entity set $E_2$ . The participation of $E_1$ in $R$ is total, while the participation of $E_2$ is partial. What is the minimum number of tables required to represent this scenario and what is the most appropriate design?" options=["A. Three tables: one for $E_1$ , one for $E_2$ , and one for the relationship $R$ .","B. Two tables: The schemas for $E_2$ and $R$ are merged into a single table.","C. Two tables: The schemas for $E_1$ and $R$ are merged into a single table.","D. One table: The schemas for $E_1$ , $E_2$ , and $R$ are all merged into a single table."] answer="C" hint="Recall the rules for mapping relationships based on cardinality. Merging is typically performed on the 'many' side of a relationship to avoid data redundancy and multi-valued attributes." solution="
The problem describes a many-to-one relationship from $E_1$ to $E_2$ . This can be represented as:

E_1 \xrightarrow{N:1} E_2

Standard Mapping: A non-optimized approach would create a table for each entity and each relationship. This would result in three tables:

T_{E1}

for entity set

E_1

.
*

T_{E2}

for entity set

E_2

.
*

T_R

for relationship

R

, containing the primary keys of

E_1

and

E_2

Optimized Mapping: For one-to-many or many-to-one relationships, we can optimize the design by merging the relationship table with one of the entity tables. The rule is to merge on the 'many' side.

* In this case,

E_1

is on the 'many' side. We can merge the relationship table

T_R

into the table for

E_1

.
* This involves adding the primary key of

E_2

as a foreign key to the table for

E_1

. Let

PK(E_1)

and

PK(E_2)

be the primary keys.
* The resulting schema would be:

T_{E1\_R}(PK(E_1), \text{attributes of } E_1, PK(E_2), \text{attributes of } R)

T_{E2}(PK(E_2), \text{attributes of } E_2)

* This reduces the total number of tables to two.

Evaluating the Total Participation Constraint: The participation of

E_1

R

is total. This means every entity in

E_1

must participate in the relationship. In our merged schema for

T_{E1\_R}

, this translates to the foreign key column

PK(E_2)

having a `NOT NULL` constraint, which is an efficient and valid design.

Why not merge on the 'one' side? If we tried to merge

R

into the table for

E_2

(the 'one' side), a single entity from

E_2

could be related to many entities from

E_1

. To store this in the

T_{E2}

table, we would need to store multiple values of

PK(E_1)

for a single tuple in

T_{E2}

, which violates the First Normal Form (1NF).

Therefore, the optimal design requires two tables, with the relationship information merged into the table for the entity on the 'many' side, which is $E_1$ .
Answer: \boxed{\text{C}}
"
:::

:::question type="NAT" question="A relational schema

R(A, B, C, D, E, F)

is defined with the following set of functional dependencies:

F = \{ AB \rightarrow C, C \rightarrow D, D \rightarrow E, E \rightarrow F, F \rightarrow A \}

What is the total number of candidate keys for the relation

R

?" answer="5" hint="First, identify any essential attributes that must be part of every candidate key. An attribute is essential if it does not appear on the right-hand side of any functional dependency." solution="

Identify Essential Attributes: We first examine the set of functional dependencies

F

to find attributes that are never on the right-hand side (RHS).

* RHS Attributes:

\{C, D, E, F, A\}

* All attributes

\{A, B, C, D, E, F\}

except for

B

appear on the RHS of some FD.
* Therefore,

B

is an essential attribute and must be a part of every candidate key.

Compute Closures with the Essential Attribute: We will compute the closure of attribute sets that include

B

to check if they form a superkey.

(B)^+ = \{B\}

. This is not a superkey.
* Let's check two-attribute sets containing

B

Check $(AB)^+$ :

(AB)^+ = \{A, B\}

* Using

AB \rightarrow C

\{A, B, C\}

* Using

C \rightarrow D

\{A, B, C, D\}

* Using

D \rightarrow E

\{A, B, C, D, E\}

* Using

E \rightarrow F

\{A, B, C, D, E, F\}

. This is a superkey.
* Since neither

(A)^+

nor

(B)^+

is a superkey,

AB

is a candidate key.

Check $(BC)^+$ :

(BC)^+ = \{B, C\}

* Using

C \rightarrow D \rightarrow E \rightarrow F \rightarrow A

\{B, C, D, E, F, A\}

. This is a superkey.
* Since neither

(B)^+

nor

(C)^+

is a superkey,

BC

is a candidate key.

Check $(BD)^+$ :

(BD)^+ = \{B, D\}

* Using

D \rightarrow E \rightarrow F \rightarrow A

\{B, D, E, F, A\}

* Now we have

A

and

B

, so we use

AB \rightarrow C

\{A, B, C, D, E, F\}

. This is a superkey.
* Since neither

(B)^+

nor

(D)^+

is a superkey,

BD

is a candidate key.

Check $(BE)^+$ :

(BE)^+ = \{B, E\}

* Using

E \rightarrow F \rightarrow A

\{B, E, F, A\}

* Using

AB \rightarrow C

\{A, B, C, E, F\}

* Using

C \rightarrow D

\{A, B, C, D, E, F\}

. This is a superkey.
* Since neither

(B)^+

nor

(E)^+

is a superkey,

BE

is a candidate key.

Check $(BF)^+$ :

(BF)^+ = \{B, F\}

* Using

F \rightarrow A

\{B, F, A\}

* Using

AB \rightarrow C

\{A, B, C, F\}

* Using

C \rightarrow D

\{A, B, C, D, F\}

* Using

D \rightarrow E

\{A, B, C, D, E, F\}

. This is a superkey.
* Since neither

(B)^+

nor

(F)^+

is a superkey,

BF

is a candidate key.

The candidate keys are $AB, BC, BD, BE, BF$ . The total number of candidate keys is 5.
Answer: \boxed{5}
"
:::

:::question type="MCQ" question="An ER model contains a generalization hierarchy where `Person` is a superclass and `Employee` and `Student` are subclasses. The generalization is specified as overlapping and partial. `Person` has primary key `person_id`. `Employee` has attribute `salary` and `Student` has attribute `gpa`. Which of the following relational schemas is the most suitable representation?" options=["A. A single table: `Person(person_id, type, salary, gpa)`.","B. Two tables: `Employee(person_id, salary)` and `Student(person_id, gpa)`.","C. Three tables: `Person(person_id)`, `Employee(person_id, salary)`, and `Student(person_id, gpa)`.","D. One table: `Person(person_id, salary, gpa)` with boolean flags `is_employee` and `is_student`."] answer="C" hint="Consider the implications of 'overlapping' (a person can be both an employee and a student) and 'partial' (a person can be neither) constraints on the schema design." solution="
Let's analyze the constraints and evaluate the options:
* Overlapping: A single `Person` entity can be both an `Employee` and a `Student` at the same time.
* Partial: A `Person` entity might be neither an `Employee` nor a `Student`.

Option A: `Person(person_id, type, salary, gpa)`: A `type` attribute cannot handle the 'overlapping' constraint, as a person can have multiple types (e.g., 'Employee' and 'Student'). This design is only suitable for disjoint generalizations.

Option B: `Employee(person_id, salary)` and `Student(person_id, gpa)`: This design loses the superclass information. We cannot store details for a person who is neither an employee nor a student, which violates the 'partial' constraint requirement. Also, attributes common to all persons (e.g., `name`, `address`) would have to be redundantly stored in both tables.

Option C: `Person(person_id)`, `Employee(person_id, salary)`, `Student(person_id, gpa)`: This is the most flexible and accurate design for this scenario.

* The `Person` table stores information for all persons, regardless of whether they are employees or students. This correctly handles the 'partial' participation constraint.
* The `Employee` table stores `person_id`s for only those persons who are employees.
* The `Student` table stores `person_id`s for only those persons who are students.
* A person who is both an employee and a student will have their `person_id` appear in all three tables. This correctly handles the 'overlapping' constraint.
* The `person_id` in `Employee` and `Student` tables would be a foreign key referencing `Person(person_id)`.

Option D: `Person(person_id, salary, gpa)` with boolean flags: This is a single-table approach. It would lead to many `NULL` values for `salary` and `gpa`. For a person who is neither an employee nor a student, both `salary` and `gpa` would be `NULL`. While it can represent the scenario, it is inefficient in terms of storage and can lead to data anomalies, especially if subclasses have many attributes. Option C is a more normalized and robust design.

Therefore, the three-table approach is the most suitable representation for an overlapping and partial generalization.
Answer: \boxed{\text{C}}
"
:::

:::question type="NAT" question="An ER diagram consists of 3 strong entity sets $E_1, E_2$ , and $E_3$ , and one weak entity set $E_4$ which is dependent on $E_3$ . The relationships are as follows:

$R_1$ is a many-to-many relationship between $E_1$ and $E_2$ .

$R_2$ is the one-to-many identifying relationship from $E_3$ to $E_4$ .

$R_3$ is a one-to-one relationship between $E_1$ and $E_3$ , with total participation from $E_1$ 's side.

Calculate the minimum number of tables required to represent this ER diagram in a relational database." answer="5" hint="Map each component according to standard rules. Remember that M:N relationships and weak entities always require their own tables, while 1:1 and 1:N relationships can often be optimized." solution="
We determine the number of tables by analyzing each component of the ER diagram:

Strong Entity Sets ( $E_1, E_2, E_3$ ): Each strong entity set is mapped to its own table.

* Table for

E_1

T_{E1}

* Table for

E_2

T_{E2}

* Table for

E_3

T_{E3}

This gives us 3 tables so far.

Weak Entity Set ( $E_4$ ): A weak entity set always requires a separate table.

* Table for

E_4

T_{E4}

. The primary key of

T_{E4}

will be a composite of the primary key of its owner (

E_3

) and its own partial key.
This adds 1 more table, for a total of 4.

Relationship $R_1$ (M:N between $E_1$ and $E_2$ ): A many-to-many relationship always requires a new table to store the primary keys of the participating entities.

* Table for

R_1

T_{R1}

.
This adds 1 more table, for a total of 5.

Relationship $R_2$ (1:N identifying from $E_3$ to $E_4$ ): Since

R_2

is the identifying relationship for the weak entity

E_4

, its mapping is already incorporated into the table

T_{E4}

(by including the primary key of

E_3

as a foreign key and part of the primary key). No separate table is needed for

R_2

Relationship $R_3$ (1:1 between $E_1$ and $E_3$ ): A 1:1 relationship does not require a separate table. It can be implemented by placing the primary key of one entity as a foreign key in the other's table. Since the participation of

E_1

is total, every

E_1

must be related to an

E_3

. To enforce this without allowing `NULL`s, we can place the primary key of

E_3

as a foreign key in the

T_{E1}

table and apply a `NOT NULL` constraint. This does not add any new tables.

Counting the required tables: $T_{E1}, T_{E2}, T_{E3}, T_{E4}, T_{R1}$ .
The minimum number of tables required is 5.
Answer: \boxed{5}
"
:::

---

What's Next?

💡 Continue Your GATE Journey

Having completed Database Design and Modeling, you have established a firm foundation for the subsequent, more procedural chapters in Database Management Systems. The relational schema, which we have meticulously learned to design, is the static structure upon which all dynamic database operations are performed.

Key connections to upcoming chapters include:

Relational Algebra and SQL: The tables, keys, and integrity constraints you design here are the very structures you will manipulate in the next chapters. A well-designed schema is a prerequisite for writing clear, correct, and efficient queries using Relational Algebra and SQL. Your understanding of foreign keys, for example, is essential for performing JOIN operations.

Normalization: In the chapter on Normalization, we will learn formal techniques (1NF, 2NF, 3NF, BCNF) to analyze and refine the relational schemas produced from ER diagrams. Normalization is the process of eliminating data redundancy and undesirable anomalies, thereby improving the quality of the designs we have practiced here.

Transactions and Concurrency Control: The integrity constraints we defined in this chapter, such as entity integrity (via primary keys) and referential integrity (via foreign keys), must be maintained by the database system, especially during concurrent operations. The study of transactions will show how the DBMS guarantees that these critical design rules are never violated.

Database Design and Modeling

Database Design and Modeling

Overview

Chapter Contents

Learning Objectives

Part 1: ER-Model

Introduction

Key Concepts

1. Entities and Entity Sets

2. Attributes

3. Relationships and Relationship Sets

4. Cardinality Ratios and Participation Constraints

5. Weak Entity Sets

Problem-Solving Strategies

Common Mistakes

Practice Questions

Summary

What's Next?

Part 2: Relational Model

Introduction

Key Concepts

1. Basic Terminology

2. Properties of a Relation

3. Keys in the Relational Model

4. Integrity Constraints

Problem-Solving Strategies

Common Mistakes

Practice Questions

Summary

What's Next?

Chapter Summary

Chapter Review Questions

What's Next?

🎯 Key Points to Remember

Related Topics in Database Management and Warehousing

Data Preparation and Transformation

Data Warehouse Modeling

Querying and Data Integrity

Physical Database Organization

More Resources

Study Notes

Short Notes

Test Series

Mock Tests

Previous Year Papers

Chapter-wise PYQs

Chapter Practice

Why Choose MastersUp?

AI-Powered Plans

15,000+ Questions

Smart Analytics

Bookmark & Revise