Data Normalisation

Anuradha Mohanty
6 min readJun 28, 2021

Before starting with Normalization, we need to understand the concept of anomalies, why do data anomalies occur in a database.

Many kinds of anomalies occur during updation, insertion, deletion.

Updation Anamolies occur when one of the data fields is updated, which causes the previous value to be deleted.

In the table above, for the first row, the instructor is changed from Mr. Ganguly to Mr. Virat. However, the information about Mr. Virat doesn't exist.

Deletion Anamolies occur when the deletion of data in one field causes the data in other fields to be deleted.

In the table above, the course Product Management was deleted, which caused the other fields in the same row to be deleted. Student Information is also deleted here.

This occurs when data about multiple entities are stored in the same table. If information about one entity is deleted, then the information about another entity in the same row gets deleted.

Insertion Anamolies occur when data about one field is added, but the details about other fields are not available.

In the table above, the course Cloud Computing is added. However, the details about the student or instructor are not available.

How to handle such Anomalies?

  • Break the table into smaller ones, such that each table represents a single entity.
  • The smaller tables created are related to using foreign keys.
  • The attributes in a table can be separated into different tables by finding the functional and transitive dependencies among the attributes.

Hence, Data Normalization is all the steps mentioned above, which are used to remove anomalies in the database.

Need for Normalization

  • It is the process of organizing data in a database such that a piece of data is not repeated at various places.
  • Different business entities can be stored in different tables.
  • The information about one business entity is stored in one table.

First Normalization Form

Criteria for 1NF:

For a table to be in 1NF,

  • The smaller tables created are related to using foreign keys. every field in the table must contain not more than one value. If you store multiple values in one field, separating them by commas, then the relational database(RDBMS) will consider the entire line to be one string, not a list. Hence, the application written to retrieve the data from the table will have to convert this string to a list to access each data element separately.
  • every table must have a primary key or a composite key to uniquely identify the table. Unique identification of each data record in a table is important for querying the database for the right information.

Converting a Table to 1NF: Example

Problems with the Table:

there are multi-value columns

So,

the unique identification of a row was not possible before

So,

First Method: Is it possible to have single attribute keys in the table?

CustomerID: One Customer can rent many cars,

Car Number Plate: One Car can be rented multiple times.

Date Of Transactions: There can be many transactions in one day.

Second Method: Is it possible to have multiple attribute keys in the table?

Customer ID, Car Number Plate: One Customer can rent many cars.

Customer ID, Date Of Transaction: One Customer can rent two cars on the same day.

Car Number Plate, Date Of Transaction: One car can be rented only once a day.

Car Number Plate, Date Of Transaction together form the composite key.

Second Normalization Form

The second normal form removes partial functional dependencies in the table.

For a table to be in 2NF, it must be in 1NF.

To understand the Second Normalization Form, we need to understand functional dependencies.

2 records for Virat, one corresponds to C14, another C12, here Customer Name is dependent on Customer ID.

For attributes A and B,

  • If for every value of attribute A, there is only one value of attribute B, then attribute A can determine the value of attribute B.
  • If attribute A can determine attribute B, then attribute B is functionally dependent on attribute A.
  • A ->B; B is functionally dependent on A.

Consider two attributes: Customer ID and Customer Name

  • For C12, there is only one unique value of Customer Name Virat.
  • Customer ID -> Customer Name
  • For Virat, there are two values of Customer ID: C12 and C14. thus, Customer Name cannot determine Customer ID.
  • Customer Name is functionally dependent on Customer ID, but Customer ID is not functionally dependent on Customer Name.

If the non-prime attribute A is functionally dependent on the complete composite key formed by Band C, then attribute A is fully functionally dependent on the composite keys B and C.

If attribute B and attribute C form the composite key for a table and attribute D can be determined by the value of attribute B, then attribute B is not fully dependent on the composite key but is partially dependent on a part of the composite key B.

From the figure above,

If only the value of B is known:

  • A cannot be determined.
  • D can be determined.

If only the value of C is known:

  • A cannot be determined,
  • E and F can be determined.

If the value of both B and C are unknown:

  • A can be determined.
  • D, E, and F can be determined.

Functional dependencies mapping for the table above is:

Criteria for 2NF:

  • For a table to be in 2NF, the table must be first in 1NF.
  • For a table to be in 2NF, all the non-prime attributes must be fully functionally dependent on the composite key or the table's primary key.
  • For a table to be in 2NF, there must be no partial dependencies on the composite key of the table.

Converting the table to 2NF

Car Name, OwnerID, Owner name are partially dependent on Car Number Plate. So we will consider them for a new table with Car Number Plate as Primary key.

next table,

Here, Both the Car Number Plate and the date of the transaction can determine Customer ID and Customer Name.

Car Number Plate and Date of Transaction together become the composite key of this table

Note: A partial functional dependency is the dependency of a non-prime attribute one of the prime attributes in case the table has a composite key.

Third Normalization Form

If we have 3 attributes, A being a prime attribute, B being Non-Prime Attribute, C being Non-Prime Attribute.

Attribute C is functionally dependent on attribute B, and attribute B is functionally dependent on attribute A. Both attribute B and attribute C are non-prime attributes. This is a case of transitive dependency.

Criteria for 2NF:

  • For a table to be in 3NF, the table must first be in 2NF.
  • For a table to be in 3NNF, there must be no transitive dependencies in the table.

Converting tables into 3NF

Reference: MSc Data Science curriculum.

--

--