This is the second in a series of posts teaching normalization.
The first post introduced database normalization, its importance, and the types of issues it solves.
In this article, we’ll explore the first normal form. For our examples, we’ll use the Sales Staff Information shown below as our starting point. As we described in the last article, there are several problems preserving the information in this form. Normalizing the data enables us to eliminate any duplicate data as well as modification anomalies.
1NF – First Normal Form Definition
The first step to constructing the right SQL table is to ensure that the information is in its first normal form. When a table is in its first normal form, searching, filtering, and sorting information is easier. The rules to satisfy the 1st normal form are:
- When the data is in a database table. The table stores information in rows and columns where one or more columns, called the primary key, uniquely identify each row.
- Each column contains atomic values, and should be not repeating groups of columns.
Tables cannot contain sub-columns in the first normal form. That is, you cannot list multiple cities in one column and separate them with a semi-colon.
When a value is atomic, the values cannot be further subdivided. For example, the value “Chicago” is atomic; whereas “Chicago; Los Angeles; New York” is not.
Related to this requirement is the concept that a table should not contain repeating groups of columns such as Customer1Name, Customer2Name, and Customer3Name.
Once recurring customer columns are put into your own table, our example table is converted into its first normal form. The following is shown:
The repeated column groups in the Customer Table are now linked to the EmployeeID Foreign Key. As described in the Data Modeling lesson, a foreign key is a value that matches the primary key of another table.
In this case, the customer table contains the corresponding EmployeeID for the SalesStaffInformation row. Here is our data in the first normal form.
This design is superior to our original table in several ways:
- The original design limited each SalesStaffInformation entry to three customers. In the new design, the number of customers associated to each design is practically unlimited.
- The Customer, which is our original data, is nearly impossible to sort. You could, if you used the UNION statement, but it would be cumbersome. Now, it is simple to sort customers.
- The same holds true for filtering on the customer table. It is much easier to filter on one customer name related column than three.
- The insert and deletion anomalies for Customer have been eliminated. You can delete all the customer for a SalesPerson without having to delete the entire SalesStaffInformaiton row.
Modification anomalies remain in both tables, but these are fixed once we reorganize them as 2nd normal form.
More tutorials are to follow! Remember, I want to remind you all that if you have other questions you want to be answered, then post a comment or tweet me. I’m here to help you. What other subjects would you like to read more about?