This is the third in a series of posts teaching normalization.
The second post focused on the first normal form, its definition, and examples to hammer it home.
Now it is time to take a look at the second normal form. I like to think the reason we place tables in 2nd normal form is to narrow them to a single purpose. Doing so bring’s clarity to the database design, makes it easier for us to describe and use a table, and tends to eliminate modification anomalies.
This stems from the primary key identifying the main topic at hand, such as identifying buildings, employees, or classes, and the columns, serving to add meaning through descriptive attributes.
An EmployeeID isn’t much on its own, but add a name, height, hair color and age, and now you’re starting to describe a real person.
So what is the definition of 2nd normal form?
2NF – Second Normal Form Definition
A table is in 2nd Normal Form if:
- The table is in 1st normal form, and
- All the non-key columns are dependent on the table’s primary key.
We already know about the 1st normal form, but what about the second requirement? Let me try to explain.
The primary key provides a means to uniquely identify each row in a table. When we talk about columns depending on the primary key, we mean, that in order to find a particular value, such as what color is Kris’ hair, you would first have to know the primary key, such as an EmployeeID, to look up the answer.
Once you identify a table’s purpose, then look at each of the table’s columns and ask yourself, “Does this column serve to describe what the primary key identifies?”
- If you answer “yes,” then the column is dependent on the primary key and belongs in the table.
- If you answer “no,” then the column should be moved different table.
When all the columns relate to the primary key, they naturally share a common purpose, such as describing an employee. That is why I say that when a table is in second normal form, it has a single purpose, such as storing employee information.
Issues with our Example Data Model
So far we have taken our example to the first normal form, and it has several issues.
The first issue is the SalesStaffInformation table has two columns which aren’t dependent on the EmployeeID. Though they are used to describe which office the SalesPerson is based out of, the SalesOffice and OfficeNumber columns themselves don’t serve to describe who the employee is.
The second issue is that there are several attributes which don’t completely rely on the entire Customer table primary key. For a given customer, it doesn’t make sense that you should have to know both the CustomerID and EmployeeID to find the customer.
It stands to reason you should only need to know the CustomerID. Given this, the Customer table isn’t in 2nd normal form as there are columns that aren’t dependent on the full primary key. They should be moved to another table.
These issues are identified below in red.
Fix the Model to 2NF Standards
Since the columns identified in red aren’t completely dependent on the table’s primary key, it stands to reason they belong elsewhere. In both cases, the columns are moved to new tables.
In the case of SalesOffice and OfficeNumber, a SalesOffice was created. A foreign key was then added to SalesStaffInformaiton so we can still describe in which office a sales person is based.
The changes to make Customer a second normal form table are a little trickier. Rather than move the offending columns CustomerName, CustomerCity, and CustomerPostalCode to new table, recognize that the issue is EmployeeID! The three columns don’t depend on this part of the key. Really this table is trying to serve two purposes:
- To indicate which customers are called upon by each employee
- To identify customers and their locations.
For the moment remove EmployeeID from the table. Now the table’s purpose is clear, it is to identify and describe each customer.
Now let’s create a table named SalesStaffCustomer to describe which customers a sales person calls upon. This table has two columns CustomerID and EmployeeID. Together, they form a primary key. Separately, they are foreign keys to the Customer and SalesStaffInformation tables respectively.
With these changes made the data model, in second normal form, is shown below.
To better visualize this, here are the tables with data.
As you review the data in the tables notice that the redundancy is mostly eliminated. Also, see if you can find any update, insert, or deletion anomalies. Those too are gone. You can now eliminate all the sales people, yet retain customer records. Also, if all the SalesOffices close, it doesn’t mean you have to delete the records containing sales people.
The SalesStaffCustomer table is a strange one. It’s just all keys! This type of table is called an intersection table. An intersection table is useful when you need to model a many-to-many relationship.
Each column is a foreign key. If you look at the data model you’ll notice that there is a one to many relationship to this table from SalesStaffInformation and another from Customer. In effect the table allows you to bridge the two tables together.
For all practical purposes this is a pretty workable database. Three out of the four tables are even in third normal form, but there is one table which still has a minor issue, preventing it from being so.
Check out our last post to learn about the third normal form.
More tutorials are to follow! Remember! I want to remind you all that if you have other questions you want answered, then post a comment or tweet me. I’m here to help you. What other topics would you like to know more about?