19

Database Second Normal Form Explained in Simple English

Second Normal Form

This is the third in a series of posts teaching normalization.

The second post focused on the first normal form, its definition, and examples to hammer it home.

Now it is time to take a look at the second normal form.  I like to think the reason we place tables in 2nd normal form is to narrow them to a single purpose.  Doing so bring’s clarity to the database design, makes it easier for us to describe and use a table, and tends to eliminate modification anomalies.

This stems from the primary key identifying the main topic at hand, such as identifying buildings, employees, or classes, and the columns, serving to add meaning through descriptive attributes.

An EmployeeID isn’t much on its own, but add a name, height, hair color and age, and now you’re starting to describe a real person.

So what is the definition of 2nd normal form?

2NF – Second Normal Form Definition

A table is in 2nd Normal Form if:

  • The table is in 1st normal form, and
  • All the non-key columns are dependent on the table’s primary key.

We already know about the 1st normal form, but what about the second requirement?  Let me try to explain.

The primary key provides a means to uniquely identify each row in a table. When we talk about columns depending on the primary key, we mean, that in order to find a particular value, such as what color is Kris’ hair, you would first have to know the primary key, such as an EmployeeID, to look up the answer.

Once you identify a table’s purpose, then look at each of the table’s columns and ask yourself, “Does this column serve to describe what the primary key identifies?”

  • If you answer “yes,” then the column is dependent on the primary key and belongs in the table.
  • If you answer “no,” then the column should be moved different table.

When all the columns relate to the primary key, they naturally share a common purpose, such as describing an employee.  That is why I say that when a table is in second normal form, it has a single purpose, such as storing employee information.

Issues with our Example Data Model

So far we have taken our example to the first normal form, and it has several issues.

The first issue is the SalesStaffInformation table has two columns which aren’t dependent on the EmployeeID.  Though they are used to describe which office the SalesPerson is based out of, the SalesOffice and OfficeNumber columns themselves don’t serve to describe who the employee is.

The second issue is that there are several attributes which don’t completely rely on the entire Customer table primary key.  For a given customer, it doesn’t make sense that you should have to know both the CustomerID and EmployeeID to find the customer.

It stands to reason you should only need to know the CustomerID.  Given this, the Customer table isn’t in 2nd normal form as there are columns that aren’t dependent on the full primary key.  They should be moved to another table.

These issues are identified below in red.

Issues Keeping Data Model From Second Normal Form

Fix the Model to 2NF Standards

Since the columns identified in red aren’t completely dependent on the table’s primary key, it stands to reason they belong elsewhere.  In both cases, the columns are moved to new tables.

In the case of SalesOffice and OfficeNumber, a SalesOffice was created.  A foreign key was then added to SalesStaffInformaiton so we can still describe in which office a sales person is based.

The changes to make Customer a second normal form table are a little trickier.  Rather than move the offending columns CustomerName, CustomerCity, and CustomerPostalCode to new table, recognize that the issue is EmployeeID!  The three columns don’t depend on this part of the key.  Really this table is trying to serve two purposes:

  1. To indicate which customers are called upon by each employee
  2. To identify customers and their locations.

For the moment remove EmployeeID from the table.  Now the table’s purpose is clear, it is to identify and describe each customer.

Now let’s create a table named SalesStaffCustomer to describe which customers a sales person calls upon.  This table has two columns CustomerID and EmployeeID.  Together, they form a primary key.  Separately, they are foreign keys to the Customer and SalesStaffInformation tables respectively.

With these changes made the data model, in second normal form, is shown below.

Data Model in Second Normal Form

To better visualize this, here are the tables with data.

Customer Table in Second Normal From

As you review the data in the tables notice that the redundancy is mostly eliminated.  Also, see if you can find any update, insert, or deletion anomalies.  Those too are gone.  You can now eliminate all the sales people, yet retain customer records.  Also, if all the SalesOffices close, it doesn’t mean you have to delete the records containing sales people.

Sales Staff Customer Data in Second Normal Form

The SalesStaffCustomer table is a strange one.  It’s just all keys!  This type of table is called an intersection table.   An intersection table is useful when you need to model a many-to-many relationship.

Each column is a foreign key.  If you look at the data model you’ll notice that there is a one to many relationship to this table from SalesStaffInformation and another from Customer.  In effect the table allows you to bridge the two tables together.

Sales Staff Information in Second Normal Form

For all practical purposes this is a pretty workable database.  Three out of the four tables are even in third normal form, but there is one table which still has a minor issue, preventing it from being so.

Check out our last post to learn about the third normal form.

More tutorials are to follow! Remember!  I want to remind you all that if you have other questions you want answered, then post a comment or tweet me.  I’m here to help you. What other topics would you like to know more about?

Kris Wenzel
 

Kris Wenzel has been working with databases over the past 28 years as a developer, analyst, and DBA.He has a BSE in Computer Engineering from the University of Michigan and a MBA from the University of Notre Dame.Kris has written hundreds of blog articles and many online courses. He loves helping others learn SQL.

  • Adnan says:

    Hey Kris,
    Thanks for such a detailed and easy to understand explanation of the normalization technique.. The way u elaborated the whole process is just stupendous…I visited a number of sites for better understanding of normalization, but no one matched your caliber seriously… Great job indeed, keep up the good work.. Thanks…. !!!!

    • You’re welcome. I’m glad you found the article informative and understandable.

      I find a lot of computer concepts are pretty easy, we just like to hide them behind jargon.

      Your comment made my day.

  • Sammani says:

    In the customer table if we consider {customerId,employeeId} as the candidate key then it is true that Customername,customerCity,PostalCode depends only partly on candidate key and hence not in 2NF. But if we consider the candidate key to be customerId alone then all columns in Customer table are fully dependent on customerId right?(Because employeeId is dependent on customerId ).If we select {customerId} to be a andidate key then {customerId,EmployeeId} cannot be candidate key as {customerId,EmployeeId} contains a candidate key.Therefore if the candidate key is customerId, isn’t the table that was developed in 1NF satisfies 2NF conditions?

    • Great question! It would seem to stand, that if CustomerID was the candidate key, then all the other fields, EmployeeID, CustomerName, CustomerCity, and CustomerPostalCode are fully dependent on item however, I don’t think the EmployeeID is dependent on the CustomerID in this case.
      The reason I see it this way is that it is the sales organization’s choice to assign a sales person to the customer. It isn’t an inherent property of the customer; therefore, no dependency exists.

      Also, more problematic, is that if the account is really large, the sales organization may want to assign more than one sales person to the customer’s account. They can’t do this if there is only one column per customer.

  • Tony says:

    I think your modification of SalesStaffInformation be an example of 3rd normal form. Since OfficeNumber depends on SalesOffice which depends on SalesPerson, that’s a transitive dependency, hence 3rd normal.
    My understanding is that 2nd normal only applies when we’re talking about composite keys: http://www.1keydata.com/database-normalization/second-normal-form-2nf.php

    • Hi Tony,

      I totally agree. When we fixed the 1st normal form issue in SalesStaffInformation we jumped right past 2nd normal form into 3rd. But, since the article focuses on 2nd normal form I didn’t want to go there.

      You must admit that if the table is in 3rd normal form that it is also in 2nd as the definition for 3rd normal form is:
      1. The table is in 2nd normal form
      2. It contains only columns that are non-transitively dependent on the primary key

      Thanks,

      Kris.

  • Aju Mohan says:

    Hey, Thank you so much.. Your efforts are appreciable .., many people are finding this useful . All the Best :-)

  • jagadeesh says:

    Explained very well,. Keep sharing ur knowledge.Thanq for this tutorial.

  • venkata says:

    I have questions.In the above example we created a separate table to remove the employee id from the customer table .Whereas in the employee table we are mapping the id for the sales office rather than creating a separate table which has the employee id and sales office id as columns .What is the reason for these two approaches

    • Hi,

      Each employee works in one sales office, so it makes sense to keep it as an attribute on the employee record. On the other hand, if an employee could work at more than one sales office (M-W in Chicago, Th-F in Detroit), then creating a separate table containing just EmployeeID and SalesOfficeID would make sense as this would support a many-to-many relationship.

  • Sriyan says:

    HI
    I have design below table(StudentResults) as 2nd normal form.Is it correct.

    Student(std_ID*,std_Name,std_address,std_race)
    Subject(subject_Id*,subject_Name)
    Exam(exam_Code*,exam_Name)

    StudentResults(exam_StdID*, exam_Code*, exam_Grade ,exam_subjectid*,exam_Marks)

    primary key is (exam_StdID*, exam_Code*, exam_Grade ,exam_subjectid)

    Thanks
    Sriyan

    • saravanakumar kandasamy says:

      Sriyan, according to your database, studentresults is in 2NF. Primary key need not be all the attributes. Primary key for your table is (exam_StdID*, exam_Code*, exam_subjectid*). exam_grade and exam_marks are non-key attributes. The non-key attributes are dependent on the primary key. Hence, your table in 2NF.

  • Albert says:

    SalesOffice and OfficeNumber aren’t dependent on the employee ID and therefore, they don’t belong in the same table as the employee information. So why are the customer’s city and postal code dependent on the customer?

    • I was thinking each customer has one location, but you could take it a step further and model many location for a customer. In that case you would want to separate the customer addresses from the customer record.

  • baiju says:

    Thanks for the simple & effective explanation. I have one query. I think in customer table customerID will be the primary key and EmployeeID which references to salesstaffinformation table. But in diagram EmployeeID is mentioned as primary key..

    • Hi! I’m glad you liked the article. I was looking at your questions regarding the Customer table’s primary key and couldn’t see the issue you describe.

      Can you give me another hint? :)

  • Raabb Ajam says:

    Hi, thank you for great step by step explanation. English is not my first language, so when I read the technical requirement for 1NF, 2NF, 3NF, it got confusing quickly. But your explanation is truly spy on. For example, when it says dependent, I never understand how it works. Same thing when it says identifying. Reading your article really clear those up.

  • >