Database Normalization – in Easy to Understand English

·

·

Database normalization is a process used to organize a database into tables and columns. There are three main forms: first normal form , second normal form, and third normal form. The main idea is each table should be about a specific topic and only supporting topics included. Take a spreadsheet containing the information as an example, where the data contains salespeople and customers serving several purposes:

  • Identify salespeople in your organization
  • List all customers your company calls upon to sell a product
  • Identify which salespeople call on specific customers.

By limiting a table to one purpose you reduce the number of duplicate data contained within your database. This eliminates some issues stemming from database modifications.

To achieve these objectives, we’ll use some established rules. This is called database normalization. As you apply these rules, new tables are formed. The progression from unruly to optimized passes through several normal forms: first, second, and third normal form.

As tables satisfy each successive database normalization form, they become less prone to database modification anomalies and more focused toward a sole purpose or topic. Before we move on be sure you understand the definition of a database table.

Reasons for Database Normalization

There are three main reasons to normalize a database.  The first is to minimize duplicate data, the second is to minimize or avoid data modification issues, and the third is to simplify queries. 

As we go through the various states of normalization we’ll discuss how each form addresses these issues, but to start, let’s look at some data which hasn’t been normalized and discuss some potential pitfalls. 

I think once you understand the issues, you better appreciate normalization. Consider the following table:

unnormalized data
Note: The primary key columns are underlined

The first thing to notice is this table serves many purposes including:

  1. Identifying the organization’s salespeople
  2. Listing the sales offices and phone numbers
  3. Associating a salesperson with an sales office
  4. Showing each salesperson’s customers

As a DBA this raises a red flag.  In general I like to see tables that have one purpose.  Having the table serve many purposes introduces many of the challenges; namely, data duplication, data update issues, and increased effort to query data.

Data Duplication and Modification Anomalies

Data Duplication and Modification Anomalies

Notice that for each SalesPerson we have listed both the SalesOffice and OfficeNumber. There are duplicate salesperson data. Duplicated information presents two problems:

  1. It increases storage and decrease performance.
  2. It becomes more difficult to maintain data changes.

For example:

Consider if we move the Chicago office to Evanston, IL. To properly reflect this in our table, we need to update the entries for all the SalesPersons currently in Chicago.  Our table is a small example, but you can see if it were larger, that potentially this could involve hundreds of updates.

These situations are modification anomalies. Database normalization fixes them. There are three modification anomalies that can occur:

Insert Anomaly

Insert Anomaly addressed with Database Normalization
Insert Anomaly

There are facts we cannot record until we know information for the entire row.  In our example we cannot record a new sales office until we also know the sales person. 

Why?  Because in order to create the record, we need provide a primary key.  In our case this is the EmployeeID.

Update Anomaly

Update Anomaly addressed with Database Normalization
Update Anomaly

In this case we have the same information in several rows.

For instance if the office number changes, then there are multiple updates that need to be made.  If we don’t update all rows, then inconsistencies appear.

Deletion Anomaly

Deletion Anomaly addressed with Database Normalization
Deletion Anomaly

Removal of a row causes removal of more than one set of facts.  For instance, if John Hunt retires, then deleting that row cause us to lose information about the New York office.

Database Normalization Helps with Search and Sort Issues

The last reason we’ll consider is making it easier to search and sort your data.  In the SalesStaff table if you want to search for a specific customer such as Ford, you would have to write a query like

SELECT SalesOffice
FROM SalesStaff
WHERE Customer1 = ‘Ford’ OR
      Customer2 = ‘Ford’ OR
      Customer3 = ‘Ford’

Clearly if the customer were somehow in one column our query would be simpler.  Also, consider if you want to run a query and sort by customer. 

Search and Sort Issues addressed with Database Normalization
Search and Sort Issues

Our current table makes this tough. You would have to use three separate UNION queries! You can eliminate or reduce these anomalies by separating the data into different tables. This puts the data into tables serving a single purpose.

The process to redesign the table is database normalization.

Database Normalization Definition

There are three common forms of database normalization: 1st, 2nd, and 3rd normal form. They are also abbreviated as 1NF, 2NF, and 3NF respectively. 

There are several additional forms, such as BCNF, but I consider those advanced, and not too necessary to learn in the beginning.

The forms are progressive, meaning that to qualify for 3rd normal form a table must first satisfy the rules for 2nd normal form, and 2nd normal form must adhere to those for 1st normal form. Before we discuss the various forms and rules in detail, let’s summarize the various forms:

  • First Normal Form – The information is stored in a relational table with each column containing atomic values. There are no repeating groups of columns.
  • Second Normal Form – The table is in first normal form and all the columns depend on the table’s primary key.
  • Third Normal Form – the table is in second normal form and all of its columns are not transitively dependent on the primary key

If the rules don’t make too much sense, don’t worry. we take a deep dive into them here!

Our Sample Data for Normalization

Before we go too much further let’s look at the sample table we’ll use to demonstrate database normalization.

Before we get into the definition see if you can find some potential problems with this table’s setup. At a glance, here is what worries me:

  • There are several “topics” covered in one table: salespeople, offices, and customers.
  • The Customers are repeated as columns.
  • The customer addresses are within on text field.

Let’s learn more about the first normal form of database normalization to see what we can do to make our table better.

First Normal Form (1NF) Database Normalization

The first step to constructing the right SQL table is to ensure that the information is in its first normal form. When a table is in its first normal form, searching, filtering, and sorting information is easier.

Issue with Unnormalized Data addressed with First Normal Form
Issue with Unnormalized Data

The rules to satisfy the 1st normal form are:

  • When the data is in a database table.  The table stores information in rows and columns where one or more columns, called the primary key, uniquely identify each row.
  • Each column has atomic values and should be not repeating groups of columns.

Tables cannot have sub-columns in the first normal form. That is, you cannot list multiple cities in one column and separate them with a semi-colon.

Atomic Values

When a value is atomic, we don’t further subdivide the value.  For example, the value “Chicago” is atomic; whereas “Chicago; Los Angeles; New York” is not.

Related to this requirement is the concept that a table should not have repeating groups of columns such as Customer1Name, Customer2Name, and Customer3Name.

Read More: Primary Key vs Foreign Key – Data Modeling Tips >>

First Normal Form Case Study

Let’s take a look at our sample data. Notice there are two “topics” covered within the table: sales staff and related information, and customer.

Placing Data into First Normal Form
Placing Data into First Normal Form

Let’s use those two topics as our basis to design our initial tables. If left as-is, and we don’t appply database normalization rules we end up with these issue:

Design Issues Addressed with First Normal Form
Design Issues Addressed with First Normal Form

Check out the example below.  Here you see we removed the offending repeating column name groups.  We replaced them with a new table to house the one or more customers.

First Normal Form Design
Tables in First Normal Form

The repeated column groups in the Customer Table are now linked to the EmployeeID Foreign Key.   As described in the Data Modeling lesson, a foreign key is a value that matches the primary key of another table.

In this case, the customer table holds the corresponding EmployeeID for the SalesStaffInformation row. Here is our data in the first normal form.

First Normal Form Example Data
1NF Table Data

First Normal Form (1NF) Improvements

This design is superior to our original table in several ways:

  • The original design limited each SalesStaffInformation entry to three customers.  In the new design, the number of customers associated to each design is practically unlimited.
  • The Customer, which is our original data, is nearly impossible to sort. You could, if you used the UNION statement, but it would be cumbersome.  Now, it is simple to sort customers.
  • The same holds true for filtering on the customer table.  It is much easier to filter on one customer name related column than three.
  • The design eliminates the insert and deletion anomalies for Customer.  You can remove all the customers for a SalesPerson without having to remove the entire SalesStaffInformation row.

Modification anomalies are still in both tables, but these are fixed once we reorganize them as 2nd normal form.

Second Normal Form (2NF) Database Normalization

Now it is time to look at the second normal form.  I like to think the reason we place tables in 2nd normal form is to narrow them to a single purpose.  Doing so brings clarity to the database design, makes it easier for us to describe and use a table, and tends to remove modification anomalies.

This stems from the primary key identifying the main topic at hand, such as identifying buildings, employees, or classes, and the columns, serving to add meaning through descriptive attributes.

An EmployeeID isn’t much on its own, but add a name, height, hair color and age, and now you’re starting to describe a real person.

So what is the definition of 2nd normal form?

Second Normal Form (2NF) Definition

A table is in 2nd Normal Form if:

  • The table is in 1st normal form, and
  • All the non-key columns are dependent on the table’s primary key.

We already know about the 1st normal form, but what about the second requirement?  Let me try to explain.

Second Normal Form Question To Ask About Table Column
Does the column fit within the table’s purpose?

The primary key provides a means to uniquely identify each row in a table. When we talk about columns depending on the primary key, we mean, that to find a particular value, such as what color is Kris’ hair, you would first have to know the primary key, such as an EmployeeID, to look up the answer.

Key Question to Ask Yourself

Once you identify a table’s purpose, then look at each of the table’s columns and ask yourself, “Does this column serve to describe what the primary key identifies?”

Decide Whether to move column to another table per second normal form rule.
Should I stay or should I go now?
  • If you answer “yes,” then the column is dependent on the primary key and belongs in the table.
  • If you answer “no,” then you should move the column to a different table.

When all the columns relate to the primary key, they naturally share a common purpose, such as describing an employee.  That is why I say that when a table is in second normal form, it has a single purpose, such as storing employee information.

Issues with our Example Data Model

So far, we have taken our example to the first normal form, and it has several issues.

The first issue is that there are several attributes which don’t completely rely on the entire Customer table primary key.

For a given customer, it doesn’t make sense that you should have to know both the CustomerID and EmployeeID to find the customer.

Issues address with Second Normal Form

It stands to reason you should only need to know the CustomerID.  Given this, the Customer table isn’t in 2nd normal form as there are columns that aren’t dependent on the full primary key.  We should move them to another table.

The second issue is the SalesStaffInformation table has two columns which aren’t dependent on the EmployeeID.  Though the SalesOffice and OfficeNumber columns describe which office the SalesPerson is based out of, they don’t serve to describe the employee.

You can see these issues outline in red below.

First Normal Form Design Issues Addressed with Second Normal Form
First Normal Form Design Issues addressed with Second Normal Form

Fix the Model to Second Normal Form (2NF) Standards

Since the columns outlined in red aren’t completely dependent on the table’s primary key, it stands to reason they belong elsewhere.  In both cases, move the columns to new tables.

In the case of SalesOffice and OfficeNumber, let’s create the SalesOffice table.  I added a foreign key to SalesStaffInformaiton so we can still describe in which office a salesperson is based.

The changes to make Customer a second normal form table are a little trickier.  Rather than move the offending columns CustomerName, CustomerCity, and CustomerPostalCode to new table, recognize that the issue is EmployeeID!

The three columns don’t depend on this part of the key.  Really this table is trying to serve two purposes:

  1. To show which customers each employee calls to make sales.
  2. To find customers and their locations.

For the moment remove EmployeeID from the table.  Now the table’s purpose is clear, it is to find and describe each customer.

Move Sample Data to Second Normal Form

Now let’s create a table named SalesStaffCustomer to describe which customers a salesperson calls upon.  This table has two columns CustomerID and EmployeeID.  Together, they form a primary key.  Separately, they are foreign keys to the Customer and SalesStaffInformation tables, respectively.

The data model is show below in second normal form.

Second Normal Form Design
Tables in Second Normal Form

To better visualize this, here are the tables with data.

Customer Table in Second Normal From
Customer Table in Second Normal From

Second Normal Form Benefits

As you review the data in the tables notice moving the data to second normal form mostly removed the redundancy.  Also, see if you can find any update, insert, or deletion anomalies.  Those too are gone.  You can now remove all the salespeople yet keep customer records.  Also, if all the SalesOffices close, it doesn’t mean you have to delete the records containing salespeople.

Example data in Second Normal Form
Intersection Table supporting Normalization

The SalesStaffCustomer table is a strange one.  It’s just all keys!  Database architects call this an intersection table.   An intersection table is useful when you need to model a many-to-many relationship.

Each column is a foreign key.  If you look at the data model, you’ll notice that there is a one-to-many relationship to this table from SalesStaffInformation and another from Customer.  In effect the table allows you to bridge the two tables together.

Sales Staff Information in Second Normal Form

For all practical purposes this is a workable database.  Three out of the four tables are even in third normal form, but there is one table which still has a minor issue, preventing it from being so.

Once a table is in second normal form, the design guarantees that every column is dependent on the primary key, or as I like to say, the table serves a single purpose.  But what about relationships among the columns?  Could there be dependencies between columns that could cause an inconsistency?

A table having both columns for an employee’s age and birth date is spelling trouble, there lurks an opportunity for data inconsistency!

How are these addressed?  By the third normal form.

Third Normal Form (3NF) Database Normalization

A table is in third normal form if:

  • A table is in 2nd normal form.
  • It contains only columns that are non-transitively dependent on the primary key

Wow!  That’s a mouthful.  What does non-transitively dependent mean?  Let’s break it down.

Transitive

When something is transitive, then a meaning or relationship is the same in the middle as it is across the whole.  If it helps think of the prefix trans as meaning “across.”  When something is transitive, then if something applies from the beginning to the end, it also applies from the middle to the end.

Transitive Property for Third Normal Form

Since ten is greater than five, and five is greater than three, you can infer that ten is greater than three.

In this case, the greater than comparison is transitive.  In general, if A is greater than B, and B is greater than C, then it follows that A is greater than C.

If you’re having a hard time wrapping your head around “transitive” I think for our purpose it is safe to think “through”  as we’ll be reviewing to see how one column in a table may be related to others, through a second column.

Dependence

An object has a dependence on another object when it relies upon it.  In the case of databases, when we say that a column has a dependence on another column, we mean that the value can be derived from the other.  For example, my age is dependent on my birthday.  Dependence also plays an important role in the definition of the second normal form.

Transitive Dependence

Now let’s put the two words together to formulate a meaning for transitive dependence that we can understand and use for database columns.

Transitive Dependence as it applies to Third Normal Form

I think it is simplest to think of transitive dependence to mean a column’s value relies upon another column through a second intermediate column.

Consider three columns:  AuthorNationality, Author, and Book.  Column values for AuthorNationality and Author rely on the Book; once the book is known, you can find out the Author or AuthorNationality.  But also notice that the AuthorNationality relies upon Author.  That is, once you know the Author, you can determine their nationality.  In this sense then, the AuthorNationality relies upon Book, via Author.  This is a transitive dependence.

This can be generalized as being three columns:  A, B and PK.  If the value of A relies on PK, and B relies on PK, and A also relies on B, then you can say that A relies on PK though B.  That is A is transitively dependent on PK.

Transitive Dependence Examples

Let’s look at some examples to understand further.

Key (PK)Column AColumn BTransitive Dependence?
PersonIDFirstNameLastNameNo, In Western cultures a person’s last name is based on their father’s LastName, whereas their FirstName is given to them.
PersonIDBodyMassIndexIsOverweightYes,  BMI over 25 is considered overweight.It wouldn’t make sense to have the value IsOverweight be true when the BodyMassIndex was < 25.
PersonIDWeightSexNo:There is no direct link between the weight of a person and their sex.
VehicleIDModelManufacturerYes:Manufacturers make specific models.  For instance, Ford creates the Fiesta; whereas, Toyota manufacturers the Camry.

To be non-transitively dependent, then, means that all the columns are dependent on the primary key (a criteria for 2nd normal form) and no other columns in the table.

Issues with our Example Data Model

Let’s review what we have done so far with our database.  You’ll see that I’ve found one transitive dependency:

Transitive Dependency Issues addressed with Third Normal Form

CustomerCity relies on CustomerPostalCode which relies on CustomerID

Generally speaking a postal code applies to one city.  Although all the columns are dependent on the primary key, CustomerID, there is an opportunity for an update anomaly as you could update the CustomerPostalCode without making a corresponding update to the CustomerCity.

We’ve identified this issue in red.

Second Normal Form Design Issues address with Third Normal Form
Transitive Dependency

Fix the Model to Third Normal Form (3NF) Standards

In order for our model to be in third normal form, we need to remove the transitive dependencies.  As we stated our dependency is:

CustomerCity relies on CustomerPostalCode which relies on CustomerID

It is OK that CustomerPostalCode relies on CustomerID; however, we break 3NF by including CustomerCity in the table.  To fix this we’ll create a new table, PostalCode, which includes PostalCode as the primary key and City as its sole column.

The CustomerPostalCode remains in the customer table.  The CustomerPostalCode can then be designated a foreign key.  In this way, through the relation, the city and postal code is still known for each customer.  In addition, we’ve eliminated the update anomaly.

Third Normal Form Design
Tables in Third Normal Form

To better visualize this, here are the Customer and PostalCode tables with data.

Sample Data in Third Normal Form

Now each column in the customer table is dependent on the primary key.  Also, the columns don’t rely on one another for values.  Their only dependency is on the primary key.

Data in Third Normal Form

The same holds true for the PostalCode table.

At this point our data model fulfills the requirements for the third normal form.  For most practical purposes this is usually sufficient; however, there are cases where even further data model refinements can take place.  If you are curious to know about these advanced normalization forms, I would encourage you to read about BCNF (Boyce-Codd Normal Form) and more!

Conclusion – Can Database Normalization Get out of Hand?

Can database normalization be taken too far?  You bet!  There are times when it isn’t worth the time and effort to fully normalize a database.  In our example you could argue to keep the database in second normal form, that the CustomerCity to CustomerPostalCode dependency isn’t a deal breaker.

ref: Wikipedia

I think you should normalize if you feel that introducing update or insert anomalies can severely impact the accuracy or performance of your database application.  If not, then determine whether you can rely on the user to recognize and update the fields together.

There are times when you’ll intentionally denormalize data.  If you need to present summarized or complied data to a user, and that data is very time consuming or resource intensive to create, it may make sense to maintain this data separately.

One of My Experiences with Database Normalization

Several years ago I developed a large engineering change control system which, on the home page, showed each engineer’s the parts, issues, and tasks requiring their attention.  It was a database wide task list.   The task list was rebuilt on-the-fly in real-time using views.  Performance was fine for a couple of years, but as the user base grew, more and more DB resources were being spent to rebuild the list each time the user visited the home page.

I finally had to redesign the DB.  I replaced the view with a separate table that was initially populated with the view data and then maintained with code to avoid anomalies.   We needed to create complicated application code to ensure it was always up-to-date.

For the user experience it was worth it.  We traded off complexity in dealing with update anomalies for improved user experience.

Next Steps.

If you’re looking to learn more about database design concepts. I would recommend the following articles:

Join the newsletter

Subscribe to get our latest content by email.

Powered by ConvertKit
64 responses to “Database Normalization – in Easy to Understand English”
  1. Christian

    Perfect

  2. Ben

    I think that it would be better to have all attributes included from the beginning of the case study.
    – Customer Postcode and City fields were added while introducing 1NF.
    – City Population field is added in 3NF.
    This can be a bit confusing for some.

  3. […] Wezel, K. (2022). Database normalization – in easy-to-understand english. https://www.essentialsql.com/database-normalization/ […]

  4. […] Also, my friend Kris over at Essential SQL has a great tutorial on normalization where he discusses the principles of Third Normal Form. Definitely check it out. […]

  5. […] Also, my friend Kris over at Essential SQL has a great tutorial on normalization where he discusses the principles of Second Normal Form. Definitely check it out. […]

  6. […] My friend Kris over at Essential SQL has a great tutorial on normalization where he discusses the principles of First Normal Form. Definitely check it out. […]

  7. JS Smith

    Thank you for the clear explanations. I’m a total novice trying to understand databases and design to improve things at my new job. Your site is my new school! =D

  8. All I can say is thank you, Kris. You have just elaborated and explained this concept in a way so easy to understand that i will use this explanation for my Jr. DBA and my students at “Valencia College”. I have 30+ yrs of experience in this field with a Master degree in “Information Systems” and humbly I express my respect and admiration to you as a professional and a colleague. Thanks again Kris and I encourage you to keep writing publications about Databases and their objects.
    Juan Galarza – DBA at Orange County Tax Collector.

    1. This makes my day! It is my pleasure to your students learn normalization. I appreciate the kind comments.

      Kris.

  9. Seth Wheeler

    really clear – well done

  10. Jia

    Please share the normalized form of above sales person table

  11. Adam

    Hello Kris, thanks for the tutorial. Could you post some examples of normalization, preferably complex ones from the real-world?

    1. abc

      Yes, will share soon

  12. Runs

    Question: Which of the following is not true about normalization?
    a.) produces relations with a single theme
    b.) may create referential integrity constraints
    c.) reduces the number of functional dependencies in the schema
    d.) reduces anomalies
    e.) splits a poorly structured relation into two or more well-structured relations

    1. Ghalia Riaz

      c.) reduces the number of functional dependencies in the schema

  13. Anja Des

    I compare two tables and take the results into one table. But issue is, in the result table the entries are repeated. I think the issue is a one table which we have taken to compare is not a fixed table. It is automatically filled with a RFID tag reader. So, can I solve this repeating issue in my resultant table from normalization methods???

    1. It sounds like your have a one to many join. That is something that happens when you normalize. Without knowing the table structures you’re comparing it is difficult to know the exact issue. I would suggest looking to see what is the join condition you’re using in the compare.

      Also, when you say you take the result into one table, are you combing columns from the two tables’ rows into one row (desired result) or are you trying to just compare to see if they are different and only keep those that don’t compare?

  14. Bradley P

    I need to normalize these sets of attributes to 3NF

    Owner #, Owner Name, Owner Phone, Pet # Pet Name, Walk #, Walk Date,Duration,Paid, Walk Notes

    While multiple pets can be walked at the same time, it is recorded as a separate walk for each pet so, that details of duration, payment and notes can be recorded separately.

    Normalise this table to the third normal form, clearly showing the stages of 0NF, 1NF, 2NF and 3NF.

    1. Hi Bradley,

      I would recommend joining the EssentialSQL learning group and asking your question there!

      https://www.facebook.com/groups/EssentialSQLLearning/

  15. Yohi

    does 3rd normal form resolve modification anomalies?

    1. Mostly, but it is not guaranteed. That is why we sometimes need BCNF.

  16. OKSANA WILLIAMS

    Thank You for this info! I am finding myself struggling with an on-line course :). I really want to learn this topic, but unfortunately my primary instructor had an emergency and one who took over is not very helpful, so your resource is a life saver! So I have a question, is I start with Person ( employee or customer ( inheritance)), then employees belong to different departments… would it be another step down inheritance or how is the best to tie it all up to get a third normal form?

  17. Tansu Aksu

    Thank you for the simple terms, I could share this with the customers :)

    Since you were asking what other topics readers were interested in, how about eliminating these parameter tables in the first place? :)

    Regards
    Tansu

  18. Mary

    This is really helpful to me, thank you very much. What part do I focus to be a new dba, Currently I am looking a job

    1. Hi,

      There are some core skills you should know an master in order to become a DBA. Top among them are being comfortable backingup and restoring databases.

      I have several resources you can review to help. Here are some DBA articles I’ve tagged.

      Let me know if you have further questions.

  19. Marie

    This was hugely helpful. I’ve been reading blogs and articles about this topic, but struggling to understand why the normalization rules are actually important and how they affect the database long term so I can make my own educated decision — thanks for this!

    1. Hi Marie,
      I’m glad you found the article useful. Spread the word! Let everyone know about EssentialSQL! :)

      I strive to help make SQL easy and understandable to all.

      I’m so glad I was able to help you get normalization. :)

  20. Kara

    This was a great article, but how do I subscribe?

  21. Well explained. Your other articles are already in my list :)

  22. Pradeep

    Hi Kris,
    Thanks for simple yet powerful explanation of Normalization. Great help for learner’s like me.
    Thanks
    Pradeep

    1. My Pleasure! Thanks for reading the article!

  23. Allan Helboe

    I strongly disagree with you that BCNF is “too advanced”. It essentially states that if an attribute determines value for other attribute(s) it must be a candidate key. As such I find it easier to understand and explain to others than 2NF and 3NF. I usually put only very little attention to 3NF and spend the time on BCNF.

    I also disagree in your first reason for normalization: duplicated data. It makes students belive that we should do a lot to save space (e.g. only storing 2 digits for the year which gave us the Y2K problem). It is the anamolies that is the sinners and as long we get rid of them (by normalizing) we also get rid of the duplicated data – so there is no need to fight that on its own.

    1. Hi, The reason to avoid duplicated data isn’t to save space, its is to avoid update issues. Why write to disk more than you need to do so? As for BCNF, most enjoy the simplicity my articles bring to the topic.

      For that, I’m less apt to tinker with the articles.

  24. Audrey Tetteh-Wayoe

    Hello Kris, thank you for your help in explaining normalization in a very SIMPLE manner however how will your database table for uml normalization fora bank be like.

    Thank you.

    1. Hi – check out this article. It should help!

      https://www.essentialsql.com/data-modeling-principles-action/

  25. Uriah Massey

    I’m an online student, trying to make sense of database design so I can make it work for me in the workplace. I just wanted to say thanks for the plain-talk on normalization. It helped me understand normalization before it was formally taught (assuming it will be); and I anticipate that to help me build a stronger relational structure, allowing me to actually build a project-database that I hope to deploy in the real world. Thanks

    1. Hi Uriah,

      I’m glad you like the articles.

      Kris.

  26. Paul

    Dear lord, thank you for a brilliant and easy to follow explanation. I’m taking a database course and their explanation of normalization makes it sound like quantum physics. Thanks 1000 times

    1. You’re welcome! I’m glad it makes sense to you!

  27. Thanks for another informative site. Where else may
    I am getting that type of info written in such an ideal way?
    I have a undertaking that I am just now running on, and I’ve been on the glance out for
    such info.

  28. special thanks to admin
    very useful to tutorial.

    1. You are welcome! I’m glad you like the article.

  29. Its very use full for me, thanks Manikandan

    1. I’m glad you like the article. If there are other topics you are interested in learning about, please let me know.

  30. sachin

    hi
    the post is very good and it is very easy to understand thanks for it
    if u have tutorials about jsf using visual web ice faces can u plzz send it

    1. Hi,
      I’m sorry but I don’t have anything about jsf. I do know tons of C#, so if you’re interested in seeing how c# and SQL can go together let me know.

      I was a developer for many many years (since 1987). Luckily I didn’t have to learn COBOL…

      1. Guddu

        Hello Kris,
        I want to build the frontend using C# and backend sql , here the ask is to create a Database inventory.

  31. Vuroborox

    Hi Kris,

    Great article series! It came in handy for a student like myself.

    1. Hi – I’m glad you liked the series. :)

  32. Newton Makeni

    Thanks for the job well done Kris, I have read several blogs and Tech-journals none has been this clear and simplified. Thanks again.

    1. Thanks for letting me know. I really appreciate that you like the articles. If there are other SQL topics you would like me to explore or explain, please let me know.

  33. dennis morgan

    Nicely done. As clear and succinct as one could ask for.

    1. Thanks Dennis. I appreciate the compliment. If there are other DB topics you think are confusing, and you would like me to write about, please let me know. You can email me from my Contact page (above on the menu) or leave a reply here.

      Thanks!

  34. I want you to explain concurrency control topic in simple words

    1. Hi Vishnu, I’ll add concurrency to my list of topics. It is a good one to cover. Be sure to subscribe to my email so you’ll know when it’s published.

  35. YWong

    Where’s Get Ready to Learn SQL. 7? Thanks

    1. Hi, You can find the 7th article here!

  36. Polo

    Note, please note, that a slight denormalisation **CAN** be a freaking speedup for your system…
    Lookup-tables tend to be logical-read hogs, even when they’re small. You’ll just need to lookup **a lot**!

    So your reason number 1 for “Data Duplication and Modification Anomalies” is not a rule written in marble.

    P.

    1. Hi,

      I agree that the rule shouldn’t be written in marble (good metaphor btw).

      There are definitely times when I take a step back and don’t follow strict normalization. I’ve done this many times in the applications I’ve written over the years; however, I think in general normalization does help performance more than it hinders it.

      Check out the end of my post on the third normal form. I talk about cases when normalization can get out of hand.

  37. Database normalization is fascinating topic, although somewhat disconnected from query. A typical introductory database course teaches students what Relational Algebra and SQL are, and then switches gears and talks about normalization with no leverage of skills developed at the first half of the course!

    Speaking of design with Customer1, Customer2, Customer3 attributes:
    http://www.orafaq.com/usenet/comp.databases.theory/2001/05/17/0244.htm

    1. Hi Vadmin,

      Thanks for your comments and I’m glad you are finding my topics fascinating.

      My target audience are people that are just starting to learn SQL. I’m trying to get across important principles using simple and easy to understand English. I’m trying to keep it non-technical, which is hard when you are dealing with a technical subject!

      Regarding normalization, the reason I did it before exploring joins is that I thought it would be good to give my readers an appreciation of why their data is spread across so many tables and why they have to go through the hassle to learn how to stitch the data back together.

      1. Werner

        Hi Kris,
        I think you’re right to talk about NFs right from the beginning. The NFs define the main principles on which relational databases are built and should be treated “with respect” :) As others have pointed out, there are times when it is possible and necessary to ignore some of the NFs, usually when building search tables, but in general, they are the most important thing to know about when learning about RDBMS.

  38. Hi Chris,
    Your hunch is correct. It would be better to have a person table to house customers, employees, and referrers.
    In this case normalization doesn’t directly speak towards having duplicate data in different tables. I think you could consider the table normalized, but of course it isn’t the design we’re looking for.
    We can look to generalization and specialization concepts to guide our design. Specifically, the common attributes can be place in a generalized entity, which in our case, is a person.
    The specialized attributes, are then placed in their own entities: customer, employee, and referrer.

    This is how I would lay out the tables.
    person: (personID [pk], last_name, first_name, middle_name, dob, gender)

    customer: (customerID [pk], last_purchase_date)
    • FK: customerID to person.personID

    employee: (employeeID [pk], date_hired)
    • FK: employeeID to person.personID

    referrer: (referrerID [pk], referral_date)
    • FK: referrerID to person.personID

    Notes:
    [pk] = primary key.

  39. Chris Null

    Question on normalization in SQL relational dbs.

    Let’s say I have an application where a given person can be included as a customer, a employee and/or a referrer .

    In the customer table you have last_name, first_name, Middle_name, dob, gender and staff id as pk identity.
    fk employee_id and fk referrer_id

    employee table has name, dob, gender, employee_id as pk identity

    referrer table has name, and referrer_id as pk identity

    It seems like it might be better to use the person table with a column for employee_id and referrer_id that link to a table that doesn’t have repetitive information in it like names or DOB so that when you query the data you don’t get a weird name or somehow conflicting data. Duplication isn’t covered under normalization but it seems like a bad idea to have duplicated data in a relational database. What is the rule/practice for removing duplicated data from the db?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More from the blog


MySQL PostgreSQL SQLite SqlServer