Introduction to Database Normalization
Database normalization is a process used to organize a database into tables and columns. The main idea with this is that a table should be about a specific topic and only supporting topics included. Take a spreadsheet containing the information as an example, where the data contains salespeople and customers serving several purposes:
- Identify salespeople in your organization
- List all customers your company calls upon to sell a product
- Identify which salespeople call on specific customers.
By limiting a table to one purpose you reduce the number of duplicate data contained within your database. This eliminates some issues stemming from database modifications.
To achieve these objectives, we’ll use some established rules. As you apply these rules, new tables are formed. The progression from unruly to optimized passes through several normal forms.
There are three normal forms most databases adhere to using. As tables satisfy each successive database normalization form, they become less prone to database modification anomalies and more focused toward a sole purpose or topic. Before we move on be sure you understand the definition of a database table.
Reasons for Database Normalization
There are three main reasons to normalize a database. The first is to minimize duplicate data, the second is to minimize or avoid data modification issues, and the third is to simplify queries.
As we go through the various states of normalization we’ll discuss how each form addresses these issues, but to start, let’s look at some data which hasn’t been normalized and discuss some potential pitfalls.
I think once you understand the issues, you better appreciate normalization. Consider the following table:

The first thing to notice is this table serves many purposes including:
- Identifying the organization’s salespeople
- Listing the sales offices and phone numbers
- Associating a salesperson with an sales office
- Showing each salesperson’s customers
As a DBA this raises a red flag. In general I like to see tables that have one purpose. Having the table serve many purposes introduces many of the challenges; namely, data duplication, data update issues, and increased effort to query data.
Data Duplication and Modification Anomalies
Notice that for each SalesPerson we have listed both the SalesOffice and OfficeNumber. There are duplicate salesperson data. Duplicated information presents two problems:
- It increases storage and decrease performance.
- It becomes more difficult to maintain data changes.
For example:
Consider if we move the Chicago office to Evanston, IL. To properly reflect this in our table, we need to update the entries for all the SalesPersons currently in Chicago. Our table is a small example, but you can see if it were larger, that potentially this could involve hundreds of updates.
These situations are modification anomalies. Database normalization fixes them. There are three modification anomalies that can occur:
Insert Anomaly
There are facts we cannot record until we know information for the entire row. In our example we cannot record a new sales office until we also know the sales person. Why? Because in order to create the record, we need provide a primary key. In our case this is the EmployeeID.
Update Anomaly
In this case we have the same information in several rows. For instance if the office number changes, then there are multiple updates that need to be made. If we don’t update all rows, then inconsistencies appear.
Deletion Anomaly

Deletion of a row causes removal of more than one set of facts. For instance, if John Hunt retires, then deleting that row cause us to lose information about the New York office.
Search and Sort Issues
The last reason we’ll consider is making it easier to search and sort your data. In the SalesStaff table if you want to search for a specific customer such as Ford, you would have to write a query like
SELECT SalesOffice FROM SalesStaff WHERE Customer1 = ‘Ford’ OR Customer2 = ‘Ford’ OR Customer3 = ‘Ford’
Clearly if the customer were somehow in one column our query would be simpler. Also, consider if you want to run a query and sort by customer.
Our current table makes this tough. You would have to use three separate UNION queries! You can eliminate or reduce these anomalies by separating the data into different tables. This puts the data into tables serving a single purpose.
The process to redesign the table is database normalization.
Definition of Database Normalization
There are three common forms of database normalization: 1st, 2nd, and 3rd normal form. They are also abbreviated as 1NF, 2NF, and 3NF respectively.
There are several additional forms, such as BCNF, but I consider those advanced, and not too necessary to learn in the beginning.
The forms are progressive, meaning that to qualify for 3rd normal form a table must first satisfy the rules for 2nd normal form, and 2nd normal form must adhere to those for 1st normal form. Before we discuss the various forms and rules in detail, let’s summarize the various forms:
- First Normal Form – The information is stored in a relational table with each column containing atomic values. There are no repeating groups of columns.
- Second Normal Form – The table is in first normal form and all the columns depend on the table’s primary key.
- Third Normal Form – the table is in second normal form and all of its columns are not transitively dependent on the primary key
If the rules don’t make too much sense, don’t worry. I’ve linked to article to help you understand them.
For now it’s important to understand there are three rules for database normalization that upon each other. Some people make database normalization seem complicated.
but it doesn’t have to be, and once you understand it, it becomes intuitive.
More tutorials are to follow! Remember! I want to remind you all that if you have other questions you want answered, then post a comment or tweet me. I’m here to help you. What other topics would you like to know more about?
Question on normalization in SQL relational dbs.
Let’s say I have an application where a given person can be included as a customer, a employee and/or a referrer .
In the customer table you have last_name, first_name, Middle_name, dob, gender and staff id as pk identity.
fk employee_id and fk referrer_id
employee table has name, dob, gender, employee_id as pk identity
referrer table has name, and referrer_id as pk identity
It seems like it might be better to use the person table with a column for employee_id and referrer_id that link to a table that doesn’t have repetitive information in it like names or DOB so that when you query the data you don’t get a weird name or somehow conflicting data. Duplication isn’t covered under normalization but it seems like a bad idea to have duplicated data in a relational database. What is the rule/practice for removing duplicated data from the db?
Hi Chris,
Your hunch is correct. It would be better to have a person table to house customers, employees, and referrers.
In this case normalization doesn’t directly speak towards having duplicate data in different tables. I think you could consider the table normalized, but of course it isn’t the design we’re looking for.
We can look to generalization and specialization concepts to guide our design. Specifically, the common attributes can be place in a generalized entity, which in our case, is a person.
The specialized attributes, are then placed in their own entities: customer, employee, and referrer.
This is how I would lay out the tables.
person: (personID [pk], last_name, first_name, middle_name, dob, gender)
customer: (customerID [pk], last_purchase_date)
• FK: customerID to person.personID
employee: (employeeID [pk], date_hired)
• FK: employeeID to person.personID
referrer: (referrerID [pk], referral_date)
• FK: referrerID to person.personID
Notes:
[pk] = primary key.
Database normalization is fascinating topic, although somewhat disconnected from query. A typical introductory database course teaches students what Relational Algebra and SQL are, and then switches gears and talks about normalization with no leverage of skills developed at the first half of the course!
Speaking of design with Customer1, Customer2, Customer3 attributes:
http://www.orafaq.com/usenet/comp.databases.theory/2001/05/17/0244.htm
Hi Vadmin,
Thanks for your comments and I’m glad you are finding my topics fascinating.
My target audience are people that are just starting to learn SQL. I’m trying to get across important principles using simple and easy to understand English. I’m trying to keep it non-technical, which is hard when you are dealing with a technical subject!
Regarding normalization, the reason I did it before exploring joins is that I thought it would be good to give my readers an appreciation of why their data is spread across so many tables and why they have to go through the hassle to learn how to stitch the data back together.
Hi Kris,
I think you’re right to talk about NFs right from the beginning. The NFs define the main principles on which relational databases are built and should be treated “with respect” :) As others have pointed out, there are times when it is possible and necessary to ignore some of the NFs, usually when building search tables, but in general, they are the most important thing to know about when learning about RDBMS.
Note, please note, that a slight denormalisation **CAN** be a freaking speedup for your system…
Lookup-tables tend to be logical-read hogs, even when they’re small. You’ll just need to lookup **a lot**!
So your reason number 1 for “Data Duplication and Modification Anomalies” is not a rule written in marble.
P.
Hi,
I agree that the rule shouldn’t be written in marble (good metaphor btw).
There are definitely times when I take a step back and don’t follow strict normalization. I’ve done this many times in the applications I’ve written over the years; however, I think in general normalization does help performance more than it hinders it.
Check out the end of my post on the third normal form. I talk about cases when normalization can get out of hand.
Where’s Get Ready to Learn SQL. 7? Thanks
Hi, You can find the 7th article here!
I want you to explain concurrency control topic in simple words
Hi Vishnu, I’ll add concurrency to my list of topics. It is a good one to cover. Be sure to subscribe to my email so you’ll know when it’s published.
Nicely done. As clear and succinct as one could ask for.
Thanks Dennis. I appreciate the compliment. If there are other DB topics you think are confusing, and you would like me to write about, please let me know. You can email me from my Contact page (above on the menu) or leave a reply here.
Thanks!
Thanks for the job well done Kris, I have read several blogs and Tech-journals none has been this clear and simplified. Thanks again.
Thanks for letting me know. I really appreciate that you like the articles. If there are other SQL topics you would like me to explore or explain, please let me know.
Hi Kris,
Great article series! It came in handy for a student like myself.
Hi – I’m glad you liked the series. :)
hi
the post is very good and it is very easy to understand thanks for it
if u have tutorials about jsf using visual web ice faces can u plzz send it
Hi,
I’m sorry but I don’t have anything about jsf. I do know tons of C#, so if you’re interested in seeing how c# and SQL can go together let me know.
I was a developer for many many years (since 1987). Luckily I didn’t have to learn COBOL…
Hello Kris,
I want to build the frontend using C# and backend sql , here the ask is to create a Database inventory.
Its very use full for me, thanks Manikandan
I’m glad you like the article. If there are other topics you are interested in learning about, please let me know.
special thanks to admin
very useful to tutorial.
You are welcome! I’m glad you like the article.
Thanks for another informative site. Where else may
I am getting that type of info written in such an ideal way?
I have a undertaking that I am just now running on, and I’ve been on the glance out for
such info.
Dear lord, thank you for a brilliant and easy to follow explanation. I’m taking a database course and their explanation of normalization makes it sound like quantum physics. Thanks 1000 times
You’re welcome! I’m glad it makes sense to you!
I’m an online student, trying to make sense of database design so I can make it work for me in the workplace. I just wanted to say thanks for the plain-talk on normalization. It helped me understand normalization before it was formally taught (assuming it will be); and I anticipate that to help me build a stronger relational structure, allowing me to actually build a project-database that I hope to deploy in the real world. Thanks
Hi Uriah,
I’m glad you like the articles.
Kris.
Hello Kris, thank you for your help in explaining normalization in a very SIMPLE manner however how will your database table for uml normalization fora bank be like.
Thank you.
Hi – check out this article. It should help!
https://www.essentialsql.com/data-modeling-principles-action/
I strongly disagree with you that BCNF is “too advanced”. It essentially states that if an attribute determines value for other attribute(s) it must be a candidate key. As such I find it easier to understand and explain to others than 2NF and 3NF. I usually put only very little attention to 3NF and spend the time on BCNF.
I also disagree in your first reason for normalization: duplicated data. It makes students belive that we should do a lot to save space (e.g. only storing 2 digits for the year which gave us the Y2K problem). It is the anamolies that is the sinners and as long we get rid of them (by normalizing) we also get rid of the duplicated data – so there is no need to fight that on its own.
Hi, The reason to avoid duplicated data isn’t to save space, its is to avoid update issues. Why write to disk more than you need to do so? As for BCNF, most enjoy the simplicity my articles bring to the topic.
For that, I’m less apt to tinker with the articles.
Hi Kris,
Thanks for simple yet powerful explanation of Normalization. Great help for learner’s like me.
Thanks
Pradeep
My Pleasure! Thanks for reading the article!
Well explained. Your other articles are already in my list :)
This was a great article, but how do I subscribe?
This was hugely helpful. I’ve been reading blogs and articles about this topic, but struggling to understand why the normalization rules are actually important and how they affect the database long term so I can make my own educated decision — thanks for this!
Hi Marie,
I’m glad you found the article useful. Spread the word! Let everyone know about EssentialSQL! :)
I strive to help make SQL easy and understandable to all.
I’m so glad I was able to help you get normalization. :)
This is really helpful to me, thank you very much. What part do I focus to be a new dba, Currently I am looking a job
Hi,
There are some core skills you should know an master in order to become a DBA. Top among them are being comfortable backingup and restoring databases.
I have several resources you can review to help. Here are some DBA articles I’ve tagged.
Let me know if you have further questions.
Thank you for the simple terms, I could share this with the customers :)
Since you were asking what other topics readers were interested in, how about eliminating these parameter tables in the first place? :)
Regards
Tansu
Thank You for this info! I am finding myself struggling with an on-line course :). I really want to learn this topic, but unfortunately my primary instructor had an emergency and one who took over is not very helpful, so your resource is a life saver! So I have a question, is I start with Person ( employee or customer ( inheritance)), then employees belong to different departments… would it be another step down inheritance or how is the best to tie it all up to get a third normal form?
does 3rd normal form resolve modification anomalies?
Mostly, but it is not guaranteed. That is why we sometimes need BCNF.
I need to normalize these sets of attributes to 3NF
Owner #, Owner Name, Owner Phone, Pet # Pet Name, Walk #, Walk Date,Duration,Paid, Walk Notes
While multiple pets can be walked at the same time, it is recorded as a separate walk for each pet so, that details of duration, payment and notes can be recorded separately.
Normalise this table to the third normal form, clearly showing the stages of 0NF, 1NF, 2NF and 3NF.
Hi Bradley,
I would recommend joining the EssentialSQL learning group and asking your question there!
https://www.facebook.com/groups/EssentialSQLLearning/
I compare two tables and take the results into one table. But issue is, in the result table the entries are repeated. I think the issue is a one table which we have taken to compare is not a fixed table. It is automatically filled with a RFID tag reader. So, can I solve this repeating issue in my resultant table from normalization methods???
It sounds like your have a one to many join. That is something that happens when you normalize. Without knowing the table structures you’re comparing it is difficult to know the exact issue. I would suggest looking to see what is the join condition you’re using in the compare.
Also, when you say you take the result into one table, are you combing columns from the two tables’ rows into one row (desired result) or are you trying to just compare to see if they are different and only keep those that don’t compare?
Question: Which of the following is not true about normalization?
a.) produces relations with a single theme
b.) may create referential integrity constraints
c.) reduces the number of functional dependencies in the schema
d.) reduces anomalies
e.) splits a poorly structured relation into two or more well-structured relations
c.) reduces the number of functional dependencies in the schema
Hello Kris, thanks for the tutorial. Could you post some examples of normalization, preferably complex ones from the real-world?
Yes, will share soon
Please share the normalized form of above sales person table
really clear – well done
All I can say is thank you, Kris. You have just elaborated and explained this concept in a way so easy to understand that i will use this explanation for my Jr. DBA and my students at “Valencia College”. I have 30+ yrs of experience in this field with a Master degree in “Information Systems” and humbly I express my respect and admiration to you as a professional and a colleague. Thanks again Kris and I encourage you to keep writing publications about Databases and their objects.
Juan Galarza – DBA at Orange County Tax Collector.
This makes my day! It is my pleasure to your students learn normalization. I appreciate the kind comments.
Kris.
Thank you for the clear explanations. I’m a total novice trying to understand databases and design to improve things at my new job. Your site is my new school! =D