How to Write SQL – Understand Database Table Meanings and Relationships – Step 1
The first step to write SQL is to understand the database tables meanings and relationships. This is important, as complex queries often span more than one table. Knowing the purpose of the table and how they relate to one another is key. The series starts with this introductory article.
All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks2012 database. You can get started using these free tools using my Guide Getting Started Using SQL Server.
Understand your Database Table’s Meanings and their Relationships to write SQL
Understanding your database is more than just knowing it is built with tables, views, and relationships. In order to write meaningful queries you need to understand how real world data was decoded and stored in the database.
These major components provide clues we can use to translate real-world requests into actionable database queries. From understanding what content is contained within a table to understanding how to relate one table to another, knowing the database’s basic structure is key to creating queries.
When constructing queries it is important to understand a table’s purpose or subject. Is the table used to organize employee data or a list of classes taken? In general you can think of most tables as covering a subject, such as employees or classes.
Before writing a query look over your database’s table names. In many cases the names reveal the main topic or subjects of the tables. If you are looking for employees, then chances are the table will be named something akin to “Employee.”
What are some of the main topics captured in this excerpt of tables from the Adventureworks2012 database?
Just by reading the names you can see this database contains information about people and their jobs.
When I confronted in creating a query, I first rely on the table names to get my bearings. Usually I’ll notice tables that have promising names. To confirm if I’m on the right track, I’ll inspect their definition and look at their columns.
You’ll also find related tables in the same manner. A record of Employee Salaries may be in a table called EmployeeSalaries or EmpSal. Keep in mind that some programmers really like to abbreviate.
Developers may follow a naming convention. Here are some naming conventions. Be on the lookout for them, as they can hint towards a table’s purpose:
- Tables containing information relating to the same topic usually start with that subject’s name. Employee, EmployeeDepartmentHistory, and EmployeePayHistory are excellent examples.
- Table Prefixes, such as “Log” in the table name ErrorLog provide clues to a table purpose. Some common prefixes are: log, index, type, and code.
- Tables containing two subjects, such as EmployeeDepartmentHistory, indicate a table could be a bridge table (See Keys and Relationships).
A table’s columns give you a lot of information. Hopefully the designer gave the column a readable name. If so, then it is pretty easy to understand each column’s purpose. Listed below is the Employee table. Can you tell which column is used to store the date a person was born?
Of course! It is the column entitled BirthDate.
The column’s data type also gives you hints. In this table you can see BirthDate is a date column and can’t be null. That means, the column won’t hold numbers or text, just dates, so we’ll be able to use more advanced logic like finding birthdays 30 days from now and that the column will always have a value.
Here are some great tips you can use to learn more about columns:
- If a column is not null, it will always have a value.
- If you don’t know what values are in a column, you can select distinct values from the column to get a quick overview of all values. For instance, you can quickly find all the possible job titles in the Employee database by running the query
SELECT DISTINCT JobTitle FROM HumanResources.Employee
- Knowing if a field contains numeric or character data can help infer meaning. If a field named “WaitTime” is of character data type, then you may expect to find values such as “Long,” “Normal,” “None;” whereas, if it is numeric then WaitTime most likely represents some quantity of time such as hours, minutes, or seconds.
- Pay Attention to Naming Conventions. Many programmers add suffixes to their column names.
- columnID – This column is used to identify a row in this table or another. It is potentially a primary or foreign key (see below).
- columnNo – This field is some sort of number, perhaps an account number (e.g. AccountNo)
- columnNum – Another variant for number.
- Columns are usually CamelCasedInSQL. This is done to avoid having placing spaces in the column name, yet make the columns readable. Compare “AccountNumber” to “Account Number.”
- columnFlag – This is programmer jargon for On and Off or Yes and No.
- columnGUID – GUID stands for Globally Unique ID.
Keys and Relationships
In our example we talked about finding all the departments that an employee has worked in. By inspecting the table names it seems logical that we would want to look at the Employee and Department tables, but how are they related?
This is where it makes sense to review the relevant table’s primary keys to understand what values are used to identify the tables and to see if you can use foreign keys from other tables to make a relation.
Typically column names in tables are named the same. If the primary key is HardwareID in one table, and you know it is related, then a good start is to look for HardwareID in another table.
Check out the employee and Department Tables. How are they related?
First look at the Department table and identify the primary key. You’ll quickly see that it is named DepartmentID. Now go to the Employee table and look to see if that column or one similarly named is listed? Do you see a DepartmentID or DeptID column listed? Nope!
To me that is a huge hint. We have two islands of information and need a bridge to get between them. This is a common problem with writing SQL. Often we find the tables that contain the end results, but fail to initially find those that bridge the gap. Further digging is needed.
In our case, we are lucky that there are other tables that have employee in their name. If you look at the above database list you’ll see there is one named EmployeeDepartmentHistory. Here is a picture of all three tables:
EmployeeDepartmentHistory is commonly called a junction or bridge table as it contains information from an intersection of both Employee and Departments. These types of tables used to model many to many relationships (e.g. Many Employees work in One Department, and One Employee could have work in Many Department over his career).
By reading this, I now understand the EmployeeDepartmentHistory table is going to contain employees and the departments they worked in.
Sometimes a database designer will define foreign key relationships. Foreign key relationships are used to ensure that a foreign key value exists as a primary key in another table. For instance in our database there is a foreign key relationship on EmployeeDepartmentHistory ensuring DepartmentID is only assigned values found in the Department table.
Foreign key relationships are mainly put in place to ensure data integrity, but we can also use them to confirm the database designer intentions. What tables did they mean to relate to one another?
Views are a fancy way of saying shortcut! When I‘m writing queries I always look for views to see whether I can use them in my queries. If a view doesn’t give you all the columns you need you have two choices:
- Use the view in a query and join to other table to get the column you want.
- Look at the view’s definition and then copy that code into your query.
Unless the view covers all the columns I need, I typically don’t use option one. Going that route can cause your code to become inefficient and hard to read. This is especially so when you have views that refer to views. The SQL quickly becomes a tangled skein.
I typically look at the view’s definition, which is easy to do and then use that code as a starting point for my own queries.
Below you can see how I opened up the view definition for vEmployeeDepartmentHistory
Here is the select statement from that view. As you can see, it provides great hints on how employees and department are related.
SELECT e.[BusinessEntityID], p.[Title], p.[FirstName], p.[MiddleName], p.[LastName], p.[Suffix], s.[Name] AS [Shift], d.[Name] AS [Department], d.[GroupName], edh.[StartDate], edh.[EndDate] FROM [HumanResources].[Employee] AS e INNER JOIN [Person].[Person] AS p ON p.[BusinessEntityID] = e.[BusinessEntityID] INNER JOIN [HumanResources].[EmployeeDepartmentHistory] AS edh ON e.[BusinessEntityID] = edh.[BusinessEntityID] INNER JOIN [HumanResources].[Department] AS d ON edh.[DepartmentID] = d.[DepartmentID] INNER JOIN [HumanResources].[Shift] AS s ON s.[ShiftID] = edh.[ShiftID]
From this view you can see Employee is related to Department via the EmployeeDepartmentHistory table. When I see relations I can use, I’ll just copy those portions of the join statement into my own SQL.
This is one article in a series of four that explain how to organize and write SQL queries. All four articles, including this one, are listed below:
- Introduction to Three Simple Steps to Writing SQL
- Step 1 – Understand your Database Table’s Meanings and Relationships
- Step 2 – Formulate your Question
- Step 3 – Write the Query