Introduction to Database Joins
One of the biggest issues beginning SQL writers have is being able to write queries that use more than one table, that is to use database joins. In this series of articles we are going to show you how to write a query that combines, or joins, data from more than one table. Once you have gone through the examples you will understand how to write the basic commands to make this happen and why data is separated in the first place.
This first article introduces the concept of joining tables. The focus is going to be more on the type of joins, not necessarily their syntax. The later articles focus on the various types of joins. Through the narrative and examples you’ll become very comfortable with each one.
In my prior articles you learned about the need to normalize to make it easier to maintain the data. Though this makes it easier to maintain and update the data, it makes it very inconvenient to view and report information. Typically the information you need to see has to be cross referenced across several tables for you to see the full picture.
Left in separate tables it’s tough to put it all together and understand what the data means.
Through the use of joins we can stitch the data back together to make it easy for a person to use and understand.
So Why Combine Data?
Before we begin let’s look into why you have to combine data in the first place. SQLite and other databases such as Microsoft SQL server and MySQL are relational databases. These types of databases make it really easy to create tables of data and a facility to relate (join or combine) the data together.
When databases are developed, care is taken to ensure redundant data is minimized. The databases go through a process called normalization that help reduce each database table to a single meaning or purpose.
For instance, if I had a table containing all the students and their classes, then wanted to change a student’s name, I would have to change it multiple times, once for each class the student enrolled in.
Normalizing, separate the data into a Student and Classes table. This makes it really easy to update the student name, but the price for this is that we have to piece the data back together to answer most of the questions we ask the database.
That is exactly why we need joins.
Database joins are used to match rows between tables to allow us to stitch the database back together to make it easy to read and use. In most cases we’re matching a column value from one table with another.
Mechanics of a Join
When broken down the mechanics of a join are pretty straightforward. To perform a join you need two items: two tables and a join condition. The tables contain the rows to be combined, and the join condition the instructions to match rows together.
Take a look at the following Venn diagram. The circles represent the tables and where they overlap rows satisfying the join condition.
You may be wondering what makes up a join condition. In many cases a join condition is just matching one or more fields in one table to those in another. This type of join is sometimes called an equijoin since the only type of comparison used is the equals sign.
Joins aren’t limited to just exact matches such as the equijoin. You’ll see in later articles where it is useful to use other comparison operators such as the greater than sign.
A common join pattern you’ll come across is joining the primary key of one table to its foreign key. I suppose this is self-evident, but I think it is important to highlight it. Can you think why this is the case?
What process do we use to break up our data?
If you guessed normalization, you are correct. Through that process we break up dependencies within tables to eliminate update anomalies among other things, but in order to keep relationships, we introduce foreign keys.
Let’s take an example from the sample database. Consider the following data model involving the Employees and Orders table. In this model each employee can place zero or more orders.
The EmployeeID is the primary key in the Employees table and foreign key in the Orders table. For each employee there can be none, one, or perhaps many orders.
Here is a list of all the employees. To keep it simple only the LastName is shown.
In the sample database you could write the following statement to get these results:
SELECT EmployeeID, LastName FROM Employees
And here are the Orders.
You can see this data using this select statement
SELECT OrderID, EmployeeID, ShippedDate FROM Orders
To create a report of employee LastName and the ShippedDate of the orders they placed, we need to combine information from both tables. To do we would create a join condition between the two tables on EmployeeID.
When we work with select statements involving more than one table, we need a way to keep really clear which field is from which table. This is done by placing the table name in front of the column name. A period is used to separate the two.
Using this convention, the join condition used is
Employees.EmployeeID = Orders.EmployeeID
Check the following diagram. We join the table together we are looking for rows where the EmployeeID matches. So, for every order, where the EmployeeID = 4, the database will match to the Employees table and match to the corresponding row. In this case that is the employee whose last name is “Baker.”
This type of join is called an inner join and I’ll go through the explanation in detail in a later article, but if you are dying to try it now, here is the command!
SELECT Employees.LastName, Orders.ShippedDate FROM Employees INNER JOIN Orders ON Employees.EmployeeID = Orders.EmployeeID
There are several type of joins we can use to combine tables together. They are introduced in the next section.
Types of Database Joins
The joins we’ll cover in this section are characterized by which rows are returned when the join condition is met. We already know that an inner joins return rows from each table that meets the join table, but there are also joins that return rows from the tables even if the join condition isn’t met. These are called outer joins.
Cross joins return all combinations of rows from each table. So, if you’re looking to find all combinations of size and color, you would use a cross join. No join conditions are required.
Inner joins return rows where the join condition is met. This is the most common join. In its most common example, where a primary key is being matched to a foreign key, only combinations of rows matching the keys are returned.
Outer joins are used when, in addition to returning the results of the inner join, you want to return rows from one table that don’t match in another. In the example above, an employee can have placed zero or more orders. If we only use an inner join, then our results only include employees that have placed orders. Those that haven’t aren’t included in the results.
There may be cases where we want to list all employees regardless of whether they placed an order. In this case an outer join comes in handy. Regardless of the match with the Orders table, an employee record’s selected columns are listed in the results.
There are three types of outer joins: Left, Right, and Full outer joins.
- Left Outer Join – Return all rows from the “left” table, and matching rows from the “right” table. If there are no matches in the right table, return Null values for those columns.
- Right Outer Join – Return all rows from the “right” table, and matching rows from the “left” table. If there are no matches in the left table, return Null values for those columns.
- Full Join – Return all rows from an inner join, when no match is found, return nulls for that table.
In the following articles we dig into the various join types, explore joins involving more than one table, and further explain join conditions, especially what can be done with non-equijoin conditions.
Also, I think it is important to understand what happens under the covers. So as part of this series, we’ll explore the impact joins have on database performance, and why it is important to understand whether indexes can help reduce query times.