Database Joins are important to master. As you progress from a beginner to advanced beginner, you’ll soon need to combine data from more than one table. To do this, you’ll use the database joins.
In this series of articles I’ll show you how to write a query that combines, or joins, data from more than one table. By going through the examples, you appreciate the problem and understand the basic join syntax.
This first article introduces the concept of joining tables. Its focus is on the type of joins, not their syntax.
In my prior articles you learned about the need to normalize to make it easier to maintain the data. Though this makes it easier to maintain and update the data, it makes it very inconvenient to view and report information.
Typically you need to cross reference, that is join, several tables to get the information you need!
Left in separate tables it’s tough to put it all together and understand what the data means.
Through the use of database joins we can stitch the data back together to make it easy for a person to use and understand.
So Why Combine Data with Database Joins?
Before we begin let’s look into why you have to combine data in the first place. SQLite and other databases such as Microsoft SQL server and MySQL are relational databases. These types of databases make it really easy to create tables of data and a facility to relate (join or combine) the data together.
As requirements are cast into table designs, they are laid up against some best practices to minimize data quality issues. This process is called normalization and it helps each table achieve singular meaning and purpose.
For instance, if I had a table containing all the students and their classes, then wanted to change a student’s name, I would have to change it multiple times, once for each class the student enrolled in.
A Normalized Database Is Not Human Readable
Normalizing, separate the data into a Student and Classes table. This makes it really easy to update the student name, but the price for this is that we have to piece the data back together to answer most of the questions we ask the database.
That is exactly why we need database joins.
To stitch the database back together to make it easy to read and use database joins are used. They match rows between tables. In most cases we’re matching a column value from one table with another.
Mechanics of Database Joins
When broken down the mechanics of a join are pretty straightforward. To perform a join you need two items: two tables and a join condition. The tables contain the rows to combine, and the join condition the instructions to match rows together.
Take a look at the following Venn diagram. The circles represent the tables and where they overlap rows satisfying the join condition.
You may be wondering what makes up a join condition. In many cases a join condition is just matching one or more fields in one table to those in another.
Its fancy name is equi-join. Why? Because of the equals sign…
No so fancy after all!
Joins aren’t limited to exact matches, you’ll see in later articles where it is useful to use other comparison operators such as the greater than sign.
What process do we use to break up our data?
If you guessed normalization, you are correct. Through that process we break up dependencies within tables to eliminate update anomalies among other things, but in order to keep relationships, we introduce foreign keys.
Let’s take an example from the sample database. Consider the following data model involving the Employees and Orders table. In this model each employee can place zero or more orders.
The EmployeeID is the primary key in the Employees table and foreign key in the Orders table. For each employee there can be none, one, or perhaps many orders.
Here is a list of all the employees. To keep it simple I’m only showing the LastName.
In the sample database you could write the following statement to get these results:
SELECT EmployeeID, LastName FROM Employees
And here are the Orders.
You can see this data using this select statement
SELECT OrderID, EmployeeID, ShippedDate FROM Orders
To create a report of employee LastName and the ShippedDate of the orders they placed, we need to combine information from both tables. To do we would create a join condition between the two tables on EmployeeID.
When we work with select statements involving more than one table, we need a way to keep really clear which field is from which table. To do this the columns is qualified by the table name. The format is:
Using this convention, the join condition is
Employees.EmployeeID = Orders.EmployeeID
Check the following diagram. We join the table together we are looking for rows where the EmployeeID matches. So, for every order, where the EmployeeID = 4, the database will match to the Employees table and match to the corresponding row. In this case that is the employee whose last name is “Baker.”
This is called an inner join. Below is a sneak peak of the command, later on, in another article, we get into more details.
INNER JOIN Orders
ON Employees.EmployeeID = Orders.EmployeeID
There are several type of Database joins we can use to combine tables together.
Types of Database Joins
There are several Database joins to consider. In this section we cover the most popular. What distinguishes each join type from one another are the rows returned when a join condition is either met or not met.
Cross joins return all combinations of rows from each table. So, if you’re looking to find all combinations of size and color, you would use a cross join. Join conditions aren’t used with cross joins. It pure combinatory joy.
Inner joins return rows when the join condition is met. This is the most common Database join. A common scenario is to join the primary key of once table to the foreign key of another.
This is used to perform “lookup,” such are to get the employee’s name from their employeeID.
Outer joins return all the rows from one table, and if the join condition is met, columns from the other. They differ from an inner join, since an inner join wouldn’t include the non-matching rows in the final result.
Consider an order entry system. There may be cases where we want to list all employees regardless of whether they placed a customer order. In this case an outer join comes in handy.
When using an outer join all employees, even those not matching orders, are included in the result.
There are three types of outer joins: Left, Right, and Full outer joins.
- Left Outer Join – Return all rows from the “left” table, and matching rows from the “right” table. If there are no matches in the right table, return Null values for those columns.
- Right Outer Join – Return all rows from the “right” table, and matching rows from the “left” table. If there are no matches in the left table, return Null values for those columns.
- Full Join – Return all rows from an inner join, when no match is found, return nulls for that table.
In the following articles we dig into the various join types, explore Database joins involving more than one table, and further explain join conditions, especially what can be done with non-equijoin conditions.
Also, I think it is important to understand what happens under the covers. So as part of this series, we’ll explore the impact joins have on database performance, and why it is important to understand whether indexes can help reduce query times.
Here are some common that come up, once you understand joins: