SQL Joins are important to master. As you progress from a beginner to advanced beginner, you’ll soon need to combine data from more than one table. To do this, you’ll one of several SQL join types.
In this series of articles I’ll show you how to write a query that combines, or joins, data from more than one table. By going through the examples, you appreciate the problem and understand the basic join syntax.
This first article introduces the concept of joining tables. Its focus is on SQL join types, not their syntax.
In my prior articles you learned about the need to normalize to make it easier to maintain the data. Though this makes it easier to maintain and update the data, it makes it very inconvenient to view and report information.
Typically you need to cross reference, that is join, several tables to get the information you need!
Left in separate tables it’s tough to put it all together and understand what the data means.
Through the use of database joins we can stitch the data back together to make it easy for a person to use and understand.
So Why Combine Data with SQL Joins?
Before we begin let’s look into why you have to combine data in the first place. SQLite and other databases such as Microsoft SQL server and MySQL are relational databases. These types of databases make it really easy to create tables of data and a facility to relate (join or combine) the data together.
As requirements are cast into table designs, they are laid up against some best practices to minimize data quality issues. This process is called normalization and it helps each table achieve singular meaning and purpose. Each table achieves singular meaning an purpose though the database normalization process.
For instance, if I had a table containing all the students and their classes, then wanted to change a student’s name, I would have to change it multiple times, once for each class the student enrolled in.
A Normalized Database Is Not Human Readable
Normalizing, separate the data into a Student and Classes table. This makes it really easy to update the student name, but the price for this is that we have to piece the data back together to answer most of the questions we ask the database.
That is exactly why we need database joins.
Joins are used to stitch the database back together to make it easy to read and use. They match rows between tables. In most cases we’re matching a column value from one table with another.
Mechanics of SQL Joins
When broken down the mechanics of a SQL join are pretty straightforward. To perform a join you need two items: two tables and a join condition. The tables contain the rows to combine, and the join condition the instructions to match rows together.
Take a look at the following Venn diagram. The circles represent the tables and where they overlap rows satisfying the join condition.
You may be wondering what makes up a join condition. In many cases a join condition is just matching one or more fields in one table to those in another.
A common join pattern used to join the primary key of one table to its foreign key. I suppose this is self-evident, but I think it is important to highlight it. Can you think why this is the case?
Example Data Model
If you guessed normalization, you are correct. Through that process we break up dependencies within tables to eliminate update anomalies among other things, but in order to keep relationships, we introduce foreign keys.
Let’s take an example from the sample database. Consider the following data model involving the Employees and Orders table. In this model each employee can place zero or more orders.
The EmployeeID is the primary key in the Employees table and foreign key in the Orders table. For each employee there can be none, one, or perhaps many orders.
Here is a list of all the employees. To keep it simple I’m only showing the LastName.
In the sample database you could write the following statement to get these results:
SELECT EmployeeID, LastName FROM Employees
And here are the Orders.
You can see this data using this select statement
SELECT OrderID, EmployeeID, ShippedDate
FROM Orders
To create a report of employee LastName and the ShippedDate of the orders they placed, we need to combine information from both tables. To do we would create a join condition between the two tables on EmployeeID.
When we work with select statements involving more than one table, we need a way to keep really clear which field is from which table. To do this the columns is qualified by the table name. The format is:
tableName.Columnn
Using this convention, the join condition is
Employees.EmployeeID = Orders.EmployeeID
SQL Join Explained
Check the following diagram. We join the table together we are looking for rows where the EmployeeID matches.
So, for every order, where the EmployeeID = 4, the database will match to the Employees table and match to the corresponding row.
In this case that is the employee whose last name is “Baker.”
This is an inner join. Below is a sneak peak of the command, later on, in another article, we get into more details.
SELECT Employees.LastName,
Orders.ShippedDate
FROM Employees
INNER JOIN Orders
ON Employees.EmployeeID = Orders.EmployeeID
There are several type of Database joins we can use to combine tables together.
SQL Join By Two or More Columns
You can use SQL to join on two or more columns. The mechanics to do so are just the same as when you join on one column. The main difference is now, rather than using SQL to join by one column and match those values, you have it match to two or more columns.
SELECT
e.employee_id,
e.first_name,
e.last_name,
d.dept_name,
d.dept_location
FROM
employees e
JOIN
departments d ON e.dept_id = d.dept_id AND e.dept_location = d.dept_location;
In this example we join on both the department and location.
SQL Join Types
There are several SQL join types to consider all of which allow you to use SQL to join by one or more columns. In this section we cover the most popular. You’ll find that the each join type is similar to the other. What distinguishes them from one another, is how each handles the join condition: whether the join returns rows or columns values when the conditions fail to match
Let’s take a look at each one now.
Cross Joins
SQL Cross joins return all combinations of rows from each table. So, if you’re looking to find all combinations of size and color, you would use a cross join. Cross joins do not use join conditions. It pure combinatory joy.
Inner Joins
SQL Inner Joins return rows when the join condition is met. This is the most common SQL join. A common scenario is to join the primary key of once table to the foreign key of another.
This is used to perform “lookup,” such are to get the employee’s name from their employeeID.
Left Join
SQL Left Joins return all rows from the “left” table, and matching rows from the “right” table. If there are no matches in the right table, they return Null values for those columns. You should get to know left outer joins so you’re able to deal with non-matching key values.
Right Join
SQL Right Joins return all rows from the “right” table, and matching rows from the “left” table. If there are no matches in the left table, return NULL values for those columns.
Full Join
A SQL FULL JOIN returns all rows from an inner join, when no match is found, return nulls for that table.
SELF Join
A SQL Self Join is a join between the same table.
What’s next?
In the following articles we dig into the various join types, explore Database joins involving more than one table, and further explain join conditions, especially what can be done with non-equijoin conditions.
Also, I think it is important to understand what happens under the covers. So as part of this series, we’ll explore the impact joins have on database performance, and why it is important to understand whether indexes can help reduce query times.
Here are some common that come up, once you understand joins:
Leave a Reply