I recently put together a lesson on non-equi joins. It is part of my Join Together Now course. Click here get the course at a discount. A non-equi join can be used to solve some interesting query problems.
You can use a non-equi join to check for duplicate value or when you need to compare one value in a table falls within a range of values within another.
You can learn more about this this video, it’s transcript follows:
What is a Non-Equi Join in SQL Transcript
Welcome back. In this lesson, we’re going to learn about non-equi joins.
Before we learn about non-equi joins, let’s do a quick review on what an equi join is.
An equi join is a join that uses equality operators, so recall that a equality is just a join that uses the equal sign.
Some examples of equi joins are join conditions where we’re matching first name from one table to first name from another table, or, for example, where we’re matching the customer ID from the customer table with the customer ID from the order table. Equi join is just a fancy way of saying that we’re matching values equally. We’re using the equal sign in our join condition.
I think it follows with a non-equi join that we’re not using the equal sign. Non-equi joins are joins whose join conditions use conditional operators other than equals.
An example would be where we are matching first name and then last name, but we are checking where one field from a table does not equal field from another table.
That’s what makes it a non-equi join.
Even though in this join condition, we are matching column for column, first name in one table to first name in another, last name in one table to last name in another, we are making sure that the business entity ID in one table does not equal the business entity ID in another table.
Therefore, our join condition is considered a non-equi join. You may be wondering, why would we do this?
Well, there’s a couple of reasons why we would want to use non-equi joins. Some of the common uses include:
- checking for duplicate data between tables
- matching against a range of values
- computing running totals.
In this lesson, we’re going to go look at how you can check for duplicate data in a table. We’re also going to look at how you can match against a range of values, and then we’ll save the computing running totals example for the last lesson because that one’s a little more advanced.
Use Non-Equi Join to Check From Duplicate Values
In this example, we’re going to check for duplicate values. We’ll use the person table as our example.
We’ll be looking at the first name and last name. What we’ll be doing is checking the table to see if there are any rows that have the same first name and last name across the table. We know that the business entity ID is a primary key in the person table, so that is unique.
Given that and knowing that the first name and last name could potentially be duplicate, what we’ll do is we will join the table on first name and last name to itself, and that will start making combinations of rows, and if you can imagine, we’re going to start seeing where the table will start matching to itself and, naturally, we’re going to see where some rows are going to match right to themselves, and so from one person, copy the person table, we’re going get the same business entity ID from the other copy, but there will be those cases where the business entity ID does not equal itself, and that is where non-equi join comes in, and that’s how we’re going to find duplicates.
Let’s go look and see what that query looks like right now.
SELECT P1.FirstName, P1.LastName, P1.Title, P1.MiddleName, P2.MiddleName, P2.Title FROM Person.Person P1 INNER JOIN Person.Person P2 ON P1.FirstName = P2.FirstName AND P1.LastName = P2.LastName AND P1.BusinessEntityID <> P2.BusinessEntityID
Rather than type this query out, I just paste it in, and what I want to show you here is that we are first joining from the person table and we are doing an inner join right back onto the person table. We are going to join the person table on first name and last name.
We’re going to be looking for matching Kris to Kris in one table and Wenzel to Wenzel in the other, and we’re going to check to see if that primary key matches, and, in fact, we’re really checking to see if it does not match because if they don’t match, then we know we have a duplicate, because if they matched, what would that mean?
If they matched, that means it’s the same record.
Let’s decompose this query so you can understand a little more what’s going on behind the scenes.
The first thing I’m going to do is add in the primary key so that you understand really what we’re bringing back here in terms of the values from each row. Whoops. We’ll bring back P1 Business Entity ID and then we’ll do P2 Business Entity ID. You can see where, for the first result, it brings in the entity ID where for PI, it’s 859, and for the second, for P2, it’s five, and this is for Gail Erickson, and you can see where for P2 or for P1, the middle name is null, and for the second copy, the middle name is A.
One thing I want to do is show you if I was to comment out the non-equi join part, what the query would look like, and so what I’m going to do is just put P1 Entity ID just so we can see a little better what these columns are and run this.
SELECT P1.FirstName, P1.LastName, P1.Title, P1.MiddleName, P2.MiddleName, P2.Title FROM Person.Person P1 INNER JOIN Person.Person P2 ON P1.FirstName = P2.FirstName AND P1.LastName = P2.LastName --AND P1.BusinessEntityID <> P2.BusinessEntityID
Now, you can see a little more clearly where the non-duplicated values are also coming back. Here, we matched the names from both tables, and I’m actually getting the unique values, too.
Here’s Gigi Matthew. This is entity ID 9 matches to itself, 11, and so on. Here, you can see where we have a duplicate for Suchtra Mohan, and what I’m going to do now is un-comment out this non-equi join and show you how it will now return this row, but to do that, I’m going to also add in a [inaudible 00:07:33] clause just so we can bring back any first names that begin with S-U, so I’m going to un-comment out that non-equi join and run it, and you can see it brings back a lot of entries. We’re not seeing Suchtra here, so we need to refine our query here. Then, I’ll do P1.firstname, like, and then I will do … and then run this, and you’ll see where we have Suchtra Mohan now showing up as a duplicate. Here’s a way where you can identify duplicate data in your table.
Let’s go on to our next example.
Compare A Range of Values using a non-Equi Join
For this next example, we’re going to match a range of values. Let’s assume we want to list every salesperson that is within one million dollars of year-to-date sales of one another.
One possible solution is to get every combination of salespeople, compute the year-to-date sales from those combinations, look to see if they are within one million dollars of one another, and if so, then keep the record. We’ll see how this would work in SQL as a query.
Again, here’s a query that I brought in.
SELECT S1.BusinessEntityID ,S1.TerritoryID ,S1.SalesYTD ,S1.SalesLastYear ,S2.BusinessEntityID ,S2.SalesLastYear FROM Sales.SalesPerson S1 INNER JOIN Sales.SalesPerson S2 ON S1.SalesLastYear Between S2.SalesLastYear - 500000 AND S2.SalesLastYear + 500000
Rather than typing, I’m pasting in the example. It goes quicker. Let me clean it up a little bit here so that it’s easier to read. Let’s take a look at this query. This query is taking each salesperson and comparing the salesperson to one another.
The salesperson is being joined on sales from last year, and we are doing it by a range, so it’s going to look very strange here. We’re not using an equal sign or even a not equal comparison. We’re actually using the between comparison operator, and we are looking for any salesperson who has their year-to-date sales from last year that are either within one million dollars of our sales.
If my sales from last year were, let’s say, $900,000, then I’m looking for another salesperson who sales were either $900,000 minus $500,000, so either $400,000 or $900,000 plus $500,000. I guess that would be $1,000, $400,000, within that range of sales, so from $400,000 to $1,400,000 in sales. If there’s any salesperson in that range, then I want to include them in my result, and we do that for each salesperson based on their sales from last year, and then we’ll compile a list.
What this list is going to show us is our sales and then the person’s sales from last year. You may be asking yourself why you would want to put a query like this together, and it could be that the sales manager is looking to find salespeople that had sales close to another salesperson, and he is looking to either team them up or set up a friendly competition between them, and this is one way to scan the whole database and find those pairings. Let’s run this query and see what we find.
You’ll see where I didn’t bring in names of salespeople just to keep it simple, but I have their ID. You can see where it’s bringing in their ID and then their sales year-to-date, and the person’s sales from last year. One thing that’s interesting is, do you notice how it’s bringing in the person again? We need to fix that because it really shouldn’t be bringing in the same person, so we can add an and … We can say, “The salesperson’s business entity ID does not equal sales to that business entity ID because we don’t really want to compare the same salesperson to themselves. Now, let’s run it.
The other silly thing is that we have some people that didn’t have sales last year. That may or may not be something that you would find it to be an issue, but what we could do is winnow those out as, say … We could make those zero. Actually, we’ll just say greater than zero.”
Now, we have the salesperson and the sales associates that have sales that are within a million dollars of their sales from last year. Salesperson 275 had sales and then salesperson 276 was a million dollars, and so on. It looks like there’s 106, some records that match this type of criteria. One thing we could do, if we wanted to make this even a little more restrictive is say that they have to be in the same territory. We could say, “And S1.territoryID = S2.territoryID,” so that we’re keeping the competition within the same territory. This way, I’m not getting a salesperson from Territory 1 competing against somebody from Territory 6. Let’s see what happens when we do this.
SELECT S1.BusinessEntityID ,S1.TerritoryID ,S1.SalesYTD ,S1.SalesLastYear ,S2.BusinessEntityID ,S2.SalesLastYear FROM Sales.SalesPerson S1 INNER JOIN Sales.SalesPerson S2 ON S1.SalesLastYear Between S2.SalesLastYear - 500000 AND S2.SalesLastYear + 500000 AND S1.TerritoryID = S2.TerritoryID
All right. Well, this really took it down to only two people. That was interesting, but I think now you can see where these join conditions really make a query start becoming more restrictive with the rows, and in a way, also, make the data more meaningful because, now, what we have, is these are our salespeople that have sales within a million dollars from last year’s sales year-to-date within the same territory. That’s a lot different than across the whole organization. This is a good example, and you can see how much more restrictive it is. Again, let me comment this out, and then run it, and you’ll see this as it expands to many more rows.
Many more sales combinations of salesmen come into play when I can go across territories. Also, notice with non-equi joins how many different operators I have in place here. I’m using a between, I’m using a not equals, I’m using a greater than, I’m using an equals. The non-equi join really has no restriction on what I can use. You can see that by me using these, I’m still able to create a very meaningful query. When I run this, this query has a lot of meaning in the sense that I am bringing back, I think, very useful information to a sales manager. He could use this to set up a really good competition between these two sales associates in Territory number 6, and it’s all based on last year’s sales year-to-date. Hopefully, you’re finding that this example really drives home non-equi joins can be used to really look at your data and do some good compare contrasts of rows, especially when you’re looking at rows from the same table.
If you have questions about this, please ask because I know this can get a little esoteric. It’s not just as straightforward as doing a inner join from one table to the next. It can get a little crazy, and I certainly will answer any questions you have. I’ll see you in the next lesson.