What is a Non-Equi Join in SQL and What is its Purpose?

·

·

A non-equi join can be used to solve some interesting query problems. You can use a non-equi join to check for duplicate value or when you need to compare one value in a table falls within a range of values within another.

You can learn more about this this video, it’s transcript follows:

What is a Non-Equi Join in SQL Transcript

Welcome back. In this lesson, we’re going to learn about non-equi joins.

Before we learn about non-equi joins, let’s do a quick on equi joins.

An equi join is a join that uses equality operators, so recall that a equality is just a join that uses the equal sign.

Check out more Join Type in my Ultimate Guide to SQL Joins.

Equi Join Examples

Some examples of equi joins are join conditions where we’re matching first name from one table to first name from another table, or, for example, where we’re matching the customer ID from the customer table with the customer ID from the order table.

Equi join is just a fancy way of saying that we’re matching values equally. We’re using the equal sign in our join condition.

I think it follows with a non-equi join that we’re not using the equal sign. Non-equi joins are joins whose join conditions use conditional operators other than equals.

An example would be where we are matching first name and then last name, but we are checking where one field from a table does not equal field from another table.

Non-Equi Join’s Secret Sauce

That’s what makes it a non-equi join.

Even though in this join condition, we are matching column for column, first name in one table to first name in another, last name in one table to last name in another, we are making sure that the business entity ID in one table does not equal the business entity ID in another table.

Therefore, our join condition is considered a non-equi join. You may be wondering, why would we do this?

Well, there’s a couple of reasons why we would want to use non-equi joins. Some of the common uses include:

  • checking for duplicate data between tables
  • matching against a range of values
  • computing running totals.

If it helps, both non-equi joins and inner joins are considered SQL inner joins!

In this lesson, we’re going to go look at how you can check for duplicate data in a table. We’re also going to look at how you can match against a range of values, and then we’ll save the computing running totals example for the last lesson because that one’s a little more advanced.

Use Non-Equi Join to Check From Duplicate Values

In this example, we’re going to check for duplicate values. We’ll use the person table as our example.

Use a non-equi join to check for duplicate values

We’ll be looking at the first name and last name. What we’ll be doing is checking the table to see if there are any rows that have the same first name and last name across the table. We know that the business entity ID is a primary key in the person table, so that is unique.

Given that and knowing that the first name and last name could potentially be duplicate, what we’ll do is we will join the table on first name and last name to itself, and that will start making combinations of rows.

Self Join

If you can imagine, we’re going to start seeing where the table will start matching to itself. This is called a self join.

Naturally, we’re going to see where some rows are going to match right to themselves.

There will be those cases where the business entity ID does not equal itself, and that is where non-equi join comes in, and that’s how we’re going to find duplicates.

Let’s go look and see what that query looks like right now.

SELECT P1.FirstName,
       P1.LastName,
       P1.Title,
       P1.MiddleName,
       P2.MiddleName,
       P2.Title
FROM Person.Person P1
INNER JOIN Person.Person P2 ON P1.FirstName = P2.FirstName
       AND P1.LastName = P2.LastName
       AND P1.BusinessEntityID <> P2.BusinessEntityID

Rather than type this query out, I just paste it in, and what I want to show you here is that we are first joining from the person table and we are doing an inner join right back onto the person table. We are going to join the person table on first name and last name.

We’re going to be looking for matching Kris to Kris in one table and Wenzel to Wenzel in the other.

We’re going to check to see if that primary key matches, because if they don’t match, then we know we have a duplicate, because if they matched, what would that mean?

If they matched, that means it’s the same record.

Query Breakdown

Let’s decompose this query so you can understand a little more what’s going on behind the scenes.

The first thing I’m going to do is add in the primary key so that you understand really what we’re bringing back here in terms of the values from each row.

We’ll bring back P1 Business Entity ID and then we’ll do P2 Business Entity ID. You can see where, for the first result, it brings in the entity ID where for PI, it’s 859, and for the second, for P2, it’s five, and this is for Gail Erickson, and you can see where for P2 or for P1, the middle name is null, and for the second copy, the middle name is A.

Query without Non-Equi Join

One thing I want to do is show you if I was to comment out the non-equi join part, what the query would look like, and so what I’m going to do is just put P1 Entity ID just so we can see a little better what these columns are and run this.

SELECT P1.FirstName,
       P1.LastName,
       P1.Title,
       P1.MiddleName,
       P2.MiddleName,
       P2.Title
FROM  Person.Person P1
      INNER JOIN Person.Person P2 ON P1.FirstName = P2.FirstName
                 AND P1.LastName = P2.LastName
                 --AND P1.BusinessEntityID <> P2.BusinessEntityID

Now, you can see a little more clearly where the non-duplicated values are also coming back. Here, we matched the names from both tables, and I’m actually getting the unique values, too.

Here’s Gigi Matthew. This is entity ID 9 matches to itself, 11, and so on. Here, you can see where we have a duplicate for Suchtra Mohan, and what I’m going to do now is un-comment out this non-equi join and show you how it will now return this row, but to do that,

I’m going to also add in a [inaudible 00:07:33] clause just so we can bring back any first names that begin with S-U, so I’m going to un-comment out that non-equi join and run it, and you can see it brings back a lot of entries.

We’re not seeing Suchtra here, so we need to refine our query here. Then, I’ll do P1.firstname, like, and then I will do … and then run this, and you’ll see where we have Suchtra Mohan now showing up as a duplicate. Here’s a way where you can identify duplicate data in your table.

Let’s go on to our next example.

Compare A Range of Values using a non-Equi Join

For this next example, we’re going to match a range of values. Let’s assume we want to list every salesperson that is within one million dollars of year-to-date sales of one another.

Use non-equi join to match range of values

One possible solution is to get every combination of salespeople, compute the year-to-date sales from those combinations, look to see if they are within one million dollars of one another, and if so, then keep the record. We’ll see how this would work in SQL as a query.

Again, here’s a query that I brought in.

SELECT S1.BusinessEntityID
       ,S1.TerritoryID
       ,S1.SalesYTD
       ,S1.SalesLastYear
       ,S2.BusinessEntityID
       ,S2.SalesLastYear
FROM   Sales.SalesPerson S1
       INNER JOIN Sales.SalesPerson S2
       ON S1.SalesLastYear Between S2.SalesLastYear - 500000 
       AND S2.SalesLastYear + 500000

Rather than typing, I’m pasting in the example. It goes quicker. Let me clean it up a little bit here so that it’s easier to read. Let’s take a look at this query. This query is taking each salesperson and comparing the salesperson to one another.

The salesperson is being joined on sales from last year, and we are doing it by a range, so it’s going to look very strange here. We’re not using an equal sign or even a not equal comparison. We’re actually using the between comparison operator, and we are looking for any salesperson who has their year-to-date sales from last year that are either within one million dollars of our sales.

Working Through The Ranges

If my sales from last year were, let’s say, $900,000, then I’m looking for another salesperson who sales were either $900,000 minus $500,000, so either $400,000 or $900,000 plus $500,000. I guess that would be $1,000, $400,000, within that range of sales, so from $400,000 to $1,400,000 in sales. If there’s any salesperson in that range, then I want to include them in my result, and we do that for each salesperson based on their sales from last year, and then we’ll compile a list.

What this list is going to show us is our sales and then the person’s sales from last year. You may be asking yourself why you would want to put a query like this together, and it could be that the sales manager is looking to find salespeople that had sales close to another salesperson, and he is looking to either team them up or set up a friendly competition between them, and this is one way to scan the whole database and find those pairings. Let’s run this query and see what we find.

You’ll see where I didn’t bring in names of salespeople just to keep it simple, but I have their ID. You can see where it’s bringing in their ID and then their sales year-to-date, and the person’s sales from last year. One thing that’s interesting is, do you notice how it’s bringing in the person again?

We need to fix that because it really shouldn’t be bringing in the same person, so we can add an and … We can say, “The salesperson’s business entity ID does not equal sales to that business entity ID because we don’t really want to compare the same salesperson to themselves. Now, let’s run it.

Dealing with Missing Sales Years

The other silly thing is that we have some people that didn’t have sales last year. That may or may not be something that you would find it to be an issue, but what we could do is winnow those out as, say … We could make those zero. Actually, we’ll just say greater than zero.”

Now, we have the salesperson and the sales associates that have sales that are within a million dollars of their sales from last year. Salesperson 275 had sales and then salesperson 276 was a million dollars, and so on. It looks like there’s 106, some records that match this type of criteria.

One thing we could do, if we wanted to make this even a little more restrictive is say that they have to be in the same territory. We could say, “And S1.territoryID = S2.territoryID,” so that we’re keeping the competition within the same territory. This way, I’m not getting a salesperson from Territory 1 competing against somebody from Territory 6. Let’s see what happens when we do this.

SELECT S1.BusinessEntityID
       ,S1.TerritoryID
       ,S1.SalesYTD
       ,S1.SalesLastYear
       ,S2.BusinessEntityID
       ,S2.SalesLastYear
FROM   Sales.SalesPerson S1
       INNER JOIN Sales.SalesPerson S2
       ON S1.SalesLastYear Between S2.SalesLastYear - 500000 
       AND S2.SalesLastYear + 500000
       AND S1.TerritoryID = S2.TerritoryID

All right. Well, this really took it down to only two people. That was interesting, but I think now you can see where these join conditions really make a query start becoming more restrictive with the rows, and in a way, also, make the data more meaningful because, now, what we have, is these are our salespeople that have sales within a million dollars from last year’s sales year-to-date within the same territory. That’s a lot different than across the whole organization.

This is a good example, and you can see how much more restrictive it is. Again, let me comment this out, and then run it, and you’ll see this as it expands to many more rows.

Non-Equi Join Clause Operators

Many more sales combinations of salesmen come into play when I can go across territories. Also, notice with non-equi joins how many different operators I have in place here. I’m using a between, I’m using a not equals, I’m using a greater than, I’m using an equals. The non-equi join really has no restriction on what I can use. You can see that by me using these, I’m still able to create a very meaningful query.

When I run this, this query has a lot of meaning in the sense that I am bringing back, I think, very useful information to a sales manager. He could use this to set up a really good competition between these two sales associates in Territory number 6, and it’s all based on last year’s sales year-to-date. Hopefully, you’re finding that this example really drives home non-equi joins can be used to really look at your data and do some good compare contrasts of rows, especially when you’re looking at rows from the same table.

Other Fun Reading!

Duplicate Rows in SQL – Find using Window Functions

How to use the SQL BETWEEN operator

If you have questions about this, please ask because I know this can get a little esoteric. It’s not just as straightforward as doing a inner join from one table to the next. It can get a little crazy, and I certainly will answer any questions you have. I’ll see you in the next lesson.

6 responses to “What is a Non-Equi Join in SQL and What is its Purpose?”
  1. Peggy B

    The example from the range has some misleading information (doesn't answer the required question). Within sales $M of each other would be …BETWEEN S2.SalesLastYear – 1000000 AND S2.SalesLastYear +1000000. Presenter answered sales within 500K (technically a range of 1M, but not what was asked).

  2. Andrii

    Thanks for the articles. I have a question regarding using ‘between’ in a join section. I mean joins work after Cartesian Product, (n*n) operations. Isn’t more natural to use simple selection from a single table with WHERE filter?

    1. Hi,

      If you can incorporate the the condition into the join clause I recommend doing so, as it may help with optimization, but to be honest, your best bet, if you’re worried about performance, is to view the query plan. Check out this article to learn more about plans: https://www.essentialsql.com/what-is-a-query-plan/

  3. Krishna

    If suppose salesperson X has sales 16 million dollars of sales and according to your question we want all the salesperson whose YTD sales should be within 1 million dollars to 16 million dollars then I guess the range should be (15,17). But according to your explanation, the range should be (15.5,16.5). Could you please clarify on this I am bit confused in this regard

  4. FAN ZHANG

    This is a very good tutorial! Not only I understand what non-equi joins are and most importantly I know two usages of it. I think with some practices I can find more usages. Thanks!

  5. El mundo del bolso maternal es algo que me fascina.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More from the blog


MySQL PostgreSQL SQLite SqlServer