How to do an Impossible Join with String Split

·

·

,

In this article I’ll show you how to use STRING_SPLIT() to join two tables.  I call this the impossible join.

I’ve come across this situation several times and always when dealing with data imported from two different sources.  In most case it has to do with users inputting data inconsistently.

Let me show you.

Watch the Video

Don’t forget to check out the video. I go over the examples and provide more back ground for you to better understand it all!

The Impossible Join

In this example I have two table Project and Project Status.  I would like to join ProjectStatus to Project to display the Name, ProjectStatus, and AccountNumber in the same result.

If you look at ProjectStatus, ProjectName is a coder’s dream.  The codes look great.  The Project table, on the other hand, is a hot mess.  The codes are buried within the column value.

There is now way to do a direct join from one to the other.  In fact, only Project 9108 successfully joins.

Impossible Join Example

The Impossible Join Approach

The way we make the impossible join work is by splitting the ProjectName into pieces. This way we don’t have to join on “F1432 plastic” but can look to join on the parts “F1432” or “plastic.”

As you can see from the example, “F1432” is part of the good ProjectName and will readily join between the two tables.

The main steps are:

  1. Split the hot mess column into pieces.
  2. Join into this separate pieces to make a match.
  3. If a match is made, then join the the two tables together.

So now that we have an idea how this is happening, let’s make it happen.  To do so we’ll first learn about STRING_SPLIT(),

How STRING_SPLIT() Works

STRING_SPLIT() is used to break up a column into separate parts.  It is a special type of function which returns a table as it’s result. It is similar to a Table Valued Function.

You can learn more about STRING_SPLIT() in this article I wrote, but the general idea is it take the column you want to split, and the character, such as space that separates the values (aka words).

So STRING_SPLIT(‘F1432 plastic’, ‘ ‘) returns a table with two rows:

  • F1432
  • Plastic

In our example let look at what happen when you use string split in a query against Project:

Here is the tokenized table with account numbers.

select value ProjectName, AccountNumber
from Project
    cross apply string_split(ProjectName, ' ')

There are a couple of things to notice here.  First, since STRING_SPLIT() return a table, we can use it in our query.  You’ll see I’m using CROSS APPLY to join the STRING_SPLIT() result to the Project.  I do so I can return a result showing each ProjectName value along side the Account Number.

We now use this result in a final join to the ProjectStatus table to  obtain our final result.

Learn More: Check out SQL Joins – The Ultimate Guide

Using STRING_SPLIT() with INNER JOIN

At this point that hard part is over.  We have an intermediate result which include the project “codes” as well as account numbers.

That last thing to do is relate this to the project table.

I pull this all together in the following example:

Select s.ProjectName, s.ProjectStatus, p.AccountNumber
from ProjectStatus s
    left join (
        select value ProjectName, AccountNumber
        from Project cross apply string_split(ProjectName, ' ')
    ) p on s.ProjectName = p.ProjectName

IF you find your self needed to join to a “code” embedded within another string, then you may find that STRING_SPLIT() allows you to get to the “code” to make the join successful.

Join the newsletter

Subscribe to get our latest content by email.

Powered by ConvertKit

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More from the blog


MySQL PostgreSQL SQLite SqlServer