In this article I’ll show you how to use STRING_SPLIT() to join two tables. I call this the impossible join.
I’ve come across this situation several times and always when dealing with data imported from two different sources. In most case it has to do with users inputting data inconsistently.
Let me show you.
Table of contents
Watch the Video
Don’t forget to check out the video. I go over the examples and provide more back ground for you to better understand it all!
The Impossible Join
In this example I have two table Project and Project Status. I would like to join ProjectStatus to Project to display the Name, ProjectStatus, and AccountNumber in the same result.
If you look at ProjectStatus, ProjectName is a coder’s dream. The codes look great. The Project table, on the other hand, is a hot mess. The codes are buried within the column value.
There is now way to do a direct join from one to the other. In fact, only Project 9108 successfully joins.
The Impossible Join Approach
The way we make the impossible join work is by splitting the ProjectName into pieces. This way we don’t have to join on “F1432 plastic” but can look to join on the parts “F1432” or “plastic.”
As you can see from the example, “F1432” is part of the good ProjectName and will readily join between the two tables.
The main steps are:
- Split the hot mess column into pieces.
- Join into this separate pieces to make a match.
- If a match is made, then join the the two tables together.
So now that we have an idea how this is happening, let’s make it happen. To do so we’ll first learn about STRING_SPLIT(),
How STRING_SPLIT() Works
STRING_SPLIT() is used to break up a column into separate parts. It is a special type of function which returns a table as it’s result. It is similar to a Table Valued Function.
You can learn more about STRING_SPLIT() in this article I wrote, but the general idea is it take the column you want to split, and the character, such as space that separates the values (aka words).
So STRING_SPLIT(‘F1432 plastic’, ‘ ‘) returns a table with two rows:
In our example let look at what happen when you use string split in a query against Project:
Here is the tokenized table with account numbers.
select value ProjectName, AccountNumber from Project cross apply string_split(ProjectName, ' ')
There are a couple of things to notice here. First, since STRING_SPLIT() return a table, we can use it in our query. You’ll see I’m using CROSS APPLY to join the STRING_SPLIT() result to the Project. I do so I can return a result showing each ProjectName value along side the Account Number.
We now use this result in a final join to the ProjectStatus table to obtain our final result.
Learn More: Check out SQL Joins – The Ultimate Guide
Using STRING_SPLIT() with INNER JOIN
At this point that hard part is over. We have an intermediate result which include the project “codes” as well as account numbers.
That last thing to do is relate this to the project table.
I pull this all together in the following example:
Select s.ProjectName, s.ProjectStatus, p.AccountNumber from ProjectStatus s left join ( select value ProjectName, AccountNumber from Project cross apply string_split(ProjectName, ' ') ) p on s.ProjectName = p.ProjectName
IF you find your self needed to join to a “code” embedded within another string, then you may find that STRING_SPLIT() allows you to get to the “code” to make the join successful.