Introduction to the Merge Statement

Introduction to the MERGE Statement and SQL Server Data Modification

The MERGE statement is used to make changes in one table based on values matched from anther.   It can be used to combine insert, update, and delete operations into one statement.  In this article, we’ll explore how to use the MERGE statement.  We discuss some best practices, limitations, and wrap-up with several examples.

This is the fifth article in a series of articles.  You can start at the beginning by reading Introduction to SQL Server Data Modification Statements.

All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks2012 database.  You can get started using these free tools using my Guide Getting Started Using SQL Server

Before we Begin

Though this article uses the AdventureWorks database for its examples, I’ve decided to create several example tables for use within the database to help better illustrate the concepts covered.  You can find the script you’ll need to run here.  Notice there is a special section pertaining to MERGE.

Basic Structure

The MERGE statement combines INSERT, DELETE, and UPDATE operations into one table.  Once you understand how it works, you’ll see it simplifies procedure with use all three statements separately to synchronize data.

Below is a generalized format for the merge statement.

MERGE targetTable
Using sourceTable
ON mergeCondition
WHEN MATCHED
THEN updateStatement
WHEN NOT MATCHED BY TARGET
THEN insertStatement
WHEN NOT MATCHED BY SOURCE
THEN deleteStatement

The merge statement works using two tables, the sourceTable and targetTable.  The targetTable is the table to be modified based in data contained within the sourceTable.

MERGE statement match conditions.

The two tables are compared using a mergeCondition.  This condition specifies how rows from the sourceTable are matched to the targetTable.  If your familiar with INNER JOINS, you can think of this as the join condition used to match rows.

Typically, you would match a unique identifier, such as a primary key.  If the source table was NewProduct and target ProductMaster and the primary key for both ProductID, then a good merge condition to use would be:

NewProduct.ProductID = ProductMaster.ProductID

A merge condition results in one of three states:  MATCHED, NOT MATCHED, or NOT MATCHED BY SOURCE.

Let’s go over what the various conditions mean:

MATCHED – these are rows satisfying the match condition.  They are common to both the source and target tables.  In our diagram, they are shown as green.  When you use this condition in a merger statement you; most like being updating the target row columns with sourceTable column values.

NOT MATCHED – This is also known as NOT MATCHED BY TARGET; these are rows from the source table that didn’t match any rows in the target table.  These rows are represented by the blue area above.  In most cases that can be used to infer that the source Rows should be added to the targetTable.

NOT MATCHED BY SOURCE – these are rows in the target table that were never match by a source record; these are the rows in the orange area.  If your aim is to completely synchronize the targetTable data with the source, then you’ll use this match condition to DELETE rows.

If you’re having trouble understanding how this works, consider the merge condition is like a join condition.   ROWS in the green section represent rows that match the merge condition, rows in the blue section are those rows found in the SourceTable, but not in the target.  The rows in the orange section are those rows found only in the target.

Give these matching scenarios, you’re able to easily incorporate add, remove, and update activities into a single statement to synchronize changes between two tables.

Let’s look at an Example.

MERGE Example

Let’s assume that our goal is to synchronize any changes made to esqlProductSource with esqlProductTarget.  Here is a diagram of these two tables:

MERGE statement sample exercises tables

Note:  For the sake of this example I ran the scripts I talked about in the introduction to create and populate two tables:  esqlProductSource, and esqlProductTarget.

Before we construct the MERGE statement, let’s look at how we would synchronize the table using UPDATE, INSERT, and DELETE statement to modify, add, and remove rows in the target table.

I think once you see how we do this individually, then seeing combined into a single operation makes more sense.

Using UPDATE to Synchronize Changes from One Table to the Next

To update the target table with the changed values in the product source, we can use an UPDATE statement.  Given the ProductID is both table’s primary key, it become our best choice match rows between the tables.

If we were going to update the column values in the target table using the source column’s we could do so using the following update statement

UPDATE esqlProductTarget
SET    Name = S.Name,
       ProductNumber = S.ProductNumber,
       Color = S.Color
FROM   esqlProductTarget T
       INNER JOIN esqlProductSource S
       ON S.ProductID = T.ProductID

This statement will update the column in esqlProductTarget with corresponding column values found in esqlProductSource for matching productID’s.

INSERT Rows Found in one Table but Not the Other

Now let’s look how we can identify the rows from the source table that we need to insert in the product target.   To do this we can use subquery to find rows in the source table that aren’t in the target.

INSERT INTO esqlProductTarget (ProductID, Name, ProductNumber, Color)
SELECT S.ProductID, S.Name, S.ProductNumber, S.Color
FROM   esqlProductSource S
WHERE NOT EXISTS (SELECT T.ProductID
                  FROM   esqlProductTarget T
                  WHERE  T.ProductID = S.ProductID)

Note:  I could also use an outer join to do the same.  If you’re interested in why, check out this article.

This statement will insert a new row into esqlProductTarget from all rows in esqlProductSource that aren’t found in esqlProductTarget.

Removing Rows

That last synchronization activity we need to do, it removes any rows in the target table that are not in SQL Source.  Like we did with the insert statement, we’ll use a subquery.  But this time we’ll idenfity rows in esqlProductTarget not found in esqlProductSource.  Here is the DELETE statement we can use:

DELETE esqlProductTarget
FROM   esqlProductTarget T
WHERE  NOT EXISTS (SELECT S.ProductID
                   FROM   esqlProductSource S
                   WHERE  T.ProductID = S.ProductID)

Now that you’ve seen how to do the various operation individually, lets see how they come together in the merge statement.

MERGE esqlProductTarget T
USING esqlProductSource S
ON (S.ProductID = T.ProductID)
WHEN MATCHED 
     THEN UPDATE
     SET    T.Name = S.Name,
            T.ProductNumber = S.ProductNumber,
            T.Color = S.Color
WHEN NOT MATCHED BY TARGET
THEN INSERT (ProductID, Name, ProductNumber, Color)
     VALUES (S.ProductID, S.Name, S.ProductNumber, S.Color)
WHEN NOT MATCHED BY SOURCE
THEN DELETE;

Notice that there most of the heavy lifting is done by the merge condition and its outcomes.  Rather than having to repeatedly set up the match, as we did int the delete statement, it is done once.

Compare again the Insert statement to the merge statement above.

INSERT INTO esqlProductTarget (ProductID, Name, ProductNumber, Color)
SELECT S.ProductID, S.Name, S.ProductNumber, S.Color
FROM   esqlProductSource S
WHERE NOT EXISTS (SELECT T.ProductID
                  FROM   esqlProductTarget T
                  WHERE  T.ProductID = S.ProductID)

Given the MERGE statement establishes the source and target table, as well as how they match, everything color coded in red is redundant; therefore, not in the insert portion of the merge.

Logging MERGE Changes using OUTPUT

You can use the OUTPUT clause to log any changes.  In this case the special variable $action can be used to log the merge action.  This variable will take one of three values:  “INSERT”, “UPDATE”, or “DELETE”.

We’ll continue to use our example, but this time we’ll log the changes and summarize the changes.

MERGE esqlProductTarget T
USING esqlProductSource S
ON (S.ProductID = T.ProductID)
WHEN MATCHED
THEN UPDATE
     SET    T.Name = S.Name,
            T.ProductNumber = S.ProductNumber,
            T.Color = S.Color
WHEN NOT MATCHED BY TARGET
THEN INSERT (ProductID, Name, ProductNumber, Color)
     VALUES (S.ProductID, S.Name, S.ProductNumber, S.Color)
WHEN NOT MATCHED BY SOURCE
THEN DELETE
OUTPUT S.ProductID, $action into @MergeLog;

SELECT MergeAction, count(*)
FROM   @MergeLog
GROUP BY MergeAction

If the above is run on fresh sample data, the following summary is generated:

MERGE statement OUTPUT clause summary