CREATE VIEW Side Effects

·

·

So what happens when you create a view, and then later someone else changes the underlying tables? In this article let’s explore a simple scenario, check out how the SQL view handles the change, and what we can learn from viewing the data dictionary.

Create View Scenario

Imagine we are accepting data from a table that is the same in three separate systems. We do some cleanup activities on the data, saved the cleaned up version, and finally to help out the user, create a view to pull it all together.

It’s a simple process which I diagramed for you below:

Source Systems used to CREATE VIEW.
Three source tables are combined a single view.

The general idea is that we create three tables and offer it up as a view.

I created the scenario using the following SQL:

-- setup scenario
create table T1 (ID int, City varchar(20));
insert into T1 (ID, City) values (1, 'Atlanta'), (2, 'Chicago'), (3, 'New York')

create table T2 (ID int, City varchar(20));
insert into T2 (ID, City) values (11, 'London'), (12, 'Frankfurt'), (13, 'Rome')


create table T3 (ID int, City varchar(20));
insert into T3 (ID, City) values (21, 'Tokyo'), (22, 'Sydney'), (23, 'Seoul')

-- make it easier on users, create a view!
create view TV
as
select * from T1 union
select * from T2 union
select * from T3

This works perfectly and it is easy to get the data. Just select from the view to see the results:

select *
from TV
Results from initial create view statement.

But when the source tables are changed by the developers, and we’re processing and cleaning data, but not explicitly using every view, we can get into a situation where some field changes go unnoticed and our CREATE VIEW definition is out of date.

CREATE VIEW issues due to Underlying Tables Modifications

To simulate a possible change, let’s drop the three tables, change some columns, and try the view:

drop table T1
create table T1 (ID int, Stat int, CityName varchar(20));
insert into T1 (ID, Stat, CityName) values (1, 0, 'Atlanta'), (2, 0, 'Chicago'), (3, 0, 'New York')

drop table T2
create table T2 (ID int, Stat int, CityName varchar(20));
insert into T2 (ID, Stat, CityName) values (11, 0, 'London'), (12, 0, 'Frankfurt'), (13, 0, 'Rome')

drop table T3
create table T3 (ID int, City varchar(20));
insert into T3 (ID, City) values (21, 'Tokyo'), (22, 'Sydney'), (23, 'Seoul')

with the tables changed, what do you think will happen when we try to run our view?

What result will you expect to see from

select * from TV

Surprisingly, it won’t throw an error. In fact, it is a bad situation. The query runs, but returns incorrect data!

Here are the results:

Incorrect SQL view results after underlying table definitions changed.

So what happened? Why are we seeing incorrect results?

Why CREATE VIEW is Causing Issues

In order to find an explanation, let’s go look at the VIEW’s SQL stored as the definition within the data dictionary.

We can lookup the columns used to create the view

select v.name, m.definition, c.*
from sys.views v
    inner join sys.sql_modules m on v.object_id = m.object_id
    inner join sys.columns c on v.object_id = c.object_id
where v.name = 'TV'

This query uses three tables to pull together SQL Server’s perspective. I’m using sys.views to get the view, sys.sql_modules to get the text used in the CREATE VIEW command, and finally sys.columns to see what columns are defined in the view.

There are a couple in interesting items to note:

  1. The view definition remain in the form “select * from T1…”
  2. The columns we see in sys.columns are from the original table definition.

So what is going on?

Clearly the whenever we execute the command

select * from TV

it isn’t looking at the view definition SQL and re-running that query. If the database did that, then select * would work fine.

Rather, the database is looking up the column definitions and incorporating those into the view. I’ll need to dig deeper to find the exact sql it is running, but from the information I have here, you can see view is ordinal based. It is based on the order of the two columns from sys.columns.

By looking at the final result, we can see the the query optimizer uses this out-of-date information to build the query.

Conclusion

In summary, it is best practice to not use “select *,” however, when you start working with data warehouses you quickly find that using full columns lists is a maintenance nightmare. You may find yourself using select * to cut down on coding. If you decide to go this route, be aware that any view you create using the select * method may become out of data as schemas in your source systems drift.

Related Topics

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More from the blog


MySQL PostgreSQL SQLite SQL Server