Interview with Alessandro Alpi on Database Life Cycle Management
I recently had the opportunity to interview Alessandro Alpi, a SQL Server MVP, regarding SQL and ALM (Application Life cycle Management). In this interview we talk about the tool and importance of managing your code and static data with version control software such as Git. In addition we talk about testing.
All these topics struck a chord with me as I’ve been a developer for more than twenty years and I find each of these topics key to managing program source code. Though I have a really good handle on this in the programming world, I didn’t when it came to databases.
I really learned alot through this interview and I hope you do to!
When you first learned SQL, what were some of the frustrations you had and what would you recommend to today’s beginners to avoid them?
I started with SQL Server 6.5 and 7, but I remember my frustration especially in SQL Server 2000. One of the first was the usage of CURSORS. Many people used to apply them due to missing knowledge about the t-SQL (or SQL) best practices such as using JOINs, temporary tables and set based operations.
The second thing could be the nested stored procedures approach (“too high” level) which involved a very complex debug (think about debugging in SQL Server 2000) with unreadable code.
Another one is, for sure, the usage of @@IDENTITY instead of SCOPE_IDENTITY() in triggers. I found that behavior in every trigger I met during my first jobs.
I can go on with the massive usage of READ UNCOMMITTED isolation level (with NOLOCK HINT everywhere), with missing index plan, with missing backups, and so on.
I’ve used source code tools to manage programs before, and it sounds like an awesome idea for a DB’s DDL. What are some of the advantages you see to do so? Are there pitfalls we should avoid?
You can use Source Control also for your static data actually. The real question to pay attention for is “why we don’t treat databases as code?” IMHO. Databases are usually the most important resources, and not only for data persistence. If we analyze them deeper, we can say that our databases are sets of lines of code (programmability, DDLs, types). In addition, data are INSERTS. Therefore, code can be source controlled. It is straight.
There are many advantages when using database source control. We can make stored and labeled versions of our database code automatically, we can share the development lines (each line can have a set of related database under the same workspace), we will have a central point of deployment, and we can automate tests and deployment. Speaking about teamwork, another benefit could be to keep the databases synchronized between developers, to allow database changesets related to the application ones. Finally, we can branch and merge scripts, we can create easily different versions of our databases, we can simply prevent regressions, and we can work without affecting other developers.
One of my advice is to understand how your team works. In many cases, I have found teams that worked on a shared database, a “development database.” Actually, a developer should have its own copy of the database (isolation), in order to avoid any change that could affect the other developers’ work. In this case, we have to focus on choosing the right model (dedicated databases instead of central one). Source control will help us to manage conflicts and to merge different versions.
The second advice is “bring static data under source control.” Static data are the most important data to bring with us when deploying in our environments. Without them, we have to create ad-hoc scripts for populating tables. If we do not create them, foreign keys constraint will be always violated and, then, the deploy processes will fail.
The third advice is “use a solution that supports merge scripts.” Suppose to have a changeset that includes ALTER TABLE with NOT NULL column addition. In our development database the table we changed, for sure, it was empty (no constraint violation). However, in another environment (i.e. testing or staging, or, in the worst case, production) the table will be populated with records. If the source control manager does not provide any alert or any migration script capability, we can occur in failed deploy (or also in data loss). Use a solution that allows you to create (or that auto-generates) migrations.
What source code tools do you recommend a SQL Server DBA or developer investigate? Are there open source tools, akin to SVN or GIT?
I would like to split the reply in two parts.
First, the source control manager. We need to pay attention for distributed or centralized approach. Distributed, like Git, bring many benefits that centralized, such as VSOnline, cannot have. For instance, working offline is not well supported in a centralized scenario. Therefore, if you need to work offline with many local branches, use the distributed one.
Then, as a client, we can choose between Visual Studio based solution or third party tool (Red-Gate, ApexSQL). Visual Studio can be used with both Git and VSOnline (or TFS on premises). The third party tools can support many popular source control manager (git, VSO, TFS, Mercurial, SVN, etc.) using direct connection or working folder.
I’ve used NUINT over the years to test my data and business layer objects (c#), but always struggled with how to “reset” the database after each test. What are some techniques specifically used to unit test databases that ensure each test is run consistently and in isolation?
We can use t-SQL based framework or Visual Studio with Unit Test project templates in order to unit test our databases. Additionally, we can install SQL Test, a testing tool by Red-Gate, which is embedded on SQL Server Management Studio. The most popular t-SQL frameworks are tSQLt and tSQLUnit. While tSQLUnit is strongly suggested for SQL Server 2000, (just because tSQLt does not work on that SQL Server version) tSQLt is maybe the best testing framework now.
Both of them; however, have isolation capabilities, that allows us to “reset” the database state after each test. With those frameworks, we can fake tables, functions and procedures, and we can create stub records too. Automatic reset will take place at the end of each test.
I suggest using tSQLt and SQLCop together (that is a framework for testing health and code rules on our databases) or installing SQL Test, which provides a UI for using the same two frameworks.
Visual Studio allows us to create a PRE-TEST and a POST-TEST phase. This means that we can setup and cleanup the database as we need. It is not isolated actually, but is flexible with its designer.
Here are some handy resources:
- SQLCop: http://sqlcop.lessthandot.com/
- tSQLt: http://tsqlt.org/
- SQL Test: http://www.red-gate.com/products/sql-development/sql-test/
- tSQLUnit: http://sourceforge.net/projects/tsqlunit/
- I have created some code samples with tSQLUnit here https://code.msdn.microsoft.com/tSQLUnit-samples-Database-54b10e47
Many ALM tools are good at moving code from dev to production using transports (e.g. SAP), but many fail to move configuration information, such as lookup table entries. I think an issue is the identity keys in the two systems for these master data tables can’t be guaranteed to be the same. How have you managed promoting both DDL and configuration data across systems?
I am currently using tools from Red-Gate for comparing data and structures together with source control management system. I am storing static data with SQL Source Control and I am deploying it using some useful options on the compare projects.
By design, when merging data, SQL Data Compare will retain the same Ids. DDLs are promoted using SQL Compare. Actually, I have an issue yet, which is the “ignore indexes” capability. I would like to ignore indexes but not the index on primary keys constraint. With those tools, I cannot discriminate the behavior.
Another case is when the change is to move from a “FULL NAME” field to a two fields: “FirstName” and “LastName”. To promote this change there are several items to coordinate. Any suggestions on how to coordinate the DDL (drop a field and create a new one) and DML (populate two new fields from old one)? Would both be in source control or some type of transport?
You are speaking about database refactoring patterns. This method is called “Split Column”. You can follow two ways: “Schema Update” OR “Data Migration”.
With “Schema Update”, first, you need to create the new columns (or, if possible, you can use existing columns). Then, you have to add a trigger, which synchronize the columns with the source. This trigger must be fired for every change applied to the source column.
With “Data migration” you have to split with t-SQL statements (i.e. functions) the source column and to execute DML statements in order to change data.
Here is a link to the book “Refactoring Databases”: http://www.essentialsql.com/refactoringdatabases.
One last question! What advice would you give to someone just starting to explore SQL?
I suggest following at least a step-by-step course, which introduces the basic topics, like a walkthrough. Better if it is offline, because a good teacher can explain better concepts when it is near the audience.
Try to understand the paradigm, the model, and the logical concepts. This will allow you to go deeper and to familiarize with the set based approach, which is different to the programmatic one (many people used to write t-sql wrong just because they want to use loops).
Do not underestimate the shape in which you write queries. A good query structure, most of the times, means good performances. A poster like this can be helpful.
Use forums, do not stop to use the official documentation, the so-called “Books online” and do not stop to be well documented.
For Italian people, I suggest the following book: http://www.ateneonline.it/catlibro.asp?item_id=2986
I work in the IT area as a developer since 2000 to 2005 and as aDBA/DBDeveloper/.net Developer since 2005.
This passion started when I was 11 years old, with my wonderful Commodore64. I worked as a web solutions developer (between ASP to the most recent versions of ASP.NET), windows application and services developer with languages C# and VBNet. Starting from 2005 I begun to study the data layer, using SQL Server deeply. I actually work full time as DBA and software developer/architect.
I’m a SQL Server MVP since 2008 and also a trainer and speaker, taking sessions on object-oriented programming based on Framework and Microsoft SQL Server. I’m working on Microsoft Azure based environments (both Paas and Iaas) and I’m following the ALM processes of my team (applying SCRUM agile methodology).
I’m a Microsoft Certified Professional and Microsoft Certified Trainer.
I manage my two blogs on wordpress and www.dotnethell.itwww.dotnethell.it. I’m a staff member of this community and I reply in the discussions on dotnethell.it forums. You can find me also on SQL Server MSDN italian forums and on StackOverflow.