I thought I would get started with database change management and Tarantino, but after spending a day trying to figure it out, I eventually decided that it would go much faster to just roll my own.
What is database change management?
Most developers today are familiar with source version control. Even if some use it only for backups only, the main idea is for you to go back in history when needed or to try out different approaches in other branches. Using SVC (software version control) has become a standard in the industry, but very few apply the same idea on databases even though they also are a major part of most application development.
How not to version control databases
- Check in the database to source control It works as long as you are only one developer to check in the binary database file to your repository. You will however not be able to merge changes from different developers and you will have to do manual updates of your production environment when the time comes.
- Share database with team members on a central server The most common way to handle database development is not to version control at all, but to setup the database on a shared central server. This fails when you need to revert to a previous version, but it also makes it harder to work as a team. When you make changes to the database you will affect your team members because they don't have your code to support the database changes. In worst case the whole team will halt production until you've checked in your code to support the database change.
Why should I care?
If you manage to version control your database development, you will not only be able to revert to old revisions, work in branches but also automatically get old databases up to date within your release process. This is much more efficient than manually merging the databases at every release, and also less error prone.
- You bring stability to your release cycle
- You reduce hold ups in the production line for your team
- You bring greater control to your database development process
How do I get started?
If you use NAnt or MsBuild as build script you should probably head over to Tarantino and download their software since it has excellent support for your environment. If not, you may keep on reading. Since I'm using psake at the moment, I decided to implement my database change management in PowerShell v2. The concept is very simple. You keep a directory in your source control where you store database creation/update scripts. The key component is that you never ever change any of these scripts after their first commit. When you need to change something in the database you create a new script. That way you will always be able to create a new database and bring it up to head revision at any time.

The naming convention here is very important. Next thing you need is a database where you have a table and a column that tells you what version this database is. As you've already guessed, this is exactly up to what revision the change scripts has been applied.

All you need now is one function that can look at your database, decide what version it is and apply those changes that has not yet been applied. Actually, I do it in three functions and they look like this.
- Update-Database $connection_string $database_directory Connects to the database and get the version number. Finds all change scripts in the database directory that are above the specific version and applies them to the database. Last it updates the database with the current version number.
- Build-Database $sql_server $database_name Connects to the DBMS and creates an empty database. In this database it adds the necessary table and column to keep track of database versions.
- Drop-Database $sql_server $database_name Removes the database from the DBMS.
Example usage
In my current project I want to run integration tests on a fresh database every time I check in code. That means I will have the following build process executed.
- Compile
- Drop-Database "MAIA\SQLEXPRESS" "IntegrationTests"
- Build-Database "MAIA\SQLEXPRESS" "IntegrationTests"
- Update-Database "Data Source=MAIA\SQLEXPRESS;Initial Catalog=IntegrationTests;Integrated Security=True;" ".\Database"
- Run integration tests
When I check in code I will rebuild the database and execute my integration tests on it. That way I will not only test my code, but also verify that my change scripts are working. When it is time for release I will be able to run Update-Database on my production database, because I know that those build scripts has been thoroughly tested. Here is my PowerShell script if you're interested. I still see myself as a novice in PowerShell scripting, but it works and is quite minimalistic.

2 Comments
Emil Stenström said
Great post Mikael! I've dealt with database migrations in Rails and Django only, but it's nice to see that it's now starting to show up elsewhere too :)
Personally, I like the migration scripts to be at the ORM level, so I don't have to deal with SQL ever. That makes them a bit easier to write too, as you're in [your real language]-land, and can use classes, functions and so on, instead of the limited SQL syntax.
Another issue, that Django (or rather, an extension called south) has solved, is the problem when two developers work on a migration with the same number. If there's two 0007 scripts, which one should be run? Django solves this by saving a copy of the whole table definition, inside every migration file. This makes it possible to detect conflicts easily.
Anyway, good article!
Mikael Lundin said
Thank you Emil.
The .NET community is far behind when it comes to process. We're still trying to catch up when it comes to source control management, continuous integration and testing. You and I work in projects where these things taken for granted, where outside our protected walls is a world where people don't know the what, why and hows. There's not much to do but keep preaching. :)
I've not heard about migration scripts at the ORM level, but I've read about generating the database model from the OR mappings. At first sight it would seem like a good idea, but then I remember why the bridge between the relational world and object world exists. Object oriented thinking does not always work in a relational scheme where you need to denormalize for performance. In an application with 5 users I would not care (could almost be using MS Access for dbms), but when you work with very large datasets or a high performing database situation, I'm not sure if it would be possible to generate the schema from the OR mapping.
Of course there's always special situations that need special measures.
Microsoft at least thinks that generating database schema from the model is a good idea and brings it out with the next release of Entity Framework. Maybe I'm missing something in the big picture.
Mikael