Database Schema Versioning

Database schema is an important part of your source code and you would like to release it along with your application especially if it is a web application. The other reason to version your database is for continuous integration. Every time a team member makes some change in the code the build server picks it up and rebuilds the whole application again. If that change contains a database update, you would want that to happen on the build server automatically. Here comes your database script runner tool.

I have been looking for tools like that over the internet but could not find an open source script runner. The one which does the job is not open source and the one which is open source does not do the job right. So I decided to write my own.

This is a class library which takes in a connection string, a file format string and the path to the scripts directory. Connection string is off course the address of your database and the file format is an handy feature so you can use any name format for your sql scripts. For instance: a file name “my_%.sql” will read all files of the format my_<sequence number>.sql where sequence number is a 4 digit integer.

Here is the code. I use it as a library and run it at the start up on one of my services but you can create a console application and reference the assembly in it to use it as a standalone tool. I might create that console application later and will share it here if I do.

    public class SchemaUpdater
    {
        private readonly string _connectionString;
        private readonly string _nameformat;
        private readonly string _scriptdir;

        public SchemaUpdater(string connectionstring, string format, string scriptpath)
        {
            _connectionString = connectionstring;
            _nameformat = format;
            _scriptdir = scriptpath;
        }

        public void Update()
        {
            using (var connection = new SqlConnection(_connectionString))
            {
                connection.Open();
                var scriptseparator = new[] {"\nGO"};

                // Make sure we have a schema versions table
                var scriptfile = string.Format("{0}\\{1}", _scriptdir, "versions-table.sql");
                var transaction = connection.BeginTransaction(IsolationLevel.Serializable);
                try
                {
                    Array.ForEach(File.ReadAllText(scriptfile).Split(scriptseparator, StringSplitOptions.RemoveEmptyEntries),
                        (sql => new SqlCommand(sql, connection, transaction).ExecuteNonQuery()));
                    transaction.Commit();
                }
                catch(Exception)
                {
                    transaction.Rollback();
                    throw;
                }

                // Now run the baseline
                scriptfile = string.Format("{0}\\{1}", _scriptdir, "base.sql");
                transaction = connection.BeginTransaction(IsolationLevel.Serializable);
                try
                {
                    Array.ForEach(File.ReadAllText(scriptfile).Split(scriptseparator, StringSplitOptions.RemoveEmptyEntries),
                        (sql => new SqlCommand(sql, connection, transaction).ExecuteNonQuery()));
                    transaction.Commit();
                }
                catch (Exception)
                {
                    transaction.Rollback();
                    throw;
                }

                // Now run all the files
                var command = new SqlCommand("SELECT Version FROM SchemaVersion", connection);
                var version = command.ExecuteScalar();
                command.Dispose();

                var start = version == null ? 0 : Convert.ToInt32(version)+1;

                var filename = _nameformat.Replace("%", start.ToString("0000"));
                scriptfile = string.Format("{0}\\{1}", _scriptdir, filename);

                transaction = connection.BeginTransaction(IsolationLevel.Serializable);

                try
                {
                    while (File.Exists(scriptfile))
                    {
                        Array.ForEach(File.ReadAllText(scriptfile).Split(scriptseparator, StringSplitOptions.RemoveEmptyEntries),
                            (sql => new SqlCommand(sql, connection, transaction).ExecuteNonQuery()));

                        start++;
                        filename = _nameformat.Replace("%", start.ToString("0000"));
                        scriptfile = string.Format("{0}\\{1}", _scriptdir, filename);
                    }

                    new SqlCommand(string.Format("Update SchemaVersion SET Version={0}", start-1), connection, transaction).ExecuteNonQuery();

                    transaction.Commit();
                }
                catch(Exception)
                {
                    transaction.Rollback();
                    throw;
                }
            }
        }
    }

This is a small piece of code but it works for me, also takes care of the GO separator in the sql scripts and runs all commands in a transaction. It uses my scripts “version-tables.sql” which creates a new version tables in the database if it does not exist and “base.sql” which contain any sql statements which you want to run to create a baseline schema.

SQL Server clustering vs. Oracle RAC

 

I did some research on SQL Server clustering a while ago to find out what high-availability and fail-over options does SQL Server provide. Below are my findings. In short SQL Server has fail-over but no high-availability as compared to Oracle RAC.

1. SQL Server 2008 Peer-to-Peer Replication

  • Multiple nodes running their own instances
  • Each node has its own copy of data
  • Every node is a publisher and subscriber at the same time
  • Not scalable because of complex architecture
  • Complex to modify schema
  • Conflicts may arise if two nodes update the same row
  • In case of conflict, the topology remains inconsistent until the conflict is resolved
  • Conflict is resolved manually using the method described in
    http://technet.microsoft.com/en-us/library/bb934199.aspx

2. SQL Server 2008 Mirroring

  • One primary server and more than one mirror instances
  • Periodic Log Shipping between primary and secondary servers
  • Failover process is manual
  • A separate ‘witness’ server can be deployed to automate the fail over
  • Secondary servers  do not participate in any transaction and just wait for the failover
  • Equivalent to Oracle standby database technology

3. SQL Server 2008 Failover Clustering

  • Two servers running on a shared storage
  • All data and logs reside on the SAN and is shared
  • One server performs all transactions and the other waits for the failover
  • Microsoft Cluster Server takes care of the fail over
  • Both instances have separate instance names and one cluster instance name
  • Clients connect to the cluster IP address and cluster instance name
  • Failover is transparent but a delay (in minutes) is required to mount the database on the failover instance and start it
  • There is an application blackout during fail over process
  • Reference (http://msdn.microsoft.com/en-us/library/aa479364.aspx)

4. SQL Server 2008 Active/Active Failover Clustering

  • Two instances running on a shared storage
  • Two different SQL Server databases setup on both servers
  • Active/Active Clustering is effectively two different failover clusters
  • Each node in the cluster is running one primary instance and one secondary instance of the other node
  • Both clusters run a synchronized copy of the database
  • Replication is setup between both clusters to keep them synchronized
  • Clients see two different databases available to connect to
  • In case of failure, one server runs both database instances which may cause performance overhead
  • Write cost may increase because of replication and database synchronization
  • Application blackout will only be for the clients connected to the failed instance
  • Peer-to-Peer replication has conflicts (See No. 1)

5. SQL Server 2008 Federated Database

  • Multiple instances running in a network connected to each other
  • Each instance carry part of the database
  • Complete table is formed always using VIEWS and distributed SQL
  • Each instance has a VIEW of the table using UNION ALL between all instances called DPV
  • Complex to scale up and manage
  • Complex to modify the schema because of multiple databases
  • May have HOT-NODE syndrome when one node carry most used data

6. Oracle 11g RAC

  • Multiple Nodes running on a shared storage
  • All nodes are participating
  • Nodes are connected to each other using inter-network
  • All nodes servicing the single database
  • Scalable because of single database
  • Entire cluster fails if SAN fails
  • Higher performance inter-connect required for cache fusion as nodes increase
  • Virtual IP Address is used to connect to all servers
  • In case of failure of one node, clients will connect to other nodes on the same IP address on subsequent requests
  • 30-60 seconds of delay required for failover
  • Application blackout will only be for the clients connected to the failed instance

 

Share this post : digg it! Facebook it! live it! reddit! technorati! yahoo!