SQL Server 2016: Temporal Tables. Changing the Schema

In the previous blog post, I made an introduction to basic concepts behind temporal tables. This time I would like to cover, in a little bit more detailed way, one difficult topic of data warehousing – schema change: how it can be applied to temporal tables and what impact on the system it can bring.

Foreword

This post contains performance measurements of certain DDL commands. However, absolute value of such metrics makes not so much sense due to no-reproducibility of execution environment on the reader side. Therefore, I will provide command execution time not just for schema change operations, but also for data load, index creation, data manipulation commands, so all timespans can be matched with each other in order to make overall understanding about the impact.

When it comes to characteristics of execution environment, it is Hyper-V guest with 4 cores, 4 GB RAM on HP ZBook /W8.1 /Core i7:

Server_overview

Performance measurements of the commands done by using:

SET STATISTICS TIME ON

 

Base table data

As the data sample, Product table from AdventureWorks2014 used, but with some additions:

  • Added extra filler – a column with a type CHAR(1000) in order to stretch rows and make bigger load on the SQL Server.
  • The dataset enlarged to 1 000 000 rows via CROSS JOIN. Actually, this is Adam Machanic’s script.
--  SQL Server Execution Times:
--   CPU time = 3 455 ms,  elapsed time = 32 109 ms.
SELECT TOP 1000000
	ISNULL(p.ProductID + (a.number * 1000),0) AS ProductID,
	p.Name + CONVERT(VARCHAR, (a.number * 1000)) AS Name,
	p.ProductNumber + '-' + CONVERT(VARCHAR, (a.number * 1000)) AS ProductNumber,
	p.MakeFlag,
	p.FinishedGoodsFlag,
	p.Color,
	p.SafetyStockLevel,
	p.ReorderPoint,
	p.StandardCost,
	p.ListPrice,
	p.Size,
	p.SizeUnitMeasureCode,
	p.WeightUnitMeasureCode,
	p.Weight,
	p.DaysToManufacture,
	p.ProductLine,
	p.Class,
	p.Style,
	p.ProductSubcategoryID,
	p.ProductModelID,
	p.SellStartDate,
	p.SellEndDate,
	p.DiscontinuedDate,
	cast('' as char(1000)) as DataFiller -- Dummy column to expand size of the table
INTO SQL2016Demo.dbo.Product
FROM AdventureWorks2014.Production.Product AS p
CROSS JOIN master..spt_values AS a
WHERE a.type = 'p'

Note that CPU time is just 1/10 of overall elapsed time. This is a sign that such operation is IO bound and such behavior is expected since virtual machine placed on a single, separate, but still non-SSD hard drive.

Temporal tables have one schema requirement: base table should have a PRIMARY KEY defined, according to documentation:

A temporal table must have a primary key defined in order to correlate records between the current table and the history table, and the history table cannot have a primary key defined.

So, another very IO bound query to run:

--  SQL Server Execution Times:
--   CPU time = 4 016 ms,  elapsed time = 173 433 ms.
ALTER TABLE dbo.Product ADD CONSTRAINT PK_Product PRIMARY KEY CLUSTERED
(
	ProductID
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

Finally, very last step in this section is enabling “System Versioning”:

--SQL Server Execution Times:
 --  CPU time = 1 593 ms,  elapsed time = 1 607 ms.
ALTER TABLE dbo.Product
ADD PERIOD FOR SYSTEM_TIME (Valid_From, Valid_Till),
Valid_From datetime2 GENERATED ALWAYS AS ROW START NOT NULL DEFAULT GETUTCDATE(),
Valid_Till datetime2 GENERATED ALWAYS AS ROW END NOT NULL DEFAULT CONVERT(DATETIME2, '9999.12.31');
ALTER TABLE dbo.Product SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.Product_History, DATA_CONSISTENCY_CHECK = ON))

 

History table data

Since base table loaded with 1 000 000 rows, it is a time to trigger filling of a history table.

Following script contains sequence of UPDATE and DELETE commands. Six commands will affect 100 000 random rows each.

-- SQL Server Execution Times:
 --   CPU time = 3 750 ms,  elapsed time = 10 467 ms.
UPDATE  dbo.Product
SET Weight = RAND()
WHERE ProductID in (
SELECT TOP (100000) ProductID FROM  dbo.Product
order by NEWID() )

 -- SQL Server Execution Times:
 --   CPU time =  4470 ms,  elapsed time = 12 842 ms.
UPDATE  dbo.Product
SET Weight = RAND()
WHERE ProductID in (
SELECT TOP (100000) ProductID FROM  dbo.Product
order by NEWID() )

 -- SQL Server Execution Times:
 --   CPU time = 3936 ms,  elapsed time = 5440 ms.
DELETE FROM  dbo.Product
WHERE ProductID in (
SELECT TOP (100000) ProductID FROM  dbo.Product
ORDER BY NEWID() )

-- SQL Server Execution Times:
--   CPU time = 4 640 ms,  elapsed time = 15 283 ms.
UPDATE  dbo.Product
SET Weight = RAND()
WHERE ProductID in (
SELECT TOP (100000) ProductID FROM  dbo.Product
order by NEWID() )

-- SQL Server Execution Times:
--   CPU time = 4 203 ms,  elapsed time = 9 773 ms.
UPDATE  dbo.Product
SET Weight = RAND()
WHERE ProductID in (
SELECT TOP (100000) ProductID FROM  dbo.Product
order by NEWID() )

--  SQL Server Execution Times:
--   CPU time = 3627 ms,  elapsed time = 2806 ms.
DELETE FROM  dbo.Product
WHERE ProductID in (
SELECT TOP (100000) ProductID FROM  dbo.Product
ORDER BY NEWID() )

Note that if “System Versioning” enabled, there is no way or need to perform operations on top of history table. Regular DML commands will do transparently required internal job on maintaining history table and keeping ValidFrom, ValidTill columns in an actual state:

exec sp_spaceused 'dbo.Product'
exec sp_spaceused 'dbo.Product_History'

Table_Space_User_2

Applying changes to the schema

This is the most crucial part of the post, it shows generic technique of how changes propagated. Three common patterns picked for testing:

  • Add a new column
  • Enlarge datatype of existing column
  • Drop a column
Add a new column

Therefore, the first item covers addition of new nullable column. Following script contains just four lines of the code and meaning of it pretty clear by the syntax perspective: firstly, versioning turned off, then changes to be applied to both tables: base and history and finally, system versioning enabled back.

Statistics output shows, that first three commands are pretty fast because changes performed on metadata level, however the last one require some data engine efforts to be completed. Those efforts are  consistency checks:

-- SQL Server Execution Times:
--   CPU time = 0 ms,  elapsed time = 0 ms.

-- SQL Server Execution Times:
--   CPU time = 0 ms,  elapsed time = 0 ms.

-- SQL Server Execution Times:
--   CPU time = 15 ms,  elapsed time = 2 ms.

-- SQL Server Execution Times:
--   CPU time = 0 ms,  elapsed time = 1 ms.

-- SQL Server Execution Times:
--   CPU time = 2578 ms,  elapsed time = 3343 ms.

ALTER TABLE Product SET (SYSTEM_VERSIONING = OFF)

ALTER TABLE Product ADD ColNew BIGINT NULL;
ALTER TABLE Product_History ADD ColNew BIGINT NULL;

ALTER TABLE Product
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=dbo.Product_History
, DATA_CONSISTENCY_CHECK = ON));

According to official documentation that check is not required if schema alteration wrapped by TRANSACTION:

Data check is unnecessary when the schema change is made within a transaction as no data changes can occur

Following script proves that, new column added by altering metadata only and schema change execution time is fantastic: 1 ms!:

-- SQL Server Execution Times:
--   CPU time = 0 ms,  elapsed time = 0 ms.

-- SQL Server Execution Times:
--   CPU time = 0 ms,  elapsed time = 1 ms.

-- SQL Server Execution Times:
--   CPU time = 0 ms,  elapsed time = 0 ms.

-- SQL Server Execution Times:
--   CPU time = 0 ms,  elapsed time = 0 ms.

BEGIN TRAN --!
ALTER TABLE Product SET (SYSTEM_VERSIONING = OFF)

ALTER TABLE Product ADD ColNew BIGINT NULL;
ALTER TABLE Product_History ADD ColNew BIGINT NULL;

ALTER TABLE Product
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=dbo.Product_History
, DATA_CONSISTENCY_CHECK = OFF)); -- DC turned off!!
COMMIT TRAN --!

Enlarge datatype of existing column

The second item in exploration – data type enlargement – it require changes on the data page level, so timings should be different:

-- SQL Server Execution Times:
-- CPU time = 0 ms, elapsed time = 1 ms.

-- SQL Server Execution Times:
-- CPU time = 1969 ms, elapsed time = 3843 ms.

-- SQL Server Execution Times:
-- CPU time = 4625 ms, elapsed time = 5520 ms.

-- SQL Server Execution Times:
-- CPU time = 0 ms, elapsed time = 0 ms.

BEGIN TRAN
ALTER TABLE Product SET (SYSTEM_VERSIONING = OFF)

ALTER TABLE Product ALTER COLUMN ProductModelID BIGINT
ALTER TABLE Product_History ALTER COLUMN ProductModelID BIGINT

ALTER TABLE Product
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=dbo.Product_History
, DATA_CONSISTENCY_CHECK = OFF));
COMMIT TRAN

The column type changed from INT to BIGINT. Both are fixed width types, therefore data engine has to do some work on page level of both objects, one by one. As you can see, alteration duration still stays within rational limits.

Drop a column

Finally, the third item to check – column deletion. By going forward, this alteration occurs on metadata level and shows very robust timings:

-- SQL Server Execution Times:
--   CPU time = 0 ms,  elapsed time = 1 ms.

-- SQL Server Execution Times:
--   CPU time = 0 ms,  elapsed time = 1 ms.

-- SQL Server Execution Times:
--   CPU time = 0 ms,  elapsed time = 1 ms.

-- SQL Server Execution Times:
--   CPU time = 0 ms,  elapsed time = 0 ms.

BEGIN TRAN
ALTER TABLE Product SET (SYSTEM_VERSIONING = OFF)
ALTER TABLE Product drop column ProductModelID
ALTER TABLE Product_History drop column ProductModelID 

ALTER TABLE Product
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=dbo.Product_History
, DATA_CONSISTENCY_CHECK = OFF));
COMMIT TRAN

 

What information described above can mean for SQL Server DWH/BI community?

  • Changes to schema can be applied in a consistent and very robust way by using DDL commands
  • Alteration of the schema doesn’t require any alteration of the logic related to system versioning, the same way, it doesn’t require table rebuild, data reload, checksum updates etc.
  • Base and history tables can have different indexing strategy, both tables can be partitioned. However, this point I going to cover in my next blog post.