SQL Server 2016: Temporal Tables. Introduction

SQL Server 2016 CTP2 finally available for a public and it brings many new features, which are worth of checking.

However, one of those new initiatives, I believe, can bring some discussions in DWH/BI environments. The name of it is: Temporal table. It can be especially actual for a BENELUX area, where the DataVault methodology is highly adopted.

Foreword

Seems, Microsoft, according to the best company traditions, brings another naming confusion. Currently SQL Server ships  temporary tables and table variables. But, with a new release, temporal table is a new guy in a toolbox, however usage of the new tool is completely different:

A temporal table is a table for which a PERIOD definition exists and which contains system columns with a datatype of datetime2 into which the period of validity is recorded by the system, and which has an associated history table into which the system records all prior versions of each record with their period of validity. With a temporal table, the value of each record at any point in time can be determined, rather than just the current value of each record. A temporal table is also referred to as a system-versioned table. Read more in online documentation.

Playground preparation

First of all, new database has to be created with a compatibility level SQL 2016 (or 130):

-- step 1: Create database, tables and seed dummy data
CREATE DATABASE SQL2016Demo;
GO

USE SQL2016Demo;
CREATE TABLE dbo.Product
(
ProductID INT IDENTITY PRIMARY KEY,
ProductName VARCHAR(100) NOT NULL
)

CREATE TABLE dbo.Project
(
ProjectID INT IDENTITY PRIMARY KEY,
ProjectName VARCHAR(100) NOT NULL,
ProductID INT FOREIGN KEY REFERENCES dbo.Product(ProductID)
)
GO

INSERT INTO dbo.Product (ProductName) VALUES ('Product A');
INSERT INTO dbo.Project (ProjectName, ProductID)
VALUES ('Project A'
, ( SELECT ProductID FROM Product WHERE ProductName = 'Product A' )
);

SELECT * FROM dbo.Product;
SELECT * FROM dbo.Project;

GO

As a result, very simple data model prepared with Just one row per entity:

Temporal_tables_1

Meet new friend – “system versioning”

The most important part starts by executing following block of the code:

-- step 2: Enable system versionning
BEGIN TRAN

-- Product
ALTER TABLE dbo.Product
ADD PERIOD FOR SYSTEM_TIME (Valid_From, Valid_Till),
Valid_From datetime2 GENERATED ALWAYS AS ROW START NOT NULL DEFAULT GETUTCDATE(),
Valid_Till datetime2 GENERATED ALWAYS AS ROW END NOT NULL DEFAULT CONVERT(DATETIME2, '9999.12.31');

ALTER TABLE dbo.Product
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.Product_History, DATA_CONSISTENCY_CHECK = ON))

-- Project
ALTER TABLE dbo.Project
ADD PERIOD FOR SYSTEM_TIME (Valid_From, Valid_Till),
Valid_From datetime2 GENERATED ALWAYS AS ROW START NOT NULL DEFAULT GETUTCDATE(),
Valid_Till datetime2 GENERATED ALWAYS AS ROW END NOT NULL DEFAULT CONVERT(DATETIME2, '9999.12.31');

ALTER TABLE dbo.Project
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.Project_History, DATA_CONSISTENCY_CHECK = ON))

COMMIT TRAN

GO
-- check table content again:
SELECT * FROM dbo.Product;
SELECT * FROM dbo.Project;

As you can see, two system-maintained columns were added to both tables:

Temporal_tables_2

Yes, that is right, since columns are system-maintained they hold non-static values  and SQL Server will keep them actual on every DML operations. It can been shown in the following example:

-- step 3: update data in the tables
BEGIN TRAN

UPDATE dbo.Product
SET ProductName = 'Product B'
WHERE ProductID = 1

INSERT INTO dbo.Project (ProjectName, ProductID)
VALUES ('Project B', 1 );

COMMIT TRAN

SELECT * FROM dbo.Product;
SELECT * FROM dbo.Project;

Temporal_tables_3

As the result, name of the product changed to “Product B” and new project created. While data in a second dataset looks valid and both operations are reflected, data of the first dataset represents current state only, in other words, previous version of the row is not appearing. The reason is simple – by default, only current and active members are queryable. If previous version of the record has to be retrieved and shown, special construction can be used as a part of FROM clause:

-- Step 4: Query historic data
SELECT * FROM dbo.Product
FOR SYSTEM_TIME FROM '2000-01-01' to '9999-12-31'
Order by Valid_From;

Temporal_tables_4

This time, the result looks like expected and both versions of the row retrieved.

Make it working

Let’s put all information together and solve simple task: perform join of entities and retrieve historic information, so data should be valid for some particular point in time.

Generic query looks  little bit overwhelmed, since sub-queries to versioned data written as derived tables. The first dataset is retrieving state of data at moment of initial seeding:

DECLARE @dtPointInHistory DATETIME2 = '2015-05-31 18:13:10.09'

SELECT prj.ProjectID, prj.ProjectName, prd.ProductName
-- wrapper for a project
FROM (
SELECT ProjectID, ProjectName, ProductID FROM dbo.Project
FOR SYSTEM_TIME AS OF @dtPointInHistory
) prj
-- wrapper for a product
JOIN (
SELECT ProductID, ProductName FROM dbo.Product
FOR SYSTEM_TIME AS OF @dtPointInHistory
) prd ON prj.ProductID = prd.ProductID

Temporal_tables_6

The result looks correct, since only one row per entity had added and product still had name: “Product A”.

In order to retrieve actual state of data,  current timestamp assigned to the variable:

DECLARE @dtPointInHistory DATETIME2 = GETUTCDATE()

Temporal_tables_5

The result looks as expected again – product name changed and two projects assigned to that single product.

What can Temporal Tables mean for SQL Server DWH/BI community?

In short, this feature can serve as a foundation brick for DWH system. Just couple of simple steps required to make entire solution working:

  • build a plain, relational, but not over-normalized datamodel;
  • enable system versioning on every entity of the datamodel;
  • load data directly to entities by using MERGE statements, wrapped in a single transaction;

That is it! No need to model and create structures like links, satellites, then satellites for links, then PIT tables for links and their satellites just to store historical information of couple of entities and then build even more complex queries to retreive data. No need also to create specialized DWH feeding ETLs by itself, which are supposed to load all that variety of object in a consistent way.

In next posts I going to cover how temporal tables can deal with two other common painful points of  datawarehousing:  model schema change and performance.