In this post I would like to highlight couple of behavioral aspects of temporal tables which was observed during test drives. Firstly, is a demonstration of how temporal tables queried by exposing live query statistics of the SELECT. The other item is about how system versioning tracks updates for cases when values are really not changed.
The highlighted results are based on pre-release version. Behavior described here is a subject for possible change. Version of SQL Server:
-- Microsoft SQL Server 2016 (CTP2.1) - 13.0.300.44 (X64)
Query plan of the SELECT and Live Query Statistics
New SQL Server Management Studio feature, called Live Query Statistics, used to capture real flow of query execution. Following animated GIF file shows that current and history data retrieved sequential way within a query, even if query plan uses parallelism.
The picture is clickable and overall duration is about 20 seconds:
This behavior is not actually an issue, however it shows possible room for improvement.
False updates or when column values are updated, but keep still original values
During tests and benchmarks I noticed default behavior, which can be really undesirable in some scenarios – all updates are logged to a history table, even if values were updated to the same values. Short demo:
Temporal Tables is an important and anticipated feature of upcoming SQL Server 2016, it definitely worth series of blogs and articles which I and other bloggers written recently. One of the aspects last weeks covered on this site is a performance. This time I would like to go even further by enlarging 10xtimes the dataset and provide brief execution results.
Changes in benchmarking
Benchamrks based on the approach used in previous posts, but with some key differences:
Number of rows in dbo.Product increased from 1 000 000 to 10 000 000
DataFiller column datatype narrowed from CHAR(1000) to CHAR(100)
DMLs Insert/Update batch size increased from 100 000 to 1 000 000 rows. Therefore, history table holds rows of two UDDATE batches and two of DELETEs
Queries remained the same as in previous benchmarks. They also available on a github for tests reproduction.
Table 1. Disk space usage per object type in megabytes:
Raw data (uncompressed rowstore)
Scenario 1: rowstore objects (default)
Scenario 2: rowstore current and columnstore history
The “scenario 1” has been explored already last time – I measured overall influence of usage system versioning on top of classical clustered rowstore indexes. This post will mainly be focused on usage of columnstore indexes with system versioning and also I’ll wrap up test results into comparison tables.
Dataset for testing
For the tests, I going to reuse data generation script used previously. However, depends on the scenario, various indexes have to be replaced by columnstore alternatives.
The dataset hold one million rows and has a size about 1,3 GB of raw heap pages.
The Microsoft made an important and relevant update in an official documentation for a topic of discussion during this post editing:
An optimal indexing strategy will include a clustered columns store index and / or a B-tree rowstore index on the current table and a clustered columnstore index on the history table for optimal storage size and performance. If you create / use your own history table, we strongly recommend that you create such an index that consists of the primary keys from the current table plus the period columns to speed up temporal querying as well as the queries that are part of the data consistency check. If your history table is a rowstore, we recommend a clustered rowstore index. The default history table has a clustered rowstore index created for you. At a minimum, a non-clustered rowstore index is recommended.
As the result, there are couple of scenarios to benchmark:
For each scenario, time to load/update data and query performance measured.
In the previous blog post, I made an introduction to basic concepts behind temporal tables. This time I would like to cover, in a little bit more detailed way, one difficult topic of data warehousing – schema change: how it can be applied to temporal tables and what impact on the system it can bring.
This post contains performance measurements of certain DDL commands. However, absolute value of such metrics makes not so much sense due to no-reproducibility of execution environment on the reader side. Therefore, I will provide command execution time not just for schema change operations, but also for data load, index creation, data manipulation commands, so all timespans can be matched with each other in order to make overall understanding about the impact.
When it comes to characteristics of execution environment, it is Hyper-V guest with 4 cores, 4 GB RAM on HP ZBook /W8.1 /Core i7: