SQL Server 2016: Temporal tables. Indexing. Part II. Columnstore tables

This is a final post related to a Temporal Tables and I will cover columnstore indexes usage scenarios. In the previous blog performance of rowstore indexes covered as well as schema alteration and general introduction to this new feature.

Remaining scenarios to test:

The “scenario 1” has been explored already last time – I measured overall influence of usage system versioning on top of classical clustered rowstore indexes. This post will mainly be focused on usage of columnstore indexes with system versioning and also I’ll wrap up test results into comparison tables.

Dataset for testing

For the tests, I going to reuse data generation script used previously. However, depends on the scenario, various indexes have to be replaced by columnstore alternatives.

The dataset hold one million rows and has a size about 1,3 GB of raw heap pages.

Scenario 2: Current table is rowstore and history is columnstore

This case is quite similar to scenario 1, since current table remains unchanged and uses a classical rowstore data storage format. When it comes to historical table – it has clustered columnstore index, therefore, more efficient for both: ad-hoc querying and disk space storage perspective.

Transformation process of the history table from a rowstore to a columnstore format performed by replacing clustered index:

DROP INDEX [ix_Product_History]
ON [dbo].[Product_History] WITH ( ONLINE = OFF )

CREATE CLUSTERED COLUMNSTORE INDEX [cci_Product_History]
ON [dbo].[Product_History] WITH (DROP_EXISTING = OFF)
ON [PRIMARY]

As the result, new index created, which is built on top of all columns, including a “filler” with a type CHAR(1000):

EXEC sp_helpindex '[dbo].[Product_History]'
01. Scenario 2. Indexes
01. Scenario 2. Indexes

 

DML operations performance

This part measures timings necessary for massive DML operations. Every command updates 100 000 rows. Difference of this scenario from the previous in a history table storage, which is columnstore index now, so timings also going to be different:

-- 4 seconds
UPDATE  dbo.Product
SET Weight = RAND()
WHERE ProductID
in ( SELECT ProductID FROM  dbo.Product_SET1 )

 -- 3 seconds
DELETE FROM  dbo.Product
WHERE ProductID
in ( SELECT ProductID FROM  dbo.Product_SET2 )

--2 seconds
UPDATE  dbo.Product
SET Weight = RAND()
WHERE ProductID
in ( SELECT ProductID FROM  dbo.Product_SET3 )

-- 4 seconds
DELETE FROM  dbo.Product
WHERE ProductID
in ( SELECT ProductID FROM  dbo.Product_SET4 )

 

The total execution time of the DML script is about 13 seconds only, which is just half of the time required to run the same data modification of the previous scenario. Such visible time reduction can be explained by the fact that in case of rowstore history table, it was based on a clustered index with PAGE compression. In current scenario, where columnstore index is in use, modified data is actually  belongs to a “delta store” which is simple uncompressed heap which by  will transform after reaching certain threshold into a real columnstore structure  by a Tuple Mover.

The Tuple Mover can also be triggered a scheduled way by data engine, however, index rebuild will have the same effect, as the result I can measure performance of the pure columnstore structures without waiting for internal processes to be finished:

ALTER INDEX [cci_Product_History] ON [dbo].[Product_History] REBUILD

SELECTs performance

For testing queries over historical data, I will use timestamp of the second operation:

SELECT DISTINCT [Valid_From] FROM dbo.Product_History
--2015-06-20 20:32:06.4259515
--2015-06-20 20:30:10.8270000 -- selected
--2015-06-20 20:30:27.2485649

The very first command is non-historic and results are nearly the same as in the first scenario. Simple retrieval of current version of rows with ProductID up to 10000:

-- Table 'Product'. Scan count 5, logical reads 8955, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times:  CPU time = 32 ms,  elapsed time = 11 ms.

SELECT SUM([StandardCost]) FROM dbo.Product
WHERE ProductID BETWEEN 1 AND 100000

 

02. Scenario 2. Query 1. Execution plan
02. Scenario 2. Query 1. Execution plan

The second query perform the same select, however it takes data, which was actual for a moment of the second DML operation: ‘2015-06-20 20:30:10.8270000’:

-- Table 'Product_History'. Scan count 4, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 873, lob physical reads 0, lob read-ahead reads 0.
-- Table 'Product_History'. Segment reads 1, segment skipped 0.
-- Table 'Product'. Scan count 5, logical reads 8955, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

-- SQL Server Execution Times:   CPU time = 31 ms,  elapsed time = 19 ms.

SELECT SUM([StandardCost]) FROM dbo.Product
FOR SYSTEM_TIME AS OF '2015-06-20 20:30:10.8270000'
WHERE ProductID BETWEEN 1 AND 100000
03. Scenario 2. Query 2. Execution plan
03. Scenario 2. Query 2. Execution plan

According to results, execution time is also not very different from corresponding result of the scenario 1

The next set of queries supposed to trigger clustered index scan, as no indexed columns are involved in the SELECT text and query, firstly current:

-- Table 'Product'. Scan count 5, logical reads 182852, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times:  CPU time = 406 ms,  elapsed time = 131 ms.

SELECT SUM([StandardCost]) FROM dbo.Product
WHERE Weight > 0.5

The query plan looks trivial again. However, the number of logical reads is even larger, CLUSTERED INDEX SCAN operator can explain that:

Query 3. Execution plan
Query 3. Execution plan

The last query is to repeat logic of the previous SELECT, but data to retrieve is historical, with one difference between current scenario and previous – columnstore index covered ALL columns, including querying ones:

-- Table 'Product_History'. Scan count 4, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 111, lob physical reads 0, lob read-ahead reads 0.
-- Table 'Product_History'. Segment reads 1, segment skipped 0.
-- Table 'Product'. Scan count 5, logical reads 182855, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times:  CPU time = 427 ms,  elapsed time = 134 ms.

select sum([StandardCost]) from dbo.Product
FOR SYSTEM_TIME AS OF '2015-06-20 20:30:10.8270000'
where Weight > 0.5
05. Scenario 2. Query 4. Execution plan
05. Scenario 2. Query 4. Execution plan

The execution plan looks simple again, both tables: base and historical, scanned so STATISTICS output reflects number of scanned pages of corresponding objects. The difference of this scenario in that history table scanned in a batch mode and overhead of querying history data reduced from 5% to nearly zero according to query plan.

 

Scenario 3: Current and history tables are columnstore based objects

This scenario is the most radically different from the default one, however it can be very efficient in BI/DWH cases, especially if loads are performed on batch basis with a possible “after-load” index rebuilds.

Changes on index level

First step is to create clustered column store index:

CREATE CLUSTERED COLUMNSTORE INDEX [CCI_Product] ON [dbo].[Product] WITH (DROP_EXISTING = OFF)

Next step is to add a primary key, since it is a basic requirement of the feature.

Currently, within SQL Server 2016 CTP2 is no possibility to create it by using SSMS:

06. SSMS 2016 CTP2 doesn’t support primary key creation on top of columnstore clustered index
06. SSMS 2016 CTP2 doesn’t support primary key creation on top of columnstore clustered index

Nevertheless, such constraint is possible to create via T-SQL:

-- 1 second
ALTER TABLE dbo.Product ADD CONSTRAINT PK_Product PRIMARY KEY NONCLUSTERED
(
    ProductID
) ON [PRIMARY]

 

When it comes to alteration of indexes on the history table, it remains unchanged from the scenario 2 – rowstore clustered index has to be replaced by columnstore alternative:

DROP INDEX [ix_Product_History] ON [dbo].[Product_History] WITH ( ONLINE = OFF )

CREATE CLUSTERED COLUMNSTORE INDEX [cci_Product_History] ON [dbo].[Product_History] WITH (DROP_EXISTING = OFF) ON [PRIMARY]

As the result, new indexes created, data structure of both tables is columnstore, however, rowstore non-clustered index added on a current table as a primary key:

EXEC sp_helpindex '[dbo].[Product]'
EXEC sp_helpindex '[dbo].[Product_History]'
07. Scenario 3. Indexes overview
07. Scenario 3. Indexes overview
DML operations performance

The same workload as in scenario 1 and 2 performed:

-- 13 seconds
UPDATE  dbo.Product
SET Weight = RAND()
WHERE ProductID in ( SELECT ProductID FROM  dbo.Product_SET1 )

-- 6 seconds
DELETE FROM  dbo.Product
WHERE ProductID in ( SELECT ProductID FROM  dbo.Product_SET2 )

-- 9 seconds
UPDATE  dbo.Product
SET Weight = RAND()
WHERE ProductID in ( SELECT ProductID FROM  dbo.Product_SET3 )

-- 5 seconds
DELETE FROM  dbo.Product
WHERE ProductID in ( SELECT ProductID FROM  dbo.Product_SET4 )

This time the total execution time of the DML script took 33 seconds, which is the highest duration across scenarios.

Tuple Mover and delta store. In order to make net measurements and exclude other factors, indexes are simply going to be rebuilt:

ALTER INDEX ALL ON [dbo].[Product] REBUILD
ALTER INDEX ALL ON [dbo].[Product_History] REBUILD

 

SELECTs performance

The first query is a trivial and non-historical, however it demonstrates remarkable performance, duration of which dropped from 31 to just 1ms. The reason is columnstore-clustered indexes, which are tuned for aggregate operations.

-- Table 'Product'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 556, lob physical reads 0, lob read-ahead reads 0.
-- Table 'Product'. Segment reads 1, segment skipped 0.
-- SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.

SELECT SUM([StandardCost]) FROM dbo.Product
WHERE ProductID BETWEEN 1 AND 100000
08. Scenario 3. Query 1. Execution plan
08. Scenario 3. Query 1. Execution plan

The second query repeats logic of the first; however, it is a historic one:

-- Table 'Product_History'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 291, lob physical reads 0, lob read-ahead reads 0.
-- Table 'Product'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 570, lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times:   CPU time = 0 ms,  elapsed time = 9 ms.

SELECT SUM([StandardCost]) FROM dbo.Product
FOR SYSTEM_TIME AS OF '2015-06-22 13:34:39.7962445'
WHERE ProductID BETWEEN 1 AND 100000
09. Scenario 3. Query 2. Execution plan
09. Scenario 3. Query 2. Execution plan

The next set of queries supposed to trigger clustered index scan, as no indexed columns are involved in the SELECT text and query. However, this statement was completely true only for the previous scenarios, when current table based on a rowstore, but not a this one, when all objects are columnstore based and all columns are indexed by default.

Firstly, querying current data:

-- Table 'Product'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 17, lob physical reads 0, lob read-ahead reads 0.
-- Table 'Product'. Segment reads 1, segment skipped 0.
-- SQL Server Execution Times:  CPU time = 15 ms,  elapsed time = 30 ms.

SELECT SUM([StandardCost]), COUNT(*) FROM dbo.Product
WHERE Weight > 0.5

This query shows an order of magnitude improvement of the results in comparison with a rowstore based current tables; however, it is still not that good as the first query of this scenario, where result measured in 1ms. The query plan shows the reason:

10. Scenario 3. Query 3. Original execution plan
10. Scenario 3. Query 3. Original execution plan

Root of the issue is in Execution Mode – it set as “Row”, instead of “Batch”. Such behavior is unexpected, since data engine starting SQL 2014 can deal correctly with scalar aggregates and process them in a batch mode. Let’s hope it will be solved by Microsoft in upcoming cumulative updates.

I going to add a GROUP BY clause in order to run aggregates in a non-scalar way:

-- Table 'Product'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 562, lob physical reads 0, lob read-ahead reads 0.
-- Table 'Product'. Segment reads 1, segment skipped 0.
-- SQL Server Execution Times:   CPU time = 0 ms,  elapsed time = 2 ms.
SELECT MakeFlag, SUM([StandardCost]) FROM dbo.Product
WHERE Weight > 0.5
GROUP BY MakeFlag

Query execution time dropped from 30 to 2 ms and it is 70x faster in a current scenario than in previous two’s. Execution plan shows BATCH mode now:

11. Scenario 3. Query 3. Alternative execution plan
11. Scenario 3. Query 3. Alternative execution plan

The last one is a copy of previous, but it queries historic data:

-- Table 'Product_History'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 37, lob physical reads 0, lob read-ahead reads 0.
-- Table 'Product'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 31, lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times:   CPU time = 0 ms,  elapsed time = 2 ms.

SELECT SUM([StandardCost]) FROM dbo.Product
FOR SYSTEM_TIME AS OF '2015-06-22 13:34:39.7962445'
WHERE WEIGHT > 0.5

Just to make it clear. The query runs over a dataset, which is in raw format exceeds 1,0 GB for a current data and 0,5 GB for a history. Columnstore indexes created on top of these datasets. Historical queries over such datasets took just 2 milliseconds to execute. Execution plan:

12. Scenario 3. Query 4. Execution plan
12. Scenario 3. Query 4. Execution plan

Interesting, that query optimizer chooses correct execution mode for a historic version of the query, however it used row mode for non-historic one, I executed in a previous test.

Just for the case, number of rows retrieved by query 4:

-- 448 849
SELECT COUNT(*) FROM dbo.Product
FOR SYSTEM_TIME AS OF '2015-06-22 13:34:39.7962445'
WHERE WEIGHT > 0.5

 

Numbers and conclusions

Firstly, numbers. Following information split by three logical groups of measurement: disk space usage, DML operations duration and query performance.

Disk space usage per object type in megabytes

Consolidated data set shows clearly that scenario 3 is the most efficient option of the section.

Current table History table Total
Raw data (uncompressed rowstore) 1080 506 1586
Scenario 1: rowstore objects 1080 30 1110
Scenario 2: rowstore current and columnstore history 1080 17 1097
Scenario 3: columnstore objects 56 17 73

Chart 01. Disk space usage

Interesting point that rowstore history object has PAGE compression by default, which make it’s size comparable to the columnstore alternative.

 

Duration of DML operations per scenario in seconds

Following table shows overall time spent to generate changes of 400 000 rows in a current table and use that information to fill a  history table

Total time spent
Scenario 1: rowstore objects 25
Scenario 2: rowstore current and columnstore history 13
Scenario 3: columnstore objects 33

Chart 02. Duration of DML operations

This section has no surprises, second scenario is the most efficient.

 

SELECTs performance per scenario in milliseconds

The most dramatic part of the benchmark, since it shows how high a columnstore based third scenario can bring an overall query performance, especially if queries are aggregates on non-predefined set of columns. Despite that dataset based on one million of rows, execution time still measured in milliseconds.

Query 1 Query 2 Query 3 Query 4
Scenario 1: rowstore objects 9 11 131 175
Scenario 2: rowstore current and columnstore history 11 19 131 134
Scenario 3: columnstore objects 1 9 30 2

 

Chart 3. SELECT operation performance

Again, “scenario 3” queries are obvious champions, nevertheless even they can run faster. As an example, query 3, which still used ROW execution mode, which leads to 20-30x times degradation in comparison with the same query, but executed using history data. I hope that noticed issue will be fixed by Microsoft in an upcoming preview releases.

 

Conclusions

  • System versioning or temporal tables is very useful and pretty well optimized feature which can be used as either: an audit tool right in OLTP database and as foundation for building historic data-warehouses.
  • Despite the fact, that system versioning works perfect with rowstore format of the data, the columnstore indexes can improve performance by couple of orders of magnitude. Especially on large data sets and especially with a big number of historical records.
  • The columnstore indexes not only good from performance perspective, but also efficient by disk usage, since they provide better ratio than rowstore PAGE type of data compression.
  • SQL Server 2016 sometimes uses ROW execution mode for a columstore clustered index based tables and queries with scalar aggregates. Such limitation has been eliminated in SQL Server 2014, however, during the benchmark, it was still captured.