Friday 11 May 2012

Bitemporal data and SQL2011

Pyrrho 4.6 is uploaded today (and updated 12 May), which contains an implementation of system and application versioning as now enshrined in the December 2011 version of the SQL Standard. In fact, the SQL2011 standard does not quite get as far as supporting bitemporal data, but Pyrrho 4.6 does, and it is worth spending a little time to explain what is happening.

In a bitemporal table each row has 4 timestamps associated with it: a logical starting time for the data, a logical finishing time, and the system transaction times for creation and deletion of the row. An update operation involves creation of a row with the new values and deletion of the old one. Most of the time, we are only concrened with current data (i.e. the deletion time is in the future), but there should be mechanisms for accessing the historical data.

First, SQL2011 supports the idea of Period types. A period consists of a pair of base table columns of the same date type, with the first column implicitly constrained to be less than the second. There is a set of period predicates (succeeds, immediately precedes etc). A base table can have two periods defined: one for system time, and one for application time. However, the system-time columns are required to be defined, using new syntax, as "generated always as row start" and "generated always as row end". A base table with the system_time period defined can be declared as "with system versioning".

In ordinary use a system-versioned table appears to behave just like an ordinary table. All visible rows have the deletion time set to the maximum date (e.g. 31 December 9999). But if the table is referenced in a query, a system period can be specified, using new syntax: FOR SYSTEM_TIME AS OF .. , FOR SYSTEM_TIME BETWEEN ..., and FOR SYSTEM TIME FROM ... This allows table data to be viewed as it was at a previous time (the only difference being that the deletion time will no longer be the maximum date). The system also works even if the table has no primary key. However, im all cases the SQL2011 syntax limits the visibility of historical rows to a single time period.

Pyrrho has combined these new features with its own temporal tables concept to support a full implementation of bitempotal tables, temporal joins etc. As a start, in Pyrrho if T is a system versioned table that does have a primary key, T FOR SYSTEM_TIME (without the AS OF/etc specifications) is a temporal table whose row start column is the temporal key component, and the other components of T FOR SYSTEM_TIME's key are the primary key of T.

Secondly, if T has a primary key, and an application time period P (S,E) and the E column is generated always as row next, then Pyrrho allows T to be declared as WITH APPLICATION VERSIONING, and application time periods can be specified similarly to system time periods, by syntax such as FOR P AS OF/BETWEEN/FROM. This allows historical rows to be retrieved according to the application-defined time concept. In addition, T FOR P (without the AS OF/BETWEEN/FROM specification) is a temporal table whse temporal key is the first column of P.

A table can have both system and application versioning. In this way, although a bitemporal table has four timestamps associated with a row, only one of these needs to be set by the application, and the others can be managed automatically. (If you want to retian full control of the values of timestamps, then don’t specify automatic versioning: you can still use the SQL2011’s period predicates and Pyrrho’s temporal machinery.)

For example, consider the following bitemporal table:

Create table employee(
    name char primary key, 
    post char, 
    salary int, 
    effective date, 
    till date generated always as row next, 
    entered timestamp generated always as row start, 
    superseded timestamp generated always as row end, 
    period for system_time(entered, superseded), 
    period for details(effective, till)) 
    with application versioning 
    with system versioning

Insert into employee(name,post,salary,effective) 
values ('Joe Bloggs','Cleaner',9600,date'2006-01-07'), 
       ('Mary Smith','HR Director',58005,date'2006-02-01')

Update employee set post='Janitor',salary=10416,effective=date'2007-07-11' where name='Joe Bloggs'

Update employee set salary=61050,effective=date'2009-11-27' where name='Mary Smith'

Insert into employee(name,post,salary,effective) values ('Chris Black','Secretary',12345,date'2009-12-03')

Delete from employee where name='Joe Bloggs'
Then "table employee" contains just the current employees and their current posts and salaries:
We can get similar data as the table stood at a past point in time (this is using transaction times, for this sample data looks a bit strange):
It is perhaps more useful to filter for a past transaction time:
Finally we can work with the complete historical record if we wish:
A subtle point is that UDATE statements that change nothing are silently ignored. This is not specifid in SQL2011 but seems harmless enough. These extensions in Pyrrho only work on base tables. In principle they should work also on subqueries and this will be tackled soon in Pyrrho.