Thursday, 31 January 2013

Optimising enumerations

Like most database engines, at the heart of Pyrrho is an implementation of B-Trees (not binary trees). In Pyrrho this is implemented by a generic abstract class ATree <K,V> for a tree of key-value pairs. There are lots of subclasses of ATree, some of which implement weakly-ordered or multilevel indexes.

The ATree method GetRowEnumerator returns a SlotEnumerator <K,V> that traverses the pairs of the tree in key order. There are two versions of this method, one of which supplies a Key for matching. This will enumerate all pairs where the key matches the given one. Now for a strongly-ordered tree (no key duplicates) the resulting enumeration will have 1 or zero entries (a TrivalEnumerator or an EmptyEnumerator) provided the key supplied will be a constant. By constant is meant "will not change during result-enumeration of any current query".

This is a very subtle and important point: Pyrrho uses partial evaluation so that a Column for values such as integers, shows just the current value, but this can change when an enumerator moves to the next row. Such values are obviously not constant, and so if the Key value supplied to GetRowEnumerator was such a value, while it would still be true that in each case there is either one or zero matching pairs in the tree, we need to check to find out which.

On the other hand, it is such an important optimisation to be able to replace an enumerator with a trivial or empty enumerator that it seems worth adding some machinery to the database engine to keep track of which expressions are constant. The illustration shows a code fragment from the database engine.

As a result of these considerations many structures (e.g. Column, TypedValue and all their subclasses) have an extra field or property with a name such as isConstant to speed up this determination.

Since the key K might be something very simple such as long or string, the IsConstant() method used in the illustration needs to be defined as an extension method. To my relief I find that Debian Squeeze supports the use of C# extension methods so henceforth Open-source Pyrrho OSP has moved back up to .NET 3.5. For Windows of course we currently use .NET 4.

Needless to say the above changes resulted in about 600 changes to the Pyrrho sources, and it is possible that some mistakes will need fixing. I have been doing quite a lot of testing and will continue to do so. For the next while there will be updates of Pyrrho roughly weekly.

Monday, 28 January 2013

Multisets and Arrays

This week's update makes a number of small alterations to Pyrrho's syntax, mostly to bring it better in line with the SQL standard. Specifically, array and multiset initialisation (if not using a subquery) uses [] instead of (). Subqueries are always introduced with a leading parenthesis as in (SELECT .., and are used in predicates such as IN, EXISTS, UNIQUE. The Open Source distribution has been moved back to .NET v2.0 so that it works with a larger set of Linux distributions.

Monday, 21 January 2013

Charting enhancements

This week's version of Pyrrho 4.8 has modified slightly the output flags that can be added to tables and columns for automatic charting. Table metadata can include Pie, Line, Points, Histogram. Column metadata can include X, Y and Caption. Legend can be usefully added to Pie and Histogram: if legend is not specified Pyrrho writes the captions directly into the chart. The colours are selected automatically. An example is shown here.
In this example the values in the table are added explicitly: in a real application this would be a presentation database and the table would be filled by some business analytics process. Feedback is always welcome on features of this sort.

Monday, 7 January 2013

Version 4.8 released

This brings standard SQL diagnostics management. Standard SQL exception handling includes three kinds of handlers for errors: CONTINUE, EXIT and UNDO. Pyrrho honours not_found if it is handled but otherwise does not report an error condition if DML operations do not affect any rows. Pyrrho's strict interpretation of correct transaction behaviour is supported by better prompts in the command line interface and a prohibition on using COMMIT inside a SQL routine (ROLLBACK is allowed but aborts the routine). The meaning of UNDO is very interesting: if this handler is activated, all of the database changes performed so far in the scope of the handler declaration are undone. This is a great improvement on ROLLBACK or EXIT as it allows for subtle and sophisticated error handling. Exceptions arising from most of Pyrrho's SQLSTATEs can be handled: you cannot handle such errors as damaged files, bad syntax, or attempts to defeat the transaction mechanism. In addition the keywords not_found and sqlexception etc are supported, and it is permissible to use your own signals (the usual rules for identifiers apply). The diagnostics area in the SQL standard is interesting. There is a set of keywords, including message_text, some of which have prescribed default values for each exception. You can override these using SET assignments in a SIGNAL statement, and collect their values by using GET DIAGNOSTICS, which allows assignment of such data identified by keywords to variables that you specify. In addition, a roundup of improvements results in better behaviour of such things as structured types. The philosophy nowadays is that client software should not need to know about how data types are implemented in the server, but that (as in XML or JSON) should simply give a full and readable description of structured data. The Pyrrho engine uses strong data types, so the server describes each part of the data by giving the nominal data type and the actual data subtype if it is different. For example, the actual type might include semantic information or metadata. Such actual types are visible to SQL and the user when they are explicitly assigned to the value using the RDF/OWL ^^ suffix or the SQL TREAT function. Unfortunately some previous cross-platform support for other APIs is now lost, although many of the interfaces in ADO.NET and JPA are still supported. For example, there seems to be no way for .NET's DataTable to support (nested) structured types, and so DataAdapters and some of the more complex JPA interfaces have been dropped in this edition. The venerable PyrrhoMgr tool is a casualty of this change. I still rather like the PyrrhoSQL tool but it needs someone other than me to improve its usability.

Wednesday, 5 December 2012

Version 4.7 published

This version improves transaction handling by allowing an explicit ROLLBACK to terminate an SQL routine such as a store procedure. (COMMIT is not allowed inside a routine since it would cause premature termination of the routine.) New error messages for this behaviour means that the client libraries OSPLink and PyrrhoLink have been updated. Transaction behaviour is more transparent in the command prompt tool PyrrhoCmd, for example by the use of a special prompt SQL-T> inside transactions. Other fixes help with tables constraints, parsing of error handlers and default values for strings.

Monday, 30 July 2012

CQRS: Command Query Responsibility Segregation

There is currently a lot of interest in the above concept, e.g. at CQRS info, which basically looks at separating database reads (queries) from writes (commands) when considering scaability and replication. A June 2012 Channel 9 video in MSDn by Grigori Melnik and Julian Dominguez included the first picture here:
In 2010, in a presentation on "A cloud database with transactions" at DB Tech, Helsinki, I included the second picture here on how Pyrrho's cloud version could do the same thing:
The really important aspect here is the convergence between the ideas involved in long-running transactions and eventual consistency with the optimistic transaction model. It offers some hope in the current nightmare world of no-transactions/no-SQL nonsense. For Pyrrho's approach see my 2010 blog post.

Friday, 11 May 2012

Bitemporal data and SQL2011

Pyrrho 4.6 is uploaded today (and updated 12 May), which contains an implementation of system and application versioning as now enshrined in the December 2011 version of the SQL Standard. In fact, the SQL2011 standard does not quite get as far as supporting bitemporal data, but Pyrrho 4.6 does, and it is worth spending a little time to explain what is happening.

In a bitemporal table each row has 4 timestamps associated with it: a logical starting time for the data, a logical finishing time, and the system transaction times for creation and deletion of the row. An update operation involves creation of a row with the new values and deletion of the old one. Most of the time, we are only concrened with current data (i.e. the deletion time is in the future), but there should be mechanisms for accessing the historical data.

First, SQL2011 supports the idea of Period types. A period consists of a pair of base table columns of the same date type, with the first column implicitly constrained to be less than the second. There is a set of period predicates (succeeds, immediately precedes etc). A base table can have two periods defined: one for system time, and one for application time. However, the system-time columns are required to be defined, using new syntax, as "generated always as row start" and "generated always as row end". A base table with the system_time period defined can be declared as "with system versioning".

In ordinary use a system-versioned table appears to behave just like an ordinary table. All visible rows have the deletion time set to the maximum date (e.g. 31 December 9999). But if the table is referenced in a query, a system period can be specified, using new syntax: FOR SYSTEM_TIME AS OF .. , FOR SYSTEM_TIME BETWEEN ..., and FOR SYSTEM TIME FROM ... This allows table data to be viewed as it was at a previous time (the only difference being that the deletion time will no longer be the maximum date). The system also works even if the table has no primary key. However, im all cases the SQL2011 syntax limits the visibility of historical rows to a single time period.

Pyrrho has combined these new features with its own temporal tables concept to support a full implementation of bitempotal tables, temporal joins etc. As a start, in Pyrrho if T is a system versioned table that does have a primary key, T FOR SYSTEM_TIME (without the AS OF/etc specifications) is a temporal table whose row start column is the temporal key component, and the other components of T FOR SYSTEM_TIME's key are the primary key of T.

Secondly, if T has a primary key, and an application time period P (S,E) and the E column is generated always as row next, then Pyrrho allows T to be declared as WITH APPLICATION VERSIONING, and application time periods can be specified similarly to system time periods, by syntax such as FOR P AS OF/BETWEEN/FROM. This allows historical rows to be retrieved according to the application-defined time concept. In addition, T FOR P (without the AS OF/BETWEEN/FROM specification) is a temporal table whse temporal key is the first column of P.

A table can have both system and application versioning. In this way, although a bitemporal table has four timestamps associated with a row, only one of these needs to be set by the application, and the others can be managed automatically. (If you want to retian full control of the values of timestamps, then don’t specify automatic versioning: you can still use the SQL2011’s period predicates and Pyrrho’s temporal machinery.)

For example, consider the following bitemporal table:

Create table employee(
    name char primary key, 
    post char, 
    salary int, 
    effective date, 
    till date generated always as row next, 
    entered timestamp generated always as row start, 
    superseded timestamp generated always as row end, 
    period for system_time(entered, superseded), 
    period for details(effective, till)) 
    with application versioning 
    with system versioning

Insert into employee(name,post,salary,effective) 
values ('Joe Bloggs','Cleaner',9600,date'2006-01-07'), 
       ('Mary Smith','HR Director',58005,date'2006-02-01')

Update employee set post='Janitor',salary=10416,effective=date'2007-07-11' where name='Joe Bloggs'

Update employee set salary=61050,effective=date'2009-11-27' where name='Mary Smith'

Insert into employee(name,post,salary,effective) values ('Chris Black','Secretary',12345,date'2009-12-03')

Delete from employee where name='Joe Bloggs'
Then "table employee" contains just the current employees and their current posts and salaries:
We can get similar data as the table stood at a past point in time (this is using transaction times, for this sample data looks a bit strange):
It is perhaps more useful to filter for a past transaction time:
Finally we can work with the complete historical record if we wish:
A subtle point is that UDATE statements that change nothing are silently ignored. This is not specifid in SQL2011 but seems harmless enough. These extensions in Pyrrho only work on base tables. In principle they should work also on subqueries and this will be tackled soon in Pyrrho.