Thursday, 31 January 2013

Optimising enumerations

Like most database engines, at the heart of Pyrrho is an implementation of B-Trees (not binary trees). In Pyrrho this is implemented by a generic abstract class ATree <K,V> for a tree of key-value pairs. There are lots of subclasses of ATree, some of which implement weakly-ordered or multilevel indexes.

The ATree method GetRowEnumerator returns a SlotEnumerator <K,V> that traverses the pairs of the tree in key order. There are two versions of this method, one of which supplies a Key for matching. This will enumerate all pairs where the key matches the given one. Now for a strongly-ordered tree (no key duplicates) the resulting enumeration will have 1 or zero entries (a TrivalEnumerator or an EmptyEnumerator) provided the key supplied will be a constant. By constant is meant "will not change during result-enumeration of any current query".

This is a very subtle and important point: Pyrrho uses partial evaluation so that a Column for values such as integers, shows just the current value, but this can change when an enumerator moves to the next row. Such values are obviously not constant, and so if the Key value supplied to GetRowEnumerator was such a value, while it would still be true that in each case there is either one or zero matching pairs in the tree, we need to check to find out which.

On the other hand, it is such an important optimisation to be able to replace an enumerator with a trivial or empty enumerator that it seems worth adding some machinery to the database engine to keep track of which expressions are constant. The illustration shows a code fragment from the database engine.

As a result of these considerations many structures (e.g. Column, TypedValue and all their subclasses) have an extra field or property with a name such as isConstant to speed up this determination.

Since the key K might be something very simple such as long or string, the IsConstant() method used in the illustration needs to be defined as an extension method. To my relief I find that Debian Squeeze supports the use of C# extension methods so henceforth Open-source Pyrrho OSP has moved back up to .NET 3.5. For Windows of course we currently use .NET 4.

Needless to say the above changes resulted in about 600 changes to the Pyrrho sources, and it is possible that some mistakes will need fixing. I have been doing quite a lot of testing and will continue to do so. For the next while there will be updates of Pyrrho roughly weekly.

Monday, 28 January 2013

Multisets and Arrays

This week's update makes a number of small alterations to Pyrrho's syntax, mostly to bring it better in line with the SQL standard. Specifically, array and multiset initialisation (if not using a subquery) uses [] instead of (). Subqueries are always introduced with a leading parenthesis as in (SELECT .., and are used in predicates such as IN, EXISTS, UNIQUE. The Open Source distribution has been moved back to .NET v2.0 so that it works with a larger set of Linux distributions.

Monday, 21 January 2013

Charting enhancements

This week's version of Pyrrho 4.8 has modified slightly the output flags that can be added to tables and columns for automatic charting. Table metadata can include Pie, Line, Points, Histogram. Column metadata can include X, Y and Caption. Legend can be usefully added to Pie and Histogram: if legend is not specified Pyrrho writes the captions directly into the chart. The colours are selected automatically. An example is shown here.
In this example the values in the table are added explicitly: in a real application this would be a presentation database and the table would be filled by some business analytics process. Feedback is always welcome on features of this sort.

Monday, 7 January 2013

Version 4.8 released

This brings standard SQL diagnostics management. Standard SQL exception handling includes three kinds of handlers for errors: CONTINUE, EXIT and UNDO. Pyrrho honours not_found if it is handled but otherwise does not report an error condition if DML operations do not affect any rows. Pyrrho's strict interpretation of correct transaction behaviour is supported by better prompts in the command line interface and a prohibition on using COMMIT inside a SQL routine (ROLLBACK is allowed but aborts the routine). The meaning of UNDO is very interesting: if this handler is activated, all of the database changes performed so far in the scope of the handler declaration are undone. This is a great improvement on ROLLBACK or EXIT as it allows for subtle and sophisticated error handling. Exceptions arising from most of Pyrrho's SQLSTATEs can be handled: you cannot handle such errors as damaged files, bad syntax, or attempts to defeat the transaction mechanism. In addition the keywords not_found and sqlexception etc are supported, and it is permissible to use your own signals (the usual rules for identifiers apply). The diagnostics area in the SQL standard is interesting. There is a set of keywords, including message_text, some of which have prescribed default values for each exception. You can override these using SET assignments in a SIGNAL statement, and collect their values by using GET DIAGNOSTICS, which allows assignment of such data identified by keywords to variables that you specify. In addition, a roundup of improvements results in better behaviour of such things as structured types. The philosophy nowadays is that client software should not need to know about how data types are implemented in the server, but that (as in XML or JSON) should simply give a full and readable description of structured data. The Pyrrho engine uses strong data types, so the server describes each part of the data by giving the nominal data type and the actual data subtype if it is different. For example, the actual type might include semantic information or metadata. Such actual types are visible to SQL and the user when they are explicitly assigned to the value using the RDF/OWL ^^ suffix or the SQL TREAT function. Unfortunately some previous cross-platform support for other APIs is now lost, although many of the interfaces in ADO.NET and JPA are still supported. For example, there seems to be no way for .NET's DataTable to support (nested) structured types, and so DataAdapters and some of the more complex JPA interfaces have been dropped in this edition. The venerable PyrrhoMgr tool is a casualty of this change. I still rather like the PyrrhoSQL tool but it needs someone other than me to improve its usability.