Thursday 1 July 2021

RowSet Review and PyrrhoV7alpha

(Revised 14th Jamuary 2022)

The query processing stage of most relational DBMS deals with Query Optimisation, and computes a strategy for query evaluation taking account of available indexes and so on. This was an important aspect in previous versions of PyrrhoDBMS too.

In the current redevelopment of PyrrhoDBMS the hope is to domething more ambitious, taking account of the previous entries in this blog, on compile-once views, triggers etc (3 December 2019), modifiable rowsets (20 Feb 2021) and lateral joins (31 July 2020). In this version the goal of parsing the SQL for CRUD operations is to construct RowSets. When such compiled objects are used there is often additional information such as filters, limiting snad skipping, and/or additional processing such as aggregation or ordering. Each RowSet produces a derived table, starting with one or more source rowsets, base tables or explicit value sets, which may in general have been computed using different (definer's) privileges.

Many such stages of traversal require intermediate rowsets to be built, for ordering, subqueries, distinct, grouping and window operations. Each such building operation provides an index whose properties can be used to oprimise the results of aggregations. Each join of a compiled rowset offers the opportunity to check for functional dependency between left and right operands. Each additional inequality where-condition raises the opportnity for truncating rowset traversal by means of an ad-hoc index. 

Applying aggregation, ordering and filtering to the top-level RowSet all offer the opportunity to pass filters down to earlier stages of the evaluation, and/or use different indexes from those that seemed most suitable at compile time. In addition, RowSet Review can be applied during merges and joins, using the additional intermediate results information. The review process can benefit from cardinality information obtained during build. 

And of course the Rowset Review process uses the fact that all of the RowSets are immutable, that that the process of review is non-destructive abd behaves like an evaluation stack.

All of this is a very different approach to rowset evaluation than was used in previous versions of Pyrrho, and will mean that the current alpha stage of Pyrrho redevelopment will extend further into the future. But I offer no apologies for this extension of the project, believing the the pointers it offers for future DBMS design might one day be of wider interest.

Some implementations (e.g. JPA) use the word Query for what we call a RowSet: it makes sense now to merge the two sets of classes in PyrrhoV7 and the syntax definitions are being updated to use syntax identifiers of form RowSetXX instead of QueryXX. I note that (a) the SQL standard calls the result of a SELECT a derived table, not a query, although it uses <query expression> etc in addition to "row set", like previous versions of Pyrrho, and (b) "Query optimisation" sounds as if it relates to a source-level rewriting of the SQL rather than an improved evaluation strategy, so that the term "RowSet review" captures the improvement mechanism rather better.

In the PyrrhoSvr code parsing of a select statement resulted in a Query structure for which a RowSet structure was later built. From now on, there is no separate Query class. The overall structure of a select statement is (very roughly) SELECT SelectList FROM TableExpression. The RowSet resulting from the parse cannot properly be constructed until the TableExpression has been parsed. But until that time, the SelectList has been defining the Domain of this |RowSet, replacing subexpressions as their references are resolved. This simplification brings several advantages to the implementation, as (a) a lot of DBObject reconstruction is avoided during parsing, and (b) it gets rid of ad hoc Domains. From now on, Domains are defined by syntax and start out with a lexical uid, and . Naturally, the SourceIntro document will be updated to explain this process.

The above new ideas will eventually see the light of day in the alpha code in the PyrrhoV7alpha folder at https://github.com/MalcolmCrowe/ShareableDataStructures