Thursday, 23 September 2021

Implementation: instancing and framing

(Updated: 14th January 2022: Implementation continues)

Tables and table references

During traversal of a rowset, the value associated with a column name changes from one row to the next. This is managed in database implementations using the notion of cursor. A cursor is a row value with the additional capabilities of being able to advance to contain the values for the next row or step back to the previous row. 

Column names are local to a table, or more precisely to a table reference, since a table and its columns may be referenced more than once in an SQL statement, and the differenet references are then distinguished using aliases. Different references to the same table thus have different properties, and may be subject to different where-conditions etc.

To manage this, it is useful to have a process of instancing, so that the shared table remains immutable, while its instance are placed in transient local storage (a heap). Thus the instance and its columns will be referenced using heap uids instead of names or defining positions.

Each table reference thus has its own instance on the heap, and instances for its columns. At the end of execuition of the current SQL statement, the heap can be forgotten.

Compiled database objects

In standard SQL, several database object types (e.g. Procedure, Check, Trigger, View) define executable code. It is an ambition of Pyrrho V7 to compile such code once only: on database load, or on definition of a new compiled object. The compilation process creates many subsidiary objects (variable declarations, expressions, executable statements) in memory. These objects are then immutable and only replaced if the database object is altered (ALTER VIEW etc). In the next iteration (V7) of Pyrrho, these subsidiary objects have temporary uids and are stored in a field of the compiled object called framing.

When the compiled object is referenced during execution of an SQL statement, the objects in the framing need to be  copied to the top of the heap. However, for efficiency reasons, even such a simple step whould be carried out as few times as possible. Procedure code can be brought into the context just once when required. Trigger code, and constraints for columns and domains should be brought in to the context  before rowset traversal begins.

When views are used in a query, view references need to be instanced in the same way as table references, so that an instance must be created for each reference, so that it can be equipped with where-conditions, grouping specifications etc.

Prepared Statements

Prepared statements are local to a connection, and have query parameters, and can be handled in much the same way as compiled statements. An instance of a prepared statement has the formal paramneters replaced with actual values before execution begins,

The storage used for prepared statements in shared with successive transactions on the connection, and does not need to be saved in the database.



Friday, 3 September 2021

Implementing Shareable Data Structures

 The concept of shareable data structures first appeared in this blog in 2018, and there has been a software repository in Github since that time. Several demonstration programs and PyrrhoV7alpha source code can also be found there. In the nature of things, the concept has evolved.

The use of shareable data structures in fundamental to the implementation of Pyrrho V7. The following documentation is based on part of a tutorial given by me and Fritz Laux at DBKDA 2021.

These have a recursive definition: a data structure is shareable iff it is immutable, and all fields are in turn shareable. In all programming languages, values of primitive data types are shareable. In Hava, Pythone and C#, strings are also shareable. Unfortunately, in any programming language that allows data structures to have have subclasses, other than declaring a class sealed or final. there is no way of preventing a subclass from adding a mutable field. If we want to ensure our data structures are shareable, we need to declare them as internal to our project. Then at least we control what subclasses exist.

In a programming language based on references, such as Java, or C#, we can make all fields in our structure final, or readonly. Then any reference can be safely shared, and all fields might as well be public (unless there are confidentiality issues).

If all of the fields are, in turn, also known to be immutable, then there is no need to clone or copy fields: copying a pointer to the structure itself gives read-only access to the whole thing.

For example, if the Database class is shareable, and b is a database, then a:=b is a shareable copy of the whole database (we have just copied a single 64-bit pointer). Such an assignment (or snapshot) is also guaranteed to be thread-safe.

Pointers to shareable structures are never updated but can be replaced when we have a new version to reference. If this new version has some changed fields, it is perfectly fine to continue to use the same pointers for all the unchanged fields

Shareable tree structures: adding a node

When we add a field located deep in a shareable structure (e.g. a node to a shareable tree structure), we will need to construct a single new node at each level back to the top of the structure. But the magic is that all the other nodes remain shared between the old and new versions.


The above picture (from Krijnen and Mertens, Mathematics Centre, Amsterdam, 1987) shows a tree with 7 leaves (that is, a tree of size 7), and updating (that is, replacing) one leaf node has resulted in just 2 new inner nodes being added to the tree. This makes shareable B-Trees into extremely efficient storage structures.

In tests, (such as DictDemo, in the github repository) we see that for a suitable minimum number of child nodes in the B-Tree, the number of new nodes required for a single update to a B-Tree of size N is O(logN), and experimentally, this means that for each tenfold increase in N, the number of new nodes per operation roughly doubles.

Note that we also get a new root node every time (this avoids wearing out flash memory).

The choice of programming language

It is helpful if the programming language supports:

  •        readonly directives (Java has final)
  •        Generics (Java has these)
  •        Customizing operators such as +=  (not Java)

o    Because a+=b is safer than Add(a,b)

o    Easy to forget to use the return value a=Add(a,b)

  •        Implies strong static typing (so not Java)

o    Many languages have “type erasure”

  •          Also useful to have all references nullable

So I prefer C#, which now has been around for 19 years. Java and Python have been with us for over 30 years. 

However, as mentioned above, C# provides no syntax for requiring a class to be shareable: specifically, there is no way of requiring a subclass of a shareable class to be shareable. It will cease to be shareable if it has even one mutable field.

Shareable database objects

What data structures in the DBMS can be made shareable?

  •       Database itself, and its subclass, Transaction.
  •       Database Objects such as Table, Index, TableColumn, Procedure, Domain, Trigger, Check, View, Role
  •       Processing objects such as Query, Executable, RowSet, and their many subclasses;
  •       Cursor and most of its subclasses.
  •       TypedValue and all its subclasses

All of these can be made shareable. But many things cannot be made shareable:

Context and Activation cannot be made shareable because in processing expressions we so often have intermediate values.

Also, something needs to access system non-shareable structures such as FileStreams, HttpRequest.

And Physical and its subclasses are used for preparing objects for the database file, so cursors that examine logs are not shareable.

An implementation library: first steps

A fundamental building block in many DBMS implementation is the B-tree. In Pyrrho BTree<K,V> is a sort of unbalanced B-tree. It has a += operator to add a (key,value) pair, and a -= operator to remove a key.

BList<V> is a subscriptable subclass where K is int. It is much slower than BTree, because it partially reindexes the list starting from 0 on insertion and deletion.

Both of these structures can be traversed up and down using shareable helper classes ABookmark<K,V> and methods First(), Last(). The ABookmark class implements key(). value(), Next() and Previous().

These classes and their subclasses are used throughout the DBMS implementation. They are shareable provided K and V are shareable. If the classes K and V are not themselves shareable, for example if one or both is the object class, a tree will be shareable provided all of its contents (the nodes actually added) are shareable. At least, it is easy to ensure that all public constructors only have shareable parameters.

For convenience, Pyrrho uses a potentially non-shareable base class Basis, whose only field is a BTree<long,object> called mem. It has an abstract method New which can be used to implement the += and -= as effectively covariant operators on subclasses, and these can be used to change the properties on a database (of course, by creating a new one).

DBObject and Database

DBObject is a subclass of Basis, with a public readonly field called defpos, of type long. The defpos (defining position) acts as a uid or unique identifier for every object in the database. As described in section 2.3 below, the range of values of long is divided into ranges for database objects depending on their lifetime: committed objects have the same lifetime as the database (e.g., the SQL delete statement can unlink one or more objects but they remain in the log). There are many subclasses of DBObject, described in this booklet.

Database is also a subclass of Basis, and it uses the mem field with positive uids to store all of database object it contains. (Entries in mem with negative uids are used for propert ies of the database, as described in section 3.5.2.) The Database class also has two static lists: databases, and dbfiles. The first is the set of databases indexed by database name, and the second is a corresponding this of File objects.

Then there is a method called Load() to build the database from the transaction log file, and Commit() which writes Physical record durably to the log.

Transaction and B-Tree

We can use the B-Tree concept to model how the transaction Commit process works. The Commit process was described in terms of transformations in the previous posting in this blog.

  1.       Suppose we have a database in any starting state D0.
  2. .       If we start a transaction T0 from this state, initially the T0 is a copy of D0 (i.e. equal pointers).
  3. .       As T0 is modified it becomes T1, which shares many nodes with D0, but not all.
  4. .       D0 also evolves as some other transaction has committed, so D1 has a new root and some new nodes.
  5. .       When T1 commits, we get a new database D2 incorporating the latest versions of everything.


Thursday, 2 September 2021

Some Pyrrho V7 alpha documentation

  At DBKDA2021 in May, I presented a tutorial on the inner workings of Pyrrho V7. The PDF version, with the demonstrations, tiotalled 10MB, which was probably much too much. Material based on it will nevertheless be included in the introductory sections and appendices of the documentation "Introduction to the Source code of the PyrrhoDBMS" a version of which is available on my github site. The following extracts are from the very start of the tutorial, which went on to explain the use of shareable data structures and ended with a deep dive into remote updates using view-mediated web services. (All of this material will soon appear on the github site: email me  at malcolm.crowe@uws.ac.uk if you are interested in getting it sooner.)

ACID and Serializable Transactions

Our starting point is that full isolation requires truly serializable transactions.

All database textbooks begin by saying how important serializability and isolation are, but very quickly settle for something much less.

If we agree that ACID transactions are good, then:

·         First, for atomicity and durability we should not write anything durable until (a) we are sure we wish to commit and (b) we are ready to write the whole transaction.

·         Second: before we write anything durable, we should validate our commit against the current database.

·         Third, for isolation, we should not allow any user to see transactions that have not yet been committed.

·         Fourth, for durability, we should use durable media – preferably write-once append storage.

From these observations, it seems clear that a database should record its durable transactions in a non-overlapping manner.

·         If transactions in progress overlap in time, they cannot both commit if they conflict: and if they don’t conflict, it does not matter which one is recorded first.

·         The simplest order for writing is that of the transaction commit.

·         If we are writing some changes that we prepared earlier, the validation step must ensure that it depends on nothing that has changed in the meantime, so that our change can seem to date from the time it was  committed rather than first considered.

·         Effectively we need to reorder overlapping transactions as we commit them.

These few rules guarantee actual serialization of transactions for a single transaction log (sometimes called a single transaction master). It obviously does not matter where the transactions are coming from.

But if a transaction is trying to commit changes to more than one transaction log, things are very difficult (the notorious two-army problem). If messages between autonomous actors can get lost, then inconsistencies are inevitable. The best solution is to have a DBMS act as transaction master for every piece of data  For safety any transaction should have at most one  remote participant, and more complex transactions can be implemented as a sequence of such one-way dependency transactions. 

The transaction log

The first was about the transaction log. In Pyrrho the transaction log defines the content of the database (it is the only thing stored on disk).

In this demonstration, we will see how every Transaction T consists of a set of elementary operations e1, e2,.. , en .

Each operation corresponds to a Physical object written to the transaction log

Committing this transaction applies the sequence to the Database D.  In reverse mathematical notation

e1

e2


(D)T = (..((D)e1)e2)..)en


If we think of a transaction commit as comprising a set of elementary operations e, then the transaction log is best implemented as a serialization of these events to the append storage. We can think of these serialized packets as objects in the physical database. In an object-oriented programming language, we naturally have a class of such objects, and we call this class Physical.

So, at the commit point of a transaction, we have a list of these objects, and the commit has two effects (a) 7 appending them to the storage, (b) modifying the database so that other users can see. We can think of each of these elementary operations as a transformation on the database, to be applied in the order they are written to the database (so the ordering within each transaction is preserved).

And the transaction itself is the resulting transformation of the database.


Every Database D is the result of applying a sequence of transactions to the empty _system Database D0.

Any state of the database is thus the result of the entire sequence of committed transactions, starting from a known initial database state corresponding to an empty database.

The sequence of transactions and their operations is recorded in the log.

Shareable data

Many programming languages (including Java, Python and C#) currently have shareable implementations of strings.

Specifically, strings in Java, Python and C# are immutable: if you make a change, you get a new string, and anyone who had a copy of the previous version sees no change.

In these systems, it is illegal to modify a string by assigning to a character position instead you need to use a library function.

The addition operator can be used in these languages to create a sum of strings. This is basically the model for shareable data structures.

For a class to be shareable, all fields must be read-only and shareable. Constructors therefore need to perform deep initialisation, and any change to an existing structure needs another constructor. Inherited fields need to be initialised in the base (or super) constructor, maybe with the help of a static method.

This is useful for databases because databases share so many things: predefined types, properties, system tables. For example, all databases can share the same starting state, by simply copying it from the _system database.

Even more importantly all transactions can start with the current state of the database, without cloning or separately copying any internal structures. When a transaction starts, it starts with the shared database state: as it adds physicals, it transforms. Different transactions will in general start from different states of the shared database.


In the above picture, we know what the database DB’s state is. Each of concurrent transaction steps T1, T2, and T3 are, if committed, will create a new version of DB (for example (DB)T2.) Because of isolation, from the viewpoint of any of these transactions, they cannot know whether DB has already been updates by another transaction (in which case, they may no longer fit on the resulting database). In particular, after T2 commits, T1 and/or T3 will possibly no longer be able to commit.

However, if T1 was able to commit before, then it will still be able to commit provided it has no conflict with T2’s changes.

Transaction conflict

The details of what constitutes a conflicting transaction are debatable. Most experts would agree with some version of the following rules:

·         T1 and T2 will not conflict if they affect different tables or other database objects

o    And only read from different tables

·         But we can allow them to change different rows in the same table

o    Provided they read different specified rows

·         Or even different columns in the same row

o    Provided they read different columns

The first rule is sound enough, although the condition on reading is very important: we would need to include things like aggregation in our definition of reading. The first rule is also very easy to implement, especially if tables are shareable structures, as a simple 64-bit comparison is sufficient!

For the other rules, we would need to be very clear on what is meant by a “specified row”, and the non-existence of a row might only be determined by reading the whole table. 

Transaction validation

At the start of Transaction Commit, there is a validation check, to ensure that the transaction still fits on the current shared state of the database, that is, that we have no conflict with transaction that committed since our transaction started.

We will see that during commit of a Transaction T, we do a validation check. It ensures that the elementary operations of T can be validly relocated to follow those of any transaction T´ that has committed since the start of T . T planned


But now we have

Relocation amounts to swapping the order of the elementary operations ei that comprise transaction T and T’. Two such cannot be swapped if they conflict. E.g. They change the same object (write/write conflict). The tests for write-write conflicts involve comparing our list of physicals with those of the other transactions.

There are also tests for read/write conflicts between T and T´. For checking read-write conflicts, we collect “read constraints” when we are making Cursors.



Thursday, 1 July 2021

RowSet Review and PyrrhoV7alpha

(Revised 14th Jamuary 2022)

The query processing stage of most relational DBMS deals with Query Optimisation, and computes a strategy for query evaluation taking account of available indexes and so on. This was an important aspect in previous versions of PyrrhoDBMS too.

In the current redevelopment of PyrrhoDBMS the hope is to domething more ambitious, taking account of the previous entries in this blog, on compile-once views, triggers etc (3 December 2019), modifiable rowsets (20 Feb 2021) and lateral joins (31 July 2020). In this version the goal of parsing the SQL for CRUD operations is to construct RowSets. When such compiled objects are used there is often additional information such as filters, limiting snad skipping, and/or additional processing such as aggregation or ordering. Each RowSet produces a derived table, starting with one or more source rowsets, base tables or explicit value sets, which may in general have been computed using different (definer's) privileges.

Many such stages of traversal require intermediate rowsets to be built, for ordering, subqueries, distinct, grouping and window operations. Each such building operation provides an index whose properties can be used to oprimise the results of aggregations. Each join of a compiled rowset offers the opportunity to check for functional dependency between left and right operands. Each additional inequality where-condition raises the opportnity for truncating rowset traversal by means of an ad-hoc index. 

Applying aggregation, ordering and filtering to the top-level RowSet all offer the opportunity to pass filters down to earlier stages of the evaluation, and/or use different indexes from those that seemed most suitable at compile time. In addition, RowSet Review can be applied during merges and joins, using the additional intermediate results information. The review process can benefit from cardinality information obtained during build. 

And of course the Rowset Review process uses the fact that all of the RowSets are immutable, that that the process of review is non-destructive abd behaves like an evaluation stack.

All of this is a very different approach to rowset evaluation than was used in previous versions of Pyrrho, and will mean that the current alpha stage of Pyrrho redevelopment will extend further into the future. But I offer no apologies for this extension of the project, believing the the pointers it offers for future DBMS design might one day be of wider interest.

Some implementations (e.g. JPA) use the word Query for what we call a RowSet: it makes sense now to merge the two sets of classes in PyrrhoV7 and the syntax definitions are being updated to use syntax identifiers of form RowSetXX instead of QueryXX. I note that (a) the SQL standard calls the result of a SELECT a derived table, not a query, although it uses <query expression> etc in addition to "row set", like previous versions of Pyrrho, and (b) "Query optimisation" sounds as if it relates to a source-level rewriting of the SQL rather than an improved evaluation strategy, so that the term "RowSet review" captures the improvement mechanism rather better.

In the PyrrhoSvr code parsing of a select statement resulted in a Query structure for which a RowSet structure was later built. From now on, there is no separate Query class. The overall structure of a select statement is (very roughly) SELECT SelectList FROM TableExpression. The RowSet resulting from the parse cannot properly be constructed until the TableExpression has been parsed. But until that time, the SelectList has been defining the Domain of this |RowSet, replacing subexpressions as their references are resolved. This simplification brings several advantages to the implementation, as (a) a lot of DBObject reconstruction is avoided during parsing, and (b) it gets rid of ad hoc Domains. From now on, Domains are defined by syntax and start out with a lexical uid, and . Naturally, the SourceIntro document will be updated to explain this process.

The above new ideas will eventually see the light of day in the alpha code in the PyrrhoV7alpha folder at https://github.com/MalcolmCrowe/ShareableDataStructures




Sunday, 30 May 2021

Benchmarking PyrrhoV7alpha

 It is natural to wonder whether Pyrrho v7's unusual data structures and approach to transactions are practical. Since 2018 PyrrhoDB has been tested using a modified version of the TPC-C benchmark

The TPC-C benchmark simulates an OLTP system for ordering items by telephone from an e-business. The business operates a number of warehouses, with 30000 customers at each divided into 10 districts. There are 100000 items available. There are normal speed tests, e.g. what is the top speed for a sequence of transactions? Pyrrho's performance here on a PC is modest at 10 new-order transactions per second.

If the specification is followed more closely, there is one clerk per warehouse to deal with a mix of tasks including new order, payment, stock enquiry, and delivery. The clerk's behaviour is simulated so that tasks take a realistic amount of real time (e.g. 23 seconds for a new order with between 5 and 15 different items). The specification sets these out in great detail. For one warehouse no DBMS following the rules can complete more than 17 new orders in 10 minutes.

To provide greater challenge for the DBMS's transaction mechanism, Pyrrho sets the number of warehouses to 1, but increases the number of clerks for that warehouse to 50 to 100. Pyrrho always requires serialisable transactions, so after 10 clerks, the chances of transaction conflict rise sharply. All commercial DBMS have trouble with this test, and performance generally collapses well before 30 clerks. 

With StrongDBMS the throughput continued to increase to 100 clerks, when CPU resources became a problem.  This link gives access not only to software for StrongDBMS's version of the test, but also versions for other DBMS we tried. The Pyrrhov7alpha folder in this repository contains the current alpha source code for PyrrhoV7, with the version of software used for the test described below.

We have analysed the reasons for Strong's unexpected performance, and it relates to the use of optimistic concurrency mechanisms. Pyrrho uses similar mechanisms but up to version 6.3 its results on this test were poor, leading to the development of PyrrhoV7, which seeks to learn lessons from StrongDBMS's implementation. A full account of this process was given in a tutorial at DBKDA 2021.

With PyrrhoV7 alpha, results with 50 clerks have always been impressive. At the current stage in development, however, Pyrrho's rowset review mechanism, still under development, does not yet carry out a good implementation of the TPCC StockLevel task, which is a relatively rare CPU-intensive task that conflicts with a great many ongoing transactions. Without this task, CPU and memory usage for server and clients remain low even at 100 clerks (1GB server memory, peak 6% CPU). Experimentally, peak throughput occurs with around 60 clerks.

The results below were obtained using the server version dated 30 May 2021.

Clerks

NewOrders Committed

Conflicts reported

1

17

0

2

29

15

5

62

74

10

91

254

20

108

678

50

135

2009

75

133

3088

100

105

4089

This page will be updated when the test includes the StockLevel task.

Saturday, 20 February 2021

Modifiable rowsets

In V7 of Pyrrho I plan to have a more systematic approach to modifiable rowsets, which are useful in view-mediated data integration. Comments on the following proposals are welcome. Update: (12 April 2021) Working implementations of these ideas (alpha code) have now been posted to GitHub.

V7 uses immutable, shareable  rowsets (all typed values are also immutable and shareable). Compiled objects including table-valued functions and views have precompiled rowsets that are instantiated when accessed. Such instantiations are merged into the surrounding context, so that Pyrrho uses rowset review where other DBMS (and previous versions of Pyrrho) use query optimisation.

Some rowsets can be used to make changes to their base tables, and this feature is useful for views. As a rule of thumb this requires rowsets whose results expose simple rows and columns, possibly with a monotonic invertible adapter function, and thus all Yes entries in this table depend on this kind of additional requirement and the need to satisfy constraints and authorisation requirements. Such an operation adds to the transaction results a set of modifications for each of the individual tables involved. There are no entries below for rowsets with 0 or 1 base tables:

SubClass

Insert

Update

Delete

DistinctRowSet

No

No

No

DocArrayRowSet

Yes

Yes

Yes

EmptyRowSet

 

 

 

EvalRowSet

No

No

No

ExplicitRowSet

 

 

 

GroupingRowSet

No

No

Yes

IndexRowSet

(Yes)

(Yes)(Yes)

JoinRowSet

Yes

Yes

Yes

MergeRowSet

Intersection only

Yes

Yes

OldTableRowSet

 

 

 

OrderedRowSet

Yes

Yes

Yes

RestRowSet

Yes

Yes

Yes

RoutineCallRowSet

 

 

 

RowSetSection

No

Yes

Yes

SelectRowSet

See below

See below

Yes

SelectedRowSet

Yes

Yes

Yes

SqlRowSet

 

 

 

SystemRowSet

 

 

 

TableRowSet

(Yes)

(Yes)(Yes)

TransitionRowSet

 

 

 

TrivialRowSet

 

 

 

ValueRowSet

 

 

 

WindowRowSet

No

No

No 

 It will be obvious in most cases what these RowSet classes are for. TransitionRowSet and OldTableRowSet are used in trigger implementation and are not directly accessible. A SelectRowSet has columns that are expressions, and only certain expressions are invertible to retrieve the underlying column values, while a SelectedRowSet picks and re-arranges columns from its source. RestRowSets are used in the implementation of RESTViews.