Thursday, 21 April 2022

PyrrhoV7alpha is on GitHub again

 As of today 21 April 2022, the version of PyrrhoDBMS that I have been working on since 2019 is available on GitHub once again. It is still alpha code, but has a number of features of interest which this article will briefly review. Go to https://github.com/MalcolmCrowe/ShareableDataStructures/tree/master/PyrrhoV7alpha .

All textbooks on Relational Databases emphasise the imprtance of having serializable transactions, to preserve correctness in a live database. They explain the two main ways of achieving this are "pessimistic" control using two-phase locking and  "optimistic" control where transactions are subject to a validation stage during commit. Both methods are stated as being over complex, so that all of the same textbooks immediately settle for something much weaker that serialzable transactions. Furthermore, the discussion of what serialisability means for business operations is based on schedules, where likely areas of conflict are considered and avoidance strategies planned.

The international standard for the database language SQL, and all commercial database products, offer a range of "isiolation levels" which can be specified for transaction control: and generally users settle for the REPEATABLE_READ level. If SERIALIZABLE is selected, many operations are prevented with an error message along the lines that serializable transactions cannot be guaranteed.

For years now, I have argued for an alternative strategy, based on the "optimistic" approach, but incorprating a full transaction log of every committed operation on the database. In fact, in a strong sense, the transaction log is the database: it is the only thing persisted on disk, and consists of a sequence of non-overlapping transaction records. All versions of Pyrrho have used this approach, which guarantees from the outset that the committed transactions were indeed serialisable, and the serialisation used is the one in the transaction log. When the server is running, all of the live data is in memory. This architecture is suited for many business databases where data (customers, accounts etc) accumulates, and less useful for use cases such as messaging hubs where data is not kept for long.

In 2019 I demonstrated with a simple DBMS called Strong (StrongDBMS) that such an approach could outperform commercial database products in situations of high concurrency and conflicting transactions.  The code for this DBMS is available in the above github repository. 

However, there were two important points in the demonstration: first, that full safety requires a style of programming based on immutable shareable data structures, and second, that my own PyrhhoDBMS was unable to commit even as many transactions as the competition, in an admittedly challenging test. Since that date I have been working to apply the approach of shareable data structures to the PyrrhoDBMS, so that it too can outperform the traditional two-phase locking products. The alpha code is now available for this version 7 of PyrrhoDBMS, and it will be presented at DBKDA 2022.

For the changes in programming techniques that are required, see previous posts in this blog. Just as strings are immutable in Java and C#, database objects such as rowsets and cursors are immutable and shareable. Queries are parsed to construct the implementing rowsets, which when traversed will use the current versions of the (immutable) base tables. The radical nature of the change in programming style can be seen in how structures are shared but never copied or cloned; taking a snapshot is simply recording the current root node, as each version has a different root node, while sharing nearly all the rest of the nodes in the structure. The server's list of database root nodes (1 per database) is briefly locked to install a new database root. Writing of data to the disk is the result of a successful commit (append storage): nothing on disk is ever overwritten.

PyrrhoV7alpha includes a set of tests that demonstrate stored procedures, constraints and triggers, user defined types and view-mediated virtual data warehousing.