Pyrrho DBMS: 2019

Tuesday, 3 December 2019

Progress with Pyrrho v7 alpha, contd

Today's version (28 March 2020) on https://GitHub.com/MalcolmCrowe/ShareableDataStructures/tree/master/PyrrhoV7alpha
shows check constraints, cascades and triggers working.
V7 documentation is provided with the source code.

There is a new section in the SourceIntro.pdf document about multi-threading, uids, and dynamic memory layout. This has been substantially improved since the December 2019 version to prepare
for the next stage of implementation, which includes Persistent Stored Modules.

The main technical changes are to make more of Pyrrho's data structures immutable and shareable: now including RowSets and Cursors which are now TypedValues.

I hope to have a new full v7 version for the pyrrhodb.com website in the near future. The following is taken from the SourceIntro document in the alpha distribution, under the heading Stored Persistent Modules.

This heading includes Trigger and sored Procedure definitions, which are parsed once only by the server[1]. Both use the SQL stored persistent modules language as described in the SQL standard, including the handling of conditions (exceptions). When such modules are invoked, they run in the definer’s role as specified by the SQL standard.

Following the design outlined in this document, although the transaction log contains only the source form of trigger and stored procedure code, while the in-memory database contains a compiled version. From version 7 parsing is done once only, and following parsing everything is referred to by uid, not by using string identifiers. As their name implies, uids are unique in the database, but they are private to the implementation, and are subject to change is later versions of the DBMS. The transaction log contains only the source code of these modules, and the format is forward and backward compatible.

There are differences in operation of the different versions, however. Up to version 6.3 of the DBMS (file format 5.1) the source code contained database object positions instead of the definer’s name for database objects. This approach is supported in version 7 of the DBMS for database files created with previous versions. Databases created with version 7 or later (file format >5.1) will contain the source code exactly as given by the definer. This is generally supported by previous versions of the DBMS, but objects will display differently in the Log$ system tables.

The in-memory data structures resulting from parsing include SqlValues, Queries, and Executables. Persistent modules exist in one of three forms, as follows:

Source code that has been deserialised from the database file on Load() is compiled into data structures that use uids based on lexical positions of source identifiers in the database file. If the server is restarted after committing the definition, the corresponding compiled code will be of this form.
For source code that has been defined by client interaction, and committed to the database, the compiled version will use uids allocated sequentially from within the range of file positions where the commit has occurred, until the server is restarted.
For source code that has not yet been committed, the compiled version will use temporary uids above 5×2⁶⁰ allocated according to the lexical position in the client input. This version is recursively converted to the above version on Commit.

Unless you are debugging the server, you should notice no difference between these versions. The uids that differ are not generally visible, and the in-memory compiled code structures are otherwise identical.

CheckConstraints are simpler as they consist merely of an SqlValueExpr that is tested to validate a given value.

[1] Up to version 6.3 of the DBMS source code was parsed on each use.

Thursday, 21 November 2019

Progress with Pyrrho v7 alpha

The alpha code for PyrrhoDB v7 on GitHub was updated on 20 November 2019, which probably continues to demonstrate the concurrency with the Tpcc benchmark reported previously in this blog.
The alpha version comes with a test application called PyrrhoTest, which will be extended in time to verify all sample code from pyrrhodb.com. The GitHub site contains updated documentation: the Pyrrho manual and an introduction to the source code.
Pyrrho v7 aims to be compatible with databases developed with previous versions, and, apart from using optimistic concurrency throughout, complies closely with the ISO SQL standard.
The current state of the v7 alpha implementation includes default values, generated columns, not-null and integrity constraints, alter and drop, and restrict/cascade behaviour for drop, update and delete in addition to joins and subqueries. Alters are role-dependent, laying the groundwork for a full implementation of role-based object renaming and the SQL security model.
The next phases of implementation will include check constraints and triggers.

Wednesday, 25 September 2019

Why Pyrrho performs so well in the TPC-C benchmark tests

I have been asked how it can be that commercial DBMS, and also PostgreSQL, show up so badly in the TPC-C benchmark tests that I have published on GitHub.

To begin with, the TPC-C benchmark normally has 1 clerk per warehouse, so that the conflict rate is around 4%. In my tests I deliberatiely increase the concurrency challenge by using multiple clerks for a single warehouse. When the number of clerks goes above 10, most New Order tasks will fail with a write-write conflict on NEXT_O_ID as this is set per district and there are only 10 districts. Worse, the single row in the WAREHOUSE table contains an amount W_YTD which is updated by the payment task, and fields from this row are read by all the NewOrder tasks and others so that a great many more tasks are aborted because of read/write conflicts. In all of the products tested, apart from Pyrrho and StrongDBMS, read/write conflicts are detected at the row level or wider.

Both Pyrrho and StrongDBMS see no conflict between the payment and NewOrder task because Payment is the only task that accesses W_YTD, and one of the available tests in the ReadConstraint for detecting read/write conflicts is a set of fields in a specific single row of a table.

There are actually three levels of read/write conflict detection in these DBMS. The following comment in the source code at ReadConstraint.cs dates from about 2005:

/// ReadConstraints record all of the objects that have been accessed in the current transaction
    /// so that this transaction will conflict with a transaction that changes any of them.
    /// However, for records in a table, we allow specific non-conflicting updates, as follows:
    /// (a) (CheckUpdate) If unique selection of specific records cannot be guaranteed, then
    /// we should report conflict if any column read is updated by another transaction.
    /// (b) (CheckSpecific) If we are sure the transaction has seen a small number of records of tb,
    /// selected by specific values of the primary or other unique key, then
    /// we can limit the conflict check to updates of the selected records (if any),
    /// or to updates of the key TableColumns.
    /// (c) (BlockUpdate) as (a) but it is known that case (b) cannot apply.

If the isolation level is reduced to repeatable-read or read-committed, most of the competing products achieve performance comparable with Pyrrho and StrongDBMS.

I remain very satisfied with the results of these tests since they show that Pyrrho and StrongDBMS achieve such high scores on concurrency tests despite, or even because of, using immutable data structures and optimistic concurrency.

Monday, 16 September 2019

TPCC benchmark with Pyrrho v7

At present, successive updates to PyrrhoDB v7 alpha are on GitHub . As of today, this location contains the 14 September 2019 version, and a version of TpccPyrrho. The TPC-C benchmark test is for OLTP for a warehouse, where the clerk works through a task sequence including new orders, with realistic time delays. In 10 minutes the clerk handles 16 new orders along with other tasks.

In order to demonstrate exceptional handling of concurrency, this version of the benchmark uses multiple clerks per warehouse. This introduces high levels of concurrency and many transactions should fail. With StrongDBMS I demonstrated performance superior to commercial databases, and now can do the same with the alpha version of PyrrhoDB. The GitHub repository includes versions of the benchmark for several popular DBMS so this claim can be verified by anyone interested.

The results for Pyrrho v7 alpha are as follows:

                Recreate DB: 1:02

                Fill stock: 2:02

                Fill districts: 6:15

                Cold start with initial warehouse: 1:30

F:\PyrrhoDB7\Pyrrho>tpccpyrrho

fid 1 loaded at 15/09/2019 12:04:32

Started at 15/09/2019 12:04:40 with 1 clerks

fid 2 loaded at 15/09/2019 12:04:40

At 15/09/2019 12:14:40 Commits 16, Conflicts 0 0

Last fid=2

F:\PyrrhoDB7\Pyrrho>tpccpyrrho

fid 1 loaded at 15/09/2019 12:17:56

Started at 15/09/2019 12:18:03 with 10 clerks

fid 11 loaded at 15/09/2019 12:18:03

At 15/09/2019 12:28:03 Commits 145, Conflicts 0 95

Last fid=11

F:\PyrrhoDB7\Pyrrho>tpccpyrrho

fid 1 loaded at 15/09/2019 12:32:41

Started at 15/09/2019 12:33:33 with 100 clerks

fid 101 loaded at 15/09/2019 12:35:01

At 15/09/2019 12:43:33 Commits 313, Conflicts 0 2920

Last fid=101

F:\PyrrhoDB7\Pyrrho>

During the benchmark test for 100 clerks my desktop machine reported the CPU utilisation was around 40% and the memory utilisation 50%.

PyrrhoDB v7 should reach beta version by December and include all of the usual database features as in previous versions of the DBMS.

Monday, 2 September 2019

Pyrrho v7 alpha available

The 2 September alpha code of PyrrhoDB v7 is now available
So far it can manage creation and CRUD operations on simple tables, but has a full set of data types and system tables. There is an updated introduction to the source code in the doc folder.
Work continues, comments welcome.

Tuesday, 20 August 2019

Pyrrho V7 progress

A draft version of the “Introduction to the Source Code” document for v7.0 is available here .

So far PyrrhoSvr.exe is under 800KB, and will probably stay below 1MB. It is quite different from version 6.3.

By contrast the user manual Pyrrho.pdf has almost no changes, apart from dropping the idea of multi-database connections.

Database files from version 6.3 (OSP and Pro) should work with PyrrhoSvr v7.0. The reverse will not be the case as I hope that Pyrrho v7.0 will allow the development of database files before users are defined and will enable databases to be renamed.

Almost all of the internal operation has changed:

Identifiers (and the Ident class) are used only for Parsing: uids are used everywhere else
BList has been added to BTree and both are easier to use
There is a Basis class for making work with immutable/shareable classes more efficient and intuitive
IO operations (Level1) have been remodelled following the StrongDBMS architecture
The Physical level 2 is not only for serialisation to the transaction log file (PhysBase has gone)
The logical database layer (“Level 3”) now includes Transaction and Database and their components, and everything at this level is immutable and shareable
All queries and SQL code fragments are parsed exactly once (and for each role on grant), are then immutable/shareable, and don’t need further analysis.

I will upload an alpha version soon. I plan to make Tpcc a first priority, then all of the examples in the Pyrrho manual, ending with the RESTView example probably sometime next year. The Embedded edition can wait for now.

Monday, 24 June 2019

Version 7 is coming (soon)

Starting in autumn 2018 I have been developing StrongDBMS (strongdbms.com, Twitter #StrongDBMS) which incorporates many of PyrrhoDB's ideas, but handles concurrency better than other DBMS I have managed to test. The reasons and source code are written up in GitHub, see shareabledata.org .These tests also showed up some weaknesses in Pyrrho.

Both Strong and Pyrrho use a minimal set of locks to manage concurrency and ensure ACID properties: one lock is for the DBMS itself, and one for each database file. Unfortunately, in versions up to 6.3, Pyrrho transactions can involve more than one database, and the database is stored in a sequence of files (with file names including sequence numbers 001 etc if required). Both of these "improvements" in Pyrrho make verification of ACID properties more difficult.

I will produce a cleaned up version 7 of Pyrrho that will be backwards compatible with Pyrrho 6.3, apart from lacking multi-database connections and supporting only one file per database.

Pyrrho DBMS