Pyrrho DBMS: 2020

Sunday, 4 October 2020

Mandatory Access Control in Pyrrho V7

From December 2018 Pyrrho has offered a simulation of Bell-LaPadula security, following the Orange Book protocols for mandatory access control. The development of Pyrrho v8 alpha has now reached completed this stage. The current state of the source code and associated documentation is on GitHub as usual. There is a document describing Mandatory Access Control, and this post repeats some of those details.

Mandatory access control is based on the concepts of (a) classification of information and database objects from levels D (public) to A (top secret), (b) clearance of individual users to access classified information, and (c) enforcement of access rules on database tables (Select. Insert. Update, Delete).

Classification can be applied at the level of database tables, table columns, and individual records, and both clearance and classification at levels above D can specify permitted groups and topics for access. The database has a security administrator (in Pyrrho this is the database owner) who can audit and modify any aspect of the security model. All access to classified information (i.e. above level D) is instantly recorded in the database's transaction log giving the user's identity, the time, the tables accessed, and the key if any, even if the user made no changes.

The document Detailed Mandatory Access Control includes a simple example. Here are some extracts:

A. Logged in with MALCOLM1\Malcolm (not the server account)

1. Starting with empty database mac

SQL> create table A(B int,C char)

SQL> create table D(E char primary key) security level D groups Army Navy references Defence scope read

SQL> create table F(G char primary key,H char security level C)

2. Create some users with and without clearance

SQL> grant "mac" to "MALCOLM1\Student"

SQL> grant "mac" to "MALCOLM1\Fred"

SQL> grant security level B groups Army references Defence Cyber to "MALCOLM1\Student"

SQL> table "Sys$User"

|---|----------------|-----------|-----------|-----------------------|

|---|----------------|-----------|-----------|-----------------------|

|26 |MALCOLM1\Malcolm| |mac | |

|366|MALCOLM1\Student| |mac | B{ARMY}[CYBER,DEFENCE]|

|416|MALCOLM1\Fred | |mac | |

|---|----------------|-----------|-----------|-----------------------|

3. Add some rows with and without classification

SQL> insert into A values(2,'Two')

1 records affected in mac

SQL> insert into A values(3,'Three') security level C

1 records affected in mac

SQL> insert into D values('Test')

1 records affected in mac

SQL> insert into F values('MI6','sis.gov.uk')

1 records affected in mac

SQL> table "Sys$Classification"

|---|-----------|----------------------|---------------|

|---|-----------|----------------------|---------------|

|553|Record | C |537 |

|154|Table | D{ARMY,NAVY}[DEFENCE]|138 |

|313|TableColumn| C |248 |

|---|-----------|----------------------|---------------|

Then Fred can see just one column in table F and cannot access table D, but can add new records to both tables A and F/

Student can see everything so can make changes just now only in table D, whose enforcement is only on read. But Student can add new records to all of the tables. The document gives the full story....

Saturday, 19 September 2020

ACID Transaction performance with Pyrrho v7

Pyrrho uses optimistic algorithms and yet ensures true serialization for ACID transactions, even in conditions of high concurrency. According to many database textbooks, this should be unlikely or even impossible. Nevertheless, Pyrrho v7 achieves this goal, with the help of some novel programming techniques and approaches, with ACID performance much better than standard commercial databases.

In this blog post I want to provide a brief overview of the evidence for this achievement, and the techniques that enable it.

Pyrrho is of course a relational SQL database. First, we assume here that the goal of concurrency algorithms is to serialize concurrent ACID transactions. Pyrrho demonstrates actual serialization, not just serializability, since the database file is constructed as a serial file, where each committed transaction is appended separately to the file, enabling easy verification of the serialization at any later time. (A consequence of this approach is that the database contains a full history of every commit, and many professionals dislike their mistakes to be visible for all time.) The many ways of accessing the log file for verification purposes are described fully in the Pyrrho manual. The current state of the database is maintained in memory, and normal SQL access methods for the current data are enhanced by many system tables.

Moving on from the desirability or otherwise of a full serialized transaction log, what is meant here by conditions of high concurrency? There is a standard benchmark for online transaction processing maintained for many years by the Transaction Processing Council, and full details of this benchmark are available from https://tpc.org . The particular benchmark I refer to here is TPC-C, whose specification is available on this link. It was developed over twenty years ago, and models a telephone based ordering system where warehouses take orders from customers, organise delivery, process payments etc. Each warehouse has a clerk to operate the system, and there is a standard set of tasks that the clerks carry out.

There is one SQL database for the whole enterprise. Clerks are not superhuman, and it takes 23 seconds at least for a clerks to take the details of an order over the telephone (orders are for quantities of between 5 and 15 different products), somewhat less time for payments etc. It has always been an interesting test for DBMS comparisons because the benchmark design includes some important aspects that cause some difficulty for a DBMS. The standard mix of tasks results in 4% of transaction concurrency between different warehouses, and the performance target for a DBMS is the completion rate of new orders for the whole system. The 23 second requirement above means that for one warehouse this number is 16 new orders in 10 minutes, but standard DBMS report thousands of new orders per second when the number of warehouses becomes large.

I have modified this test by having multiple clerks per warehouse, in order to create a greater challenge for the DBMS. The testing software is written in C# and is available for numerous DBMS at https:/github.com/MalcolmCrowe/ShareableDataStructures . It is fair to say that this modification really shows up the weaknesses of all DBMS! All DBMS tested with this modified benchmark show an eventual collapse in performance if there are 10 clerks or more for a single warehouse, as the DBMS is forced to abort transactions because of concurrency conflicts. The testing program records all commit requests made to the database (in the order they are sent to the DBMS).

But Pyrrho v7 can outperform them all, with performance on a single PC increasing up to 50 clerks for one warehouse. Try it yourself: the code is available at the above location. Full explanations and further details are forthcoming in DBKDA 2020, with previous bulletins in previous years of this conference. The screenshot below shows 50 clerks (see the full video), and 338 new orders in 10 minutes despite most commits failing. As mentioned above, the serialization of transactions is verified by the transaction log. The figures are explained in conference papers. (Another optimistic DBMS, StrongDBMS, is also documented, which last year performed well for 100 clerks, but it lacks many features of standard SQL.)

Friday, 31 July 2020

Lateral Derived Tables and Pyrrho v7

This post discusses a new feature in v7 of Pyrrho, namely a simple implementation of lateral derived tables, that does not require the LATERAL keyword. Lateral derived tables are specified in section 7.6 "Table Reference" of the SQL standard, but are well introduced in the MySQL 8.0 manual at https://dev.mysql.com/doc/refman/8.0/en/lateral-derived-tables.html .

That page gives an example problem where the concept is useful: "Given a table of people in a sales force (where each row describes a member of the sales force), and a table of all sales (where each row describes a sale: salesperson, customer, amount, date), determine the size and customer of the largest sale for each salesperson. "

Now a naive solution might look like this:

create table SalesPerson(pid int primary key)

create table Sales(sid int primary key, spid int, cust int, amount int)

select * from SalesPerson, (select cust, amount from Sales where spid = pid order by amount desc fetch first 1 rows only)

According to the MySQL manual page, this is not permitted because "A derived table cannot normally refer to (depend on) columns of preceding tables in the same FROM clause. " The page goes on to discuss a number of solutions which either are very complicated or require the LATERAL keyword.

Pyrrho v7 will automatically detect the need for a lateral derived table (or "lateral join") and will form the required join without using the LATERAL keyword. For example, the above naive solution is accepted and correctly computed by the current alpha build of Pyrrho at https://github.com/MalcolmCrowe/ShareableDataStructures . The feature is not discussed in the Pyrrho manual, because no special syntax is required.

I'd welcome comments on this feature, and a reference in the SQL standard for the rule that MySQL gives. Write to me at malcolm.crowe@uws.ac.uk .

Monday, 29 June 2020

Further progress with v7 alpha

(for latest post see: 2 January 2021)
on https://GitHub.com/MalcolmCrowe/ShareableDataStructures/tree/master/PyrrhoV7alpha

An important goal in v7 of Pyrrho is to compile SQL code once only, on definition by the user or on load on a database cold-start. Another is to use shareable immutable data structures in the server wherever possible, in the manner described in the above GitHub repository. By the end of 2019, Database, Query and Transaction had these properties. The current version extends this philosophy to RowSet and Cursor objects, and the current version passes a set of tests including stored procedures, triggers, constraints, cascade operations, and structured types.

The documentation in the repository includes full details of how this is done. The Pyrrho manual from previous versions applies almost unchanged to v7, and this version will reach beta stage when all examples in the Pyrrho manual have been implemented for v7. The next stage in this process will be to implement role-based security.

I look forward to demonstrating the performance of the current version at DBKDA 2020. Comments to malcolm.crowe@uws.ac.uk, including PyrrhoDB in the subject line.

Pyrrho DBMS

Sunday, 4 October 2020

Mandatory Access Control in Pyrrho V7

A. Logged in with MALCOLM1\Malcolm (not the server account)

Saturday, 19 September 2020

ACID Transaction performance with Pyrrho v7

Friday, 31 July 2020

Lateral Derived Tables and Pyrrho v7

Monday, 29 June 2020

Further progress with v7 alpha

Pyrrho DBMS

Followers

Blog Archive

About Me