Saturday 19 September 2020

ACID Transaction performance with Pyrrho v7

Pyrrho uses optimistic algorithms and yet ensures true serialization for ACID transactions, even in conditions of high concurrency. According to many database textbooks, this should be unlikely or even impossible. Nevertheless, Pyrrho v7 achieves this goal, with the help of some novel programming techniques and approaches, with ACID performance much better than standard commercial databases.

In this blog post I want to provide a brief overview of the evidence for this achievement, and the techniques that enable it.

Pyrrho is of course a relational SQL database. First, we assume here that the goal of concurrency algorithms is to serialize concurrent ACID transactions. Pyrrho demonstrates actual serialization, not just serializability, since the database file is constructed as a serial file, where each committed transaction is appended separately to the file, enabling easy verification of the serialization at any later time. (A consequence of this approach is that the database contains a full history of every commit, and many professionals dislike their mistakes to be visible for all time.) The many ways of accessing the log file for verification purposes are described fully in the Pyrrho manual. The current state of the database is maintained in memory, and normal SQL access methods for the current data are enhanced by many system tables.

Moving on from the desirability or otherwise of a full serialized transaction log, what is meant here by conditions of high concurrency? There is a standard benchmark for online transaction processing maintained for many years by the Transaction Processing Council, and full details of this benchmark are available from https://tpc.org . The particular benchmark I refer to here is TPC-C, whose specification is available on this link. It was developed over twenty years ago, and models a telephone based ordering system where warehouses take orders from customers, organise delivery, process payments etc. Each warehouse has a clerk to operate the system, and there is a standard set of tasks that the clerks carry out.

There is one SQL database for the whole enterprise. Clerks are not superhuman, and it takes 23 seconds at least for a clerks to take the details of an order over the telephone (orders are for quantities of between 5 and 15 different products), somewhat less time for payments etc. It has always been an interesting test for DBMS comparisons because the benchmark design includes some important aspects that cause some difficulty for a DBMS. The standard mix of tasks results in 4% of transaction concurrency between different warehouses, and the performance target for a DBMS is the completion rate of new orders for the whole system. The 23 second requirement above means that for one warehouse this number is 16 new orders in 10 minutes, but standard DBMS report thousands of new orders per second when the number of warehouses becomes large.

I have modified this test by having multiple clerks per warehouse, in order to create a greater challenge for the DBMS. The testing software is written in C# and is available for numerous DBMS at https:/github.com/MalcolmCrowe/ShareableDataStructures . It is fair to say that this modification really shows up the weaknesses of all DBMS!  All DBMS tested with this modified benchmark show an eventual collapse in performance if there are 10 clerks or more for a single warehouse, as the DBMS is forced to abort transactions because of concurrency conflicts. The testing program records all commit requests made to the database (in the order they are sent to the DBMS).

But Pyrrho v7 can outperform them all, with performance on a single PC increasing up to 50 clerks for one warehouse. Try it yourself: the code is available at the above location. Full explanations and further details are forthcoming in DBKDA 2020, with previous bulletins in previous years of this conference. The screenshot below shows 50 clerks (see the full video), and 338 new orders in 10 minutes despite most commits failing. As mentioned above, the serialization of transactions is verified by the transaction log. The figures are explained in conference papers. (Another optimistic DBMS, StrongDBMS, is also documented, which last year performed well for 100 clerks, but it lacks many features of standard SQL.)



No comments:

Post a Comment