Wednesday 25 September 2019

Why Pyrrho performs so well in the TPC-C benchmark tests

I have been asked how it can be that commercial DBMS, and also PostgreSQL, show up so badly in the TPC-C benchmark tests that I have published on GitHub.

To begin with, the TPC-C benchmark normally has 1 clerk per warehouse, so that the conflict rate is around 4%. In my tests I deliberatiely increase the concurrency challenge by using multiple clerks for a single warehouse. When the number of clerks goes above 10, most New Order tasks will fail with a write-write conflict on NEXT_O_ID as this is set per district and there are only 10 districts. Worse, the single row in the WAREHOUSE table contains an amount W_YTD which is updated by the payment task, and fields from this row are read by all the NewOrder tasks and others so that a great many more tasks are aborted because of read/write conflicts. In all of the products tested, apart from Pyrrho and StrongDBMS, read/write conflicts are detected at the row level or wider.

Both Pyrrho and StrongDBMS see no conflict between the payment and NewOrder task because Payment is the only task that accesses W_YTD, and one of the available tests in the ReadConstraint for detecting read/write conflicts is a set of fields in a specific single row of a table.

There are actually three levels of read/write conflict detection in these DBMS. The following comment in the source code at ReadConstraint.cs dates from about 2005:

    /// ReadConstraints record all of the objects that have been accessed in the current transaction
    /// so that this transaction will conflict with a transaction that changes any of them.
    /// However, for records in a table, we allow specific non-conflicting updates, as follows:
    /// (a) (CheckUpdate) If unique selection of specific records cannot be guaranteed, then
    /// we should report conflict if any column read is updated by another transaction.
    /// (b) (CheckSpecific) If we are sure the transaction has seen a small number of records of tb,
    /// selected by specific values of the primary or other unique key, then
    /// we can limit the conflict check to updates of the selected records (if any),
    /// or to updates of the key TableColumns.
    /// (c) (BlockUpdate) as (a) but it is known that case (b) cannot apply.

If the isolation level is reduced to repeatable-read or read-committed, most of the competing products achieve performance comparable with Pyrrho and StrongDBMS.

I remain very satisfied with the results of these tests since they show that Pyrrho and StrongDBMS achieve such high scores on concurrency tests despite, or even because of, using immutable data structures and optimistic concurrency.

No comments:

Post a Comment