Friday 23 November 2018

Recording access to sensitive data

Today there is considerable interest in access auditing, and a requirement in some jurisdictions for companies to record use of sensitive data.
Pyrrho already indelibly records changes to all data together with the user/role and time of changes, and watches reading of data during transactions to construct simple constraints on a transaction being committed. 
As a matter of fact, few experts agree with this feature, since it means the inclusion of a read operation in a transaction will prevent the transaction being committed a concurrent transaction commits a modification of any of the data that has been read. Pyrrho enforces this approach however, because the user whose transaction is prevented from committing is the same one who (presumably deliberately) included the read operation in the transaction.
The present discussion also considers read operations but in a different way. Here we want to distinguish sensitive data (say at the data type level) and immediately record access to it (by anyone other than the database owner), whether or not the current operation is ever committed.
These features [updated: 27 November] are available in Pyrrho version 6.3. The audit record is merged into the transaction log and a system table gives access to all the details.
From the manual:
Sec 1.5: 
Version 6.3 adds support for “sensitive” data, for which any access is auditable. Columns, domains and types can be declared SENSITIVE[1]. Sensitive values are not assignment-compatible with anything that is not sensitive, and there is a sensitive property inherited by any object that contains a sensitive data type. This means for example that the sum of sensitive data is still sensitive. The transaction log will contain a record of every access to sensitive values (apart from by the database owner), even if the transaction is rolled back. These details are visible in the Sys$Audit system table (see section 8.3.1).
Sec 7.4:
Type  (StandardType | DefinedType | Domain_id | Type_id | REF’(‘TableReference’)’) [UriType] [SENSITIVE] .
Sec 8.3.1:

8.3.1 Sys$Audit

The location of this access record in the transaction log
The defining position of the accessing user
The defining position of the sensitive table or view object
The time of the access in ticks
Audit records are only for committed sensitive data. Entries come from physical Audit records, and are added immediately on access (do not wait for transaction commit).

8.3.2 Sys$AuditKey

The location of the access record in the transaction log
The ordinal position of the key (0 based)
The defining position of the key column
A string representation of the key value at this position
Key information for audit records comes from the filters used to access a sensitive object. For example, if a record is inserted in a table, there is no applicable filter, the audit record will apply to the whole table, and there will be no key information here.
Comments by email are welcome.

[1] SENSITIVE is a reserved word in SQL that normally applies to cursor sensitivity. The usage in Pyrrho described here is quite different, and the keyword comes at the end of a type clause (see section 7.4).

Wednesday 21 November 2018

Rethinking Shareable Data Structures

I have been recently developing a set of data structures following on from Okasaki's Purely Functional Data Structures. There is a lot to be gained by using immutable data structures, that is, where all the fields are public readonly in C# or public final in Java. Strings in C# and Java already have this property and it turns out to be remarkably easy to develop all the usual data structure types with this property. Something of the kind had already started to happen in Pyrrho: many of Pyrrho's data structures are immutable, and Pyrrho uses Bookmarks instead of Iterators. Specifically, the benefits (as with strings) are
  •  a snapshot is obtained by a simple assignment, so rollback is a breeze
  •  structures can be modified while a traversal continues with the previous state#
  •  they are thread-safe and safe to pass as a parameter in C# or Java, so that
  •  these data structures never need to be locked
The price to be paid is extra work for the garbage collector: this is a reasonable trade-off.
So the time seemed ripe for a serious approach to #ShareableDataStructures and the fruits of these labours are emerging at . Eventually the classes will be rich enough to implement a DBMS, and the plan is to implement everything in C# and Java, and then Python later. Efforts in the DBMS direction are currently called #StrongDBMS . A lot will depend on the performance of the TPCC benchmark.
It is natural to ask what this might mean for Pyrrho. It does seem like a natural evolution (Pyrrho 7.0 maybe), but some of the Pyrrho code would be a real nuisance to transform. Time will tell.