Thursday, 8 September 2016

Improvements to the Object-Oriented API

For years now, Pyrrho has offered an object-oriented API for C#, and will generate a C# class definition for any base table for any given database and role. Tutorials offered demos of how these classes could be extended by adding in relational data, and over the years versioning was added so that all such C# classes are subclasses of Versioned.
The mechanisms have now been enhanced in the following important ways:
  1. Full class definitions are now available for Views and structured types as well as base tables,
  2. Deep structure is supported: Pyrrho will define nested classes for subjects in tables or views, and for user-defined types. (All such classes are Versioned too)
  3. The Versioned base class now includes a readCheck in addition to a row Check.
As discussed in this blog, for base tables, the row Check is a row version validator in the sense of Laiho and Laux's paper, and for other data may be an ETag in the sense of RFC 7232. In complex cases (e.g. results that do not correspond directly to base table rows) either or both may be empty.
The readCheck is for checking later whether the query used to obtain the data would get the same results.
Class definitions can be found in the "Role$Class" table.Such a class C can then be used with the database connection with the following methods added to the usual IDbConnection interface:
  • C FindOne()
  • C[] FindWith(string where)
  • C[] FindAll()
  • Post(C s)
  • Put(C s)
  • Delete(C s)
  • Check(string check)
  • Check(string check,string readCheck)
Triggers and generated columns apply, and Post and Put update affected fields of s along with the Versioned values.

Monday, 5 September 2016

ETags and Row version validation

Web resources often come with semantically useful information (starting with the URI), for example RDF [1] or OWL types [2]. HTTP now offers ETags (Fielding and Reschke, 2014, RFC7232) in responses to allow caching of results, and ETags can be used for validating a step in a transaction. ETags are very similar to the RVV concept[3] or MVCC [4], successfully used in PostGRES [5], SQL server [6] Oracle [7] etc to provide optimistic execution.  
The purposes of the two concepts, however, remain different, and become more clearly visible when we try to generalise their use to more complex situations such as composite databases, updatable views, joins, and general queries.
For Pyrrho in simple cases the ETag can coincide with the strong RVV validator (CHECK) for requests that return a row of a base table: so implementing this feature amounts to an extension of the Pyrrho API's Versioned mechanism (See also this discussion on REST in Spring).
In this blog I have previously suggested that RVV should be updated if related data changes, but I have been persuaded that the validator should just be for the contents of the row itself.
In that case, I want to extend the idea of RVV to updatable views and joins. In such objects, each row has a direct relationship with specific base table rows, so an RVV for the view will reflect changes to these rows.
It is a moot point whether the result of a query is a Web resource. It is possible that it contains nothing directly copied from any base table, but if an ETag was returned with the result it would contain the information that the result has not changed since the last time the it was computed.
Also, we should bget the same versioning information from a View that we would get with the associated SELECT statement.
So, consider the following:

(a) RVV

The RVV if present includes row version validators for any base table row whose values appear in the result row.
The RVV will be absent for a row that does not identify any single row in a contributing base table (e.g. some of the rows in select distinct or  group by). RVVs are computed for base tables using the Laiho/Laux algorithms, and indicate the transaction point that last modified the values of the row in the base table. For query results, an RVV is copied into a result row by the first contributing row and removed if more than one row contributes.
RVV can be used to identify a set of rows to be deleted or updated, and so have an immediate use in the API in an application-side object to identify corresponding individual base table row(s) in the database.

(b) ETag

The ETag if present indicates the latest transaction points of the values used to compute the values in the row.
The ETag will be absent if the row does not come from any database contents (e.g. the row consists of constants, current_time, current_user etc). There is an obvious algorithm for computing ETags from RVVs. For rows coming from base tables the ETag is the same as the RVV. When another row contributes, the ETag becomes the later transaction point.
ETags can be used for HTTP caching as described in RFC7232. However, we note that for complex queries the server would have to recompute the result to see if the ETag is valid.
If results come from more than one database, RVVs and ETags will be lists of values, at most one from each database.
From today, The CHECK pseudofunction in Pyrrho gives the RVV if defined for a row, and the ETag returned with a REST result is for the first result in the result set. If an ETag is supplied with a REST request, it is checked for validity.
The RVV is also used in the Pyrrho API, so that Find<C> and similar functions return Versioned objects that can be used with the API functions Put, Post, and Delete, as described in the Pyrrho manual.

[1] W3's RDF documentation
[2]W3's OWL documentation
[3 Laiho,M, Laux, F: (2010) Implementing Optimistic Concurrency Control for Persistence Middleware Using Row Version Verification
[4]MVCC
[5]Amit Kapila, (2015) Well-known Databases Use Different Approaches for MVCC
[6]Steve Howard (2012) SQL Server MVCC with read_committed_snapshot
[7]Oracle Database Concepts: Chapter 13 Data Concurrency and Consistency
[8]The concept of strong validator dates from sec 13 of the HTTP/1.1 definition (Fielding et al 1999)

Friday, 19 August 2016

Composite Databases

Composite (or Layered) Databases access data from other servers using REST. They open the possibility of integrating databases from multiple platforms incrementally using VIEWs. Data obtained from a remote system through a view is not stored locally, and if permissions have been appropriately set up, these views can be updatable.

The design should be interoperable across many platforms and DBMS, using the widely-implemented row-versioning mechanisms for DBMS and entity tags for HTTP (RFC7232). Both RVV and ETags can be strong validators that help with transaction management even in the absence of two-phase commit guarantees for transactions, and they also make updatable joins more intuitive. The mechanism offered in Pyrrho makes no assumptions on the format of these validators.

To make views work in this way in SQL, we need to be able to specify the resulting row type of the view and the URL to GET the source data. With some reasonable conventions on the REST interface this information can be sufficient to implement associated POST, PUT and DELETE operations (assuming that the remote data owners have made the necessary permissions available), and so the SQL view will be updatable using ordinary SQL INSERT, UPDATE and DELETE, with the  application of integrity and referential constraints, triggers etc.

The advantages for data integration are considerable, and I look forward to further work in this area.
For now, I offer a very simple example that you can try out. Suppose we have two databases A and B on different servers, and A offers a REST interface on port 8180 (see below). We suppose A is a generally-available list of employees, and B in the HR department gives details of their posts and salaries.

Let us define A and add some sample data:

create table D(eid int primary key, fname char, sname char)
insert into D values(1,'Joe','Soap'),(2,'Betty','Boop')

And B:

create table H(eid int primary key, job char, salary int)
insert into H values (1,'Cleaner',12500)
insert into H values (2,'Manager',31400)

HR will use the following mechamism to give a usable employee list:

create view W of (eid int, fname char, sname char) as get 'http://localhost:8180/A/A/D'
create view V as select * from W natural join H
select * from V

Creating the view copies no data!. And then the following changes work:
update v set fname='Elizabeth' where eid=2
insert into v(eid,fname,sname,job,salary) values(3,'Fred','Smith','Janitor',22160)
delete from v where sname='Soap'
delete from v where eid=2

The changes are applied to both databases as needed.
To set up the two servers on the local machine, you can use these commands:
For A: ospsvr
For B: ospsvr -p:5434 -s:0 -S:0
Then A uses Pyrrho's HTTP port 8180.
To set up the database A, use ospcmd A woth the above two lines.
To send the above commands for B use ospcmd -p:5434 B .

Of course, in practice things will not be quite so simple, but it is reasonable to assume some cooperation and suitable tuning from the REST providers, especially if they will be providing privileges to the composite database. In this way we can assure quality of the combined data without disturbing ownership of the data.

Wednesday, 10 August 2016

Updatable joins and views

Today's version has improvements for views and joins. Base tables referred to in views and joins can now be modified from the view or join using INSERT, UPDATE and DELETE, subject to the obvious restrictions.  To be modifable, the view and join definitions must not contain aggregations or distinct, and the join conditions must be composed of conjunctions of equality conditions.

I notice that some DBMS including Oracle only allow one base table to be modified in such an operation. Pyrrho does not have this restruction.

Thursday, 4 August 2016

Bookmarks replace Enumerators

In today's new version of Pyrrho, use of .NET Enumerator classes has been replaced with Bookmarks.
Throughout the source code in the new version you will see for statements of the form
for (var a=b.First();a!=null;a=a.Next()) {..} instead of foreach(var a in b).
This is because bookmarks cannot be used with foreach; but they bring other advantages which this posting will try to explain.
A quick reminder of how .NET's IEnumerators work: given a collection class C, GetEnumerator() returns an IEnumerator which initially does not have any value (it points before the first item in the collection). On such an IEnumerator, MoveNext() returns a boolean saying whether a move to the next item in the collection succeeds, Current gives the current member of the collection and Reset repositions the enumerator back to before the start. Evidently such enumerators contain mutable data, and MoveNext will modify the internal structure of the Enumerator class.
Pyrrho takes transactions and program state obsessively seriously. I like to control side effects of everything. Pyrrho uses immutable data structures wherever possible, e.g. for trees and even databases. Bookmarks can be implemented using immutable data (all fields are readonly) and this is a great advantage from my viewpoint.
So if b is a member of a collection class C, we can have an associated bookmark class (probably called C.Bookmark), with the following standard methods: 
  • a static method New(b). If the colection is empty New(b) will return null), otherwise it gives a bookmark for the first member of the collection. b.First() is implemented as New(b).
  • an instance method Next() which returns a bookmark for the next member of the collection, or null if there is none.
  • value() to access the current member of the collection.
I will get around to benchmarking the new version soon to see what effect on execution speed this change brings, and in the meantime I welcome comments!

Friday, 22 April 2016

Supporting transactions in REST

From the next version 5.5, Pyrrho's HTTP service and REST interface will support the RFC7232 ETag fetaure (Fielding and Reschke, 2014). The strong validator used will be compatible with Pyrrho's VERSIONED mechanism for row data, and will enable the development of layered databases using Pyrrho's REST interfaces and extended VIEWs.

I do hope other REST implementations will soon follow RFC7232 as this will make REST even more attractive for implementing linked data. There is clearly some interest in the Spring community.
ETags can be used for validating a step in a transaction.

On the other hand we need to do more to implement transaction coordination, when a Pyrrho server makes HTTP calls to another REST service. A number of systems have different mechanisms for maintaining an open transaction in such circumstances, e.g. Neo4j. But in view of Pyrrho’s rather pedantic approach to transaction isolation and durability it seems best for Pyrrho to use its own mechanism. A Pyrrho transaction coordinator will identify itself as a unique UserAgent in these circumstances, including the coordinator host, port, with a timestamp and the transaction ID. This information will enable the transaction partners to coordinate their work and it can then be used in a simple two-phase commit sequence. For example, we could make the User-Agent string contain a link with the information, such as

Pyrrho 5.5 http://abc.com:8133/635970209933434980/4652

so that a Pyrrho server participating in the transaction can keep an HTTP server thread open for operations on its local databases for this transaction, until (a) it receives a GET/POST/PUT/DELETE for / which it will interpret as Check, Prepare, Commit and Rollback respectively , or (b) the timestamp is older than 20 minutes. If anything goes wrong, it will report 409 Conflict or 410 Gone.


References

Fielding, R; Reschke, J, eds (2014): Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests, RFC7232, https://tools.ietf.org/html/rfc7232.

Tuesday, 15 March 2016

Transaction diagnostics

Most DBMS will report transaction conflicts if concurrent transactions attempt to update the same data (since this is the classic "lost update" fault). Certainly it is a necessary condition for transaction serialisation that such attempts are prevented.
There has recently been some discussion in the community about whether transaction conflicts should also be reported when data that has been read in the transaction has been changed before the transaction commits. In combination with the above condition, this is sufficient to ensure transaction serialisation (so-called view serialisability).
For over ten years, Pyrrho has enforced this very strong requirement (it is the only transaction mode it supports) although for practical reasons the full read set of a transaction is not always available, and Pyrrho will report a transaction conflict in all cases of doubt.
From today, Pyrrho provides some additional information in the standard SQL diagnostics area to identify a reason that a transaction conflict has been reported, provisionally using the keyword WITH. Because of Pyrrho's role-based data models, transactions do not necessarily agree on table names etc, and database objects in such messages are usually identified by their numerical identities.

Saturday, 5 March 2016

Threading and PyrrhoLink


The thread-safety issues when database connections are shared among threads have been well-documented for years, and as threading becomes more widespread a number of famous libraries have been affected. While it is good practice to close database connections as soon as possible, an application can have many database connections open, and connections can be shared across threads.

The following considerations are planned for the PyrrhoLink/OSPLink implementations in C# and Python from version 5.4. They provide some protection from unsafe coding, and operations will only block while the connection is busy in another thread.

ADO.Net


Many complications involving ADO.NET such as DependentTransactions are irrelevant in Pyrrho because of strong transaction isolation and serialisation.

1.       We need to enforce only one ongoing transaction per connection. The situation is really likely only to arise if the connection is shared between threads, so a call to BeginTransaction should block until the connection is available. If it is the same thread it is probably a programmer mistake (nested or parallel transactions are not supported) and an exception should be raised.

2.       We need to enforce only one open reader per connection. In the single threaded case (or on the same thread), this likely to be a misunderstanding by the programmer (ADO.NET allows only one active reader per connection), and an exception should be raised. Otherwise, the call should block until the connection is available.

3.       We should definitely disallow the sharing of commands between threads because IDbCommand methods can set up the command text, parameters etc and it is easy to see that conflicts could arise. Each IDbCommand should be specific to a given thread, but a connection can have many commands.

4.       When a command is executed, the call should block until the connection is available, as the connection might be busy in another thread.

5.       We should disallow the sharing of readers between threads, because of possible races with Read(), data fetching, and RdrClose(). We will associate each reader with the command’s thread, and cross-thread access should raise an exception.

Python


Similar blocking and safety measures will be added to the Python library OSPLink.

Act() and Versioned objects


At first sight, connection.Act(s) looks atomic and thread-safe, but the changes it makes could interfere with open Readers in the same connection. So its internal use of Command execution should make it comply with the same protocol.

Versioned objects do not use Command, but need to be subject to the same restrictions about the Connection being used. It is okay to use a versioned object in another Connection, but if that connection is busy in another thread, an operation such as Update or Delete will block until the connection is available. Versioned objects can easily be used in client-side cursor simulations.

REST


The REST API is unaffected by all the above because each call is effectively in a new Connection.

Comments from the community are welcome.