Friday, 19 August 2016

Composite Databases

Composite (or Layered) Databases access data from other servers using REST. They open the possibility of integrating databases from multiple platforms incrementally using VIEWs. Data obtained from a remote system through a view is not stored locally, and if permissions have been appropriately set up, these views can be updatable.

The design should be interoperable across many platforms and DBMS, using the widely-implemented row-versioning mechanisms for DBMS and entity tags for HTTP (RFC7232). Both RVV and ETags can be strong validators that help with transaction management even in the absence of two-phase commit guarantees for transactions, and they also make updatable joins more intuitive. The mechanism offered in Pyrrho makes no assumptions on the format of these validators.

To make views work in this way in SQL, we need to be able to specify the resulting row type of the view and the URL to GET the source data. With some reasonable conventions on the REST interface this information can be sufficient to implement associated POST, PUT and DELETE operations (assuming that the remote data owners have made the necessary permissions available). We should expect that the mechanisms allow for the application of integrity and referential constraints, the application of triggers etc.

The advantages for data integration are considerable, and I look forward to further work in this area.
For now, I offer a very simple example that you can try out. Suppose we have two databases A and B on different servers, and A offers a REST interface on port 8180 (see below). We suppose A is a generally-available list of employees, and B in the HR department gives details of their posts and salaries.

Let us define A and add some sample data:

create table D(eid int primary key, fname char, sname char)
insert into D values(1,'Joe','Soap'),(2,'Betty','Boop')

And B:

create table H(eid int primary key, job char, salary int)
insert into H values (1,'Cleaner',12500)
insert into H values (2,'Manager',31400)

HR will use the following mechamism to give a usable employee list:

create view W of (eid int, fname char, sname char) as get 'http://localhost:8180/A/A/D'
create view V as select * from W natural join H
select * from V

Creating the view copies no data!. And then the following changes work:
update v set fname='Elizabeth' where eid=2
insert into v(eid,fname,sname,job,salary) values(3,'Fred','Smith','Janitor',22160)
delete from v where sname='Soap'
delete from v where eid=2

The changes are applied to both databases as needed.
To set up the two servers on the local machine, you can use these commands:
For A: ospsvr
For B: ospsvr -p:5434 -s:0 -S:0
Then A uses Pyrrho's HTTP port 8180.
To set up the database A, use ospcmd A woth the above two lines.
To send the above commands for B use ospcmd -p:5434 B .

Of course, in practice things will not be quite so simple, and column renaming, roles, adapter functions etc can be used to overcome differences between the physical databases.

Wednesday, 10 August 2016

Updatable joins and views

Today's version has improvements for views and joins. Base tables referred to in views and joins can now be modified from the view or join using INSERT, UPDATE and DELETE, subject to the obvious restrictions.  To be modifable, the view and join definitions must not contain aggregations or distinct, and the join conditions must be composed of conjunctions of equality conditions.

I notice that some DBMS including Oracle only allow one base table to be modified in such an operation. Pyrrho does not have this restruction.

Thursday, 4 August 2016

Bookmarks replace Enumerators

In today's new version of Pyrrho, use of .NET Enumerator classes has been replaced with Bookmarks.
Throughout the source code in the new version you will see for statements of the form
for (var a=b.First();a!=null;a=a.Next()) {..} instead of foreach(var a in b).
This is because bookmarks cannot be used with foreach; but they bring other advantages which this posting will try to explain.
A quick reminder of how .NET's IEnumerators work: given a collection class C, GetEnumerator() returns an IEnumerator which initially does not have any value (it points before the first item in the collection). On such an IEnumerator, MoveNext() returns a boolean saying whether a move to the next item in the collection succeeds, Current gives the current member of the collection and Reset repositions the enumerator back to before the start. Evidently such enumerators contain mutable data, and MoveNext will modify the internal structure of the Enumerator class.
Pyrrho takes transactions and program state obsessively seriously. I like to control side effects of everything. Pyrrho uses immutable data structures wherever possible, e.g. for trees and even databases. Bookmarks can be implemented using immutable data (all fields are readonly) and this is a great advantage from my viewpoint.
So if b is a member of a collection class C, we can have an associated bookmark class (probably called C.Bookmark), with the following standard methods: 
  • a static method New(b). If the colection is empty New(b) will return null), otherwise it gives a bookmark for the first member of the collection. b.First() is implemented as New(b).
  • an instance method Next() which returns a bookmark for the next member of the collection, or null if there is none.
  • value() to access the current member of the collection.
I will get around to benchmarking the new version soon to see what effect on execution speed this change brings, and in the meantime I welcome comments!

Friday, 22 April 2016

Supporting transactions in REST

From the next version 5.5, Pyrrho's HTTP service and REST interface will support the RFC7232 ETag fetaure (Fielding and Reschke, 2014). The strong validator used will be compatible with Pyrrho's VERSIONED mechanism for row data, and will enable the development of layered databases using Pyrrho's REST interfaces and extended VIEWs.

I do hope other REST implementations will soon follow RFC7232 as this will make REST even more attractive for implementing linked data. There is clearly some interest in the Spring community.
ETags can be used for validating a step in a transaction.

On the other hand we need to do more to implement transaction coordination, when a Pyrrho server makes HTTP calls to another REST service. A number of systems have different mechanisms for maintaining an open transaction in such circumstances, e.g. Neo4j. But in view of Pyrrho’s rather pedantic approach to transaction isolation and durability it seems best for Pyrrho to use its own mechanism. A Pyrrho transaction coordinator will identify itself as a unique UserAgent in these circumstances, including the coordinator host, port, with a timestamp and the transaction ID. This information will enable the transaction partners to coordinate their work and it can then be used in a simple two-phase commit sequence. For example, we could make the User-Agent string contain a link with the information, such as

Pyrrho 5.5 http://abc.com:8133/635970209933434980/4652

so that a Pyrrho server participating in the transaction can keep an HTTP server thread open for operations on its local databases for this transaction, until (a) it receives a GET/POST/PUT/DELETE for / which it will interpret as Check, Prepare, Commit and Rollback respectively , or (b) the timestamp is older than 20 minutes. If anything goes wrong, it will report 409 Conflict or 410 Gone.


References

Fielding, R; Reschke, J, eds (2014): Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests, RFC7232, https://tools.ietf.org/html/rfc7232.

Tuesday, 15 March 2016

Transaction diagnostics

Most DBMS will report transaction conflicts if concurrent transactions attempt to update the same data (since this is the classic "lost update" fault). Certainly it is a necessary condition for transaction serialisation that such attempts are prevented.
There has recently been some discussion in the community about whether transaction conflicts should also be reported when data that has been read in the transaction has been changed before the transaction commits. In combination with the above condition, this is sufficient to ensure transaction serialisation (so-called view serialisability).
For over ten years, Pyrrho has enforced this very strong requirement (it is the only transaction mode it supports) although for practical reasons the full read set of a transaction is not always available, and Pyrrho will report a transaction conflict in all cases of doubt.
From today, Pyrrho provides some additional information in the standard SQL diagnostics area to identify a reason that a transaction conflict has been reported, provisionally using the keyword WITH. Because of Pyrrho's role-based data models, transactions do not necessarily agree on table names etc, and database objects in such messages are usually identified by their numerical identities.

Saturday, 5 March 2016

Threading and PyrrhoLink


The thread-safety issues when database connections are shared among threads have been well-documented for years, and as threading becomes more widespread a number of famous libraries have been affected. While it is good practice to close database connections as soon as possible, an application can have many database connections open, and connections can be shared across threads.

The following considerations are planned for the PyrrhoLink/OSPLink implementations in C# and Python from version 5.4. They provide some protection from unsafe coding, and operations will only block while the connection is busy in another thread.

ADO.Net


Many complications involving ADO.NET such as DependentTransactions are irrelevant in Pyrrho because of strong transaction isolation and serialisation.

1.       We need to enforce only one ongoing transaction per connection. The situation is really likely only to arise if the connection is shared between threads, so a call to BeginTransaction should block until the connection is available. If it is the same thread it is probably a programmer mistake (nested or parallel transactions are not supported) and an exception should be raised.

2.       We need to enforce only one open reader per connection. In the single threaded case (or on the same thread), this likely to be a misunderstanding by the programmer (ADO.NET allows only one active reader per connection), and an exception should be raised. Otherwise, the call should block until the connection is available.

3.       We should definitely disallow the sharing of commands between threads because IDbCommand methods can set up the command text, parameters etc and it is easy to see that conflicts could arise. Each IDbCommand should be specific to a given thread, but a connection can have many commands.

4.       When a command is executed, the call should block until the connection is available, as the connection might be busy in another thread.

5.       We should disallow the sharing of readers between threads, because of possible races with Read(), data fetching, and RdrClose(). We will associate each reader with the command’s thread, and cross-thread access should raise an exception.

Python


Similar blocking and safety measures will be added to the Python library OSPLink.

Act() and Versioned objects


At first sight, connection.Act(s) looks atomic and thread-safe, but the changes it makes could interfere with open Readers in the same connection. So its internal use of Command execution should make it comply with the same protocol.

Versioned objects do not use Command, but need to be subject to the same restrictions about the Connection being used. It is okay to use a versioned object in another Connection, but if that connection is busy in another thread, an operation such as Update or Delete will block until the connection is available. Versioned objects can easily be used in client-side cursor simulations.

REST


The REST API is unaffected by all the above because each call is effectively in a new Connection.

Comments from the community are welcome.

Tuesday, 15 September 2015

A Python API for Pyrrho

As of today, OSPLink.py is available in the distribution and enables the open-source Pyrrho server OSPSvr to be accessed from Python 3.4 clients. The API has similarities to Pyrrho’s version of ADO.NET.
To use OSPLink.py, place it in the same folder as your Python script.

(Update 30 Sept 2015: OSPLink.py now includes Python implementations of the AWebSvr classes from TAWQT.com and does not depend on .NET. There is a tutorial on the PyrrhoDB website containing a fully-worked-out sample.)

For example (assuming OSPSvr.exe is running on the local machine):

from OSPLink import *
conn = PyrrhoConnect("Files=Temp;User=Fred")
conn.open()
try:
    conn.act("create table a(b date)")
except DatabaseError as e:
    print(e.message)
conn.act("insert into a values(current_date)")
com = conn.createCommand()
com.commandText = 'select * from a'
rdr = com.executeReader()
while rdr.read():
    print(rdr.val(0))
rdr.close()
print("Done")


For full details, see section 6.8 of the PyrrhoDB Manual.