Friday, 19 August 2016

Composite Databases

Composite (or Layered) Databases access data from other servers using REST. They open the possibility of integrating databases from multiple platforms incrementally using VIEWs. Data obtained from a remote system through a view is not stored locally, and if permissions have been appropriately set up, these views can be updatable.

The design should be interoperable across many platforms and DBMS, using the widely-implemented row-versioning mechanisms for DBMS and entity tags for HTTP (RFC7232). Both RVV and ETags can be strong validators that help with transaction management even in the absence of two-phase commit guarantees for transactions, and they also make updatable joins more intuitive. The mechanism offered in Pyrrho makes no assumptions on the format of these validators.

To make views work in this way in SQL, we need to be able to specify the resulting row type of the view and the URL to GET the source data. With some reasonable conventions on the REST interface this information can be sufficient to implement associated POST, PUT and DELETE operations (assuming that the remote data owners have made the necessary permissions available), and so the SQL view will be updatable using ordinary SQL INSERT, UPDATE and DELETE, with the  application of integrity and referential constraints, triggers etc.

The advantages for data integration are considerable, and I look forward to further work in this area.
For now, I offer a very simple example that you can try out. Suppose we have two databases A and B on different servers, and A offers a REST interface on port 8180 (see below). We suppose A is a generally-available list of employees, and B in the HR department gives details of their posts and salaries.

Let us define A and add some sample data:

create table D(eid int primary key, fname char, sname char)
insert into D values(1,'Joe','Soap'),(2,'Betty','Boop')

And B:

create table H(eid int primary key, job char, salary int)
insert into H values (1,'Cleaner',12500)
insert into H values (2,'Manager',31400)

HR will use the following mechamism to give a usable employee list:

create view W of (eid int, fname char, sname char) as get 'http://localhost:8180/A/A/D'
create view V as select * from W natural join H
select * from V

Creating the view copies no data!. And then the following changes work:
update v set fname='Elizabeth' where eid=2
insert into v(eid,fname,sname,job,salary) values(3,'Fred','Smith','Janitor',22160)
delete from v where sname='Soap'
delete from v where eid=2

The changes are applied to both databases as needed.
To set up the two servers on the local machine, you can use these commands:
For A: ospsvr
For B: ospsvr -p:5434 -s:0 -S:0
Then A uses Pyrrho's HTTP port 8180.
To set up the database A, use ospcmd A woth the above two lines.
To send the above commands for B use ospcmd -p:5434 B .

Of course, in practice things will not be quite so simple, but it is reasonable to assume some cooperation and suitable tuning from the REST providers, especially if they will be providing privileges to the composite database. In this way we can assure quality of the combined data without disturbing ownership of the data.

Wednesday, 10 August 2016

Updatable joins and views

Today's version has improvements for views and joins. Base tables referred to in views and joins can now be modified from the view or join using INSERT, UPDATE and DELETE, subject to the obvious restrictions.  To be modifable, the view and join definitions must not contain aggregations or distinct, and the join conditions must be composed of conjunctions of equality conditions.

I notice that some DBMS including Oracle only allow one base table to be modified in such an operation. Pyrrho does not have this restruction.

Thursday, 4 August 2016

Bookmarks replace Enumerators

In today's new version of Pyrrho, use of .NET Enumerator classes has been replaced with Bookmarks.
Throughout the source code in the new version you will see for statements of the form
for (var a=b.First();a!=null;a=a.Next()) {..} instead of foreach(var a in b).
This is because bookmarks cannot be used with foreach; but they bring other advantages which this posting will try to explain.
A quick reminder of how .NET's IEnumerators work: given a collection class C, GetEnumerator() returns an IEnumerator which initially does not have any value (it points before the first item in the collection). On such an IEnumerator, MoveNext() returns a boolean saying whether a move to the next item in the collection succeeds, Current gives the current member of the collection and Reset repositions the enumerator back to before the start. Evidently such enumerators contain mutable data, and MoveNext will modify the internal structure of the Enumerator class.
Pyrrho takes transactions and program state obsessively seriously. I like to control side effects of everything. Pyrrho uses immutable data structures wherever possible, e.g. for trees and even databases. Bookmarks can be implemented using immutable data (all fields are readonly) and this is a great advantage from my viewpoint.
So if b is a member of a collection class C, we can have an associated bookmark class (probably called C.Bookmark), with the following standard methods: 
  • a static method New(b). If the colection is empty New(b) will return null), otherwise it gives a bookmark for the first member of the collection. b.First() is implemented as New(b).
  • an instance method Next() which returns a bookmark for the next member of the collection, or null if there is none.
  • value() to access the current member of the collection.
I will get around to benchmarking the new version soon to see what effect on execution speed this change brings, and in the meantime I welcome comments!