Friday 19 August 2016

Composite Databases

Composite (or Layered) Databases access data from other servers using REST. They open the possibility of integrating databases from multiple platforms incrementally using VIEWs. Data obtained from a remote system through a view is not stored locally, and if permissions have been appropriately set up, these views can be updatable.

The design should be interoperable across many platforms and DBMS, using the widely-implemented row-versioning mechanisms for DBMS and entity tags for HTTP (RFC7232). Both RVV and ETags can be strong validators that help with transaction management even in the absence of two-phase commit guarantees for transactions, and they also make updatable joins more intuitive. The mechanism offered in Pyrrho makes no assumptions on the format of these validators.

To make views work in this way in SQL, we need to be able to specify the resulting row type of the view and the URL to GET the source data. With some reasonable conventions on the REST interface this information can be sufficient to implement associated POST, PUT and DELETE operations (assuming that the remote data owners have made the necessary permissions available), and so the SQL view will be updatable using ordinary SQL INSERT, UPDATE and DELETE, with the  application of integrity and referential constraints, triggers etc.

The advantages for data integration are considerable, and I look forward to further work in this area.
For now, I offer a very simple example that you can try out. Suppose we have two databases A and B on different servers, and A offers a REST interface on port 8180 (see below). We suppose A is a generally-available list of employees, and B in the HR department gives details of their posts and salaries.

Let us define A and add some sample data:

create table D(eid int primary key, fname char, sname char)
insert into D values(1,'Joe','Soap'),(2,'Betty','Boop')

And B:

create table H(eid int primary key, job char, salary int)
insert into H values (1,'Cleaner',12500)
insert into H values (2,'Manager',31400)

HR will use the following mechamism to give a usable employee list:

create view W of (eid int, fname char, sname char) as get 'http://localhost:8180/A/A/D'
create view V as select * from W natural join H
select * from V

Creating the view copies no data!. And then the following changes work:
update v set fname='Elizabeth' where eid=2
insert into v(eid,fname,sname,job,salary) values(3,'Fred','Smith','Janitor',22160)
delete from v where sname='Soap'
delete from v where eid=2

The changes are applied to both databases as needed.
To set up the two servers on the local machine, you can use these commands:
For A: ospsvr
For B: ospsvr -p:5434 -s:0 -S:0
Then A uses Pyrrho's HTTP port 8180.
To set up the database A, use ospcmd A woth the above two lines.
To send the above commands for B use ospcmd -p:5434 B .

Of course, in practice things will not be quite so simple, but it is reasonable to assume some cooperation and suitable tuning from the REST providers, especially if they will be providing privileges to the composite database. In this way we can assure quality of the combined data without disturbing ownership of the data.

1 comment: