Thursday, 6 April 2017

Anonymous row types and monotonic functions

A previous blog post explained about the usefulness of mutually inverse function pairs. It is also useful to know if a function is monotonic. Many joins can be speeded up if the join condition uses a function that is known to be monotonic, so that the table operands are then automatically sorted before forming the join.
Most CASTs are monotonic, and Pyrrho now allows a user-defined function to be declared MONOTONIC through the use of metadata.
For example
create function f(b date) returns (int,int) monotonic return (extract(year from b),extract(month from b))
This function could be used to help join a table with a date primary key to one with a primary key containing year and month, using a join condition such as WHERE (y,m)=f(b) .
In this example we also two small innovations in the current version, as (a) the return type of the function is an anonymous row type, (b) no real distinction is made in Pyrrho now between ON conditions in inner joins, and WHERE conditions in cross joins.
In recent versions of the SQL standard ON conditions can be any boolean expression, so by default Pyrrho combines them and decides which is faster, and whether an index can be used to speed things up.

Friday, 31 March 2017

Adapter functions, updatable views and joins

With the introduction of REST Views it has become more interesting to support updatable views and joins, and more important to support adapter functions. The Beta version 5.7 of Pyrrho makes some important contributions in these aspects.
In the following sample, we model a situation where databases have columns that nearly match but do not quite do so. For example suppose Table A(B,C) has an integer primary key B, while table D(E,G) uses corresponding string values all prefixed by the letter H as its primary key E. It is easy to define a suitable adapter function F

create function f(x int) returns char return 'H'|| cast(x as char)

and create a view of A that makes sense in D's database:

create view v as select f(b) as fb,c from a

D, however, wishes to be able to update table A, so would like this view to be updatable. Pyrrho allows the declaration of an inverse of F:

create function f1(a char) returns int inverts f return cast(substring(a from 1) as int)

where the phrase "inverts f" is parsed as metadata for the new function f1 (otherwise everything so far is ordinary standard SQL). This makes f and f1 into mutually inverse functions, that is, declaring f1 as an inverse for f also declares f as an inverse of f1. The machinery works for multiple parameters using row types for the return values.

This makes v updatable and we can write

insert into v values('H91','This is new')

We can also use such adapter functions in referential constraints, e.g. a slight extension to SQL in Pyrrho allows us to declare the above relationship between tables A and D as follows:

create table d (e char references a using f1,g int)

The following SQL sample code demonstrating the above ideas is supported by Pyrrho 5.7 as of today.

create function f(x int) returns char return 'H'|| cast(x as char)
[create function f1(a char) returns int inverts f return cast(substring(a from 1) as int)]
select f(45) from static
select f1('B67') from static
create table a(b int primary key,c char)
insert into a values(23,'Twenty3'),(42,'Forty2')
create view v as select f(b) as fb,c from a
select * from v
insert into v values('H91','This is new')
table a
create table d (e char references a using f1,g int)
insert into d values('H23',234),('H91',567)
create view w as select c,e from a left join d on b=f1(e)
delete from w where c='Twenty3'
table a


Friday, 17 March 2017

Composite Database example with details

This post gives more details of the example contained in the posting Composite Databases in this blog.

The following transcripts use the beta version 5.7 of Pyrrho dated 16 March 2017, and using localhost instead of servA, with servers A and B using folders \A and \B respectively. I have set a debugging –D flag on server A so that that we can see the use of RVVs and ETags.
We begin by setting up the databases on servers A and B:
Database A:
create table D (e int primary key, f char, g char)


insert into D values (1,'Joe','Soap'), (2,'Betty','Boop')

Database B:

create table H (e int primary key, k char, m int)
insert into H values (1,'Cleaner',12500), (2,'Manager',31400)
[create view W of (e int, f char, g char) as get
'http://localhost:8180/A/A/D']

create view V as select * from W natural join H

The square brackets here are added because of the embedded newline added in the formatting of the page.
After setting up the databases on A and B with B’s views defined, we see the transaction log contents for A and B. Some of the numbers shown will be used in RVV and ETags in what follows.



We see at position 381 that the URL http://localhost:8180/A/A/D has been provided in metadata for the view W. W was defined in position 366 in terms of the anonymous structure with columns E, F, G declared at position 290.

The use of position numbers instead of identifiers in the definition of view V at position 439 is a standard feature of Pyrrho to allow renaming of objects.
In the blog post we now have the following operations on database B:
select e,f,m,check from V where e=1

Note that in normal use there is no need to request the check pseudocolumn: it is here so can show what is happening within the two databases. The database API uses it to implement the Versioned feature in client side “database model” classes.
Here, the check value was requested explicitly in the SELECT statement, and shows that this row of the join uses a row from A with defining position 209 placed there in transaction 193 (see the log for A) and a row from B with defining position 208 arising from transaction 192 (see the log for B).
The debug information for server A shows the REST request from B to A, and the ETag it constructed.

The ETag consists of an RVV for the first row of the result (mentioned above), and a readCheck for the read operation carried out by A. This was a specific row in table D (position 69) with key (1) .
The next operation is
update v set f='Elizabeth' where e=2


We see there is now an updated ETag supplied by A showing the new transaction that has updated the record defined at 241.
Also check B’s view using the join:

The next operation is an Insert into the View/RestView/Join combination:
[insert into v(e,f,g,k,m)
values(3,'Fred','Smith','Janitor',22160)]



This time the readCheck information indicates it will conflict with any read operation on table 69.
And again verifying the view from B:


Finally, we try a deletion from the View/RestView/Join combination:



Thursday, 8 September 2016

Improvements to the Object-Oriented API

For years now, Pyrrho has offered an object-oriented API for C#, and will generate a C# class definition for any base table for any given database and role. Tutorials offered demos of how these classes could be extended by adding in relational data, and over the years versioning was added so that all such C# classes are subclasses of Versioned.
The mechanisms have now been enhanced in the following important ways:
  1. Full class definitions are now available for Views and structured types as well as base tables,
  2. Deep structure is supported: Pyrrho will define nested classes for subjects in tables or views, and for user-defined types. (All such classes are Versioned too)
  3. The Versioned base class now includes a readCheck in addition to a row Check.
As discussed in this blog, for base tables, the row Check is a row version validator in the sense of Laiho and Laux's paper, and for other data may be an ETag in the sense of RFC 7232. In complex cases (e.g. results that do not correspond directly to base table rows) either or both may be empty.
The readCheck is for checking later whether the query used to obtain the data would get the same results.
Class definitions can be found in the "Role$Class" table.Such a class C can then be used with the database connection with the following methods added to the usual IDbConnection interface:
  • C FindOne()
  • C[] FindWith(string where)
  • C[] FindAll()
  • Post(C s)
  • Put(C s)
  • Delete(C s)
  • Check(string check)
  • Check(string check,string readCheck)
Triggers and generated columns apply, and Post and Put update affected fields of s along with the Versioned values.

Monday, 5 September 2016

ETags and Row version validation

Web resources often come with semantically useful information (starting with the URI), for example RDF [1] or OWL types [2]. HTTP now offers ETags (Fielding and Reschke, 2014, RFC7232) in responses to allow caching of results, and ETags can be used for validating a step in a transaction. ETags are very similar to the RVV concept[3] or MVCC [4], successfully used in PostGRES [5], SQL server [6] Oracle [7] etc to provide optimistic execution.  
The purposes of the two concepts, however, remain different, and become more clearly visible when we try to generalise their use to more complex situations such as composite databases, updatable views, joins, and general queries.
For Pyrrho in simple cases the ETag can coincide with the strong RVV validator (CHECK) for requests that return a row of a base table: so implementing this feature amounts to an extension of the Pyrrho API's Versioned mechanism (See also this discussion on REST in Spring).
In this blog I have previously suggested that RVV should be updated if related data changes, but I have been persuaded that the validator should just be for the contents of the row itself.
In that case, I want to extend the idea of RVV to updatable views and joins. In such objects, each row has a direct relationship with specific base table rows, so an RVV for the view will reflect changes to these rows.
It is a moot point whether the result of a query is a Web resource. It is possible that it contains nothing directly copied from any base table, but if an ETag was returned with the result it would contain the information that the result has not changed since the last time the it was computed.
Also, we should bget the same versioning information from a View that we would get with the associated SELECT statement.
So, consider the following:

(a) RVV

The RVV if present includes row version validators for any base table row whose values appear in the result row.
The RVV will be absent for a row that does not identify any single row in a contributing base table (e.g. some of the rows in select distinct or  group by). RVVs are computed for base tables using the Laiho/Laux algorithms, and indicate the transaction point that last modified the values of the row in the base table. For query results, an RVV is copied into a result row by the first contributing row and removed if more than one row contributes.
RVV can be used to identify a set of rows to be deleted or updated, and so have an immediate use in the API in an application-side object to identify corresponding individual base table row(s) in the database.

(b) ETag

The ETag if present indicates the latest transaction points of the values used to compute the values in the row.
The ETag will be absent if the row does not come from any database contents (e.g. the row consists of constants, current_time, current_user etc). There is an obvious algorithm for computing ETags from RVVs. For rows coming from base tables the ETag is the same as the RVV. When another row contributes, the ETag becomes the later transaction point.
ETags can be used for HTTP caching as described in RFC7232. However, we note that for complex queries the server would have to recompute the result to see if the ETag is valid.
If results come from more than one database, RVVs and ETags will be lists of values, at most one from each database.
From today, The CHECK pseudofunction in Pyrrho gives the RVV if defined for a row, and the ETag returned with a REST result is for the first result in the result set. If an ETag is supplied with a REST request, it is checked for validity.
The RVV is also used in the Pyrrho API, so that Find<C> and similar functions return Versioned objects that can be used with the API functions Put, Post, and Delete, as described in the Pyrrho manual.

[1] W3's RDF documentation
[2]W3's OWL documentation
[3 Laiho,M, Laux, F: (2010) Implementing Optimistic Concurrency Control for Persistence Middleware Using Row Version Verification
[4]MVCC
[5]Amit Kapila, (2015) Well-known Databases Use Different Approaches for MVCC
[6]Steve Howard (2012) SQL Server MVCC with read_committed_snapshot
[7]Oracle Database Concepts: Chapter 13 Data Concurrency and Consistency
[8]The concept of strong validator dates from sec 13 of the HTTP/1.1 definition (Fielding et al 1999)

Friday, 19 August 2016

Composite Databases

Composite (or Layered) Databases access data from other servers using REST. They open the possibility of integrating databases from multiple platforms incrementally using VIEWs. Data obtained from a remote system through a view is not stored locally, and if permissions have been appropriately set up, these views can be updatable.

The design should be interoperable across many platforms and DBMS, using the widely-implemented row-versioning mechanisms for DBMS and entity tags for HTTP (RFC7232). Both RVV and ETags can be strong validators that help with transaction management even in the absence of two-phase commit guarantees for transactions, and they also make updatable joins more intuitive. The mechanism offered in Pyrrho makes no assumptions on the format of these validators.

To make views work in this way in SQL, we need to be able to specify the resulting row type of the view and the URL to GET the source data. With some reasonable conventions on the REST interface this information can be sufficient to implement associated POST, PUT and DELETE operations (assuming that the remote data owners have made the necessary permissions available), and so the SQL view will be updatable using ordinary SQL INSERT, UPDATE and DELETE, with the  application of integrity and referential constraints, triggers etc.

The advantages for data integration are considerable, and I look forward to further work in this area.
For now, I offer a very simple example that you can try out. Suppose we have two databases A and B on different servers, and A offers a REST interface on port 8180 (see below). We suppose A is a generally-available list of employees, and B in the HR department gives details of their posts and salaries.

Let us define A and add some sample data:

create table D(eid int primary key, fname char, sname char)
insert into D values(1,'Joe','Soap'),(2,'Betty','Boop')

And B:

create table H(eid int primary key, job char, salary int)
insert into H values (1,'Cleaner',12500)
insert into H values (2,'Manager',31400)

HR will use the following mechamism to give a usable employee list:

create view W of (eid int, fname char, sname char) as get 'http://localhost:8180/A/A/D'
create view V as select * from W natural join H
select * from V

Creating the view copies no data!. And then the following changes work:
update v set fname='Elizabeth' where eid=2
insert into v(eid,fname,sname,job,salary) values(3,'Fred','Smith','Janitor',22160)
delete from v where sname='Soap'
delete from v where eid=2

The changes are applied to both databases as needed.
To set up the two servers on the local machine, you can use these commands:
For A: ospsvr
For B: ospsvr -p:5434 -s:0 -S:0
Then A uses Pyrrho's HTTP port 8180.
To set up the database A, use ospcmd A woth the above two lines.
To send the above commands for B use ospcmd -p:5434 B .

Of course, in practice things will not be quite so simple, but it is reasonable to assume some cooperation and suitable tuning from the REST providers, especially if they will be providing privileges to the composite database. In this way we can assure quality of the combined data without disturbing ownership of the data.

Wednesday, 10 August 2016

Updatable joins and views

Today's version has improvements for views and joins. Base tables referred to in views and joins can now be modified from the view or join using INSERT, UPDATE and DELETE, subject to the obvious restrictions.  To be modifable, the view and join definitions must not contain aggregations or distinct, and the join conditions must be composed of conjunctions of equality conditions.

I notice that some DBMS including Oracle only allow one base table to be modified in such an operation. Pyrrho does not have this restruction.