Monday 25 July 2011

REST and role-based data models

Despite the popularity of REST approaches, there is still no standard mechanism for automatically generating url patterns appropriate for a database. The early implementations from Microsoft had a very complex url syntax, and the later MVC3 approaches simply expect programmers to create routing URLs.
There is not even any agreed way of deciding whether a data value should be given as an attribute or a child element of a given entity. Many data modelling systems allow a distinction between data values and attributes, and this can help. Let us suppose that the data model system is extended by adding a small set of metadata flags to indicate which tables are entities and which columns are attributes or properties.
Then the following modest proposal is offered as a natural url model for a REST service:
All data segments are URLencoded so that SQL identifiers never need to be enclosed in double quotes. Matching rules are applied in the order given, but can be disambiguated using the optional $ prefixes. White space is significant (e.g. the spaces in the rules below must be a single space, and the commas should not have adjacent spaces).

http://host:port/database/role{/Selector}{/Processing}

Selector matches
[$table ]id
[$procedure ]id[(string{,string}]
[$where ]id=string
[$select ]id{,id}
[$key ]string
Appending another selector is used to restrict a list of data to match a given primary key value or named column values, or to navigate to another list by following a foreign key, or supply the current result as the parameters of a named procedure, function, or method.

Processing matches:
distinct
ascending id{,id}
descending id{,id}
skip Int_string
count Int_string

For example with an obvious data model, GET http://Sales/Sales/Orders/1234 returns a single row from the Orders table, while http://Sales/Sales/Orders/1234/OrderItem returns a list of rows from the OrderItem table, and http://Sales/Sales/Orders/1234/CUST/Customer returns the row from the Customer table.
A URL can be used to GET a single item, a list of rows or single items, PUT an update to a single items, POST a new item to a list, or DELETE a single row.
For example, PUT http://Sales/Sales/Orders/1234/DeliveryDate with posted data of 2011-07-20 will update the DeliveryDate cell in a row of the Orders table.
POST http://Sales/Sales/Orders will create a new row in the Orders table: the posted data should contain the XML version of the row. In all cases the new primary key value will be contained in the Http response.

Generated columns and updatable views

When developing the conceptual level of a database, new structure can be added by using views and generated columns. The programming model for this in the SQL standard is theoretically complete, but in practical examples the standard machinery using triggers can seem over-complex.
Recent programming frameworks (e.g. .NET 4) have focussed instead on providing up to 4 pieces of SQL for the GET, PUT, POST and DELETE operations for any named structure. This suggests an extension to the CREATE VIEW syntax as follows:
CREATE VIEW id AS QueryExpression [UPDATE SqlStatement] [INSERT SqlStatement] [DELETE SqlStatement]
Similarly the syntax for a generated column can be extended:
GenerationRule = GENERATED ALWAYS AS '('Value')' [UPDATE '(' Assignments ')' ]
The implementation of these ideas is extremely straightfoward so they are good candidates for inclusion in the next version of Pyrrho.
More importantly, together with the following syntax extensions:
SqlStatement =| HTTP (POST|UPDATE|DELETE) url ['('Value')']
QueryExpression =| HTTP GET url .
we can see how a local database include or refer to data selected from an external database.
It is quite interesting to consider the effect of such extensions on the transaction model. One reasonable approach would be to recommend that for long transactions database designers should use Web-based transaction control as in JAX-RS. Another reasonable suggestion would be that such web requests should be tunnelled in duplex streams established for each web host referenced in the transaction, and this would be quite suitable for short transactions. For example, instead of the sequence
PUT http://myhost/mydb/url1 (+20)
PUT http://myhost/mydb/url2 (-20)
send
POST http://myhost/mydb/strtrans (PUT /url1 (+20) PUT /url2 (-20) COMMIT)

Saturday 9 July 2011

Reflection and Relationships

Relational databases need to provide support two-way relationships especially now that such relationships are so easily navigated using REST-style URLs. This article proposes a simple approach to defining path navigation in the relational database at physical or conceptual level. The resulting generated columns (or properties) are of type ROW ARRAY and are quite difficult to use in SQL, but very easy from REST.
Interestingly the Java Persistence Query Language provides query language support for such two-way relationships, but required a lot of annotation infrastructure.
For example (inspired by the Java Tutorial), suppose we have tables
player: (pid int primary key, pname char)
team: (tid int primary key, tname char, capt int references player)
playerteam: (pid references player, tid references team)
This club has a number of teams for different purposes, and players can belong to (or even captain) more than one team. Then the syntax proposed here would allow us to write definitions as simple as
alter table player add captains reflects team
alter table player add playsin reflects playerteam
Then in the REST interface we can write
http://myclub.org/db/db/player(pname=’Fred Bloggs’)/captains/tname
to obtain the names of all the teams Fred captains.
The syntax is one of the alternatives for ColumnDefinition, and defines a generated-always column:
| id REFLECTS [Cols IN] Table_id [Cols]
Recall that ColumnDefinition is also used to define a property in a role-based data model, and of course in that context entity and property identifiers can be used in place of table and column identifiers.
The REFLECTS definition sets up an array-valued generated column as a reverse reference: e.g. if t.f references g(i) then “alter g add h reflects t” is a shorthand for “alter g add h row t array generated always as (select * from t where f=i)”. That is, each cell in the generated column contains a list of rows from the referenced table.
The optional [Cols] following the referenced Table_id allows selection of a particular foreign key if there is more than one.
For many-many relationships the referenced table will be a secondary table, so that the type of the generated column is defined only implicitly. Still the basic idea is simple enough. Suppose we have a many-many relationship R between tables A and B. In a relational database this will be implemented by means of a secondary table AB consisting of pairs of primary keys for A and B. In both A and B we can easily define additional generated columns for our convenience (e.g. for REST navigation):
alter table A add X reflects AB
alter table B add Y reflects AB
Then in each row of A, X will be a list of rows from table B; and in each row of B, Y will be a list of rows from table A. In practice we would probably name these properties B and A rather than X and Y to make the resulting REST-based navigation more intuitive.
Again, the optional column specification [Cols IN] allows selection of a particular foreign key in the secondary table. In the above player and team example we could define
alter table player add captains reflects team(capt)
alter table team add players reflects (pid) in playerteam(tid)
The full details of the [Cols IN] option are as follows. [Cols IN] is only permitted in relationships that navigate a secondary table. In a definition such as “alter S add C reflects (TF) in MM (SF)”, MM will be a secondary table implementing a many-many relationship between table S and some other table T, containing (at least) foreign keys SF and F where TF references the target table T and SF references the source table S. Then the values of C will be lists of rows of T.
The final [Cols] syntax allows selection of a foreign key that references the source table, and can be used in any REFLECTS construct.

Wednesday 6 July 2011

On concurrency tokens

Microsoft states for .NET Framework 4: "WCF Data Services supports the optimistic concurrency model by enabling you to define a concurrency token for an entity. This concurrency token, which includes one or more properties of the entity, is used by the data service to determine whether a change has occurred in the data that is being requested, updated, or deleted."
Now this is interesting as it suggests a growing realisation that more needs to be done about optimistic concurrency. But alas, all that is done here is to declare a property as useful for applications to test: it comes with no guarantees.
Pyrrho already has (mandatory) optimistic concurrency control, but still, it is an attractive notion to have a test to perform during a transaction ("is it already too late to commit?").
So from the next version (4.5) of Pyrrho, there will be a boolean property of transactions. Requesting the value of Transaction.Conflict will cause a server round trip to discover if other transaction commits have already made committing this transaction impossible.
Whatever the result of this test, the Transaction will remain open as it probably still contains useful information, until Commit or Rollback is called.

Tuesday 5 July 2011

Role-based Data Models

It does appear that Entity Frameworks loom ever larger in the data access world, whether they are used in a code-first or model-first way. But any database can support several data models.
It has been a common occurrence that Java programmers will use different data models (as defined by annotations in their program) to access the same database. This is because annotations can provide different entity and property names and map their relationships in different ways. Pyrrho (up to version 4.4) therefore regarded the data model in use as a thread-specific structure. This now seems quite the wrong thing to do.
Instead, Pyrrho now takes its cue from the business analytics and business modelling field, and considers that it will be best practice for each role to have a different view of the data in the database, and it is at such a conceptual model level that entity modelling should take place.
So, from version 4.5 of Pyrrho, a data model will be regarded as a property of a Role. The physical database defines a default data model, and an associated default Role. This default Role treats each base table with a primary key as an entity, and each column of such a table as a property, and allows navigation using foreign keys. Multi-column keys are handled as lists whose property name is the reserved word KEY.
[Updated 2 August: Users with admin access to a role can create new database objects for their role. Granting access to database objects for a Role copies the necessary entitiy and property definitions into the Role's data model. The data model that is applied is determined by the current Role: this is initially set in the connection but can be changed during the session to any other role that the user has been granted.]
The data model can be modified for a Role by hiding entity types or properties, or defining entity types and properties by renaming inherited entities and properties tables or views and their columns. Descriptive text can be associated with the entity and property definitions. An additional syntax for ALTER ROLE has been created for providing this facility.
This allows data models to have more user-friendly names and allows Roles to be localised for different languages. Pyrrho 4.5 enhances generated columns by introducing updatable generated columns in addition to updatable views so additional derived columns can be added to the database schema where required for making data models more natural. It is a feature of Pyrrho that such derived columns are only computed when required, so there is little overhead in using them.
[Added 2 August: The SQL contained within e.g. stored procedures executes under definer's rights. Although there is only one copy of the SQL in the database, a developer viewing the SQL will see a version adapted to their current data model. The SQL will execute with definer's rights, although the code will reflect the user's data model. The resulting dependencies between roles and their data models are tracked by Pyrrho, and the usual restrict/cascade protocols apply to changes to the data model schema.]
The data model can be retrieved from the database is the usual way in EDMX format, and will determine the URL mapping used in Pyrrho's new secure REST service. SQL will of course continue to be applied at the level of base tables. Pyrrho's implementation of LINQ and Java Persistence will also use the new role-based data models: for these APIs the change will only be noticed if a Role is specified in the connection string.
In Pyrrho, strong types, including types with embedded metadata, will continue to be supported in the physical database. The new derived column machinery can present the same physical data as different types as appropriate for role-based data models.
Over the summer I hope to publish more articles on the version 4.5 goodies, including some better support for role base security.