Monday, 5 September 2016

ETags and Row version validation

Web resources often come with semantically useful information (starting with the URI), for example RDF [1] or OWL types [2]. HTTP now offers ETags (Fielding and Reschke, 2014, RFC7232) in responses to allow caching of results, and ETags can be used for validating a step in a transaction. ETags are very similar to the RVV concept[3] or MVCC [4], successfully used in PostGRES [5], SQL server [6] Oracle [7] etc to provide optimistic execution.  
The purposes of the two concepts, however, remain different, and become more clearly visible when we try to generalise their use to more complex situations such as composite databases, updatable views, joins, and general queries.
For Pyrrho in simple cases the ETag can coincide with the strong RVV validator (CHECK) for requests that return a row of a base table: so implementing this feature amounts to an extension of the Pyrrho API's Versioned mechanism (See also this discussion on REST in Spring).
In this blog I have previously suggested that RVV should be updated if related data changes, but I have been persuaded that the validator should just be for the contents of the row itself.
In that case, I want to extend the idea of RVV to updatable views and joins. In such objects, each row has a direct relationship with specific base table rows, so an RVV for the view will reflect changes to these rows.
It is a moot point whether the result of a query is a Web resource. It is possible that it contains nothing directly copied from any base table, but if an ETag was returned with the result it would contain the information that the result has not changed since the last time the it was computed.
Also, we should bget the same versioning information from a View that we would get with the associated SELECT statement.
So, consider the following:

(a) RVV

The RVV if present includes row version validators for any base table row whose values appear in the result row.
The RVV will be absent for a row that does not identify any single row in a contributing base table (e.g. some of the rows in select distinct or  group by). RVVs are computed for base tables using the Laiho/Laux algorithms, and indicate the transaction point that last modified the values of the row in the base table. For query results, an RVV is copied into a result row by the first contributing row and removed if more than one row contributes.
RVV can be used to identify a set of rows to be deleted or updated, and so have an immediate use in the API in an application-side object to identify corresponding individual base table row(s) in the database.

(b) ETag

The ETag if present indicates the latest transaction points of the values used to compute the values in the row.
The ETag will be absent if the row does not come from any database contents (e.g. the row consists of constants, current_time, current_user etc). There is an obvious algorithm for computing ETags from RVVs. For rows coming from base tables the ETag is the same as the RVV. When another row contributes, the ETag becomes the later transaction point.
ETags can be used for HTTP caching as described in RFC7232. However, we note that for complex queries the server would have to recompute the result to see if the ETag is valid.
If results come from more than one database, RVVs and ETags will be lists of values, at most one from each database.
From today, The CHECK pseudofunction in Pyrrho gives the RVV if defined for a row, and the ETag returned with a REST result is for the first result in the result set. If an ETag is supplied with a REST request, it is checked for validity.
The RVV is also used in the Pyrrho API, so that Find<C> and similar functions return Versioned objects that can be used with the API functions Put, Post, and Delete, as described in the Pyrrho manual.

[1] W3's RDF documentation
[2]W3's OWL documentation
[3 Laiho,M, Laux, F: (2010) Implementing Optimistic Concurrency Control for Persistence Middleware Using Row Version Verification
[4]MVCC
[5]Amit Kapila, (2015) Well-known Databases Use Different Approaches for MVCC
[6]Steve Howard (2012) SQL Server MVCC with read_committed_snapshot
[7]Oracle Database Concepts: Chapter 13 Data Concurrency and Consistency
[8]The concept of strong validator dates from sec 13 of the HTTP/1.1 definition (Fielding et al 1999)

1 comment: