Pyrrho DBMS: 2009

Thursday, 10 December 2009

Pyrrho and the Cloud

A number of writers, considering the poor support for transactions and locking support, have concluded that cloud storage might not be suitable as a data store for the purposes of most line-of-business applications (Chantry 2009), (Abadi, 2009), (Wei et al, 2009). Brantner et al (2009) reviewing the support for databases in the cloud, note that although some services are quite powerful, they might not directly satisfy the requirements for a given task. In particular, storage providers usually give only relaxed consistency guarantees, and storage services are “orders of magnitude” slower than locally attached disks. Abadi (2009) discusses the limitations of the “simple databases” offered by cloud services such as Amazon and Yahoo, noting that they simply relax the ACID requirements for transactions and do not guarantee consistency.
The different cloud services providers differ most of all in their choice of implementation language. Microsoft’s Windows Azure platform, based on .NET and its language set, notably C#, provides three sorts of storage: Blobs, Tables, and Queues, which are accessible either using Web Roles (from IIS) or from Worker Roles (from Azure). Azure hosted services can also be accessed using WCF and REST, but so far I haven’t been able to access Azure Storage by this route. Distressingly, the web pages generated to access Azure storage using the November 2009 samples seem to have a lot of hidden authorisation stuff, and I haven’t been able to generate my own client applications to consume Azure Storage data. It is all still clearly under development, and is only due for release in 2010.
Still, I am sure that before too long we will be able to use WCF Streaming from Azure storage, so I am assuming this in the following design ideas.
Many authors draw attention to the CAP theorem whereby a shared-data system can choose at most two out of the three properties: consistency, availability, and tolerance to partitions. DBMS designers usually insist on consistency at all costs. For this reason, we will end up with a solution that retains a transaction master server for each database, while taking advantage of the cloud for data replication and transaction routing. The paper by Wei et al (2009) presents essentially the same solution as the one proposed here.

Pyrrho’s Binary Storage Architecture
Pyrrho operates a layered architecture described in the documentation using the diagram shown. In this diagram the bottom two layers represent durable storage (e.g. file storage on the hard disk). It is natural to focus on using Cloud storage as an alternative form of durable storage, and the concept of a logical file consisting of binary segments, all but the last of which will never require updating, makes Pyrrho particularly attractive for such a design.

In fact, the original design of Pyrrho envisaged completely immutable file data, where each committed transaction would simply append new durable data to the file.
The current physical design of these two layers is not ideally suited to use with the cloud however. In the first place, there is a tamper-proofing end-of-file marker which is overwritten by the next transaction commit. Secondly, the segment size used on the hard disk is measured in gigabytes, which is too large for current internet implementations. Finally, there is exactly one disk write for each transaction commit.
Taking this last point first, all of Pyrrho’s competitors, when committing a transaction finally to disk storage, write in many places in the disk file, so that multiple disk writes are required. Measurements of disk traffic show that under heavy transaction loading, Pyrrho uses fewer disk writes by a factor of around 70 compared to commercial DBMSs. Nevertheless, the default transaction mode used is that each SQL statement will autocommit, so that there is approximately one disk update for each INSERT/UPDATE/DELETE statement. If cloud storage is used, there will be one server round-trip corresponding to each storage change: this strongly motivates the use of explicit BEGIN TRANSACTION and COMMIT if many changes are to be made, as this would tend to group associated modifications in a single batch. It is an obvious first step towards implementation to modify Pyrrho’s autocommit policy so that by default, transactions are committed when a connection is closed rather than at completion of each statement. This will have the effect of adding to cloud storage in larger chunks, and this seems to point to a natural solution for the segment size issue mentioned in the second point.
For the segment size of cloud storage, there is obviously a balance to be struck between the practical size of a binary transmission and the number of connection requests managed by the network. Assuming an efficient streaming implementation, which we will come to in a moment, the natural thing to do would be to store the data from each transaction in a new segment to be committed to cloud storage.
This leaves the question of the end-of-file marker, which we no longer need. Its purpose was to ensure that the contents of the entire data file had not been changed since they were committed. Instead, we simply store the same marker data in the cloud storage along with the associated segment: the marker can be used to check the entire data file up to that point.

Following this step, the Pyrrho DBMS server can operate using local (disk) storage and remote (cloud) storage. So far, this model could be conveniently hosted by a simple REST/bytestream service such as Amazon S3. Older parts of the database can be replicated without loss of consistency. We consider next how we can ensure that the data is up-to-date.
Transaction Serialisation
Pyrrho’s transaction design is actually very well suited to sharing cloud storage between database engines, because its transaction serialisation handling is performed at the level of the binary records (“Physical”s) that are written to the durable storage when a transaction commits. Pyrrho uses optimistic locking, which is also very well suited to use with cloud storage, since it would be a bad idea to have an exclusive lock on the Cloud service.
For a proper understanding of the mechanism, it is necessary first to mention that the tree and queue data structures managed in the Logical Database layer have the property that all their elements are immutable: any change results in a new head structure, and following links through new components of the data structure usually leads to components of the original structure.
So, at the start of a transaction, new headers for the database(s) involved are created, which initially lead to the current committed state: if a few transactions are in progress, there will be several versions of the database(s), mostly coinciding (on unchanged portions). The changes associated with the transaction each have a Physical which will be written to durable storage along with a descriptor of the transaction itself.
Whether such ongoing transactions conflict can be determined by examining just these Physicals. For example, two updates to the same row of a table will conflict, an insert in a table will conflict with dropping the table, etc. This means that transaction serialisation can be determined at the Physical Records level of the database.
There are two subtleties to the resulting design, which apply equally to local and remote storage. The first relates to relocation of Physicals, the second to transaction conflicts where reads conflict with changes.
The relocation issue arises because Pyrrho uses the (immutable) defining addresses of data in Physical records: so that for example a Column will refer to its parent Table by giving its defining address rather than its (mutable) name. For this reason a group of Physicals in a transaction are very likely to refer to each other using their positions in the data file. While transactions are in progress, nobody can tell which transaction will commit first, so the positions of Physicals in the data file are generally not known until the Commit succeeds. The physical layer of the DBMS therefore relocates the transaction to its final place in the data file.
The read-conflict issue is rather pedantic. Pyrrho uses a very strict concept of transaction isolation, and notes all data that a transaction reads. If any of this data changes before the transaction commits, Pyrrho will roll back the transaction. For example, if a transaction computes a value to insert in the database, it should be invalidated by changes to any values used in the computation. Pyrrho calls these ReadConstraints.
Similarly, referential constraints (called IndexConstraints in the code) must also be checked again at Commit time, in case another transaction has inserted a record with a conflicting unique key, or deleted a record that our transaction intends to refer to.
For this reason, the data used in the Commit process consists of the list of proposed Physicals together with sets of ReadConstraints and IndexConstraints. For a remote database. It is this collection of data, and not the final binary data, that must be sent to the remote master server for processing.

Unfortunately, we are left with the situation that serialisation must ultimately be decided by a single DBMS server. The most we can use the cloud for in transaction management is as an enterprise system bus, routing transaction streams to the transaction master. To ensure that we have read the most up-to-date version, we simply call commit: if it is not up to date, our transaction will fail, and re-running the query will get its new value.
The Data Stream
The upper layers of the DBMS still need to construct and maintain the logical database for doing query processing, joins etc, and for this purpose, as with local databases, the complete data stream of Physicals must be read at some stage. For a large database, this is a time-consuming process (roughly 1 minute per 200MB) so should only be carried out when a cold-start of the DBMS occurs. Thus the DBMS itself is too stateful to be placed in the cloud.
On the other hand, once the DBMS is up to date with all of the Physicals that have committed to the local or remote data file, it is relatively easy matter to check for transactions committed by other servers, at the start of a transaction and again at transaction commit. The protocol for obtaining this Physical data must be able to deal with both small and large volumes of data, and take into account that Physicals can vary greatly in size. A binary large object (blob) for example, is committed in a single Physical.
Pyrrho already has excellent mechanisms for serialising such diverse binary data, with the AsyncStream class. This streaming mechanism can be used directly to deal with the Physical data from cloud storage. It is important to be able to use streamed responses from the cloud service, though, as it would be unacceptably slow to recreate repeated connections for downloading such data piecemeal. The AsyncStream class is also a useful mechanism for uploading to the cloud service the data described in the last section. It also supports a mechanism for reporting any exceptions thrown by the cloud service, which would most likely be transaction conflicts.
Nevertheless, a local initialisation device (such as an SD card) containing a copy of a database up to a certain date would seem an obvious way to save a lot of time, particularly if the remote database is often used.
Distributed Databases
Pyrrho maintains a single logical file for each database, which as we have seen might comprise a number of segments. This has the advantage that ordinary operating system utilities can be used for backing up, moving or renaming databases.
Pyrrho supports multi-database connections, where one DBMS can combine data from several databases in response to a query. A two-phase commit protocol must be used to ensure that all of the participating databases succeed in committing the data, or all roll it back. Pyrrho does not support multi-database relationships: it is not possible to have a foreign key relationship to a table in another database, since references to another database would be vulnerable to the sorts of file operations just mentioned. Whenever a multi-database connection is used, Pyrrho checks that all distributed transaction commits recorded in the databases are reflected in all of the databases in the connection.
In the simplest cases the DBMS has at its disposal the logical database structure for all of the databases involved in the connection, either because the data comes from local durable files under its control, or has been synchronised from another DBMS or the cloud service. In the latter case, there may be a local file or device that contains a copy of the data for initialisation purposes. Another DBMS is involved in cluster computing scenarios, where only one member of the cluster is given write access to the data file for each database, but all members maintain all the logical databases. This mechanism, can be used effectively to partition data among servers so that transaction masters for each partition of the data collaborate to ensure the ACID properties of the transaction.
Remote Databases
With the introduction of cloud storage, and with the increasing use of smaller devices as servers, it becomes important to extend the storage modes available for remote data, to include the case where, although the local server needs to know the schema for all databases in a connection, it does not have row or index information for remote databases. The logical databases all need to exist somewhere, of course: in this case the remote database needs to be managed by a remote DBMS, not just by the cloud service. During query processing, the local server constructs a query for the remote DBMS which includes join, projection and selection aspects to minimise the amount of traffic required, and it is only when there is local data to be combined with the resulting rowset(s) that the local DBMS actually needs to do much work.

It will be seen that this is a rather different matter than simply using cloud storage. Which approach is preferable will depend on whether the memory of the local device is capable of holding the whole of the remote logical database, and how frequently consultation of the remote database is really required.
As a result of these developments, in the next edition of the Pyrrho DBMS, any server can have a configuration file which specifies which databases are available from specific servers or cloud storage (and how to access them): and the cloud service needs to refer all transactions to their transaction masters. In addition, the configuration file should explain for remote databases whether the local database maintains the full logical database, and if so, whether there is a local initialisation file.
“Eventual Consistency”
It seems reasonable for read-only transactions to appeal to the eventual consistency of cloud computing services. A client can always COMMIT a read-only transaction if they wish to be warned about a recent update that might have invalidated their work; but this is of limited value since a further update can always occur before they commit their next transaction. It is best for the reasons given above to transact reads with their resulting updates. For simplicity we have stayed with a single transaction master per data item, although a majority voting (Byzantine) system would be theoretically possible.
Conclusion
This paper has outlined a manifesto for a future version of Pyrrho DBMS supporting cloud storage and the use of small devices as local DBMS servers. We agree with other published studies that cloud storage on its own would violate several key database principles and so remains unsuitable for transactional data management. It is pleasing that this approach is corroborated by the work reported in Wei et al (2009).
References
Abadi, Daniel (2009): Data Management in the Cloud: Limitations and Opportunities, Bulletin of the IEEE Computer Society, 32 (2009), p.3-12.
Brantner, Matthias; Florescu, Daniela; Graf, David; Kossmann, Donald; Kraska, Tim (2009) Building a Database in the Cloud, Technical Report, ETH Zurich, www.dbiz.ethz.ch
Chantry, Darryl (2009): Mapping Applications to the Cloud, The Architecture Journal, 19 (2009), p.2-9 (Microsoft, www.architecturejournal.net)
Crowe, Malcolm (2007): The Pyrrho Database Management System, Computing and Information Systems Technical Reports No. 38 (University of the West of Scotland) www.pyrrhodb.com
Wei, Zhou; Pierre, Guillaume; Chi, Chi-Hung (2009) Scalable transactions for web applications in the cloud, Proc 15th International Euro-Par Conference on Parallel Processing, LNCS 5704, p.442-453 (Springer)

Saturday, 21 November 2009

Embedded Pyrrho

The main effect of today's updates is to make Embedded Pyrrho available again.

Saturday, 7 November 2009

Bugfixes

The 6 November updates dealt with three quite severe bugs. Two were in the fundamental data structures in the Pyrrho data engine - indexing and multi-level indexes, and seem to have crept in at different stages, fairly recently. They weren't detected in routine testing. An unrelated query processing problem was just days old, and should have been spotted.
The Embedded edition is temporarily unavailable because of a different problem.
Please send in reports any time you find anything incorrect.

Monday, 2 November 2009

Updatable RowSets

Today's update fixes a number of issues with the updatability of RowSets. The basic idea is unchanged: a RowSet is updatable if is has been selected (via a view or directly) from a single base table and its columns include the primary key. Unfortunately this simple idea got overlaid with PyrrhoMgr trying to be too clever.
(Updated: 5 Nov 09)

Tuesday, 20 October 2009

Sparql fix

Today's fix is to the Sparql engine which has obviously been broken since the last posting. I have also reformatted the Sparql tutorial as I noticed some unfortunate page and line breaks in the code samples that caused difficulties if people pasted the text into the Rdf tool without watching carefully.

Sunday, 4 October 2009

Transaction Conflicts

Today's fix is to the mechanism for detecting and handling Transaction conflicts. (Updated: 5 Oct)

Friday, 25 September 2009

Out and Inout Parameters

Yesterday's fix was about output parameters from procedures and methods.

Tuesday, 22 September 2009

Procedure body syntax

Today's fix relates to procedure bodies.
According to SQL2003 (in section 02) procedure call statements must have the CALL keyword and (in section 04) assignment statements must have the SET keyword. For ages Pyrrho has not been enforcing this rule properly. It would be (strictly speaking) a breaking change to start enforcing it now.
So, both keywords had better be optional in Pyrrho procedure bodies, at least for call statements that have a possibly empty parameter list enclosed in () and for simple assignment statements, and this is yet another departure from SQL2003. I will confess to this in the next version of the user manual.

Monday, 21 September 2009

Embedded Pyrrho etc

Today there is a fix for the Embedded edition, and for yet another type-checking bug in the server...

Saturday, 19 September 2009

Triggers and DataGridView

Today's update is mainly about fixing some errors in the implementation of Triggers and DataGridViews. A serious issue about type checking has also been resolved.

For the improved functionality of DataGridView, please use the new version of PyrrhoLink.dll or OSPLink.dll. (Anything that worked previously should still work.)

Monday, 7 September 2009

DataTypeName

The ADO.NET API includes a method called GetDataTypeName on the IDataReader: this is supposed to return a string, given a field index. The documentation says

The data type information can differ from the type information returned by GetFieldType, especially where the underlying data types do not map one for one to the runtime types supported by the language. (for example, DataTypeName may be "integer", while Type.Name may be "Int32".)

Up to now, the DataTypeName was just the name of the closest native type, such as System.Decimal.

From today, and version 3.3 of Pyrrho, it will return the name of the SQL type. For example, NUMERIC(8,2) or a user defined type name.

The version number is changed since the new version of PyrrhoLink for this behaviour will then only work with server versions 3.3 or later. (In all other respects backwards compatibility is preserved.)

Sunday, 30 August 2009

The SPARQL Tutorial

Today's update (30 August 2009) includes a version of the ARQ SPARQL Tutorial and some associated fixes to the SPARQL subsystem. Specifically, default data sets should work better for the RDF application program.

Saturday, 22 August 2009

Better Numeric Types

Today's update (22 August) fixes a number of bugs mostly related to handling of numeric types. Two corrections to the syntax checking fix the BETWEEN predicate and named columns joins.
I really do like finding and fixing bugs!

Friday, 14 August 2009

Subtype improvements

Today's update to the DBMS corrects the handling of the new URI-based subtypes of standard types. As mentioned in previous posts, these are a novelty in Pyrrho, allowing declarations such as
create type uk_numberplate as char with 'http://dvla.gov.uk'
As of today's version the representation clause ("as char" in this example) for such types is constrained to be a standard type. Subtypes of such types must be declared as follows:
create type sc_nunberplate under uk_numberplate as char with 'http://dvla.scotland.gov.uk'
The semantic of IS OF for these subtypes has been corrected.

Saturday, 11 July 2009

Provenance again

Today's version of the software aims to fix a number of issues.
(a) a bug in transaction handling that resulted in premature closure of transactions
(b) a number of corrections to provenance handling and subtypes.
(c) a new system table called Sys$Object which makes it easier to see the provenance of tables and types.

To try out the new ideas, here are some examples.

1. To have database objects or inserted rows with a particular provenance, say something like
begin transaction with provenance 'http://example.com'
...
commit
Pyrrho refers to this internally as an import provenance, but of course it is more flexible than that.

2. To have rows in a table associated with a provenance, you can also say
insert with provenance '/data' into ...
and then the rows have a provenance consisting of eith or both the import and row provenance (if both are specified the provenance is the result of concatenating them).

There is no way of modifying provenance (this would be heresy).

3. Subtypes can be associated with a uri: there is a WITH syntax defined in Pyrrho for this purpose, e.g.
create type abcint as integer with 'http://abc.net'
and then if desired the subtype can be used for values you want to mark with the URI, by using (say) treat(11) as abcint instead of just 11.
Such a subtype behaves like an integer of course, but you can select on such subtype values using the OF predicate, e.g. select * from a where b is of(only abcint).

Write to me with any problems.

Thursday, 11 June 2009

Bug fixed: SelectedColumnsRowSet

In previous releases of v3.2, the rowType was incorrectly reported as the original rowType, not the rowType with just the selected columns. This would have caused an error 22202 when the wrong rowType was used.
Similar fixes to other RowSet classes. Fixed as of 12 June.

Monday, 8 June 2009

On importing data

I've started to provide better support for data import into Pyrrho. Version 3.x has been introducing ideas of provenance. Now I am enhancing the PyrrhoMgr application a bit. The changes will get published in the next few days. So far they amount to:
1. Supporting direct import from Access 2007 in addition to Access 2003 and SQL Server.
2. Supporting the "Percent" numeric format in Access. (Intriguingly in Access 2007 the default Percent format is for a long integer, and so validation changes every value to either 0.0% or 100.0% !)
3. Allowing the importer to specify the "From Culture" so that culture specific formats for numbers and dates can get converted into the culture used by the importing thread (the culture of the machine that PyrrhoMgr is running on).
As mentioned in the last posting, the server always uses an invariant culture.
As a result of some ongoing research, more changes in this area can be expected soon.

Saturday, 6 June 2009

Why the version hasn’t changed

Changes like this are possible in Pyrrho because of the distinctive way that Pyrrho is designed. Nearly all DBMSs store data structures like indexes in permanent storage within the files making up the database. Pyrrho by contrast places in permanent storage only the data required for durability of each committed transaction. This approach minimises disk activity during normal operations, to about one-seventieth of the disk activity of rival systems.
The server’s data structures are private to the server, and are initialised when the server starts up, and the server brings its state up to date by re-reading the database file, which is just the transaction record. This is a time-consuming process for a large database, equivalent to a cold-start with resynchronisation step for other products.
The format of this data is as far as possible version and platform-independent – there is not even any assumption for what “double precision” means, or how many bits make up a long integer. This means that every aspect of the organisation of data structures can evolve without breaking backward compatibility. Database files created by version 0.1 can still be used in every version of Pyrrho, so all Pyrrho data bases can always be managed by the latest version of the server. New features, such as the recently introduced concept of provenance, or URI-based data types, introduce new or modified data formats which would not be understood by earlier versions, but they really should not be in use. Upgrading to the latest version is always recommended, and is free of charge.

Wednesday, 3 June 2009

Speeding up joins

The next version (still numbered 3.2) will have a new implementation of the matching code for joins. There will be situations where a significant performance improvement can be expected, for example where the join condition refers to a component of a primary or foreign key. In the source code this is called a TrivialJoin, but it is not necessarily quite so trivial where the join condition does not constrain all the components of the key...
I should be able to release this implementation this weekend following testing.
Update 4 June: so FDJoin is a better term than TrivialJoin, with consequential renaming of lots of things. I've introduced a new internal datatype called Wild for dealing with partial matches. And better handling of recursive traversal of multilevel indexes. It's all loking rather nice.

Wednesday, 27 May 2009

Version 3.2 is done

I will release the next version 3.2 tomorrow 28th May.
It contains the promised improvements to the infrastructure.

Wednesday, 13 May 2009

Towards stronger types internally

The next version of Pyrrho will use much stronger generics. The internal Tree structures will be Tree<Key,Value> instead of just Tree<Key> . But this is taking an age to sort out...
With any luck version 3.2 should be out next month (June 2009) though.
The change has flushed out a few obscure bugs, but unfortunately the changes to the code are extensive so no doubt there will be the occasional null refernce that slips through. So if you get a bad message from the server, please let me know about it...

Sunday, 3 May 2009

The REST service

Pyrrho's new REST service is similar to ADO.NET data services, from which it differs by providing a default data model. For a Pyrrho database file myfile, the default data service is myfile.svc. Custom data models can be supplied, and then the data service is given the same name as the data model, e.g. mymodel.pdm.

The service is hosted by default on port 8080 on the host running the Pyrrho DBMS server, e.g.

http://example.com:8080/mydatabase.svc/

and the rest of the URI follows the rules for ADO.NET. As with ADO.NET, the detailed datamodel is obtained by adding $metadata to the end of the service URI.

The methods GET, PUT, POST, and DELETE are supported. If no resource path is specified, a GET on the service gives a synopsis of the service, and otherwise returns XML for the specified data. POST is used for inserting new entities, PUT for updating specified data in existing entities, and DELETE for deleting specified entities.

However, to make any changes to the database, you need to supply UserID and Role identifiers in the HTTP headers. This means that most browser access will support only GET,