What Pyrrho does offer in the direction of DBaaS is distributed and (horizontally) partitioned databases. Each horizontal partition is its own transaction master (a replica of a horizontal partition will not be). The most (network) partition tolerant design is where the only distributed transactions are either read-only or for schema changes. In that case you have a good deal of (network) partition tolerance:
Saturday, 28 December 2013
"Database as a Service" and the CAP Theorem
What Pyrrho does offer in the direction of DBaaS is distributed and (horizontally) partitioned databases. Each horizontal partition is its own transaction master (a replica of a horizontal partition will not be). The most (network) partition tolerant design is where the only distributed transactions are either read-only or for schema changes. In that case you have a good deal of (network) partition tolerance:
Monday, 25 November 2013
Distributed and Partitioned DB Tutorials
Those postings have been updated slightly to match the new version. The main difference is that reconfiguration and repartition of databases are now transactioned. You can say begin transaction before you start either process and then no changes are made to disks on any of the servers involved until the transaction is committed. Recall that Pyrrho is very fussy about transaction isolation, so that while a transaction is ongoing the connection that started the transaction is the only participant that can examine the progress being made.
The advantage of doing this is more than theoretical. Defining a partition includes specifying a set of conditions for including records in the new partition, and autocommits during this process would not be helpful.
The new version comes with a much more robust 3-phase distributed transaction protocol than was provided in previous versions, and there are slight differences.
For the purpose of explaining the internal operation of Pyrrho for distributed and partitioned data, a tutorial mode -T has been implemented that exposes all of the server-server protocols and commit steps. Sections have been added to the tutorials to explain some of what is going on. The Pyrrho manual and the SourceIntro document in the open source distribution have been updated with full details. I am happy to explain the internals to anyone who is interested and plan to add more comments to the source code.
Future developments for Pyrrho will develop these facilities a little further, offering behaviour closer to scatter-gather (Hadoop). In connection with a related project at UWS, I am also planning to provide internal support for a BSON data type. As usual, what distinguished Pyrrho from other database initiatives is that all databases are consistent and relational, support full SQL and optimistic concurrency.
Thursday, 15 August 2013
Pyrrho DBMS architecture
The bottom two blocks are referred to in the source code as Level 1 (datafiles). Level 2 (physical records) is where transaction serialization occurs, Level 3 is for database objects, and Level 4 is where the SQL mechanisms, rowsets, OLAP, multi-database connections etc occur. Since approximately v4.5, the strong data type mechanism operates in level 2, and this now facilitates binary server-server communications.
With v5.0 the asynchronous client-server implementation (AsyncStream) is adapted for server-server communications, and applies for communication at the different levels in the following way:
- At Level 1 server-server communications support the notion of remote storage
- At Level 2 we use server-server comms for remote transaction master
- At Level 3 we support database partitioning, where the asynchronous comms enable traversal of partitioned table segments
- At Level 4 server-server comms will support mobile clients accessing remote databases
The v5.0 implementation of partitioning has brought a subtle and important change to Pyrrho's internal operation. Up to now objects in data and schema records have uniformly been identified using their defining or actual addresses in the transaction log files. However, from v5.0, schema objects are always identified by places in the base database's transaction log, and a special physical record (Partition) is used to wrap and import these into the partition's log. These Partition records are created and pushed to the partition as needed whenever a communication channel is opened between them. (They should probably be transacted in the partition under the identity of whoever made the schema change to the base database: but currently they sit outside of the transaction mechanism, as they are not part of the current transaction. I would like such changes to be asynchronous as I don't require all partitions to be online for a schema change.)
I plan to add more comments to the source code to explain things and make the structure clearer. There are some attempts to explain the internal workings of Pyrrho in the SourceIntro.doc and Classes spreadsheet in the distribution.
Wednesday, 14 August 2013
Partitioned Database Implementation
A simple example
See a full version of this tutorial, here.
Configuring Database Servers
A simple configuration example
This will show in the _ database as follows:
Server
|
Client
|
User
|
Password
|
A
|
B
|
M\me
|
*******
|
B
|
C
|
M\me
|
*******
|
Name
|
Server
|
ServerRole
|
Password
|
Remote
|
RemoteUser
|
RemotePassword
|
D
|
A
|
7
|
||||
D
|
B
|
4
|
*******
|
A
|
M\me
|
*******
|
D
|
C
|
2
|
*******
|
B
|
M\me
|
*******
|
At this stage the Log$ files for _ on A and _ on B show different Records, but table “_Database” and table “_Client” will agree on the two servers, showing that the configuration database is shared.
pyrrhocmd –h:C _
See a full version of this tutorial, with full explanations of the protocols used, here .
Thursday, 31 January 2013
Optimising enumerations
The ATree method GetRowEnumerator returns a SlotEnumerator <K,V> that traverses the pairs of the tree in key order. There are two versions of this method, one of which supplies a Key for matching. This will enumerate all pairs where the key matches the given one. Now for a strongly-ordered tree (no key duplicates) the resulting enumeration will have 1 or zero entries (a TrivalEnumerator or an EmptyEnumerator) provided the key supplied will be a constant. By constant is meant "will not change during result-enumeration of any current query".
This is a very subtle and important point: Pyrrho uses partial evaluation so that a Column for values such as integers, shows just the current value, but this can change when an enumerator moves to the next row. Such values are obviously not constant, and so if the Key value supplied to GetRowEnumerator was such a value, while it would still be true that in each case there is either one or zero matching pairs in the tree, we need to check to find out which.
On the other hand, it is such an important optimisation to be able to replace an enumerator with a trivial or empty enumerator that it seems worth adding some machinery to the database engine to keep track of which expressions are constant. The illustration shows a code fragment from the database engine.
As a result of these considerations many structures (e.g. Column, TypedValue and all their subclasses) have an extra field or property with a name such as isConstant to speed up this determination.
Since the key K might be something very simple such as long or string, the IsConstant() method used in the illustration needs to be defined as an extension method. To my relief I find that Debian Squeeze supports the use of C# extension methods so henceforth Open-source Pyrrho OSP has moved back up to .NET 3.5. For Windows of course we currently use .NET 4.
Needless to say the above changes resulted in about 600 changes to the Pyrrho sources, and it is possible that some mistakes will need fixing. I have been doing quite a lot of testing and will continue to do so. For the next while there will be updates of Pyrrho roughly weekly.