Wednesday, 14 August 2013

Configuring Database Servers

The new version of Pyrrho has a completely approach to server configuration, based on a peer-to-peer database. In this article, we look at the configuration of distributed databases, addressing the "Pyrrho and the Cloud" scenario from 2009, and return to issues of partitioned databases in a forthcoming article.

Each server has its own folder for holding databases (although full pathnames can be specified for databases). The configuration database (if any) is found in this folder and is called _ ,. It contains tables with names starting with _. The basic schema of _ is hardwired in the server engine so that _ is created the first time the server owner alters a configuration table.

By default the configuration tables are empty: databases that are not mentioned in the configuration tables are local databases, and we know about no other servers. In this way we already have an API for configuration (namely SQL/ADO.NET) and we conform to Codd’s rule that SQL should be the DDL for the DBMS.

Configuration tables holding server data are automatically partitioned so that each _ database only holds information for data managed by that server instance, and is the transaction master for its own data. Thus any change to properties of remote servers is handled as a remote update – the Pyrrho protocol already supports this. It turns out that the provision for verification of distributed transactions is exactly what is required to guarantee consistency of the information held by different servers.

When the server starts up it should load the _ database if present. What will be new is a more elegant handling of the situations where servers are unavailable: obviously this should mean that their databases and partitions are unavailable. Code exists in the server already for deferring distributed transaction verification and this is exactly what is needed for if and when the remote server comes online later. See also “Database dependency” below.

As of v 5.0, we will use the following schema for _:

_Client: (Server, Client, User, Password). We locally store entries for which Server=us. This table is maintained automatically: don’t update it. Passwords use Pyrrho’s password type so are not readable. The user and password information enables access to the configuration database on the client.

_Database: (Name, Server, Password, ServerRole, Remote, RemoteUser, RemotePassword). (Remote, Server) references _Client. We locally store entries for which Server=us. Passwords use Pyrrho’s password type so are not readable, but they must be supplied when this table is updated to enable server-server communication. The remoteuser and remotepassword information enable server-server access to the named database on the remote server. The password information enables access to the configuration database on the server.

_Partition: (Base, BaseServer, BasePassword, Table, PartitionName, PartitionServer PartitionServerConfigUser, PartitionServerConfigPassword, SegmentName, Column, MinValue, MaxValue, IncludeMin, IncludeMax). Table and Column are positions in the base database: we understand that the partition database has appropriate entries in its schema and a mapping from the base positions. We locally store entries where BaseServer=us.  If you create such an entry, Pyrrho automatically defines the table in the partition database, and makes a Partition1 record in the partition database that gives the server details. If the PartitionServer field is null, the partition is marked as dropped. The partition user and password are for server-server access from the base database: the base server password is a security precaution: it must match the base server’s account. The PartName field enables a number of tableparts to be specified as being in the partition: each part can specify a number of column conditions.

Remember that _ is for distributed configuration information, and is not affected by connection-specific things such as creating or accessing non-configured databases. The current server’s serverRole for non-configured databases will be 7 (for local databases: Query+Master+Storage) and 0 (remote), but they are not entered in the configuration tables. No server is obliged to have complete information.

In the _Database table, Remote is null if ServerRole has bit 1 (Master), and otherwise references cause distributed transactions: e.g. a server providing query and/or storage only will usually list the master server as remote. If no master server is configured the database is read-only (this is not an error). However, at most one server can have the master role.

To change configuration information, the owner of the server account simply connects to the local _ database and modifies it. The information about other servers is read-only, although selects, joins etc show the overall information to the extent required for server-server communication (server roles and partitions). When changes are committed, affected databases are unloaded and a background process is started to reload them. Pyrrho does try to notify affected servers, using the same mechanism as for any update to a distributed database: this will be an area for continuing improvements.

Note, however, that the server passwords in the configuration tables are for enabling access from one server to another, using standard operating system security, for the accounts under which the servers have been configured to run. The partition base credentials enable access to the base database of the partition being configured. On a single server or in a single domain, it is quite likely that a single account and password will be used in all or most of the tables.

As a special case, the entry in the _Database table for a partition will have all 3 server roles and the remote information will be for the base server.

A simple configuration example


Three servers A  , B,  C  initially with no configuration databases. Database D on server A has a table E(F int). We will configure these servers so that both A and C serve queries for D, but C will access storage for D on server B, while A will retain its roles of Query+Master+Storage. 

This will show in the _ database as follows:

_Client table

Server
Client
User
Password
A
B
M\me
*******
B
C
M\me
*******

 _Database table

Name
Server
ServerRole
Password
Remote
RemoteUser
RemotePassword
D
A
7
 
 
 
 
D
B
4
*******
A
M\me
*******
D
C
2
*******
B
M\me
*******

As we get there we can keep track of developments using Pyrrho's system and log tables. Log tables show the contents of physical database files (so are available when the server has Query+Storage role).

Step 1: Create _ on B and configure storage for database D on A.

 pyrrhocmd –h:B _

SQL> [insert into "_Database" values('D','B',4,passwd,'A',user,passwd)]

The right user name (in Windows Domain\User form) and password should be supplied, but the password won’t be displayed. These two steps have the side effect of getting A to create a _ database and fill in the  _Client entry for B and _Database entry for D, so if A is offline this step should fail. This also creates a D.pfl file in B’s folder.

At this stage the Log$ files for _ on A and _ on B show different Records, but table “_Database” and table “_Client” will agree on the two servers, showing that the configuration database is shared.  

Step 2: Create _ on C and configure the Query ServerRole for database D on B.

pyrrhocmd –h:C _

SQL> [insert into "_Database" values('D','C',2,passwd,'B',user,passwd)]

Pyrrho discovers A's master role. The _Database table shows as above on server B at this stage, (A and C show 2 rows each) although “Sys$Table” and table “Log$” on each server shows just one record in _DatabaseTable E shows in D on A and C,  and E should be listed in Sys$Table on D but the Log$ table on D should not be visible since there is no D.pfl file in C’s folder. div>

This completes the configuration. On C delete an entry in E, and check for expected behaviour.  On C, alter table E to make F a primary key. Check that the Sys$Index tables on both A and C show the change.  We can read information from database D on C even if A is offline. We can’t change anything though.

See a full version of this tutorial, with full explanations of the protocols used,  here .


No comments:

Post a Comment