Wednesday 14 August 2013

Partitioned Database Implementation

If a database becomes too large or too busy for the server(s) on which it is provided, an option is to partition it to create a “distributed database” (Ceri and Pelagatti, 1984). When tables are partitioned horizontally by selecting ranges of values of fields in one or more columns of a single table, the result is nowadays called “shared-nothing replication”. Table partitioning can be combined with other distribution mechanisms such as ordinary database replication, additional query processors and remote clients. A (distributed) configuration database is used to specify the relationship between databases, partition files, and the partitioning mechanism, as described in Configuring Database Servers.

All operations on partitioned tables are effected by SQL statements sent to the base database or any of the partitions, so that the partitioning is transparent to clients. Operations on a single, local partition will generally be faster than those involving multiple or remote partitions: in particular the use of cross-partition constraints or triggers should be discouraged. Schema changes can only be made on the base database.

When table partitions are used, the complete database consists of the base database and a set of database partitions. A database partition contains a copy of the schema obtained from the base database, and records for its table partitions. A database partition can contain named segments from a number of partitioned tables. Each named segment is stored in its entirety in a named partition is defined by a set of specifying a range of values for columns. The columns do not have to be part of an index for the table (the table does not even need a primary key, and in specifying the range, either or both of the maximum and minimum values can be included or not. Segments do not overlap, and any data not in any segment will remain in the base database.

The server hosting the partition is the transaction master for that partition: this allows updates to be made to the rest of the database even when one partition is offline. Indexes are also partitioned, with each partition holding indexes for the data it contains. Cross-partition operations will be distributed transactions, and by default the server will try to verify coherence of the participating versions as the database is loaded. A good partitioning scheme for a particular scenario will minimise the number of distributed transactions that result.

Partitions are created by adding a row to the _Partition table on the base database server (see section 8.2). The base server creates or connects to the partition database using the credentials provided, creates the schema for the partition, and moves any existing partition data.

Certain types of partition changes can be carried out on a live database by means of transacted groups of changes to the Partition table: a new segment, moving a segment, splitting a segment, deleting a segment (data in the partition is moved to the base database).

In partitions, schema information uses defining positions in the base database. Each partition maintains a binary copy of the entire schema in Partition records in the database: a further Partition record is supplied by the base server if the schema has changed on the next commit of data to the partition.

When the partition information changes this probably involves multiple rows in the _Partition table, so these should occur in transacted groups, with validation (non-overlapping) enforced at the end of each group, when we should see what records need to transfer between partitions or to/from the base database. Table segments should move in their entirety. We support four situations (for at most one base database at a time):

a.       A new partition (or new table or segment for an existing partition) has been added without any changes to other partitions, examine the base database for records for it.

(AddRecords only: this is NewPartition())

b.       One or more configuration rows for a partition P has been deleted. No updates to other rows: records affected will move to the base database. (Deletion of records for P: this is OldPartition())

c.       Changes to other partitions are only to receive tables or segments from P. (Changes to partition names from a single P: this is OldPartition())

d.       Split of a single TableSegment: no records need to move. (change to ValueBounds, and Add another)

It should be possible to design any change in partitioning as a sequence of these four steps. Note that movement of records happens after the validation step, so the order of operations only has to match one of the above.

For implementing constraints, there is an extra stage of checking at the base database level (e.g. uniqueness across all partitions). We will add a PartitonedIndex class to help with this, and a service to look up (a set of) entries in a local index.

A simple example

Two servers A  and B both running, each in its own empty folder (no configuration databases).

0: No _ databases. Create database D on server A and create table E, with some sample data:

                E: (F int, G char). (1,’One’),(2,’Two’).

1: On A, use pyrrhocmd _ to autocreate the configuration database, and define a new partition D2 on B containing a single table segment E2:

Insert into “_Partition” values (‘D2’, 'B', ’me’, ’password’, ‘D’, 'A', ‘password’, ’E’,’E2’, ’F’, 2, 2, true, true)

The credentials supplied must be sufficient to create or open database D2 on server B and create objects in D2.

2. Inspect the logfiles to see what has been done. There should be a configuration database on A with the above information, but on B we see an entry in _Database for a database called D2, and an entry in the _Client table for the base server as a client of B.

On A, the log for D should show the deletion of one record in E as part of a distributed transaction involving _ and D2.

On B, the log for D2 on B should show the creation of E with the same owner as in A, the partition information for E, and the addition of a single record.

3. From A, verify that the table E still shows both records.

4. From A insert some more records in E: (3,’Three’) (2,’Deux’). Check both show up in E, and verify from the logfiles that one new record each has been added to databases D and D2.

5. On A, delete the partition by deleting the single partition record in _ .

See a full version of this tutorial, here.

No comments:

Post a Comment