All
operations on partitioned tables are effected by SQL statements sent to the
base database or any of the partitions, so that the partitioning is transparent
to clients. Operations on a single, local partition will generally be faster
than those involving multiple or remote partitions: in particular the use of
cross-partition constraints or triggers should be discouraged. Schema changes
can only be made on the base database.
When
table partitions are used, the complete database consists of the base database
and a set of database partitions. A database partition contains a copy of the
schema obtained from the base database, and records for its table partitions. A
database partition can contain named segments from a number of partitioned
tables. Each named segment is stored in its entirety in a named partition is
defined by a set of specifying a range of values for columns. The columns do
not have to be part of an index for the table (the table does not even need a
primary key, and in specifying the range, either or both of the maximum and
minimum values can be included or not. Segments do not overlap, and any data
not in any segment will remain in the base database.
The
server hosting the partition is the transaction master for that partition: this
allows updates to be made to the rest of the database even when one partition
is offline. Indexes are also partitioned, with each partition holding indexes
for the data it contains. Cross-partition operations will be distributed
transactions, and by default the server will try to verify coherence of the
participating versions as the database is loaded. A good partitioning scheme
for a particular scenario will minimise the number of distributed transactions
that result.
Partitions
are created by adding a row to the _Partition table on the base database server
(see section 8.2). The base server creates or connects to the partition
database using the credentials provided, creates the schema for the partition,
and moves any existing partition data.
Certain
types of partition changes can be carried out on a live database by means
of transacted groups of changes to the Partition table: a new segment, moving a
segment, splitting a segment, deleting a segment (data in the partition is
moved to the base database).
In
partitions, schema information uses defining positions in the base database. Each
partition maintains a binary copy of the entire schema in Partition records in
the database: a further Partition record is supplied by the base server if the
schema has changed on the next commit of data to the partition.
When the partition information changes this probably
involves multiple rows in the _Partition table, so these should occur in
transacted groups, with validation (non-overlapping) enforced at the end of
each group, when we should see what records need to transfer between partitions
or to/from the base database. Table segments should move in their entirety. We
support four situations (for at most one base database at a time):
a.
A new partition (or new table or segment for an
existing partition) has been added without any changes
to other partitions, examine the base database for records for it.
(AddRecords only: this is NewPartition())
b.
One or more configuration rows for a partition P
has been deleted. No updates to other rows: records affected will move to the
base database. (Deletion of records for P: this
is OldPartition())
c.
Changes to other partitions are only to receive
tables or segments from P. (Changes to partition names
from a single P: this is OldPartition())
d.
Split of a single
TableSegment: no records need to move. (change to ValueBounds, and Add
another)
It should be possible to design any change in partitioning
as a sequence of these four steps. Note that movement of records happens after
the validation step, so the order of operations only has to match one of the
above.
For implementing constraints, there is an extra stage of
checking at the base database level (e.g. uniqueness across all partitions). We
will add a PartitonedIndex class to help with this, and a service to look up (a
set of) entries in a local index.
A simple example
Two servers A and B both
running, each in its own empty folder (no configuration databases).
0: No _ databases. Create database D on server A and create
table E, with some sample data:
E: (F
int, G char). (1,’One’),(2,’Two’).
1: On A, use pyrrhocmd _ to autocreate the configuration
database, and define a new partition D2 on B containing a single table segment
E2:
Insert into “_Partition”
values (‘D2’, 'B', ’me’,
’password’, ‘D’, 'A', ‘password’, ’E’,’E2’, ’F’, 2, 2, true,
true)
The credentials supplied must be sufficient to create or
open database D2 on server B and create objects in D2.
2. Inspect the logfiles to see what has been done. There
should be a configuration database on A with the above information, but on B we
see an entry in _Database for a database called D2, and an entry in the _Client
table for the base server as a client of B.
On A, the log for D should show the deletion of one record
in E as part of a distributed transaction involving _ and D2.
On B, the log for D2 on B should show the creation of E with
the same owner as in A, the partition information for E, and the addition of a
single record.
3. From A, verify that the table E still shows both records.
4. From A insert some more records in E: (3,’Three’)
(2,’Deux’). Check both show up in E, and verify from the logfiles that one new
record each has been added to databases D and D2.
5. On A, delete the partition by deleting the single
partition record in _ .
See a full version of this tutorial, here.
See a full version of this tutorial, here.
No comments:
Post a Comment