Sunday, 3 May 2015

MongoDB and the Document standard type

The Document Type

A new feature of Pyrrho (from version 5.1) is the Document standard type. It is planned that this should be used in so-called Big Data applications of Pyrrho. The Document type is inspired by MongoDB and is very useful for ad hoc data. Documents contain strongly typed named fields, but without a schema: the only field that every Document is required to have is called _id, and this is supposed to be unique per document: it is generated for you if you do not supply it.
Documents can be provided in SQL using Json syntax: Pyrrho's SQL is extended to allow Json objects to be written directly as in
select * from a where b={'C':23}

Note that field names are always case-sensitive. Document fields can be embedded documents, arrays or regular expressions: for example a regular expression /a?b.*c/i can be written without quotes inside a Json object. As discussed further below, the equals sign here only tests for the fields mentioned in the document literal.
The client library also supports the Document type: PyrrhoDocument has conversions to and from Json, Bson and byte[].

Indexing Documents

Any table can have columns that contain Documents, and fields inside documents can be accessed from SQL using the a.b.c syntax. The usual SQL2011 case-sensitivity rules apply, so this selector will obtain values such as 23 from the above table.  In fact the above simple query can be written
select * from a where b.c=23

Document field selectors can also be used where SQL allows column lists:
select b.c, d from a

and
create table f(g char, h document, primary key(g, h.i as int))

though here we note that Pyrrho needs to be told what type the field I has. Extra document indexes can be specified as is usual in SQL using unique or references . Such indexes add restrictions to the creation of new documents, as fields used in index keys must not be null.
These new features also work well for other forms of structured data.

Queries

MongoDB has $ operators for use in creating templates for queries and in updates, and these are also available in Pyrrho, and provide alternative ways of writing queries in ordinary data. For example  the query
select * from t where x>100

(even where table t contains no Document columns) can be rewritten using a document literal as
select * from t where x={'$gt': 100}

Such a constant equality-match condition can be used very efficiently on remote data.

Document Updates

Documents are almost never replaced in their entirety. Instead document fields are modified using templates that contain $ operators, and Pyrrho's transaction log contains only the update templates used: the actual binary value of the document is maintained in memory.

Update May 2015

Work on developing this service has been disappointingly slow: it does not work well with the MongoDB 3.0 shell. Work on it will resume when the MongoDB meta protocol documentation appears.

Friday, 24 April 2015

The Pyrrho Book

This is now available, free of charge.

Tuesday, 14 April 2015

On Row Versioning

By popular request (!), Pyrrho now has row versioning. This enables two things: (1) during a long transaction you can check if someone else has changed or deleted a row that you have read; (2) at any subsequent time (using the same or another connection) you can see if the row you have read has been changed or deleted.

The manual says:

Pyrrho supplies a pseudocolumn in all base tables called CHECK. The value is a string that includes the transaction log name, defining position and current offset of the row version. When retrieved it refers to the version valid at the start of the transaction, but it can be used at any time subsequently (inside or outside the transaction) to see if the row has been updated by this or any other transaction (this is the only violation of transaction isolation in Pyrrho).

The method of checking is a method of the PyrrhoConnect subclass of IDbConnection:

bool Check(string ch)
Check to see if a given CHECK pseudocolumn value is still current, i.e. the row has not been modified by a later transaction.
The Open Source Pyrrho OSPDemo folder includes a demo of this CHECK feature.

Tuesday, 31 March 2015

Optimistic Partitioned Databases

The Partitioned Database Tutorial is now also updated for today's version of the distribution.
For comments on safety of these so-called "optimistic approaches", see my previous blog post.
Being optimistic does not mean skipping the three-phase commit!

Tuesday, 24 March 2015

Optimistic transaction safety

Having just updated the Distributed Database tutorial for Pyrrho with all its three-phase commit stuff I thought it might be a good time to state once again that while Pyrrho uses optimistic concurrency control, it is totally transaction-safe. If you are using explicit transactions you can use the system "Log$" table to view the proposed changes for the current transaction. Each connection will have its own, and it is easy to see the they are isolated: the only entries with known positions are the ones that predated the start of the transaction.


Accordingly the transaction commit protocol is in 4 or 5 phases controlled by locks on the transaction log file (which in Pyrrho is the durable version of the database): 1. Verify the transaction does not conflict with anything written since the start of the transaction. 1.5 Lock the database and repeat this test. 2. Prepare the binary package to be written. 3. Write it to the disk file and unlock the database. 4. Now discard the local transaction and allow the client to see the database as it now is.
If multiple servers or databases are active then step 3 here requires three-phase commit during which time the proposed changes are written to temporary files. If all is well, these temporary files do not need to be read, and can be removed once all participants have acknowledged the commit request.

Friday, 27 February 2015

Version 5.2 enhancements

The 5.2 version of Pyrrho has some extra metadata features, for example, to enable navigation through ERM relationships when generating XML, and to perform data visualisations for views and functions. For better compliance with the SQL standard, metadata is only added or managed in ALTER statements.

A number of SQL features have been improved, for example, IN OUT INOUT and RESULT for parameters, and the operation of triggers. Pyrrho continues to relax SQL restrictions on constraints, generated always, and default values: all of these can contain arbitrary SQL.
Pyrrho's REST service also generates Json.

The open source 5.2 release includes a set of unit tests based on the samples on the Pyrrho web site and the Pyrrho manual. The Tpcc benchmark is also in the distribution.
In the open source distribution, the SourceIntro document and the spreadsheet of C# classes have both been updated.

As usual when there have been many changes, there will doubtless be many bugs to sort out, and I as usual request people to let me know about anything that does not seem to work.

As time allows I will continue testing Pyrrho's Mongo service. In the future I want to add another abstraction layer to Pyrrho's type system, as I don't think the physical database should maintain names of columns of structures since these are role-dependent in Pyrrho. I would like to add syntax to allow the multiplicity of foreign-key relationships to be specified (SQL only supports [0..1,0..]).

An interesting wrinkle relates to when a database can be written to by PUBLIC. Pyrrho will in future record the login identities of any such.

SQL's CURRENT_USER and CURRENT_ROLE keywords only refer to the first database in the current connection (Pyrrho allows connection to a list of databases) but Pyrrho's Sys$KnownRoles system table gives details for all databases in the current connection.

Monday, 12 January 2015

The MongoDB service

It is noticeable that many companies are now using MongoDB documents alongside traditional SQL databases. But it is a very bad idea to require connections to different DBMS in a single application since transactional consistency is made even more difficult if not impossible. Pyrrho already allows a single connection to involve several databases, and MongoDB has a similar facility. But if an application is going to work with both database types it is really essential to have a single connection to both the SQL and no-SQL databases. Pyrrho has added support for MongoDB documents (and accessibility to these documents via SQL) for at least academic interest, by making DOCUMENT (and DOCARRAY) into primitive types in the same way that SQL added support for XML. Pyrrho’s SQL syntax in chapter 7 of this manual allows {} to delimit Json documents (and, where the context allows, [ ] to delimit Json arrays). Within such documents the Mongo operators have their usual meanings.

By default Pyrrho starts up a MongoDB-like wire protocol service on port 27017, so that the mongo shell can work with Pyrrho although the database file formats are completely different.
In application programming where SQL tables and documents are both needed, it is better to use the above SQL extensions on a single TCP connection rather than opening a second connection on port 27017. The mongo shell’s document collections are implemented as database tables with a single column “doc” of type DOCUMENT, although SQL can access more general layouts. Comparison of documents has been hacked so that WHERE “doc”= condition in SQL behaves similarly to a Mongo find() (the resulting concept of equality of documents is admittedly strange but works well).

When working with such mixed systems the biggest surprise is around case-sensitivity. Json collection and field names are case sensitive, while SQL by default converts unquoted identifiers to upper case.

The MongoDB documentation for update contains the following example {
  $set: { status: "D" },
  $inc: { quantity: 2 }
}
Here are complete examples for updates using Pyrrho DB. The first uses the mongo shell to access Pyrrho, the second, the Pyrrhocmd client:
C:\Program Files\MongoDB 2.6 Standard\bin>mongo
MongoDB shell version: 2.6.6
connecting to: test
:0> db.runCommand({insert:"orders",inserts: [{product:"coffee",status:"E",quantity:1}]
})
{ "ok" : true, "n" : 1 }
:0> db.orders.find()
{ "product" : "coffee", "status" : "E", "quantity" : 1, "_id" : ObjectId("0000000039150006003d1909") }
:0> db.runCommand({update:"orders",updates:
[{query: {product:"coffee"},update:{$set:{status:"D"},$inc:{quantity:2}}}]
})
{ "ok" : true, "n" : 1 }
:0> db.orders.find()
{ "product" : "coffee", "status" : "D", "quantity" : 3, "_id" : ObjectId("0000000039150006003d1909") }
:0>

Microsoft Windows [Version 6.3.9600]
(c) 2013 Microsoft Corporation. All rights reserved.

C:\PyrrhoDB\OSP\OSP>pyrrhocmd test
SQL> update "orders" set "doc"={$set: {status:"G"},$inc: {quantity: 4}} where "doc"={product:"coffee"}
1 records affected

SQL> table "orders"
|--------------------------------------------------------------------------------------|
|doc
       |
|--------------------------------------------------------------------------------------|
|{"product": "coffee", "status": "G", "quantity": 7, "_id": "0000000039150004007
52379"}|
|--------------------------------------------------------------------------------------|
SQL>



Update May 2015

development of this service continues disappointingly slowly. It does not work well with MongoDB 3+ tools, and work is effectively suspended until MongoDB releases more documentation of their meta protocol.