Sunday, 3 May 2015

MongoDB and the Document standard type

The Document Type

A new feature of Pyrrho (from version 5.1) is the Document standard type. It is planned that this should be used in so-called Big Data applications of Pyrrho. The Document type is inspired by MongoDB and is very useful for ad hoc data. Documents contain strongly typed named fields, but without a schema: the only field that every Document is required to have is called _id, and this is supposed to be unique per document: it is generated for you if you do not supply it.
Documents can be provided in SQL using Json syntax: Pyrrho's SQL is extended to allow Json objects to be written directly as in
select * from a where b={'C':23}

Note that field names are always case-sensitive. Document fields can be embedded documents, arrays or regular expressions: for example a regular expression /a?b.*c/i can be written without quotes inside a Json object. As discussed further below, the equals sign here only tests for the fields mentioned in the document literal.
The client library also supports the Document type: PyrrhoDocument has conversions to and from Json, Bson and byte[].

Indexing Documents

Any table can have columns that contain Documents, and fields inside documents can be accessed from SQL using the a.b.c syntax. The usual SQL2011 case-sensitivity rules apply, so this selector will obtain values such as 23 from the above table.  In fact the above simple query can be written
select * from a where b.c=23

Document field selectors can also be used where SQL allows column lists:
select b.c, d from a

and
create table f(g char, h document, primary key(g, h.i as int))

though here we note that Pyrrho needs to be told what type the field I has. Extra document indexes can be specified as is usual in SQL using unique or references . Such indexes add restrictions to the creation of new documents, as fields used in index keys must not be null.
These new features also work well for other forms of structured data.

Queries

MongoDB has $ operators for use in creating templates for queries and in updates, and these are also available in Pyrrho, and provide alternative ways of writing queries in ordinary data. For example  the query
select * from t where x>100

(even where table t contains no Document columns) can be rewritten using a document literal as
select * from t where x={'$gt': 100}

Such a constant equality-match condition can be used very efficiently on remote data.

Document Updates

Documents are almost never replaced in their entirety. Instead document fields are modified using templates that contain $ operators, and Pyrrho's transaction log contains only the update templates used: the actual binary value of the document is maintained in memory.

Update May 2015

Work on developing this service has been disappointingly slow: it does not work well with the MongoDB 3.0 shell. Work on it will resume when the MongoDB meta protocol documentation appears.

No comments:

Post a Comment