Sedna LogoBackground Top
 
Home  |  Getting Started  |  Documentation  |  Demo  |  Download  |  Support 

2.5 Data Definition Language

This section describes Sedna Data Definition Language (DDL) that is used to create and manage the database structures that will hold data.

Most of parameters of Sedna DDL are computable and specified as XQuery expressions. The expected type of all the parameters is xs:string. All parameters are evaluated and atomized. If the atomized value is not of xs:string, a dynamic error is raised.

2.5.1 Managing Standalone Documents
CREATE DOCUMENT doc-name-expr

The CREATE DOCUMENT statement creates a new standalone document with the name that is the result of doc-name-expr.

DROP DOCUMENT doc-name-expr

The DROP DOCUMENT statement drops the standalone document with the name that is the result of doc-name-expr.

For example, next statement:

}  
CREATE DOCUMENT "report"

creates a documentnamed ”report”, while:

DROP DOCUMENT "report"

There is a system document $documents which lists all available documents and collections. For details on retrieving metadata see 2.5.6 section.

2.5.2 Managing Collections

Sedna provides a mechanism of organizing multiple documents into a collection. Collection provides a uniform way of writing XQuery and XML update statements addressed to multiple documents at once. Collections are preferable for situations when a group of documents is to be queried/updated by means of not referring to their individual document names, but according to some conditions over their content.

In a Sedna database, a document can be either a standalone one (when it doesn’t belong to any collection) or belonging to some collection. Compared to standalone documents, all documents within a given collection have a common descriptive schema. The common descriptive schema (which can be considered as a union of individual descriptive schemas for all documents that belong to a collection) allows addressing XQuery and XML update queries to all members of a collection.

Below is the specification of syntax and semantic of statements that manage collections.

CREATE COLLECTION coll-name-expr

The CREATE COLLECTION statement creates a new collection with the name that is the result of coll-name-expr.

For example, CREATE COLLECTION "mails" creates a collection named ”mails”.

XML documents can be loaded into the collection, as previously described in section 2.4.

To access a single document from collection in an XQuery or XML update query, the document function accepts a collection name as its second optional argument:

doc($doc as xs:string,  
    $col as xs:string) as document-node()

The function returns the document with the $doc name that belongs to the collection named $col.

For example, doc(’mail-01’, ’mails’) query returns documents with name mail-01 from collection mails.

fn:collection($col as xs:string?) as document-node()*

The function collection can be called from any place within an XQuery or XML update query where the function call is allowed. The collection function returns the sequence of all documents that belong to the collection named $col. The relative order of documents in a sequence returned by collection function is currently undefined in Sedna.

Conventional XQuery predicates can be used for filtering the sequence returned by the collection function call, for selecting certain documents that satisfy the desired condition.

CREATE DOCUMENT doc-name-expr IN COLLECTION coll-name-expr

This statement creates a new document named doc-name-expr in a collection named coll-name-expr.

For example, the following statement:

CREATE DOCUMENT ’mail’ IN COLLECTION ’mails’

creates a document named ”mail” in the collection named ”mails”.

DROP DOCUMENT doc-name-expr IN COLLECTION coll-name-expr

The DROP DOCUMENT IN COLLECTION statement drops the document named doc-name-expr located in the collection named coll-name-expr.

DROP COLLECTION coll-name-expr

The DROP COLLECTION statement drops the collection with the coll-name-expr name from the database. If the collection contains any documents, these documents are dropped as well.

RENAME COLLECTION old-name-expr INTO new-name-expr

The RENAME COLLECTION statement renames collection with the name that is result of the old-name-expr. The new name is assigned which is result of the new-name-expr. Both result of the old-name-expr and result of the new-name-expr after atomization applied must be either of type xs:string (or derived) or promotable to xs:string.

There is a system document $collections which lists all available collections. For details on retrieving metadata see 2.5.6 section.

2.5.3 Managing Value Indices

Sedna supports value indices to index XML element content and attribute values. Index could be based on two different structure types: B+ tree and BST (experimental) (Block String Trie). Below is the description of statements to manage indices.

CREATE INDEX title-expr  
ON Path1 BY Path2  
AS type  
[USING tree-type]

The CREATE INDEX creates an index on nodes (specified by Path1) by keys (specified by Path2).

Path1 is an XPath expression without any filter expressions that identifies the nodes of a document or a collection that are to be indexed. Path2 is an XPath expression without any filter expressions that specifies the relative path to the nodes whose string-values are used as keys to identify the nodes returned by the Path1 expression. The Path2 expression should not start with ’/’ or ’//’. The full path from the root of documents (that may be in a collection) to the key nodes is Path1/Path2.

For instance, let Path1 be doc("a")/b/c and Path2 be d. Let X be the node returned by the Path1 expression, and Y be one of the nodes returned by the doc("a")/b/c/d expression. If Y is the descendant of X, then the value of Y is used as the key for searching the node X.

title-expr is the title of the index created. It should be unique for each index in the database.

type is an atomic type which the value of the keys should be cast to. The following types are supported for B-tree: xs:string, xs:integer, xs:float, xs:double, xs:date, xs:time, xs:dateTime, xs_yearMonthDuration, xs_dateTimeDuration. Note that BST supports xs:string only.

tree-type defines the structure that would be used for index storage. This argument is optional. Index is stored using B+ tree structure by default and it’s a good choice in the general case. But there is one more implemented structure that could show great disk-space economy in certain situations. BST is based on prefix tree (or trie) conception. Main distinguishing features are:

  1. BST can handle strings with any length.
  2. If you want to index data by fields that contain strings with repeating prefixes (e.g: URLs, URIs) BST would be a good choice for you.

In the case of appropriate usage BST can store indexes up to 4 times more compressed with the same search speed in comparison with B+ tree.

The following tree-type are supported: "btree" for B+ tree (default), "bstrie" for BST.

Note 3 BST feature is experimental at this moment; do not use it in production or any critical applications.

In the following example, people are indexed by the names of their cities. To generate the index keys, city names are cast to the xs:string type. B+ tree is used for index storage.

CREATE INDEX "people"  
ON doc("auction")/site//person BY address/city  
AS xs:string

The following example is exactly the same as the previous but uses BST for index storage:

CREATE INDEX "people"  
ON doc("auction")/site//person BY address/city  
AS xs:string  
USING "bstrie"

To remove an index, use the following statement:

DROP INDEX title-expr

The DROP INDEX statement removes the index named title-expr from the database.

Note 4 In the current version of Sedna, query executor does not use indices automatically. You can enforce the executor to employ indices by using the XQuery index-scan functions specified in section 2.2.2.

2.5.4 Managing Full-Text Indices

Sedna allows to build full-text indices in order to combine XQuery with full-text search facilities. Resulting indices need to be used explicitly via full-text search functions (see 2.2.3), XQuery full text extensions are not supported.

Sedna can be integrated with dtSearch [15], a commercial text retrieval engine, which provides full-text indices. As dtSearch is a third party commercial product, Sedna does not include dtSearch. If you are interested in using Sedna with dtSearch, please contact us. Below is the description of statements to manage full-text indices in Sedna.

CREATE FULL-TEXT INDEX title-expr  
ON path  
TYPE type  
[  
WITH OPTIONS options  
]

The CREATE INDEX indexes nodes (specified by path) by a text representation of the nodes. The text representations of the nodes are constructed according to type parameter value.

title-expr is the title of the index created. It should be unique for each full-text index in the database.

path is an XPath expression without any filter expressions that identifies the nodes of a document or a collection that are to be indexed. An example of the path expression is as follows doc("foo")/library//article.

type specifies how the text representations of nodes are constructed when the nodes are indexed. type can have one of the following values:

  • "xml" – the XML representations of the nodes are used;
  • "string-value" – the string-values of the nodes are used as obtained using standard XQuery fn:string function. The string-value of a node is the concatenated contents of all its descendant text nodes, in document order;
  • "delimited-value" – the same as "string-value" but blank characters are inserted between text nodes;
  • "customized-value" ((element-qname, type) , ... (element-qname, type)) – this option allows specifying types for particular element nodes. Here element-qname is a QName of an element, type can have one of the values listed above (i.e. "xml", "string-value", "delimited-value"). For those elements that are not specified in the list, the "xml" type is used by default.

options is a sting of the following form: "option=value{,option=value}". It denotes options used for index constuction. The following options are available:

  • backend – specifies which implementation of full-text indexes to use. Allowed values are native and dtsearch, the latter in only available in dtSearch-enabled builds. For dtSearch-enabled builds, default backend is dtsearch, for other builds - native.

Options for native backend:

  • stemming – specifies stemming language to use.
  • stemtype – in order to be able to search both stemmed and original words - add stemtype=both option, otherwise stemming will be used always if enabled.

In the following example, articles are indexed by their contents represented as XML.

CREATE FULL-TEXT INDEX "articles"  
ON doc("foo")/library//article  
TYPE "xml"

The example below illustrates the use of "customized-value" type.

CREATE FULL-TEXT INDEX "messages"  
ON doc("foo")//message  
TYPE "customized-value"  
     (("b", "string-value"),  
     ("a", "delimited-value"))

To remove a full-text index, use the following statement:

DROP FULL-TEXT INDEX title-expr

The DROP FULL-TEXT INDEX statement removes the full-text index named title-expr from the database.

Note 5 In the current version of Sedna, query executor does not use full-text indices automatically. You can enforce the executor to employ indices by using the XQuery full-text search functions specified in section 2.2.3.

2.5.5 Managing Modules

XQuery allows putting functions in library modules, so that they can be shared and imported by any query. A library module contains a module declaration followed by variable and/or function declarations. The module declaration specifies its target namespace URI which is used to identify the module in the database. For more information on modules see the XQuery specification [3].

Before a library module could be imported from an query, it is to be loaded into the database. To load a module, use the following statement.

LOAD MODULE "path_to_file", ..., "path_to_file"

Each path_to_file specifies a path to the file. If only one parameter is supplied, it refers to the file which contains the module definition. The module definition can also be divided into several files. In this case all files must have a module declaration with the same target namespace URI (otherwise an error is raised).

For example, suppose that you have the following module stored in math.xqlib.

module namespace math = "http://example.org/math";  
 
declare variable $math:pi as xs:decimal := 3.1415926;  
 
declare function math:increment($num as xs:decimal) as xs:decimal {  
    $num + 1  
};  
 
declare function math:square($num as xs:decimal) as xs:decimal {  
    $num * $num  
};

You can load this module as follows.

LOAD MODULE "math.xqlib"

Once an library module is loaded into the database, it can be imported into an query using conventional XQuery module import [3]. For example, you can import the above module as follows.

import module namespace math = "http://example.org/math";  
 
math:increment(math:square($math:pi))  

To replace an already loaded module with new one, use the following statement.

LOAD OR REPLACE MODULE "path_to_file", ..., "path_to_file"

To remove a library module from the database, use the following statement.

DROP MODULE "target_namespace_URI"

It results in removing the library module with the given target namespace URI from the database.

You can obtain information about modules loaded into the database by querying the system collection named $modules as follows collection("$modules").

2.5.6 Retrieving Metadata

You can retrieve various metadata about database objects (such as documents, collections, indexes, etc.) by querying system documents and collections listed below.

Names of the system documents and collections start with $ symbol. The system documents and collections (except the ones marked with * symbol) are not persistent but generated on the fly. You can query these documents as usual but you cannot update them. Also these documents are not listed in the $documents system document.

  • $documents document – list of all stand-alone documents, collections and in-collection documents (except system meta-documents and collection, like $documents document itself);
  • $collections document – list of all collections;
  • $modules document – contains list of loaded modules with theirs names;
  • $modules (*) collection – contains documents with precompiled definitions of XQuery modules;
  • $indexes document – list of indexes with information about them;
  • $ftindexes document – list of full-text indexes with information about them (this document is available if Sedna is build with SE_ENABLE_FTSEARCH enabled);
  • $triggers document – list of triggers with information about them (this document is available if Sedna is build with SE_ENABLE_TRIGGERS enabled);
  • $db_security_data (*) document– list of users and privileges on database objects;
  • $schema document – descriptive schema of all documents and collections with some schema-related information;
  • $errors document – list of all errors with descriptions;
  • $version document – version and build numbers;
  • $schema_<name> document – the descriptive schema of the document or collection named <name>;
  • $document_<name> document – statistical information about the document named <name>;
  • $collection_<name> document – statistical information about the collection named <name>.

The statistical information in $document_<name> and $collection_<name> documents contains the following elements:

  • total_schema_nodes – the number of the nodes of the descriptive schema;
  • total_schema_text_nodes – the number of the attribute and text nodes of the descriptive schema;
  • total_nodes – the number of the nodes of the document (or collection);
  • schema_depth – the maximal depth of the document (or collection);
  • total_desc_blk – the number of the descriptor blocks occupied by document (or collection);
  • total_str_blk – the number of the text blocks occupied by document (or collection);
  • saturation – fill factor of the blocks (in percents);
  • total_innr_blk – the number of the descriptor blocks occupied by document (or collection) except first and last blocks in each chain of blocks;
  • total_innr_size – the size of the inner blocks;
  • innr_blk_saturation – fill factor of the inner blocks (in percents);
  • strings – the share of the string blocks (in percents);
  • descriptors – the share of the descriptor blocks (in percents);
  • nid – the share of the long labeling numbers’ (> 11) total size (in percents);
  • indirection – the share of the indirection records’ total size (in percents);
  • total_size – the total size of the document (or collection), in MBs;
  • string_size – the total size of the string blocks, in MBs;
  • descriptor_size – the total size of the descriptor blocks, in MBs;
  • nids_size – the total size of the long labeling numbers, in MBs;
  • free_space_in_str_blocks – the total size of the free space in the string blocks;
  • indirection_size – the total size of the indirection records, in MBs;
  • nids_size – the share of the indirection records’ total size (in percents);
  • STRINGS – the histogram of the xml data by its size;
  • NID – the histogram of the labeling numbers its length;