Bulk Load

2.4 Bulk Load

To bulk load a stand alone document use the following statements:

LOAD "path_to_file" "document_name"

The first parameter is a path to the file which contains a document to be loaded. The second parameter is the name for this document in the database.

For example,

LOAD "/opt/test.xml" "test"

loads file /opt/test.xml into database as a stand-alone document with name test.

To load document into a collection, use the following statement:

LOAD "path_to_file" "document_name" "collection_name"

The first parameter is a path to the file which contains a document to be loaded. The second parameter is the name for this document in the database. The third parameter is the collection name to load the document into.

For example,

LOAD "/opt/mail-01.xml" "mail-01" "mails"

loads file /opt/mail-01.xml into collection mails.

For performing bulk load not from the source file but from an input stream, use the following statements (first for loading stand alone document, second - for loading into a collection):

LOAD STDIN "document_name"

LOAD STDIN "document_name" "collection_name"

Compared to the above bulk load statements, here the "file_name" is replaced by the keyword STDIN to denote that the file to be loaded is taken from the input stream. Characters in the input stream must form a well-formed XML document, which is loaded into the database and named as specified by "document_name". If collection_name is set, the document is loaded into the specified collection of the database.

By default, the standard input stream is used. You can redirect a different input stream to be used as an input for bulk load. For example, an XML document produced by some program as its output can be loaded to a Sedna database in a stream-wise fashion. To redirect the input when working from a command line, you can use the functionality provided by your operation system. Java and Scheme APIs provide additional wrappers for bulk load from stream, such that the input stream can be specified by an additional argument of a function call.

By default, Sedna removes boundary whitespace according to the boundary-space policy defined in [3]. To control boundary whitespace processing, use boundary-space declaration [3] in the prolog of the LOAD statement. The following example illustrates a boundary-space declaration that instructs Sedna to preserve whitespace while loading auction.xml document:

declare boundary-space preserve;
LOAD "auction.xml" "auction"

Notice, that heavy bulk-loads might be greatly optimized by setting SEDNA_LOG_AMOUNT connection attribute to SEDNA_LOG_LESS (see Section 1.2.3 for more information).

2.4.1 CDATA section preserving

It is possible to save the formatting of continuous CDATA sections with cdata-section-preserve option.

declare option se:bulk-load "cdata-section-preserve=yes";
LOAD "auction.xml" "auction"

The cdata-section-preserve=yes option makes text nodes within CDATA sections to be serialized within CDATA sections. CDATA section formatting is saved only for the whole text node and this property of text node is inherited when text node is appended. E.g. in the following document fragment CDATA section will be serialized as it appears in document:

But the next fragment will not be saved with mixed formatting.

Instead, it will be serialized in the same way as prevoius one, i.e. the whole text will be in CDATA section.

[next] [prev] [prev-tail] [front] [up]