Aggregation of documents in Manuscriptorium: basic information

The Manuscriptorium system is conceived as a digital library of manuscripts, old printed books and other scarce documents. As any other library, it comprises a catalogue and the digital documents themselves. The catalogue assembles descriptive metadata relating to the physical documents in the form of so-called identification records in XML format and directs users to the digital documents which exist in Manuscriptorium in the form of so-called complex digital documents (CDD).

The Manuscriptorium system collects the available metadata centrally and ensures access to the digitised data distributed in the on-line environment via the operator’s data storage facility and the remote storage facilities of other contributors.

Incorporation of documents in Manuscriptorium – the content aspect

The content aspect of the document selection method is described in detail in Document selection and preparation of descriptions (Manuscriptorium v. 1.0). Below we specify the properties required for the acceptance of digital documents.

Crucial requirements:
 

  1. the importance of the document and its suitability for incorporation in Manuscriptorium,
  2. technical applicability of the document information (metadata) and information representing the document, i.e. images, sound, full texts (data),
  3. quality and veracity of the data.

The National Library of the Czech Republic supervises the incorporation of documents in Manuscriptorium, specifying the latter’s remit to make available the written cultural heritage up to the year 1800, with the primary focus on book-type documents; on a secondary level, however, it is open to documents of an archival nature (in particular books, deeds and maps).

All digital documents in Manuscriptorium are as a matter of principle identified in terms of the original physical document, whether extant or defunct.

Technical requirements

To enable individual contributors to contribute both identification records and associated digital data to the Manuscriptorium system, fundamental technical conditions and parameters for data entry must be established. Contributors must ensure that the data and metadata they contribute are compatible with the Manuscriptorium system.

Documents can be aggregated into Manuscriptorium directly, if they are directly compatible. Compatibility may be achieved by menas of  import connectors, which facilitate dedicated conversion of metadata during import from original formats.

The basic form of information storage in Manuscriptoriu is XML files with a structure corresponding to the TEI P5 standard. For the storage of metadata within this format the specific TEI P5 ENRICH schema is used, which has been specially developed for the requirements of projects focusing on manuscript description and the description of old printed books and other historic documents. The schema was developed under the leadership of Oxford University Computing Services in co-operation with experts from a number of heritage institutions under the ENRICH project.

The following are considered compatible documents
 

  1. where their metadata is created by any tools such that it is in a form of  XML valid according to DTD or the XSD TEI P5 ENRICH schema,
  2. where their metadata contains at least the requisite minimum information enabling automated processing in Manuscriptorium,
  3. where their data is accessible on-line via HTTP protocol (directly or via script) on the basis of information available in the metadata,
  4. where their image data is accessible in formats directly supported by current popular internet browsers (current JPEG, GIF a PNG formats) .

The essential technical information minimum contained in metadata is currently considered to be:
 

  • bibliographical description comprising the basic extent of identification information, i.e. the following information about the specimeni: country and location of storage, library, unique shelfmark within the library system,
  • list of data files (typically images) enabling access to data available on-line.
  •  
  •  
The Manuscriptorium technical minimum does not take account of the content aspect and may be unacceptable for other projects!
Information on this scale actually represents the technical minima only. The minimum, or the optimal minimum, which it is appropriate to observe in order for the digital document to be properly usable, ought to include significantly broader information, the structure of which naturally varies according to the properties of the physical object. Manuscriptorium does not, therefore, seek to determine the minimum extent of information over and above the technical minimum.

 

In cases where conversion of existing information is difficult or the contributor has no reason to change its form, but still wishes to publish it in Manuscriptorium, tools can be created for routine conversion to achieve compatibility. Manuscriptorium will add a so-called import connector to its system and the contributor thereafter manages data independently, offering it for aggregation in its original form. Prior consultation is necessary for the creation of a connector.

Connectors are typically built at the Manuscriptorium end; alternatively also at the contributor’s end. The use of existing connectors or the building of new ones is subject to individual negotiations between the future partner, the National Library of the Czech Republic and the Manuscriptorium manager.

The National Library of the Czech Republic supports and finances the import of documents compatible with Manuscriptorium. Documents for which a connector already exists are also considered compatible.

 

Manuscriptorium offers freely accessible on-line tools facilitating creation and quality screening of digital document metadata.