Basic concepts of the model
In the context of META-SHARE, the term metadata refers to descriptions of LRs, encompassing both data (textual, multimodal/multimedia and lexical data, grammars, language models etc.) and technologies (tools/services) used for their processing. These are also found in the literature as Language Resources and Technologies (LRTs).
The mechanism we have adopted is the component-based mechanism (Component MetaData Infrastructure, CMDI), according to which semantically coherent elements are grouped together to form components [Broeder et al., 2008].
More specifically, elements are used to encode specific descriptive features of the LRs. To cater for semantic consistency with other related schemas and models, links to the conceptually same or similar existing elements in the Dublin Core (DC, www.dublincore.org) and the ISO Data Category Registry (ISO DCR, [ISO 12620, 2009]) are provided.
In addition, the notion of relations has been introduced for the encoding of linking features between resources. Relations hold between the various forms of a LR (e.g. raw and annotated resource), different LRs (e.g. a language resource and the tool that has been used to create it etc.) - irrespective of whether these are included in the META-SHARE repository or not - as well as peripheral resources (e.g. standards used, related documentation etc.). Relations are also represented as elements.
The set of all the components and elements describing specific LR types and subtypes represent the profile of this type.Obviously, certain components include information common to all types of resources (e.g. identification, contact, licensing information etc.) and are, thus, used for all LRs, while others (e.g. components including information on the contents, annotation etc.) differ across types.
In order to accommodate flexibility, the elements belong to two basic levels of description:
- an initial level providing the basic elements for the description of a resource (minimal schema), and
- a second level with a higher degree of granularity (maximal schema), providing detailed information on a resource and covering all stages of LR production and use.
The minimal schema contains those elements considered indispensable for LR description (from the provider's perspective) and identification (from the consumer's perspective). It takes into account the views expressed in the user survey conducted in the framework of WP7 (see [Federmann et al., 2011]) concerning which features are considered sufficient to give a sound "identity" to a resource; discussions taken up within the extended metadata group concerning the need of specific features have also fed the specifications of the minimal schema.
These two levels contain four classes of elements: the first level contains mandatory and condition-dependent mandatory elements (i.e. to be filled in when specific conditions are met), while the second level includes recommended and optional elements.
- Broeder, D., T. Declerck, E. Hinrichs, S. Piperidis, L. Romary, N. Calzolari and P. Wittenburg,“Foundation of a Component-based Flexible Registry for Language Resources and Technology”, Proceedings of the 6th International Conference of Language Resources and Evaluation, 2008. Available at: http://www.lrec-conf.org/proceedings/lrec2008/
- Federmann, C., B. Georgantopoulos, R. delGratta, O. Hamon, B. Magnini, D. Mavroeidis, S. Piperidis, M. Schroeder, M. Speranza, META-NET Deliverable D7.1.1– META-SHARE Functional and Technical Specification, http://t4me.dfki.de/intranet/document_repository/deliverables/wp07-infrastructure-functional-and-technical-specification/meta-net-d7.1.1-final.pdf/view, 2011
- ISO 12620,Terminology and other language and content resources – Specification of data categories and management of a Data Category Registry for language resources. http://www.isocat.org, 2009