ePrints UK |
|
| [home] [documents] [contacts] [workshops] [search] | ...subject access to eprint archives |
by Andy Powell, Michael Day and Peter
Cliff
UKOLN, University of Bath
Version 1.2
This document provides some recommendations for the use of simple Dublin Core metadata [1] to describe eprints in eprint archives.
These recommendations are not primarily targeted at the end-users of eprint archives. Rather, they are intended to guide 'best-practice' for the maintainers of eprint archives in order that such systems can be configured to maximise the benefits of a shared approach to metadata disclosure. The intention is to facilitate more consistent results when 'cross-searching' and browsing metadata records gathered from multiple eprint archives. Nonetheless, we would hope that these guidelines to have some impact on the cataloguing guidelines and help sub-systems offered to end-users of institutional eprint archives.
As can be seen from the recommendations for dc:type (below), this document uses a fairly wide working definition of 'eprint':
"an electronic copy of an academic paper" [2].
These recommendations are specific to the use of simple (unqualified) Dublin Core. The intention is to develop a separate set of guidelines for the use of qualified Dublin Core (including the use of element refinements and encoding schemes) to describe eprints.
These recommendations draw on three documents:
All eprint archives must support the 'oai_dc' record format (this is mandated by the OAI-PMH [6]).
All eprint archives are strongly recommended to support the following minimal set of DC elements within the 'oai_dc' record format.
The values of these elements must not contain any HTML (or XML) markup. They may contain LaTeX [7] commands if desired but it is worth remembering that there is no mechanism for explicitly indicating that LaTeX is being used.
This section lists each of the Dublin Core elements. For each element, an eprint-specific recommendation is provided followed by the authoritative definitions and comments from the Dublin Core Metadata Initiative.
(*) indicates that an element is part of the minimal set.
| Eprint-specific Recommendation: |
The title of the eprint. Preserve the original wording, order and spelling of the eprint title. Only capitalize proper nouns. Punctuation need not reflect the usage of the original. Subtitles should be separated from the title by a colon. For example: <dc:title>Initial sequencing and analysis of the human genome</dc:title> <dc:title>The new nationalism and the old history: perspectives on the West German Historikerstreit</dc:title> If necessary, repeat this element for multiple titles. |
|
DC Label: |
Title |
|
DC Name: |
title |
|
DC Definition: |
A name given to the resource. |
|
DC Comment: |
Typically, a Title will be a name by which the resource is formally known. |
| Eprint-specific Recommendation: |
An author of the eprint. Personal names should be listed surname or family name first, followed by forename or given name or initial followed by a full stop. Separate the surname (or family name) from the forenames, given names or initials with a comma. Titles (Dr., Prof., etc.) should precede the forenames, generational suffixes (Jr., Sr., etc.) should follow the family name. When in doubt, give the name as it appears, and do not invert. For example: <dc:creator>Sulston, John E.</dc:creator> <dc:creator>Evans, R.J.</dc:creator> <dc:creator>Ng, Tze Beng</dc:creator> <dc:creator>Walker Jnr., Dr. John</dc:creator> In the case of organizations where there is clearly a hierarchy present, list the parts of the hierarchy from largest to smallest, separated by full stops. If it is not clear whether there is a hierarchy present, or unclear which is the larger or smaller portion of the body, give the name as it appears in the eprint. For example: <dc:creator>International Human Genome Sequencing Consortium</dc:creator> <dc:creator>Loughborough University. Department of Computer Science</dc:creator> Only encode organisations in this element to indicate corporate authorship, not to indicate the affiliation of an individual. The inclusion of personal and corporate name headings from authority lists constructed according to AACR2 [8], e.g. the Library of Congress Name Authority File (LCNA), is also acceptable. In cases of lesser responsibility, other than authorship, use dc:contributor. If the nature of the responsibility is ambiguous, recommended best practice is to use dc:publisher for organizations, and dc:creator for individuals. If necessary, repeat this element for multiple authors. |
|
|
DC Label: |
Creator |
|
|
DC Name: |
creator |
|
|
DC Definition: |
An entity primarily responsible for making the content of the resource. |
|
|
DC Comment: |
Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity. |
|
| Eprint-specific Recommendation: |
The topic of the eprint. In general, choose the most significant and unique words for keywords, avoiding those too general to describe a particular eprint. If the subject of the eprint is a person or an organization, use the same form of the name as you would if the person or organization were an author, but do not repeat the name in the dc:creator element. For free-text keywords either encode multiple terms with a semi-colon separating each keyword; or repeat the element for each term. There are no requirements regarding the capitalization of keywords though internal (within archive) consistency is recommended. Where terms are taken from a standard classification scheme: encode each term in a separate element. Encode the complete subject descriptor according to the relevant scheme. Use the capitalisation and punctuation used in the original scheme. Where subject terms are taken from LCSH, the subfields of the subject heading should be separated by double dash (--) and spaces should be omitted. For example (using free-text keywords and LCSH): <dc:subject>polar oceanography; boundary current; mass transport; water masses; halocline; mesoscale eddies</dc:subject> <dc:subject>World War, 1939-1945--Germany</dc:subject> <dc:subject>Germany--History--1933-1945</dc:subject> <dc:subject>Hitler, Adolf, 1889-45</dc:subject> |
|
|
DC Label: |
Subject and Keywords |
|
|
DC Name: |
subject |
|
|
DC Definition: |
The topic of the content of the resource. |
|
|
DC Comment: |
Typically, a Subject will be expressed as
keywords, key phrases or classification codes that describe a
topic of the resource. |
|
| Eprint-specific Recommendation: |
A summary of the content of the eprint, typically in the form of an abstract. |
|
|
DC Label: |
Description |
|
|
DC Name: |
description |
|
|
DC Definition: |
An account of the content of the resource. |
|
|
DC Comment: |
Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content |
|
| Eprint-specific Recommendation: |
The publisher of the eprint, typically either the author's institution or a commercial publisher. In the case of organizations where there is clearly a hierarchy present, list the parts of the hierarchy from largest to smallest, separated by full stops. If it is not clear whether there is a hierarchy present, or unclear which is the larger or smaller portion of the body, give the name as it appears in the eprint. For example: <dc:publisher>Loughborough University. Department of Computer Science</dc:publisher> <dc:publisher>University of Cambridge. Department of Earth Sciences</dc:publisher> <dc:publisher>University of Oxford. Museum of the History of Science</dc:publisher> <dc:publisher>University of Reading. Rural History Centre</dc:publisher> <dc:publisher>University of Exeter. Institute of Cornish Studies</dc:publisher> <dc:publisher>European Bioinformatics Institute</dc:publisher> <dc:publisher>John Wiley & Sons, Inc. (US)</dc:publisher> Personal names should be listed surname or family name first, followed by forename or given name or initial followed by a full stop. Separate the surname (or family name) from the forenames, given names or initials with a comma. Titles (Dr., Prof., etc.) should precede the forenames, generational suffixes (Jr., Sr., etc.) should follow the family name. When in doubt, give the name as it appears, and do not invert. For example: <dc:publisher>Sulston, John E.</dc:publisher> <dc:publisher>Evans, R.J.</dc:publisher> <dc:publisher>Ng, Tze Beng</dc:publisher> <dc:creator>Walker Jnr., Dr. John</dc:creator> The inclusion of personal and corporate name headings from authority lists constructed according to AACR2 [8], e.g. the Library of Congress Name Authority File (LCNA), is also acceptable. |
|
|
DC Label: |
Publisher |
|
|
DC Name: |
publisher |
|
|
DC Definition: |
An entity responsible for making the resource available. |
|
|
DC Comment: |
Examples of a Publisher include a person, an
organisation, or a service. |
|
| Eprint-specific Recommendation: |
A contributor to the eprint (but not one of the primary authors). For example, a supervisor, editor, technician or data collector. Personal names should be listed surname or family name first, followed by forename or given name or initial followed by a full stop. Separate the surname (or family name) from the forenames, given names or initials with a comma. Titles (Dr., Prof., etc.) should precede the forenames, generational suffixes (Jr., Sr., etc.) should follow the family name. When in doubt, give the name as it appears, and do not invert. For example: <dc:contributor>Sulston, John E.</dc:contributor> <dc:contributor>Evans, R.J.</dc:contributor> <dc:contributor>Ng, Tze Beng</dc:contributor> <dc:creator>Walker Jnr., Dr. John</dc:creator> In the case of organizations where there is clearly a hierarchy present, list the parts of the hierarchy from largest to smallest, separated by full stops. If it is not clear whether there is a hierarchy present, or unclear which is the larger or smaller portion of the body, give the name as it appears in the eprint. For example: <dc:contributor>International Human Genome Sequencing Consortium</dc:contributor> <dc:contributor>Loughborough University. Department of Computer Science</dc:contributor> Only encode organisations in this element to indicate a corporate contribution, not to indicate the affiliation of an individual. The inclusion of personal and corporate name headings from authority lists constructed according to AACR2 [8], e.g. the Library of Congress Name Authority File (LCNA), is also acceptable. |
|
|
DC Label: |
Contributor |
|
|
DC Name: |
contributor |
|
|
DC Definition: |
An entity responsible for making contributions to the content of the resource. |
|
|
DC Comment: |
Examples of a Contributor include a person,
an organisation, or a service. |
|
| Eprint-specific Recommendation: |
The 'last-modified' date of the eprint and/or the date of its accession into the archive. The date should be formatted according to the W3C encoding rules for dates and times [9] (a profile based on ISO 8601 known as W3C-DTF), for example: <dc:date>2000-12-25</dc:date> <dc:date>1999</dc:date> <dc:date>2003-01</dc:date> If necessary, repeat this element to provide both the last-modified date and the date of accession. The last-modified date will be assumed to be the more recent of the two dates. If only one date is provided, it will be assumed that the last-modified date and the date of accession are the same. |
|
|
DC Label: |
Date |
|
|
DC Name: |
date |
|
|
DC Definition: |
A date associated with an event in the life cycle of the resource. |
|
|
DC Comment: |
Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 and follows the YYYY-MM-DD format. |
|
| Eprint-specific Recommendation: |
The type of eprint. Recommended best practice is to take the value of this element from the following list:
For example: <dc:type>JournalArticle</dc:type> If necessary, repeat this element to encode multiple types. If necessary, repeat this element to indicate the peer-reviewed status of the eprint, using one of the following values:
For example: <dc:type>PeerReviewed</dc:type> |
|
|
DC Label: |
Resource Type |
|
|
DC Name: |
type |
|
|
DC Definition: |
The nature or genre of the content of the resource. |
|
|
DC Comment: |
Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the working draft list of Dublin Core Types). To describe the physical or digital manifestation of the resource, use the FORMAT element. |
|
| Eprint-specific Recommendation: |
The media-type of the eprint. Recommended best practice is to select a term from the IANA registered list of Internet Media Types (MIME types) [10]. For example: <dc:format>application/pdf</dc:format> Repeat this element if the eprint is available in multiple formats. |
|
|
DC Label: |
Format |
|
|
DC Name: |
format |
|
|
DC Definition: |
The physical or digital manifestation of the resource. |
|
|
DC Comment: |
Typically, Format may include the media-type or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. Examples of dimensions include size and duration Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types defining computer media formats). |
|
| Eprint-specific Recommendation: |
A URI or bibiographic citation for the eprint, typically the URI of the 'jump-off page' for the eprint, as served by the archive. For example: <dc:identifier>http://eprints.bath.ac.uk/archive/00000003/</dc:identifier> If possible, repeat this element to provide a full bibliographic citation for the eprint. For example: <dc:identifier>Heery, R. (2000). "Information gateways: collaboration on content." Online Information Review, 24 (1), 40-45.</dc:identifier> If possible, also repeat this element to provide an OpenURL [11] for the eprint, using the form below. For example: <dc:identifier>openurl:?sid=ukoln:&genre=article &sid=ukoln:&atitle=Information%20gateways:%20collaboration%20on%20content &title=Online%20Information%20Review&issn=1468-4527&volume=24 &spage=40&epage=45&artnum=1&aulast=Heery&aufirst=Rachel</dc:identifier> (Note that lines in these two examples have been wrapped for readability.) |
|
|
DC Label: |
Resource Identifier |
|
|
DC Name: |
identifier |
|
|
DC Definition: |
An unambiguous reference to the resource within a given context. |
|
|
DC Comment: |
Recommended best practice is to identify the
resource by means of a string or number conforming to a formal
identification system. |
|
| Eprint-specific Recommendation: |
The URI, title or bibliographic citation for a resource from which the eprint is derived. In general, this element should not be used. |
|
|
DC Label: |
Source |
|
|
DC Name: |
source |
|
|
DC Definition: |
A Reference to a resource from which the present resource is derived. |
|
|
DC Comment: |
The present resource may be derived from the Source resource in whole or in part. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system |
|
| Eprint-specific Recommendation: |
The language in which the eprint is written. Use the language codes defined in RFC 3066 [12], for example: <dc:language>en-GB</dc:language> If necessary, repeat this element to indicate multiple languages. |
|
|
DC Label: |
Language |
|
|
DC Name: |
language |
|
|
DC Definition: |
A language of the intellectual content of the resource. |
|
|
DC Comment: |
Recommended best practice is to use RFC 3066 which, in conjunction with ISO 639, defines two- and three-letter primary language tags with optional subtags. Examples include "en" or "eng" for English, "akk" for Akkadian, and "en-GB" for English used in the United Kingdom. |
|
| Eprint-specific Recommendation: |
The URI of each available format of the eprint. If necessary, repeat this element for multiple formats. Also repeat this element if the eprint is available from other locations, for example from the publisher's Web site. For example: <dc:relation>http://eprints.bath.ac.uk/archive/00000003/01/1097.pdf</dc:relation> <dc:relation>http://eprints.bath.ac.uk/archive/00000003/01/1097.html</dc:relation> <dc:relation>http://www10.org/cdrom/posters/1097.pdf</dc:relation> |
|
|
DC Label: |
Relation |
|
|
DC Name: |
relation |
|
|
DC Definition: |
A reference to a related resource. |
|
|
DC Comment: |
Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system. |
|
| Eprint-specific Recommendation: |
The geographic location or temporal period that the eprint is about. Recommended best practice is to select the value from a controlled vocabulary (for example, the Getty Thesaurus of Geographic Names [13] or TGN) and that, where appropriate, named places or time periods be used in preference to numeric identifiers such as sets of co-ordinates or date ranges. If necessary, repeat this element to encode multiple locations or periods. |
|
|
DC Label: |
Coverage |
|
|
DC Name: |
coverage |
|
|
DC Definition: |
The extent or scope of the content of the resource. |
|
|
DC Comment: |
Coverage will typically include spatial
location (a place name or geographic co-ordinates), temporal
period (a period label, date, or date range) or jurisdiction
(such as a named administrative entity). |
|
| Eprint-specific Recommendation: |
A human-readable statement about the rights held in and over the eprint, the URI of a Creative Commons [14] licence or the URI of a machine-readable statement. For example: <dc:rights>(c) University of Bath, 2003</dc:rights> <dc:rights>(c) Andrew Smith, 2003</dc:rights> <dc:rights>http://creativecommons.org/licenses/by-nd-nc/1.0</dc:rights> <dc:rights>http://eprints.bath.ac.uk/archive/00000003/01/1097-odrl.xml</dc:rights> |
|
|
DC Label: |
Rights Management |
|
|
DC Name: |
rights |
|
|
DC Definition: |
Information about rights held in and over the resource. |
|
|
DC Comment: |
Typically, a Rights element will contain a
rights management statement for the resource, or reference a
service providing such information. Rights information often
encompasses Intellectual Property Rights (IPR), Copyright, and
various Property Rights. |
|
Thanks to Tim Brody, Les Carr, Thom Hickey, Diane Vizine-Goetz, Andrew Houghton, Simon Jennings, Pete Johnston, Rachel Heery, Ruth Martin and Monica Duke for commenting on previous drafts of this document.
|
| Page last modified: 17-Mar-2004 | Maintained by rdn-support@rdn.ac.uk |