Summary
This policy describes a pragmatic and working solution to managing information about item types where standardisation is lacking.
The current process involves:
- Harvesting the source item type for an item from the repository
- Applying mapping rules to programmatically assign an appropriate IRUS item type to the item
- Incremental bibliographic metadata harvesting to reflect changes at source
This process is providing an effective solution in the UK and for services internationally, though we continue to examine existing guidelines and evaluate our policy on an ongoing basis and will respond to any changes in the environment as appropriate.
IRUS item types
There are currently 31 IRUS item types which are listed and defined in the table below.
IRUS Type ID | IRUS Type | IRUS Type Description |
---|---|---|
23 | {Unmappable} | The information provided in a type element is contradictory and cannot be mapped to a single IRUS type, e.g. 'books/reports' |
0 | {Unspecified} | No element describing the type of resource was found in the source metadata |
26 | Abstract | An abstract provides a short representation of information contained within a research paper, journal article, conference paper, book or book section. |
1 | Art/Design Item | Works including artist's books, visual artworks, sound design, set design, choreography and theatre scenery props, all developed as a result of artistic practice. |
2 | Article | An article on a particular topic and published in a journal issue. |
3 | Audio | A resource primarily intended to be heard. Examples include a music playback file format, an audio recording, and recorded speech or sounds. |
4 | Book | A complete, stand-alone work published in one or more volumes, often identified with an ISBN. |
5 | Book Section | A defined chapter or section of a book, or anthology/collected work and usually with a separate title or number. |
30 | Chemical Structures | The description or graphical representation of the arrangement of chemical bonds between atoms in a molecule |
6 | Conference Item | Includes a conference paper that is submitted to a conference and presented to the audience and is published in proceedings. A display poster, typically containing text with illustrative figures and/or tables, submitted for acceptance to and/or presented at a conference, seminar, symposium, workshop or similar event. Conference proceedings forming the official record of a conference meeting. It is a collection of documents, which corresponds to the presentations given at the conference. All kind of digital resources contributed to a conference, like conference presentation (slides) to a specific group within the conference, conference lecture, abstracts, workshops and demonstrations. |
7 | Dataset | A collection of structured data-related (measurements, observations) facts and data encoded in a defined structure. |
28 | Design/Plan | Plans, blueprints, drawing or set of drawings showing how something e.g. building or product is to be made and how it will work and look. |
8 | Exam Paper | Resources used to support teaching and learning and evaluating including assessment materials, exam papers, previous papers and tests, both internal and external. |
9 | Image | A visual representation other than text, of types of still image. This class of image includes diagrams, drawings, graphs, graphic designs, photographs and prints. |
22 | Journal Issue | A journal is a scholarly publication containing articles written by researchers, professors and other experts. Journals focus on a specific discipline or field of study. Unlike newspapers and magazines, journals are intended for an academic or technical audience, not general readers. |
10 | Learning Object | A collection of educational resources put together, rather than independent items, used to support teaching and learning. They should be reusable, interoperable and manageable in multiple contexts and may include curricula and syllabuses, course materials, learning resources, lecture notes, learning exercises, instructional problems, course validation/assessment documents, including tests. |
25 | Lecture | Transcription of an oral presentation/talk intended to present information or teach people about a particular subject, for example by a university or college teacher. |
27 | Map | A representation normally to scale and on a flat medium, of a selection of material or abstract features on, or in relation to, the surface of the earth or of another celestial body. |
11 | Moving Image | A moving display, either generated dynamically by a computer program or formed from a series of pre-recorded still images imparting an impression of motion when shown in succession. It also includes film and video footage. |
12 | Music/Musical Composition | Musical composition can refer to an original piece of music, the structure of a musical piece, or the process of creating a new piece of music. Musical notation is any system used to visually represent aurally perceived music through the use of written symbols, including ancient or modern musical symbols. |
13 | Newspaper or Newsletter | Textual content published in a newspaper or newsletter. |
14 | Other | Any resources that do not fall into any of the other item types. |
15 | Patent | A patent or patent application is a document that grants the rights of a piece of work/invention to its originator. |
16 | Performance | A collection of information records representing performance outputs of dramatical or musical entertainment. Works may be produced alone or collaboratively. |
17 | Report | A report is a record of research findings, research still in progress, or other technical findings, usually bearing a report number and sometimes a grant number assigned by the funding agency. It includes internal, research and technical reports. |
18 | Show/Exhibition | A collection of sculptures, models, art paintings, installations (works of art) exhibited under the direction of a curator, artist or as a graduation exhibition. |
24 | Software | A computer program in source code (text) or compiled form. |
29 | Text | A resource consisting primarily of words for reading. |
19 | Thesis or Dissertation | A thesis or dissertation is a document submitted in completion of a course of study at an institution for higher education. It may be submitted in support of candidature for an academic degree or professional qualification presenting the author's research and findings. |
20 | Website Content | Website content contributes to a set of related web pages where multiple types of information on a specific theme are available via a URL. |
21 | Working Paper | A working paper or preprint is a report on research that is still ongoing, or which has not yet been accepted for publication. They are usually published by the author's own institution. |
How were these chosen and defined?
The original set of IRUS item types was defined in 2012 in consultation with our then community advisory group. The list has since been through three iterations in 2015, 2020, and 2022.
The current set of IRUS item types was selected and defined in July 2022 following an in-depth review and analysis of over 4 million source item types. Many item types were changed, some were added, and some renamed or redefined. An explanation of previous iterations can be found in the Appendix at the end of this policy.
How we map a source item type to an IRUS item type
IRUS harvests the item type information, along with other metadata, from repositories via an OAI interface. We look in the DC Type field(s) and capture this as the 'Source Item Type'. If there are no entries, this becomes '{Unspecified}'.
This Source Item Type is held in our database and then mapped to an 'IRUS item type' as part of the daily and monthly processing.
Mapping rules
The processing applies a number of rules to determine the most appropriate IRUS item type. There are around 40 rules, which act as a series of sieves to identify the 'most applicable' IRUS item type. These are based on empirical evidence from an analysis of pre-existing item types (over 4 million records).
- Firstly, if there is contradictory information within a single type element these become '{Unmappable}'.
- Where a repository is sending us 'standard' controlled item types (e.g. COAR resource type) we map from those item types to IRUS item types.
- We use a variety of techniques, from identification of words to identification of sub-strings, to extract the salient information. For example:
- We ignore item type elements that indicate a format (e.g. PDF) and work with the remaining information.
- We look for single words such as 'article' or 'thesis'.
- If there are multiple item types, we look at the combination and under certain conditions we give precedence to one type (e.g. Text and Image will be assigned to 'Text' as it is most likely an image of text.)
- Finally, anything that is left is mapped to 'Other'.
Handling changes to source item type
IRUS carries out monthly incremental harvesting of metadata to check for changes in the source item type.
If a repository makes changes to their source item types that would impact mappings, they can also request that we re-harvest and re-map by contacting the helpdesk.
Use of controlled vocabularies and standards
As part of the most recent review in 2022, we considered adopting a recognised list such as COAR's Resource Type vocabulary, DataCite Metadata Schema, and DCMI Type Vocabulary. Whilst these informed our decisions around our choice of IRUS item types, we concluded that it was still appropriate for us to maintain an IRUS-defined list.
- There was still patchy uptake and use of these vocabularies – none had been widely adopted
- There were gaps and conflicts within the lists – some have too much granularity, others not enough
- None of the available options covered the requirements for the breadth of repositories we work with
We strongly support standardisation, whilst acknowledging individual requirements of institutions and repositories. We recommend that repositories put in place policies and controlled vocabularies (either internally or externally defined) to standardise entry into the item types field wherever possible to support interoperability with other systems. This will help ensure we are able to consistently map to appropriate, meaningful, and useful item types.
Current iteration
The manual mapping process was unsustainable and so we moved to a rule-based approach in July 2022, as described above in this policy. We also introduced incrementable harvesting that enables us to check for changes in source item type (and other bibliographic metadata) on a regular basis.
At this time, we also carried out an in-depth review and analysis of incoming source types to inform revisions to the IRUS item types list. We considered a combination of:
- Volume and frequency: Are we seeing enough examples for a type of item?
- Applicability: Can it be applied to a wide range of repositories?
- Value: Is it useful or interesting to identify usage of this scholarly output?
- Does it describe a scholarly output and not the format or subject of the content?
Following this, we added eight new IRUS item types:
- {Unmappable}
- Software
- Lecture
- Abstract
- Map
- Design/Plan
- Text
- Chemical Structures
We renamed 2 IRUS item types:
- 'Journal' became 'Journal Issue'
- 'Website' became 'Website Content'
Appendix: Previous iterations
This appendix sets out previous iterations and changes to the IRUS item types policy and provides an explanation of the rationale behind these changes.
2012
From the beginning of IRUS in 2012, research was undertaken into the use of item types in regular use. This work sought to compare existing policies, practices, and guidelines concerning the use of item types by institutional repositories (IRs) in order to support a decision around the item types IRUS would use. This was a key area for the service in being able to offer support to its participating IRs, including for any cross-repository comparisons.
The research indicated a lack of standardisation in the use of item types when looking across repositories. Reasons for this included where individual IRs had developed their own lists of item types, or where default lists of item types depended on a chosen software platform. The IRUS team took a decision to define a list of item types for use by IRUS, mapping all IR item types in use to a defined list.
A defined list of 25 IRUS item types was created from over 700 item types that were identified in use in repositories, but many of which were remarkably similar. This was achieved with feedback from the IRUS Community Advisory Group (CAG). For full details of the report including initial research undertaken on item types, a detailed breakdown of the methodology used, the many UK and international policies that were reviewed, and detailed appendices of all their guidelines, see IRUS item type report v3.3 [PDF].
2015
In 2015, following proposals outlined in this report, a few changes were made to the defined list of item types. The 'Journal' item type was removed because the instances where it was used were not of journals but rather the image from the front page of a journal; these were remapped as 'Image'. The 'Review' item type was removed with these instances being remapped to 'Article', and the 'Conference or Workshop Item' type was broken down into three constituent parts.
2020
Between 2015 and 2020 there were 23 IRUS item types in use. The approach taken had functioned well and been appropriate. It had been effectively used through the international services and projects in Australia, New Zealand, and the US (United States), indicating this was fit for purpose.
In 2020, aligned with the move to the new IRUS COUNTER R5 (Release 5) service, the item types were amended to map more closely with data types used by COUNTER.
Manual mapping of item types was performed monthly. For each item new to IRUS, the designated item type, obtained via metadata from the relevant participating repository, was mapped to one of the 23 agreed IRUS item types.