How do you gather data?
We gather data using tracker code plugins and patches. We have patches available for DSpace (4.x, 5.x and 6.x). There are plug-ins available for EPrints (works with 3.2 or greater) and Haplo, and a Ruby Gem for Hydra/Samvera (Fedora) repositories. Figshare, Pure, Worktribe and Esploro have also implemented tracker functionality for their repository software platforms. It's also possible to add tracker functionality to Equella.
We use a "push" mechanism whereby a notification is sent to the IRUS server as an OpenURL key-value pair string every time a file is downloaded from a repository.
When do you gather data?
Data is stored in daily log files which are usually processed the following day. The data for each item are consolidated into daily statistics and are filtered to remove robots and double clicks. These daily statistics are then consolidated into monthly statistics, the traditional COUNTER granularity.
What statistics and reports are available?
IRUS provides reports and visualisations through its website which can be viewed on screen or downloaded in CSV, JSON and TSV formats. Further information can be found in our Guides. IRUS also has an API and widget, and relevant details can be found in our Embed area.
Why are there differences between the usage figures shown in IRUS R5 and R4?
The R5 Total_Item_Requests metric type is the most similar to the previous IRUS R4 service "number of successful downloads" metric. However, counts will differ due to changes in processing. In the R5 service we are working with a broader range of data from across the family of IRUS services. This means it is easier for us to spot more anomalies such as robots and rogue usage, and our exclusions have been updated. We are continually improving our filtering. As a result, some of the recorded usage figures may appear lower than they did in the IRUS R4 service. For further information see our guide Comparing IRUS COUNTER Release 4 (R4) and Release 5 (R5) figures.
Can I view downloads by school/department?
The only practical way for IRUS to obtain school/department metadata related to individual items for each repository is to use the 'ListSets' mechanism which is included in the OAI-PMH. Having reviewed our current IRUS repositories, some repositories do map their schools and departments to OAI Sets but many do not. In fact there are over 28,000 different 'sets' in use, and many of those define the status/subject/type of items rather than their affiliation. It's not feasible to meet this requirement currently but we will continue to explore options to share downloads by set. In the meantime, if you would like to use this functionality in future we would recommend you check your repository configuration to ensure that you map your schools and departments into corresponding OAI Sets.
Can I view information and trends about article deposits?
IRUS only captures information on items that have been used and we do not have mechanism for capturing information about deposits. IRUS' focus is on usage and so deposit information falls outside its scope.
How does IRUS handle robot and rogue usage?
IRUS excludes robots and rogue usage from usage statistics by using:
- the COUNTER Robot Exclusion List
- IP addresses of known robots that do not declare themselves in their user agent
- algorithms that identify usage that appears robotic in behaviour
Work on filtering robots is ongoing.
For further information see "How IRUS excludes robots and rogue usage".
How does the data in IRUS compare with that reported by other statistics products such as Google Analytics?
Different repository usage statistics packages produce data for different purposes. For example, IRStats provides information on what is downloaded from a repository and who is doing the downloads whereas Google Analytics aims to present a picture of visitor traffic. Each package will have a different set of criteria as to how they treat accesses by robots and “double clicks” and hence are likely to give different figures for total downloads.
IRUS provides usage statistics on items downloaded from participating repositories in accordance with COUNTER guidelines. The data provided by IRUS are, therefore, comparable across all participating repositories.
How do IRUS and IRStats2 differ?
EPrints stores accesses to its resources in a database table. IRStats2 processes the contents of this table. Typically a script is run each night which processes the new entries to the access table and updates the data used to produce reports.
IRStats2 has a set of filter plugins which are applied when processing the stats. The 'Repeat' filter measures multiple hits on the same resource and determines if these are likely to be a 'double-click' or similar. This in based on the same IP address requesting the same record within a configurable window, default 1 hour, although some set it to be 24 hours. The 'Robots' filter in the current release matches the user agent against a large static set of patterns, and filters out the accesses accordingly.
Recent work in this area now means there is a version which can also filter by a set of IP addresses, and an additional set of user agent patterns to match against.
IRUS processes daily logs of raw download events gathered from participating repositories and generates COUNTER-conformant statistics.
When ingesting data:
- We follow the COUNTER rules to exclude a list of known robotic user agents and apply the COUNTER double click filter.
- We additionally exclude usage data from:
- Known IP ranges of bots like Baidu Spider that masquerade as ordinary users
- Usage events with fake Google referrers
- IP addresses with 40 or more downloads in a single day
- IP addresses/User Agents with 10 or more downloads of a single item in a single day
- IP address ranges grouped by the 1st three octets that have 300 or more downloads in a day
- These thresholds for these additional exclusions have been derived empirically by analysing our (extensive) logs
During an audit review, the COUNTER auditors agreed that these appeared to be reasonable extra measures to remove robotic/rogue activity from our stats.
It should be noted that IRUS benefits from having download data available from nearly two hundred repositories, making it possible to see the activity of an IP across all of those repositories; a handful of downloads may look legitimate when considering one repository, but can quickly become suspicious activity when viewed across dozens of repositories. This means that, as more repositories take part in IRUS, this continues to improve the accuracy and reliability of the statistics provided.
What are the IRUS item types?
IRUS has researched item types in regular use and compared existing guidelines over many years to produce the IRUS item types. This is also reviewed as an ongoing process.
For further information on the item types and their definitions see the IRUS Item types policy.
What do I do if I need to take down an item?
The IRUS takedown policy applies to the rare occasions when an item in a participating institutional repository has been uploaded in error or has been deemed to be commercially or otherwise sensitive and the repository itself has received a takedown request for that item.
Takedown requests to IRUS must be made by a designated repository manager/administrator.
What are the best practices for cataloguing identifiers to expose them in OAI-PMH?
Item identifiers play an important role in allowing IRUS to produce accurate statistics and we have produced a brief document outlining recommended methods.
For further information see IRUS best practices for cataloguing identifiers and exposing them in OAI-PMH.
If our repository is participating in IRUS, do I need to provide our repository usage statistics to OpenAIRE?
No, you do not need to provide these as OpenAIRE harvest statistics from IRUS. If you send usage statistics separately, OpenAIRE will be double counting.
Does IRUS have an API?
Yes, we have an API. Further information is available in the Embed area. Please contact our helpdesk at firstname.lastname@example.org if you wish to explore its use.
How can I give feedback?
We are a community resource and welcome your comments and suggestions at any time. Please get in touch with our helpdesk at email@example.com.