Comparing IRUS COUNTER Release 4 (R4) and Release 5 (R5) figures
We follow the guidance in the Processing Rules for Underlying COUNTER Reporting Data in the COUNTER Release 5 Code of Practice. The R5 Total_Item_Requests Metric_Type is the most similar to the previous IRUS R4 service "number of successful downloads" metric. However, counts will differ due to changes in R5 processing rules and continued improvements in our handling of exclusions, e.g., anomalies of robots and rogue usage. As a result, some of the recorded usage figures may appear lower or higher than they did in the IRUS R4 service. There are multiple factors why the figures may be different.
Reasons why counts could be lower
What has changed: R4 only recorded and processed file downloads, whereas R5 processes both page views and file downloads. R5 gives a more accurate overview of the activity of "users" enabling IRUS to detect excessive usage which was not detected in R4.
Effect: This will tend to reduce Requests (file downloads) counts in R5. In most cases the effect will be small. But in some cases, it can result in a significant reduction in R5 counts when compared to R4.
What has changed: R4 exclusions operated within individual IRUS service silos, whereas R5 exclusions operate across the much broader, integrated dataset of the whole family of IRUS services. R5 gives a more accurate overview of the activity of "users" enabling IRUS to detect excessive usage which was not detected in R4.
Effect: This will tend to reduce the counts in R5. In most cases the effect will be small. But in some cases, it can result in a moderate reduction in R5 counts when compared to R4.
What has changed: R4 statistics included deleted items, R5 statistics do not.
Note: Deleted items must be excluded from R4 figures before comparing with R5 figures.
Reasons why counts could be higher
What has changed: R4 screened out a handful of 'nonsense' user agent like "ahfo", "chlzgh", etc. whereas R5 does not. The R4 screening is not scalable, as there are unlimited 'nonsense' possibilities. We have seen no evidence of malicious or suspicious behaviour from such user agents and it is most likely that they are genuine, harmless internet users.
Effect: This has a minimal, generally insignificant effect on overall numbers but may slightly increase the counts in R5.
Client IP addresses
What has changed: R4 screened out several ranges of client IP addresses associated with known 'bad actors', including those belonging to Baidu and Shenzen, whereas R5 does not. The improved IRUS exclusion thresholds in R5 should screen out most of the excessive activity from such sources.
Effect: This may slightly increase the counts in R5.
Double click processing
What has changed: Both R4 and R5 process double clicks using a combination of IP address and user agent as a proxy for a user. However, R5 processes user double clicks in hourly 'sessions' which R4 didn't. This is because we need to be conformant with COUNTER R5.
Effect: This will tend to slightly increase the counts in R5.
Referrer exclusion list
What has changed: R4 (IRUS-UK only) filtered out HTTP Referrer value 'None', whereas R5 does not. Previous analysis indicated that this was associated with robotic/rogue usage, but more recent analysis showed there is both legitimate and rogue usage. This is most notable with Figshare which explicitly sets an empty referrer to 'None' in the tracker messages.
Effect: This will tend to increase counts in R5. Differences can range from very minor to significant depending on the repository.