Being transparent & privacy aware: ditching third-party trackers in Strathprints
George Macgregor Scholarly Publications & Research Data, University of Strathclyde
Over the years, and like a lot of websites, Strathprints has historically made use of third-party integrations. Some of these integrations have provided us, and Strathprints users, with useful functionality over the years. But because these integrations involve the implementation of tracking code within Strathprints, they have also entailed third-party cookies being attached to our users. This is most notable in our use of Google Analytics and AddThis, the former providing analytics on web traffic and the latter providing convenient social sharing buttons and web analytics. In fact, the Google Analytics Tracking Code (GATC) also entails the DoubleClick cookie used to enable remarketing for products like Google Ads, while AddThis engages in browser fingerprinting.
Avoiding Google Analytics or AddThis on the web these days is difficult because so many websites use them, from The Guardian to InsideHigherEd. But users shouldn’t be worrying whether they are being tracked when using open knowledge commons. And remember, you can always configure your browser to reject third-party cookies, or use private tabs, to avoid being tracked when you aren’t visiting Strathprints.
Samvera: New Strathclyde digital thesis repository platform selected
We are pleased to announce that as part of a new digital thesis project, being pioneered by our umbrella unit, Scholarly Research Communications – involving Cataloguing & Metadata and our team (Scholarly Publications & Research Data), as well as Library Systems – has selected the open-source digital repository platform, Samvera, for a new digital thesis repository. Work has been percolating away for many months but has been kept under wraps until now. Technical work also involves input from colleagues at the University of London (CoSector).
The repository will use the Hyrax framework for the repository front-end and will aim to increase the visibility and impact of University of Strathclyde thesis content. The move to Samvera will also facilitate a suite of exciting digital preservation activity, including integrations with Archivamatica, and enable the exposure of valuable non-thesis based digital content too.
The project team will continue to beaver away but it is hoped that something publicly available will be released during summer 2021. Watch this space for updates! In the meantime we can report that the Strathprints institutional repository has recently been upgraded – a blog post on this topic was recently published.
As many of you will be aware, Strathprints is powered by EPrints, a free and open-source software
repository software platform. I am pleased to report that Strathprints has
recently been upgraded from EPrints version 3.3.13 to 3.4.2. This brings with
it many improvements, although most of these are under the bonnet and are of
more interest to me than you! Nevertheless, there are some user-facing changes
that are worth highlighting, some of which are a product of the upgrade, while
others are simply changes made while the upgrade was taking place. These
changes can be categorised into the following: better internal search,better
use of screen real estate, better metrics, and better usage statistics.
Better internal search
In general, repositories acquire the majority of their usage
from so-called ‘horizontal’ information seekers. These are users who, for
example, are literature searching on, say, Google Scholar. These are therefore
users who seek scholarly information across dozens of open repositories and dip
in and out of potentially relevant results, and the repositories that contain
that potentially relevant content. Even so, a surprising volume of content
usage results from internally initiated searches of Strathprints, using either
the quick search or advanced search options. The good news for these users is that
internal search has been greatly enhanced in Strathprints.
Strathprints now uses an improved version of Xapian to power its searches (Xapian is an Open
Source probabilistic information retrieval library). This makes quick searches of
Strathprints far more effective, allowing searches to be executed across all
metadata elements and full-text content, with associated improvements in result
quality and ergo potential relevance.
The results page for a quick search (or the SERP
– Search Engine Results Page) also provides significant enhancements, most
notably the inclusion of facets (or filters) to simplify the browsing of
results or to initiate facetted
searches (highlighted in the screen snippet below). These facets are
displayed in the left-hand column in the SERP and include facets such as
subject, Strathclyde department / organisational unit, year of publication,
item type, document format and full-text availability.
Better use of screen real estate
Over recent years it has been difficult to balance the needs
of users with the overwhelming urge to use every pixel of repository screen space
for additional scholarly features, whether this be bibliometrics or alternative
metrics, ORCIDs, recommender systems, and so forth. Many repository user
interfaces (UIs) have therefore become overwhelming to the uninitiated user and
can detract from the open content itself which is, after all, the principal
reason for visiting a repository. Strathprints was no exception. The upgrade to
3.4 provided a good opportunity update the abstract pages in order to make
better use of screen space, prioritising full-text content and any related
content (via the CORE
The new, updated abstract pages look almost identical to the
‘old’ ones, but with item metadata, metrics and export options now hidden
within expandable menus. Authors’ ORCIDs are
also given greater prominence, such is the importance of author PIDs.
Item metadata, though very important, consumes a lot of screen space – and for
something that is of interest to only a minority of users.
But by hiding this content we, ironically, have an
opportunity to use more space, such as providing additional metadata or
additional metrics! So, for example, additional metadata, such as publication
state dates, are now displayed (see screen snippet below) when the ‘item
metadata’ tab is expanded.
Larger, clearer bibliometric and alternative metric
badges, with additional data, can now also be displayed. Which leads to…
Better (alternative) metrics
In addition to improving the quality of the information
displayed in relation to DS Dimensions
citation metrics and Altmetric impact,
PlumX metrics have also been
incorporated into Strathprints, providing a more holistic overview of the
impact of specific repository deposits. PlumX provides an additional set of
alternative metrics alongside Altmetric but also provides data on Scopus and CrossRef citations, to
be considered alongside DS Dimensions. Where such metrics are unavailable (e.g.
for a recently published journal article), the ‘citations and altmetrics’ tab
will no be displayed to users.
Better usage statistics
Statistical reporting of Strathprints usage now enjoys an
overhauled report form, a form which is a little more flexible and intuitive,
particularly in the reporting of usage over temporal periods. But the principal
changes resides in the data quality, which now supports an improved blacklist
of robots/crawlers uuing the IRUS-UK lists for user agents and IP ranges.
Incremental improvements are always being made to
Strathprints, irrespective of whether its core software is being upgraded or
not; and the future is no exception. With the upgrade completed we will be
turning our attention to finalising integration work with Archivematica, an open-source
digital preservation system. This is important to ensure Strathclyde’s open
content and intellectual memory is digital preserved, remains digital accessible
over time and supports the ongoing digital scholarly record. Our initial
spadework and testing has been very positive indeed (see this presentation from the
UK Archivematica User Group in September 2020). To date this work has
remained in our test environment. Coming weeks will see this work brought out
of test and into production. Watch this space! In the meantime, please check
out the updates on Strathprints…
Last week we
highlighted some of 2020’s hard won achievements. But, as Jimmy
Cricket said, “c’mere, there’s more!” And indeed there is. Because
as well as some record-breaking Strathprints usage and
favourable Open Access rankings in the Leiden Rankings 2020, we have been depositing
more and more content into Strathprints and working harder than ever to ensure
users get access to Strathclyde’s Open Access research.
We have previous posts explaining
our repository ecosystem and the relationship of Strathprints to our
Current Research Information System (CRIS), which at Strathclyde is Pure. The
main thing to note is that we operate a so-called ‘connector lite’
environment, enabling a level of discretion between what content is exposed on
Strathprints and which remains in Pure only – because they have different
Anyway, let us explore some of our deposit numbers in the
validations, re-validations, and more…
A total of 6,615
items were validated between 01 January 2020 and 31 December 2020. That is an
average of 551 items per month. Of
these 6,615 items, 3,716 resulted in
Open Access deposits in Strathprints. In other words, 3,716 full-text deposits were made in Strathprints during 2020.
So, what about the remaining 2,899? Good question! 2,735
were metadata-only validations which were retained in our CRIS for the purposes
of research management. And a further 164 were metadata-only items which – for
various reasons too dull to discuss now – were deposited in Strathprints. Strathprints
today is, to all intents and purposes, a near 100% full-text repository, with
almost all deposits since 2014 being accompanied by full-text; but very
occasionally it is necessary to allow the odd metadata-only deposit.
Perhaps most astonishingly – and an often under reported
figure by institutions when referring to Open Access or repository content
maintenance – were the number of re-validations during 2020. ‘Re-validations’
are essentially outputs which are revisited by team members after their initial
deposit to, for example, update important metadata elements (DOI, pagination,
publication date, etc.) or, indeed, to update the full-text. The REF Policy on
Open Access has increased the need for this sort of activity because a great
many deposits are now made when an output is accepted for publication, rather
than when the output is published – and for obvious reasons there tends to be
limited information about an output at the point of acceptance. During 2020 we
undertook a whopping 6,650 re-validations .
The chart below puts all these numbers into perspective, particularly the
volume of re-validations performed in 2020, and helps to illustrate the
mountain of work the team powers through in a year (on top of other
During 2020 we received 1,042
requests from the request button feature. Of these 300 were fulfilled. That’s an average of 25 requests per month which were successfully fulfilled and, in the
majority of cases, resulted in a new full-text deposit too. This means circa a
third of requests are being fulfilled. What about the remaining two thirds?
Well, before we get to that, it is important to recognise how important each
request can potentially be, and given the potential benefits of additional
full-text when previously there were none, the prospect of fulfilling a third
of all requests is to be welcomed. As
noted previously on this blog:
Every reader of a
paper has the potential to be highly significant.
The chart below highlights the outcome of the requests which
are received, including the remaining two thirds; because, not unexpectedly,
some academics cannot find an accepted manuscript for an article published in
2008, or even more recently. But also, researchers are highly mobile and very
often the request may be made for a publication belonging to a member of staff
who has long since left University of Strathclyde. And, on the former outcome,
we can observe from the chart that the majority of the requests we received in
2020 fall into this category. In other words, of
the ‘unfulfilled’ requests we mediate, the majority cannot be completed because
the Strathclyde author can no longer locate a copy of the accepted author
It is nevertheless worth highlighting that the
overall proportion of successfully fulfilled requests is lower than reported previously. We can speculate that, over time, the number of
requests successfully fulfilled is likely to decline. This is because as more
of the previously deposit-less deposits in Strathprints acquire full-text (as a result of previous requests),
or as deposit embargoes are lifted, there is no longer a need for users to submit a request. Instead, they can access the full-text from Strathprints as they might any other
deposit – which is great. But requests for items which can never be fulfilled,
either because the AAM cannot be located or because a member of staff has moved
to a different institution, will still continue to be received and recorded. This anticipated decline in requests remains a heartening sign though – it indicates that all possible gaps in full-text have been plugged, and that all possible full-text has been deposited. See, we promised to deliver more reasons to be cheerful!
During 2020 the blog was quieter than we would have liked.
This was largely attributable to the consequences of the Covid-19 pandemic. The
Scholarly Publications & Research Data (SPRD) has never been busier,
particularly as team members aim to deliver services as normally as possible.
may have had impediments during 2020, but 2021 gives us the opportunity to
reflect on 2020’s hard-won achievements. The truth is – and we aren’t wishing to blow our own
trumpets here – that there have been a great many achievements, some of which we
may document in a future blog post; but for now I think we can kick-off 2021 by
reflecting on two specific achievements: Leiden Rankings, and repository usage.
Some readers will be familiar with the Leiden Rankings. For
those who are not, the Leiden Ranking…
CWTS Leiden Ranking 2020 offers important insights into the
scientific performance of over 1000 major universities worldwide.
The use and abuse of university rankings is a common
complaint within academia. The context for the Leiden Ranking is, however, coloured
by ‘responsible metrics’ and influenced by the Leiden Manifesto, itself a
declaration that research evaluation requires more expert judgement, not more
metrics. More recently, however, the Ranking has recently included Open Access
indicators, itself a useful validation of the importance of Open Access – and
Open Research more generally – in the evaluation of university activities.
The excellent news for Strathclyde was that, in the 2020
Ranking, our position for proportion of Open Access content (Green & Gold)
made available (2015-2018), increased to *4th in the world*. This was
tremendous news and consolidates SPRD and Strathclyde’s commitment to open
research. It is also pleasing to know that our hard work and commitment to the
cause over many years is now receiving global acknowledgement. As the tweet
indicates, our team has been key to this achievement, but Strathclyde
researchers deserve kudos too for embracing our ambitions!
The trend for reporting pleasing numbers also extends from
the Leiden Ranking to repository usage though.
Strathprints usage: record COUNTER usage!
Reporting Strathprints usage has been common on this blog –
and we will link to previous blog posts if you want to understand what COUNTER
usage means. Suffice to state, Strathprints experienced record-breaking usage
throughout 2020, with 694,941 COUNTER compliant downloads made. This is in
excess of a 33% increase compared to 2019’s figures. November 2020 was
particularly healthy, with almost 85,000 downloads made – a 43% increase on the
same month in 2019. The growth in usage throughout 2020 can be observed from
the chart below, which includes 2019’s data for comparative purposes.
But what were the most downloaded outputs in 2020? Well,
before considering this question we should probably pose another: Which outputs
deposited in 2020 were the most used in 2020? As I have noted in a previous
…looking at the most used items in Strathprints at any given
point isn’t particularly insightful because some deposits have been available
for many years and have established an ongoing impact. It is therefore better
to consider deposits which have been made more recently.
So, to this end, let us consider outputs that were deposited
in 2020 and were most used.
Below we present the top 20 of outputs falling into this category. It is a pleasant
mix of journal articles and scholarly grey literature, many of which enjoyed
four figure usage in 2020. It may come as no surprise that some of these items
assess the impact of the Covid-19 pandemic. Here we go…
Congratulations to everyone featuring in these top 20 listings – but congratulations to every Strathclyde researchers for contributing to our favourable Open Access ranking with the Leiden Rankings! Respect!
Exploring complexity: the two sides of Open Science (II)
Pablo de Castro, Open Access Advocacy Librarian
is the second post on the topic of Open Science and innovation. In the first
one we saw how research libraries and their research support services
seemed at risk of being out-of-synch with the mainstream, pragmatic approach to
Open Science for the sake of ensuring the continuum between research and its
practical, innovation-driven application. This second delivery will examine some
of the reasons why and a couple of possible adjustments to the current
workflows that would bring the libraries closer to other research support
services without disrupting their present approach.
Could workflows around Green Open Access
policies be fine-tuned for increased efficiency?
being out-of-synch is a side-effect of the fact that UK research
libraries are implementing what one feels is the most advanced, most successful
Open Access policy worldwide, namely the (previously called) HEFCE
policy linked to the national research assessment exercise (REF). This
policy is totally aligned with the recommendations of the EU-funded PASTEUR4OA FP7 project, namely to
make the deposit of full-text accepted manuscripts mandatory and to link such
deposit to the eligibility for the research assessment exercise.
Given that these were European-level recommendations, one cannot help but
wonder how come the policy has not been more widely implemented despite the
evident fact that 26 out of the 30 top institutions worldwide by percentage of
openly available institutional research outputs as per the CWTS Leiden ranking 2019 happen to be British universities. Two main reasons come to mind: first, that
no other country has dared to apply the PASTEUR4OA project findings in such a
literal way. Second, that no other country has developed such an effective,
almost ruthless network of Open Access implementation teams within their
research libraries (there’s a third reason: no other country had the likes of
Alma Swan or Stevan Harnad to pester the policymakers).
So why should this highly successful national-level policy that could
effectively achieve the 100% Open Access objective be an obstacle to a
pragmatic approach to Open Science? Because it’s a Green Open Access policy
based on the deposit of accepted manuscripts in institutional repositories with
widespread embargo periods. Because despite the current and future progresses
in enhancing the visibility and discoverability of repository contents, the
canonical way to reach a publication for an external stakeholder with little
knowledge about the complex scholarly communications landscape (eg Industry)
remains and will remain the DOI issued by the publisher. Because a Green
OA-based policy does not open the publications sitting behind those DOIs. And
because the amount of effort involved in the implementation of the HEFCE policy
as it is designed right now is so huge that research libraries lack the
physical resources to adopt any other complementary Open Access implementation
Enter Plan S with its highly pragmatic approach to Open Access implementation.
Originally strongly based on Gold Open Access, APC payments where needed and
deals with the publishers to address the double-dipping issue around hybrid
journals, it’s only after considerable pressure has been exerted by the Green
Open Access lobby that the zero-embargo Green Open Access policy has found a
place in the Plan S implementation guidelines. But with the current scramble
for ‘transformative’ deals that will allow most hybrid journals to become
eligible under Plan S requirements, the size of the institutional Gold Open
Access output pie will only grow in forthcoming years.
Caveat: one is an (European) institutional Open Access advocacy librarian and shares most
of the views one’s colleagues have about certain scholarly publishers. But this
is a series of posts devoted to exploring complexity and the way research
libraries may deliver a better service to their institutions.
The big issue at the moment is that the enormous effort that Open Access teams
at UK research libraries are devoting to implement the HEFCE policy – which
requires them to chase every single full-text accepted manuscript for every
single publication the university produces – prevents them from being able to
adopt the set of workflows required for the implementation of all these
'transformative’ deals. Not to mention adequately exploring enhancements in
system interoperability to make sure other Open Access mandates also get
implemented. Or paying more attention to the actual impact of such research
It is somewhat ironic that so much duplicate effort is going on around the
HEFCE OA policy implementation – with all Open Access teams at all co-authoring
UK institutions for one single paper chasing repeated copies of the same
full-text manuscript from 'their’ authors – when the tools are already there
that could make this process much more reasonable. The Jisc Publications Router, formerly
known as the Repository
Junction Broker, would easily allow for one AAM ('Accepted Author
Manuscript’ in the OA lingo) to be chased once by a single institution
(ideally the one associated to the corresponding author) and brokered to all
co-authoring UK institutions.
This software was originally conceived and designed with publishers in mind as
content providers. They would provide the AAMs for their papers, and these
would then be distributed to the co-authoring institutions, very much in the
spirit of the PEER project. But
then all the announcements we’re getting from the Jisc regarding publishers joining the Publications
Router as content providers only cover their Gold Open Access papers. Of course
one could not expect otherwise from publishers, but AAMs are actually the
authors’ intellectual property. A bold approach to the HEFCE policy
implementation would involve making institutions the default content providers for
the Publications Router and designing a set of rules on who would need to
provide what AAM and when. The brokering of Gold Open Access papers is
pretty much worthless for institutions. There would clearly be a need for the
appropriate institutional authentication mechanisms on the Publications Router
if it were used to promote the Green Open Access route, and this would hardly be rocket science, but this is not the way the wind seems to be blowing.
External affiliations for industry: metadata
management, hence a library task
said, there are tasks associated with the continuum between research and its practical,
innovation-driven application that only research libraries can address. One of
these is the systematic mapping and analysis of the collaboration workflows
with industry within the institution. The most straightforward way to identify
these collaborations is through the analysis of the partner consortia within
externally funded projects. This is (in principle) beyond the library’s remit
and rather falls with the institutional research office or project management
office. However, nothing prevents the library and its Open Access
implementation team with its constant scoping of manuscript acknowledgements to
be well aware of projects such as the EU-funded ROMEO or the EPSRC-funded DISTINCTIVE to
mention but a couple of random examples for projects in collaboration with
industry for Strathclyde Uni.
And there is another, much finer-grained way of mapping these collaborations.
This is through publications and the affiliations of their co-authors. This
approach will catch collaborations with industry in the form of publication
co-authorships even if not supported by a joint project. This is of course IF –
and this is a big if – the affiliations are correctly coded in the
We have the – only recently launched – Research Organization Registry (ROR) initiative running now, but author affiliations
are a very difficult area to address, and we are not even talking institutions
here, but companies. Some research funders, driven by the need to pragmatically
deal with the issue, have often taken a shortcut and directly used the national
registration codes for their companies as an identifier in the past. Only the academia-industry
collaboration realm is hardly ever restricted to a national environment.
Repositories in particular are very poor (yet) at mapping affiliations – with remarkable
exceptions such as HAL in France, where it
is possible to search by industrial affiliation. This is rather the domain
of CRIS systems with their detailed CERIF-based data model. The problem is that
it’s mostly researchers who are directly creating the records for their
publications in CRIS systems and researchers are very unlikely to realise the
value of (and subsequently make the effort for) adequately coding the
affiliation of all co-authors for a given publication in the institutional
system. This is boring stuff for the research support stuff to take care of.
Note how the industrial affiliation entries in the record above for ‘EU industry’ and ‘UK Industry, Commerce, Public Corporation’ – categories defined for external affiliations in the CRIS – have instead been coded as ‘Unknown’, making it impossible to track eg the institutional co-authorships with EU industry via searches against this metadata element. This is again hardly the researchers’ fault – it’s not for them to deal with these metadata intricacies. If the research support team at the library were able however to reallocate some of its time to make sure these affiliation entries were adequately coded in the system, they would be able to handle it just fine.
This would automatically place them in a position to address the pressing question of how the UKRI 5-year Gold Open Access funding policy may have impacted this critical indicator for assessing the effectiveness of such policy. The citation advantage is the default indicator we use to try and assess the academic impact of (Gold) Open Access, but as per the approach described in the previous post, this complementary indicator for the number of institutional co-authorships with industry would be both more precise and more in line with the objectives of the funding exercise.
If we don’t do this kind of analysis ourselves, others will do it for us using our own data and will sell it back to us among loud complaining among Open Science advocates about the outsourcing of key business intelligence-related workflows. We seem to be experts in this sort of thing. And this is already happening, though it’s so beautifully done that one cannot find any reasons to complain.
These are our own (extremely expensive, publicly-funded) research facilities, our own staff who are striving to have them used by industry as per the research funders’ recommendations, our own publications based on the data coming out of such facilities and instruments and our own collaborations with external stakeholders though. It should ideally be our own analysis too.
Exploring complexity: the two sides of Open Science
Pablo de Castro, Open Access Advocacy Librarian
One may see Open Science (which some
prefer to call Open Research) as an altruistic movement towards opening up
research methods and especially its outputs for the sake of their visibility
and open availability to the wider society. The legitimate right for any
citizen to read research outputs resulting from public funding is regularly
raised by every Open Access advocate
– including yours truly
when explaining the rationale for Open
Science. Patients, schoolteachers, doctors are highlighted as the sort of
citizens that may need to access scientific literature and may be forced to pay
for such access unless we succeed in our push towards Open Science. And SMEs.
Yes, one always mentions SMEs here as well. In fact anyone who happens to be
outside the institutional subscription bubbles.
There is another take to Open Science though, a far more pragmatic and hence
more likely to succeed approach. This other take, although not unconcerned with
access to research results by the average citizen, is mostly about the
possibility of exploiting the synergies between research and industry by making
not only research results but other areas such as research facilities or
expertise as openly available to industry (and the wider outside world) as possible. This is the approach
driven by innovation that sees research and its commercial application as a
continuum and understands the value of openness for the purpose of realising
The first concept of Open Science is traditionally adopted by research
libraries, whereas the second one is characteristic of research offices and any
pragmatic approach to research such as research assessment (the REF) and the
measurement of research impact. This is hardly surprising: libraries have made
a historical emphasis on their users (nowadays often called
“customers”) and making content available to them in as freely as
possible a way. One has also elaborated elsewhere (“spenders not
fundraisers”) about the traditionally poor approach that libraries display
towards raising funding besides spending large amounts of it.
hardly surprising then, this difference in approaches also creates a large
ideological divide that tends to isolate university libraries and the research
support services they host with regard to practically any other
research-related instance at the institution and beyond. This is of course
unless there is a carefully nurtured bridge between services and a
communication channel that allows the mutual understanding and respect for each
other’s practices and drivers.
All this is not just about job creation and general scientific and
technological progress, but mostly about implementing this continuum between
research and its commercial application in a way that allows public investment
to benefit the wider socio-economic tissue that surrounds higher education
institutions. Same as it is deeply right that elderly ladies outwith the
Strathclyde sphere of influence are able (and so happy!) to use the brand new,
flashy Sports Centre the University has built on Cathedral Street, it is deeply
right that master students at the University are able to conduct their learning
and their training at commercial partners where they may well end up employed
in a couple of years’ time.
There are of course many issues raised by the innovation-driven approach to
Open Science becoming mainstream (if none of them is unsolvable). The attitude
to adopt with regard to basic research and to research in the Humanities is one
of them. The seemingly unstoppable trend towards an ever increasing
commercialisation of research is another one. The clash between openness for
the sake of widening access to research outputs for everyone and openness for
the sake of their commercial exploitation is perhaps the key one to explore as
part of the current effort for the definition of the workflows associated to
Open Science implementation.
But there are also many upsides stemming from this approach too. The main one
of these may well be that this is a deeply shared philosophy across European
countries and regions (and beyond), where the economic return of the
investments on research infrastructure and activity is solidly sitting on the radar of
policymakers everywhere – hence the European and Regional Innovation
Scoreboards where every country and region in Europe may assess its progress
within this general trend to strengthen the continuum between Academia and
Industry. Moreover, this is not a zero-sum game: because of the deeply
transnational character of research and innovation, there is a knock-on effect
whereby improved innovation systems in a given region benefit the global
competitivity of the wider economic area, be it a country or a Union thereof.
In the meantime, and from a library perspective, it would be good for research
support services at research libraries to widen their perspective a bit. It’s
not just that the discussion around (for instance) Plan S and Gold Open Access
implementation via Read & Publish agreements gains a whole new dimension
when examined under an innovation-driven perspective. It’s also that the kind
of tasks that research support services are currently undertaking could be
redesigned for a better alignment with this mainstream approach to the institutional
research activity so that the potential synergies could be much better
These aspects will be addressed in a companion post that will look into what
Open Science services currently do and what else they could do if they managed
to find the resources for it by adequately redistributing their workload.
Strathprints usage for 2019 in review: the ascendency of open scholarly grey literature
George Macgregor, Scholarly Publications & Research Data, University of Strathclyde
Long time, no see! Firstly, on behalf of the Scholarly
Publications & Research Data (SPRD) team, we must apologise for the paucity
of blog posts during the second half of 2019. The reasons for this might become
clearer over time but the truth is simply that we have been
exceedingly busy. So busy, in fact, that no-one could find the time to document
all the exciting things we have been doing! A blog summarising some of these
exciting things will be forthcoming, but in the meantime let us review some of
the vital statistics for 2019.
Total usage / downloads
During 2019 Strathprints attracted 521,697COUNTER compliant
downloads, a 29% increase on 2018 which, in turn, was a 43% increase on 2017.
In fact, since 2016 we have enjoyed double-digit growth in usage, of course
attributable to a growth on Open Access content made available, but also the
numerous visibility and discoverability improvements which have been rolled out
to Strathprints over the past 2-3 years. We can also add to this total those
COUNTER downloads made via the CORE aggregation service, equivalent to 26,122.
Most used in 2019 & deposited in 2019
As we have discussed previously on this blog, looking at the
most used items in Strathprints at any given point isn’t particularly
insightful because some deposits have been available for many years and have
established an ongoing impact. It is therefore better to consider deposits
which have been made more recently. To this end, below are the top 20 most used
deposits in 2019 which were also deposited during 2019:
The curious thing to note about the above top 20 is the
significant presence of scholarly grey literature, accounting for 8 of the
entries above. This is a phenomenon we have discussed previously on this blog and it appears to be a growing trend, with Strathclyde’s unique high-value grey content increasing in global reach and impact. Moveover, the Digital Health & Care Institute (DHI) has 4 of
these outputs and occupies #1 spot. Congratulations to DHI!
A special mention
too for Matt Hannon and colleagues for grabbing the #2 spot, no mean feat
considering their report only became available for download on 27 August 2019. It is also worth noting that the authors of the grey scholarly deposits featuring in our top 20 this year also had similar deposits feature last year, DHI and Matt Hannon included, but also Daniel Broby and his continued work in FinTech.
Wellcome Workshop for funded researchers in the SSH
Pablo de Castro, Open Access Advocacy Librarian
On Fri July 5th the
Wellcome Trust held a full-day workshop in London for their funded authors in
Social Sciences and Humanities. Approximately 30 funded researchers attended
the event, with a fairly homogeneous distribution across UK geographies and academic
seniority. The only Open Access advocate in the room – attending the event on
behalf of our institutional Wellcome-funded authors at Strathclyde – was yours
The workshop meant a good
opportunity to get a first-hand insight on researchers’ views on topics like
Plan S and the Wellcome Open Research platform. Also, being an SSH-specific
event, it provided the chance to explore to what extent the STM-centric
compliance workflows can be tweaked to address the specific circumstances and
needs of this community.
The agenda for the day included a
few presentations (Robert Kiley and Diego Baptista, Wellcome, Steve Sturdy, Uni
Edinburgh and Helen Saunders, Open Library of Humanities) and two lively
breakout sessions to discuss (i) the main issues around Open Access and SSH and
(ii) what the Wellcome could do to specifically support SSH authors and
disciplines within the ongoing shift in the publishing landscape.
The discussions held at the event mostly
focused on the Wellcome strategy to influence/press publishers in order for the
publishing landscape to evolve towards more openness. A particularly hot topic
was the perceived limitation1 in the choice of publishing venues arising from Plan S
principles. This is part of a wider concern that the SSH disciplines may
risk being dragged along a path that has been designed with mainly the STM
publishing landscape in mind.
Robert Kiley’s presentation
provided the rationale for the updated
Wellcome Trust Open Access policy (to kick-off as of Jan 1st,
2021) and dispelled some of the most pressing concerns raised by SSH authors in
the round of consultations held by cOAlition S. The main goals of
this policy are:
to have all articles available Open Access upon
publication (RK showed the current OA figures that prove it’s still a long way
to go to achieve this objective), and
for all articles to be re-usable (meaning
machine-readable for TDM purposes, with the “Mining the History of Medicine”
project by the National Centre for Text Mining NaCTeM mentioned as an example).
Some concerns around Plan S
mentioned in the presentation (together with the ways to mitigate them) were:
choice of compliant publication venues: this needn’t be the case if
publishers are capable of evolving their business models. Green Open Access
(meaning deposit of accepted manuscripts in institutional platforms) is also an
option for compliance, so the discussion doesn’t just need to focus on the
suitability or otherwise of Article Processing Charges
international collaborations (see item 1 on the U
Edinburgh feedback to cOAlition S): the Gates Foundation is also a Plan S
signatory and they haven’t seen any impact in terms of hampering international
collaborations because of potentially limited eligibility of publishing
with a strict CC-BY licence (also described in the Edinburgh response to
Plan S): revision
has softened the requirement for the SSH; “As an interim measure to address the
concerns, particularly expressed by HSS communities, we recommend that funders
should be willing to consider an exemption from the requirement for a CC BY
license to allow the use of CC BY-ND on a case-by-case basis”
timeline for the transition: the first Open Access policy by the Wellcome
Trust was issued in 2006. The transition has been going on for quite some time
already, but the pace is too slow
Learned societiesunable to find alternative business models for their
subscription-based journals: the Wellcome Trust is working together with a
number of learned societies in order to promote collaboration and explore
various options in this regard – reporting expected on this strand later this
also highlighted a recent piece published by Jasmine Lange from SSH publisher
Brill in the Netherlands (“Plan S and Humanities Publishing”, Jul 2nd, 2019) where
she states that an SSH exception to Plan S and/or the continuation of 24-month
embargo periods would mean a high risk for the SSH being left behind on impact and
Wellcome Open Research platform
was followed by a presentation of the Wellcome Open Research (WOR)
publishing platform by Diego Baptista. The success of this publishing channel (4th
venue altogether by number of Wellcome-funded publications after Scientific
Reports, Nature Comms and PLoS ONE) and its quick review process and
affordability (average APC one third of the ‘externally paid’ one) are
this Jan’2019 post. It’s also covering a wide range of outputs beyond
articles (such as negative results, which happens to be the area for one of the
two pieces published at WOR by Strathclyde authors thus far as shown on the figure below) and offering a safe publishing channel to Wellcome-funded
authors in developing countries, thus tackling predatory publishing.
number SSH researchers in the room raised issues with a publishing platform
like the WOR, mainly focused on the lack of editors. The risk was highlighted
for ‘toxic’ submissions akin to the Wakefield anti-vaccination
paper in The Lancet to get openly posted on the platform while awaiting
peer-review – even if the submission were eventually rejected, it could still
collect a few citations while sitting in the WOR.
SSH disciplines are different
Sturdy (who has published an Open Letter on
Biomedicine, self and society in the WOR himself) provided a wider context
for the reluctance of SSH scholars towards publishing platforms by describing
the specific role that well-established publishing venues play in the SSH
disciplines in his presentation on ‘Open Access and HSS Disciplines’. He
emphasised the ‘sociality’ of SSH journals, which act as knowledge producers
and community drivers in a distinct way, see associated figure. A strategy for
replacing publishing venues with a social mission in a specific discipline with
either publishing platforms or new titles would put this mission at risk.
Four main issues were in fact raised
– both in Steve’s presentation and in the subsequent breakout session to
discuss OA in the HSS – with regard to the perceived threat for a restriction
in publishing channels generated by the implementation of Plan S in the SSH:
Destruction of the current publishing landscape
with ‘sociality’ at its core
Particularly bad consequences for ECRs: “I
am not sure I would feel inclined to recommend my ECR to apply for a Wellcome
grant should these restrictions be in force that will prevent her to progress
in her career” – stated one researcher in the room
Implications for unfunded authors in a landscape
where funding is scarce
Too tight implementation schedule: it will be
very difficult for current publishing channels to come up with alternative
business models in such a short time as the policy proposes
There was a further call for Wellcome to support the area of SSH as a whole and not just Wellcome-funded authors, for instance by ensuring
the voice of SSH researchers is heard in the discussions around alternative
business models. A widespread belief was expressed that a model based only on
Article Processing Charges (APCs) will not succeed in the SSH and that there
need to be multiple co-existing mechanisms for compliance, including Green Open
Access and crowdfunded business models with institutions as supporters (the
afternoon presentation on the Open
Library of Humanities looked deeper into these crowdfunded models for both
journals and books).
The breakout discussions touched
on various specific topics such as the REF requirements and to what extent they
may help evolving the landscape (including its eventual expansion to books),
the specific relevance of languages other than English in the SSH or the role
of institutions in providing information on and support for the implementation
of Plan S (most researchers in the room admitted not being regularly in contact
with their institutional Open Access support services at their libraries, and
the Wellcome lead may consider a follow-up workshop to promote the engagement
of institutions in the process).
The listening exercise was very
valuable and key for building the ‘bond of trust’ that needs to exist in order
for authors to take into consideration the funder’s recommendations around the
evolution of the Open Access landscape. A potentially very useful outcome could
be the establishing of a regular information exchange mechanism that allowed
researchers to stay updated on the way the landscape is evolving in their
specific field (covering aspects such as evolving positions with regard to Plan
S by different funders, steps taken by specific publishers in transitioning
their business models, reporting on the discussions with learned societies
etc). Communication is key as ever and institutions may have an important role
to play here too. We will definitely try our best at Strathclyde, but it could
make sense to make this a wider effort.
1. See this statement “some publishers are claiming that authors will no longer be able to publish in their journals - but this would not be the case if they were to change some of their policies” on the webpage
that the University of Edinburgh Information Services have put together about Plan S.↩
Archivematica Camp, LSE, London 2019: A Visit Report
Alan Morrison, Research Data Support Officer, University of Strathclyde
Canadian based company which produces the free, open-source digital
preservation software Archivematica,
deliver regular training “camps” “intended to provide a space for anyone
interested in or currently using Archivematica to come together, learn about
the platform from other users, and share their experiences”. Having missed the
previous UK event in York
in 2017 I was fortunate to attend this year, held in the library at the London
School of Economics, 10-12 July.
The majority of the forty or so attendees were UK based,
representing a mix of digital projects from university libraries, archives or
heritage institutions, with some representatives from the Netherlands and
Nordic countries. While the level of experience of attendees ranged from the
novice to intermediate, the three day schedule was well planned and paced, allowing
beginners to get a solid grounding in the basics of the technical architecture
and core functionality of the software. The RDMS service at Strathclyde University has
been using Archivematica to curate research data for about two years now but
mainly “out of the box” with minimal customisation. As an intermediate user I
found this first day a welcome revision, with the explanation of dashboard functionality
in combination with the hands-on exercise providing a comforting re-assurance that the RDM workflows
we are employing at Strathclyde to be appropriate and effective.
The following two days built on these foundations, exploring
more advanced exercises on specialised workflows and non-core functionality. Tempting
alternative streams on “Alternative DIP Use Cases” and “File Format Deep Dives”
were also on offer, but I decided to stay with the main stream as this focused
on consolidating and developing the skills required in “day-to-day operations”.
Of particular interest was a closer look at the components which form one of
the main outputs of Archivematica, the Archival Information Package or AIP. The
content of the different METS
files Archivematica creates and the examination of their different roles finally
clicked! Fortunately at Strathclyde our workflow includes adding the descriptive
metadata available from the Pure dataset record to Archivematica as part of the
ingest process and is captured in the METS output. Throughout this forensic
exercise the possibilities of harvesting data in the METS files to reveal
characteristics of the data and file formats being deposited by individual
departments at Strathclyde started forming in my mind. Such clear explanations of
these elements helped not only to understand issues which had previously alluded
me but were now presenting ideas on how the RDMS service could use the metadata
outputs from Archivematica to develop and inform our own services.
Away from the hands-on exercises there were opportunities to
hear from the community on their experiences and implementation of Archivematica.
Camp Counsellor Rachel MacGregor from Warwick University provided reassuring
advice from her own experiences for any novice users and institutions (“Have
a go, you probably won’t break it.”) while our LSE hosts Fabiana Barticioti
& Nick Bywell gave a detailed overview of the use of the algorithms employed
to organise their ambitious digital preservation projects. A final presentation
on day three from Hannah Mackay from the International Institute of Social
History (IISH) in Amsterdam on finding and deleting digital duplicates, again using
algorithms, went over my head at times, but what are these events for if
not to be challenged? Generous food provisions and a free bar reception in a near-by
pub in the evening certainly contributed to a healthy exchange of ideas and a
sense of community bonding!
It’s not often that you attend an event where the founders and
directors of the company producing the software being taught are present, but this
is perhaps what sets Artefactual and Archivematica apart from other corporate
training events. Justin Simpson, the newly appointed Managing Director of
Artefactual, and Kelly Stewart, Director of Archival and Digital Preservation
Services, wrapped up the Camp with a Q&A session and look at how the
company has grown since its foundation in 2000 – currently 28 employees in 5
countries/time zones – including its first UK System Archivist Sarah Mason. With
an expanding clientele (most notably the Wellcome Trust) the Roadmap presented promises
a busy but very rewarding future for both the company, its software and,
perhaps most importantly, its users.
Personally, this turned out to be a very worth-while and profitable
trip consolidating my understanding of the software while strengthening
existing connections within the Archivematica community as well as forming new
ones. If I had one recommendation it would be to encourage more people working
with research data to get involved in using Archivematica and join this
welcoming and productive community of developers and users. If you can’t make
the next Camp consider attending the first community driven Archivematica
Con, in Brooklyn NY, April 2020.
Running a no-hybrid Open Access funding policy: some results
Pablo de Castro, Open Access Advocacy Librarian
Five months ago we reported that a no-hybrid Open Access funding policy had been introduced at Strathclyde as of mid-Nov last year following the running out of the block grant that the Research Councils UK (currently UK Research and Innovation) had allocated to the university for
2018/19. In that post we promised “regular updates on the progress around the updated APC funding eligibility policy, ideally including a list of the funded journals since the change in the policy”. This is the first one of such follow-up posts.
One caveat to keep in mind is that we’re only applying a no-hybrid policy to the UKRI block grant and not to the one allocated by the Charities Open Access Fund (COAF) since this one specifically supports hybrid Open Access for the time being (though not for much longer according to the updated Wellcome Trust Open Access policy).
No-hybrid policy results
1. Lower number of funded APCs
An automatic result from the application of a no-hybrid policy is the decrease in the number of funded APCs per month. This is in fact the main reason for the implementation of such policy, since a lower number of funded APCs will mean a lower aggregate expenditure that will make the 2019/20 UKRI block grant last for the whole period it’s intended to be used, i.e. until Mar 31st, 2020.
The extrapolation for the annual figure for 2019 shows we’re back onto an annual average of a 100 APCs paid from the library. This is roughly where we were in 2016, but a significant underspend forced us to increase the dissemination for the available Open Access funding in 2017 and 2018, something we did so effectively that it eventually led to an overspend.
2. Much more balanced distribution across publishers
The most relevant impact of the introduction of a no-hybrid policy is by far the immediate shift in the distribution of APC funding by publisher, see the figure below.
Not only the number of APCs paid to large ‘hybrid’ publishers has very significantly decreased with regard to the past two years, but the number of funded APCs with fully Open Access publishers like MDPI, Frontiers, PLoS or Copernicus has increased in parallel.
It’s not like fully Open Access titles are terribly popular among Strathclyde researchers beyond the very well-established Scientific Reports or Nature Communications, but we’re happy to see our first Copernicus entry ever (not that frequent outside German-speaking countries) and to see MDPI topping the table for 2019.
A parallel table to this one charts the number of rejected Open Access funding applications for manuscripts accepted in hybrid titles and their distribution by publisher. The results are not surprising, and researchers are generally well-inclined to follow the Green OA route for these papers. We have nevertheless introduced a 1-paper-per-author-per-year exception under which they may still be able to exceptionally apply for hybrid Open Access funding for excellent papers of theirs (something for them to judge, not for us at the library, it goes without saying).
3. A parallel increase in the use of the Springer Compact
This is not something we were expecting, and in fact we still consider it to be a potential coincidence until we have more data, but it is a fact that since we introduced the no-hybrid policy, the number of publications funded under the Springer Compact has stabilised and is gradually increasing. It’s been a while since we stopped trying to predict this monthly trend, as it showed a seemingly erratic behaviour. However, the number of monthly Springer Compact papers remains unusually easy to predict since Dec'18.
Whether or not this is related to the introduction of the no-hybrid policy it’s hard to tell, the samples are too small to be statistically significant. If it were so, though, this would hint at a very preliminary Plan S-aligned transition whereby the APCs are transferred to Read & Publish deals. It’s still hybrid titles and there’s probably not that much of a correlation after all, but it’s something we’re keeping an eye on anyway. This is definitely the way it should happen, and we now have an ACS R&P deal to test too.
4. List of fully OA titles funded since Nov'18
Not all funded manuscripts have been published in fully OA journals, partially because of the above-mentioned exception and also because authors had occasionally checked their funding eligibility before the no-hybrid policy had been introduced and had received confirmations upon manuscript submission. The list below includes the fully OA titles we have funded in the past few months since the policy was introduced.
ACS Omega (American Chemical Society)
Aerosol and Air Quality Research (AAQR)
APL Photonics (AIP)
Applied Network Science (Springer)
Applied Sciences (MDPI)
Biology Open (Company of Biologists)
Biomedical Optics Express (OSA)
BMJ Open (BMJ)
Bone and Joint Research (British Editorial Society of Bone and Joint Surgery)
Bone Reports (Elsevier)
Cell Death Disease (NPG)
Communication Physics (NPG)
Computational and Mathematical Methods in Medicine (Hindawi)
Design Science (Cambridge)
Frontiers in Aging Neuroscience (Frontiers)
Frontiers in Bioengineering and Biotechnology (Frontiers)
Frontiers in Cellular and Infection Microbiology (Frontiers)
Frontiers in Immunology (Frontiers)
Frontiers in Neural Circuits (Frontiers)
Frontiers in Neuroscience (Frontiers)
High Power Laser Science and Engineering (Cambridge)
IEEE Access (IEEE)
International Journal of Molecular Sciences (MDPI)
International Journal for Parasitology: Drugs and Drug Resistance (Elsevier)
International Journal of Naval Architecture and Ocean Engineering (Elsevier)
Journal of Biological Chemistry (American Society for Biochemistry and Molecular Biology)
Journal of Marine Science and Engineering (MDPI)
Letters in Biomathematics (Taylor & Francis)
Marine Drugs (MDPI)
Mathematical Biosciences and Engineering (AIMS Press)
mBio (American Society for Microbiology)
Nature Communications (Springer Nature)
New Journal of Physics (IOP/DPG)
Nucleic Acids Research (Oxford)
Ocean Science (Copernicus)
Optics Express (OSA)
Optics Materials Express (OSA)
Photonics Research (OSA)
PLoS One (PLoS)
Royal Society Open Science (The Royal Society)
Science Advances (AAAS)
Scientific Data (NPG)
Scientific Reports (Springer Nature)
Solid Earth (Copernicus)
Systems Science & Control Engineering (Taylor & Francis)
The 14th International Conference on Open
Repositories (OR2019) was held in Hamburg. The largest ever Open Repositories
conference, the event attracted circa 600 attendees and accommodated a record
number of parallel sessions and workshops. Thanks have to go to the organisers
for coordinating such a large and successful event.
This blog post has two parts. The first part, which I am
posting now, briefly highlights and summarises some of the contributions
delivered at the conference by members of our Scholarly Publications & Research Data (SPRD) team; the second
part – which should be posted in coming days – will provide a summary of some
of my conference highlights.
SPRD contributions at
Both Pablo de Castro and I were fortunate to be able to attend OR2019
this year. Very often the conference is in a faraway land making attendance
difficult and the costs prohibitive. The location of Universität Hamburg this
year, however, made OR2019 more accessible to us such that attendance and
presentation at the conference was possible.
A prominent Strathclyde presence was visible at OR2019
through workshop and main session participation.
Pablo co-chaired and co-organised a well-attended workshop
on repository / CRIS technical interoperability and integration, and delivered
the following presentations at the workshop itself:
At the same workshop I made a ‘guest appearance’, presenting on CRIS-repository
interoperability issues at Strathclyde, exploring the issues around the
Strathprints connection with Pure and some syntactic and semantic
This paper, delivered in collaboration with Rebecca Bryant
and Michele Mennielli and echoing the above noted workshop, disseminated the
findings from a global survey on research information management (RIM) and CRIS
systems, coordinated by OCLC and euroCRIS. A particular focus of this work was an
exploration of the trends unveiled by survey data and examples of productive
repository-CRIS interoperability or integration.
discovery of open repositories
Another paper was delivered in the P8A: How to be
discovered? technical track of the conference.
This paper was an analysis of a unique longitudinal dataset on
repository web impact and usage data, following the implementation of numerous
technical enhancements. Using Strathprints as a case study, the results provide
persuasive evidence that specific enhancements to the technical configuration
of a repository can generate substantial improvements in its content discovery
potential and ergo its content usage, especially over several years. In this
case study COUNTER usage was found to grow by 62%, with increases in Google
'impressions’ (266%) and 'clicks’ (104%) a notable finding. High levels of
statistical significance found in the correlation between clicks and usage (t =
14.30; df = 11; p < 0.0005). Web traffic to Strathprints from Google and
Google Scholar was found to increase significantly with growth on some metrics
Come back soon!
So, thanks for wading through this summary of what the SPRD team got up to at OR2019 and check back soon for a summary of my person highlights from what was a very inspiring conference!
How to best link research facilities and equipment to research publications and datasets
Pablo de Castro, Open Access Advocacy Librarian
“It is clear that, in future, funding bodies are going to be keen to
see greater utilisation of equipment and facilities. Blair feels that some
equipment can have as low as 5% utilisation at present. The ULAB
system tracks bookings and usage
allowing funders and institutions to monitor more effectively equipment use.
This monitoring actually encourages increased utilisation”
(Caroline Ingram, Jisc: “Sharing Research
Study: Dr Blair Johnston, Strathclyde University”)
A key development in recent years in
the area of research information management has been the possibility to link
research publications to funded projects at a metadata description level. This
has mainly happened through the advent of research information management
systems with an extended data model that allows the so-called contextual
information to be taken in when describing a publication. This contextual
information is incidentally not limited to funded projects, but also to
research facilities and to other kinds of outputs such as research datasets or
patents. The data model that is able to take in this additional info is also
not restricted to research information management systems or CRISs: publication
repositories are also able to do this by extending their data model, and
platforms like EPrints or DSpace-CRIS are regularly offering this
An issue typically comes up though when trying
to code in the funding information for a publication in an institutional system
of one kind or another: funded projects
available for linkage in the institutional system are just those led or
participated by the institution. This inevitably limits the scope for the
linking to the strict institutional snapshot, and the whole attempt to link
publications to projects tends to fail for instance when a researcher has just
moved in from a different institution and the funding sources for just
published outputs have been left behind. Sometimes project information gets
transferred across institutions into the new system, but this is not always the
case, see an example below.
The same issue arises when trying
to code in research equipment associated to a given publication: usually this
will only be possible if the specific research equipment is in the
institutional database, which is a problem when trying to describe a landscape
where external research facilities are frequently used.
The reason why this happens is
mainly down to the internal information exchange workflows at institutions.
There is usually a research project management unit that covers the project
lifecycle from cradle to grave, including the submission of a project proposal
and its financial management when approved by the research funder. It’s the
information for these ‘institutional projects’ that gets typically transferred
across modules within the institutional research information management system,
while the 'external projects’, in whose lifecycle this internal project
management unit is not involved whatsoever, fail to make it into the project
database to be used by other institutional units, the research library in
Critically, this institutional unit for research
project management have no specific expertise in the area of the metadata
required for a proper project description – this is rather the research library
territory – so it’s important to have some communication channel that allows
the library to feed back any specific need for missing project metadata
elements to the project management unit.
The same siloed workflow operates
for research equipment. The library team are no experts in research facilities
and are not involved in any workflows regarding the acquisition or the use of
research equipment by institutional researchers. This again tends to fall with
the institutional Research Office, who are usually in charge of providing the
information to wider-scoped initiatives like this EPSRC-funded equipment.data initiative run by the
Jisc. However, it’s still the research libraries that are expected to link
publications to research equipment when delivering the metadata into the
institutional research information management systems. It is then a bit of a surprise
that most research libraries within the Open Access Scotland working group have
no knowledge of this equipment.data initiative, even when they are providing
support in the area of research data management and specific case studies may
have been published from their institutions in the blog for the initiative.
Wouldn’t it be much more
reasonable for the library to at least be part of the workflows for externally
sharing this information? And wouldn’t it be just great if the research
equipment could be included as contextual information in the description of
specific research datasets and publications?
In order to be able to do this we would first
need to merge the current institutional siloes into a single pool of contextual
information for projects or for research facilities and equipment that
institutional research information management systems are able to link to. This
‘uber funded project database’ should ideally be offered at a national level,
but could start with a smaller step, for instance by pooling the project
information for a single research funder. Databases like the UKRI Gateway to Research are arguably already
offering this information in the correct format (the Common European Research
Information Format or CERIF), but their interoperability with institutional
CRISs remains underexploited at this time. Most research funders in the UK
other than the UKRI have however no equivalent project database, and this
prevents the kind of monitoring to be performed for their funded projects
that’s already been offered – alas only partially – for the RCUK.
reporting whatsoever is being offered yet for research facilities and equipment
(other than for those big infrastructures funded by the EPSRC) in terms of the
publications (or research data) arising from their use, but this would probably
be of interest to both institutions and research funders. Some strategy to have
the data for cross-funder, cross-institutional project and equipment made
available to research libraries as single, aggregated, interoperable databases
would be a very useful first step.
In recent months I
have had the pleasure of spending a great deal of time in Singapore.
The reasons for why I have frequently been 14,000 km away from my
Glasgow office are complicated and probably quite boring. Suffice to
say, it has been necessary for me to spend periods of time away from
the office and instead I have been working remotely in Singapore for
a couple of weeks at a time. This arrangement has worked very well
indeed; but it has also afforded me the opportunity to explore the
open science – or ‘open scholarship’
some of Singapore’s (indeed the world’s) leading institutions and
formalise collaborative links with the many teams active in this
Southeast Asian island city-state.
The purpose of this blog post is
therefore to provide a brief flavour of what is happening in
Singapore in relation to open scholarship, and very specifically Open
Access and Open Data (and research data management).
interesting aspect of open scholarship within the Singapore scenario to
note is that, perhaps owing to its compact size, Singaporean
institutions are not subject to the kind of government or funder
policies surrounding Open Access (OA) or Open Data (OD) which we are
all used to in Europe. In my discussions with teams across NUS, SMU
and NTU this seemed to be considered an impediment to the
correspondingly high staff engagement levels (and ergo manuscript and
dataset deposits) we enjoy at Strathclyde, for example. However,
rather than dictate institutional policies on the topics of open
science, the approach has instead been to allow Singaporean
institutions to acknowledge the importance of open science and
development their own approaches to delivering it which, to me, made
me quite envious. But, I suppose, whatever the approach it is always
swings and roundabouts.
I had the pleasure
of visiting teams at National University of Singapore (NUS), the
Singapore Management University (SMU) and Nanyang Technological
University (NTU) – and it was my pleasure because every institution
had an excellent campus. All of my site visits occurred towards the
end of 2018 so I must apologise for being a little overdue in writing
my blog! All the teams I visited could in general terms be described as
scholarly communications and research data management teams, managing
OA, RDM, repositories, publishing platforms and so forth; although,
as we shall see in the case of some, their remits could at times extend towards digital scholarship training and bibliometrics too.
of Singapore (NUS)
NUS is a
comprehensive research intensive university. It specialises in a wide
range of disciplines, including the sciences, medicine and dentistry,
design and environment, law, arts and social sciences, engineering,
business, computing. The impressive set-up of institutional support
for open scholarship at NUS corresponds to its research intensive
nature. And one thing that struck me about all the institutions I
visited in Singapore was the commitment to digital scholarship;
ensuing that academic staff were not disenfranchised from the open
scholarship revolution. This was certainly true at NUS library, which
supports its own ‘Digital Scholarship Lab’, used for training
academic, ECRs and PhD students on topics as diverse as text and data
mining (TDM) to digital historiography (see image below). In a way, such research user
support is cognate to – and an extension of
literacy training undergraduate students might traditionally receive at most
universities. However, this form of digital scholarship training for
academic staff members on, for example, techniques around TDM,
requires a high level of technical efficacy among teams in order to deliver effectively.
The team at NUS, led
by Gerrie Kow, makes use of a very cool instance of DSpace,
named ScholarBank@NUS, for both publications and data deposit. OA
is certainly more ‘green’ flavoured in Singapore. Without
umbrella policies from government or funders, there are few grants
available to support Gold publication of any considerable size –
and, if we think about it, the UK has been anomalous in recent years
in its considerable support for Gold so, to that extent, Singapore
may not really be that different. Instead it is the UK which is
NUS also use
Symplectic Elements for their Current Research Information Systems (CRIS). But interestingly there is no
‘connection’ between Elements and ScholarBank@NUS, in which
designated research content is pushed automatically to from the CRIS
to the repository. Instead Chee Yong
Ng (repository manager) has a neat
trick for harvesting specified metadata and file dumps from Elements
and batch uploading details of these via a .csv file into ScholarBank@NUS. This ensures a useful demarcation between CRIS and repository
exists, something I have discussed within the local Strathclyde
context on this blog in the past, allowing optimum function of both
the CRIS and the repository and without either impinging on their
respective raison d’etres.
My visit to SMU was
much anticipated because it afforded me the opportunity of meeting
and sharing conversations with Aaron Tay, someone with whom I have
corresponded electronically for many years but who I had never met in
person. Readers of this blog may know Aaron as the author of the
‘Musing about librarianship’ blog but, when he isn’t writing
insightful blog posts he is an Analytics Manager at SMU Library,
Pin Pin Yeo
(Head of Scholarly Communications), Dong Danping (Scholarly Communications), Yuyun Wirawati Ishak (Head of Information Services). Aaron typifies the data centric operations
which are de rigueur at SMU; where data intelligence is frequently
used to determine operational and strategic priorities.
It is a little
off-topic for this blog post but, to illustrate this point, one of
the projects Aaron was working on when I visited was ‘location
analytics’. Collaborating with an SMU spin-off company, the Library
had helped to create and test software, known locally as LiveLabs,
capable of providing real time data on the location and movement of
people within large spaces (i.e. an academic library), all via the
clever use of data from visitors’ mobile and wearable devices. This
has clear applications within large academic libraries to gather
intelligence on where, for example, students are congregating, which
IT labs they gravitate to, which social spaces they prefer, how they
move through the building, and so forth and – on the basis of this
intelligence – making operational decisions about the layout, IT
offerings, the services provided, etc. See the photo of the UI above,
in which reports can be generated and ‘heat maps’ pored over. But I
The SMU campus is
situated in the ‘downtown’ area, a short distance from Raffles
and the Peranakan Museum, National Gallery Singapore and the Asian
Civilisations Museum, all of which are sublime, incidentally, and definitely worth visiting if you are ever in Singapore. As a
relatively young institution, SMU is blessed with a beautiful and
modern city campus. Exciting university buildings are connected by
expertly landscaped gardens, social areas, modern art - and a vast
array of vending machines too!
Aaron introduced me to the team at SMU
who, together, support a wide array of
activities around for OA and RDM. DigitalCommons is used for
institutional repository functions but is also used to store the SMU
‘heritage collection’, such as its oral history collection and
digitised image collection. SMU are at the earlier stages of
exploring RDM and open data and have recently been experimenting with
Technological University (NTU)
My visit to NTU was
a departure from the city campus of SMU. Instead the campus grounds
for NTU are located in the western part of Singapore, along 50
Nanyang Avenue, where there is plenty of green space and tropical
vegetation. It is the ideal university campus in my view, with plenty
of room for one’s mind to expand. They even have a McDonald’s on
When it comes to
open scholarship, NTU enjoys the largest complement of staff of the
institutions I visited. Again, NTU is very much a research intensive
institution, frequently at the forefront of many a scientific
discovery. All the staff are situated within ‘Knowledge, Learning &
Research’ (we’ll call them ‘KLR’ for short henceforth), based
in the library, wherein there are sub-teams dedicated to Digital
Scholarship (5 staff), Research Data Management (5 staff), Scholarly
Publishing & Impact (6 staff) and Education & Learning (7
staff). This is a team that means business!
Again, it is
interesting to note that the KLR team, like the team at NUS, not only
demonstrate a specialism in digital scholarship, but have team
members entirely dedicated to the topic. From data visualisation to
geospatial data analysis to TDM to optimising open research. It’s
all there. Workshops, seminars, ‘digital scholarship Tuesdays’,
training materials, online learning modules, as well as direct
liaison with researchers is order of the day for this sub-team.
Scholarly Publishing & Impact not only encompasses OA and
repository management, but the ‘impact’ in ‘Impact’: focusing
on bibliometrics and another analytics, producing report profiles for
academic staff and management.
repository, DR-NTU, is the hub of NTU’s open research content but
also grey literature, such as digital theses and dissertations, NTU
publications and – something which was a topic for discussion at
the last UKCORR member’s day event at the British Library –
undergraduate papers and posters. NTU also maintain a restricted access repository, also DSpace.
But perhaps some of
the most interesting work within KLR surrounds RDM, where
Goh Su Nee
(Research Data Management at NTU)
has worked with the wider institution to implement robust policies surrounding research data. A lot of the momentum about RDM at NTU has
arisen owing to institutional governance. Indeed, a recent high-level
risk analysis of institutional threats to NTU identified research
misconduct as one of the most significant risks, adding weight to the kinds
of policies many UK institutions – operating under data mandates –
would love to see at their own institutions.
Owing to the root of
research misconduct often lying within the underlying research data,
NTU, and the RDM team within KLR, have cultivated an effective policy
and operational framework around academics’ research data, its
management, sharing and persistence. Given NTU’s research
reputation it is easy to understand how a potentially high profile
instance of misconduct could have severe consequences for the
institution’s reputation and existence.
Some of the steps taken by NTU to ensure better RDM include the following:
plans (DMPs) are mandatory for all research projects.
Research funds are
withheld by NTU if researchers fail to demonstrate
cognisance of research data procedures or generate a satisfactory DMP.
DMPs are submitted
via NTUs in-house built CRIS, where RDM team members regularly
perform audits on the DMPs.
To date, 1000s of
DMPs have been received, enabling some level of DMP analysis, e.g.
what component constitute a ‘good’ DMP, how can these be
generalised across disciplines, which DMPs are exemplars worthy of
sharing with researchers as a knowledge resource, and so forth
NTU have been using
an instance of Dataverse, called DR-NTU (Data) and launched circa 18
months ago, for their RDM (see screen below). My time with
Goh Su Nee
first time I had experienced a proper demonstration of Dataverse, and
it was at this point I cottoned onto the potential of the software.
Its ability to accommodate high quality metadata via customisable
metadata schema, smooth concatenated downloading of data files,
extensible metadata templates for specific ‘dataverses’,
schema.org support, and so forth. The team have also been working
closely with the Digital Curation Centre (DCC) and, most recently,
have invited members of the DCC to assist in the delivery of RDM
advocacy and training.
I am tremendously
grateful to all those at NUS, SMU and NTU who took time out of their
busy days to share knowledge and expertise, as well as share our
passions for biscuits and other baked goods. Upon hopping onto
Singapore’s futuristic metro railway system (the MRT) after my last
meeting, I reflected on the differences between what is happening in
Singapore and at my home institution. And I concluded that whilst
there was more commonality that difference, one area where the
Singaporean institutions have grown is in the area of digital
scholarship … yes, we once again return the topic of ‘digital
Improving our digital scholarship offering at
Strathclyde is something our (newly named) Scholarly Communications &
Research Data team have pondered and vowed to enact for years, yet we
remain nowhere near delivering the suite of services, training or
facilities that NTU or NUS does. I imagine this is true for many
other institutions in the UK. Why is this? Is it because we are so
enthralled to compliance with funder policies on OA and RDM, that we
don’t have time for much else? Possibly. Is it because we are
dealing with near insurmountable deposit workloads? Again, possibly.
But, more probably, it is because institutions in Singapore have been
able to fill their ‘funder mandate vacuum’ with their own
policies and strategies – and with this customise what is offered
to academics in order to support a true behavioural change in the
It all depends on which side of the fence you
are sitting, of course. All the teams I encountered in Singapore remarked that
they found the volume of full-text content, and research data, we
were dealing with at Strathclyde extraordinary, and they were envious. This led them to wish they had similar
policies and mandates in Singapore too. But, as I think my colleagues would agree, be
careful what you wish for!
An opportunity to support Open Science implementation at SMEs?
Pablo de Castro, Open Access Advocacy Librarian
Earlier this week the Open Access team at Strathclyde received a message from an
SME leading an EU-funded project in which Strathclyde is a partner: they have
received a letter from the EC project manager asking about Open Access and
warning them that non-compliance with the EC Open Access policy could ultimately result in
the retention of a fraction of the project grant.
is some extraordinary piece of news – so these communications from EU-funded project officers are NOT an urban legend after all!
A key idea included in the
comprehensive summary for the EC
FP7 Post-Grant OA Pilot was that ambitious
institutional research libraries could consider supporting Open Access
implementation for EU-funded projects led by SMEs where their institution was a
project partner: SMEs tend not to know (or care) much about OA, and it’s
not their fault since no-one has ever invited them to join the OA WGs founded
by and for universities (it’s incidentally not just SMEs but university hospitals,
foundations and many other non-academic actors who are unfortunately missing
there, but at least these working groups do exist).
It’s worth wondering how
effectively this support for external
Open Science implementation may be getting provided at the time. An increasing
number of EU-funded projects are being led by these external stakeholders,
often SMEs, with no knowledge about Open Science. Technical universities should
be ideally placed to start assessing this, as there’s a large involvement of industry in projects they participate in.
The letter is not particularly
threatening in its wording, but the SMEs reps were terrified by the possibility
of losing a fraction of the project grant. There is a clear role here for HEIs,
even a source of income if this were provided as a service (only another of the
findings mentioned in the FP7 Post-Grant OA Pilot report is that research
libraries are very bad at raising funding to at least partially cover their
massive expenditure in providing access to scientific literature).
A huge dissemination effort has been taking place at UK institutions to clarify
Open Access to their researchers. Many of them still do not understand what all this
is about, but the exercise has achieved some results nonetheless. However,
explaining compliance to SMEs is a totally different story – there is no
research assessment exercise to raise as a potential driver and occasionally not
even an understanding of what an EU-funded project is supposed to be for (see for
instance an example of the worst EU-funded project website – or shell thereof –
this reporter has ever seen: it may or may not be a coincidence that it was mostly led
These workflows are part and
parcel of a wider approach based on
funded projects that has been raised before in this blog. By pulling the
funded project thread, the perspective on Open Science implementation becomes
much clearer. Unfortunately research libraries have traditionally not dealt
with funded projects as this domain is Research Office territory. But this is
changing very quickly with the increasing access research libraries
are getting to institutional CRISs.
Funded projects are one of the
main entities in research information management systems. It’s however hard to
understand the relevance of this without first grasping the essentials of the
funded project approach. When this is understood, all the workflows around
scholarly comms, research administration and related areas (such as measuring
social impact of research) become part of one and the same process. This opens very rich opportunities for
collaboration between research libraries and research offices. With regard to this reaching out to SMEs for offering them support for Open Science policy compliance, the first step would be to identify EU-funded projects where the institution is a partner who happen to be led by an SME or by a non-academic stakeholder (such as the NHS).
It was lucky that the specific
Elsevier paper that the EC project officer’s letter was about had been
deposited in Zenodo under embargo and that the embargo period expired the
day after the letter was received. A sizeable fraction of the project’s
publications were not deposited and were subsequently non-compliant, but the
letter didn’t mention any of these – it’s difficult for research funders to get
to know about publications arising from their funded projects if they don’t get
reported either by authors or by institutions.
This is arguably the biggest
issue right now in the UK OA landscape. While the levels of OA availability
(or ‘compliance’) are literally soaring as a result of the HEFCE OA policy in
what is currently the largest
unsung success in the OA domain in the whole of Europe, the percentages of
EU-funded publications stemming from projects coordinated at UK institutions
that are being delivered into the OpenAIRE aggregation are ridiculously low.
This is all due to system interoperability flaws: as a consequence of the very
low implementation at UK repositories of the specific RIOXX application profile
that was devised by the RCUK as a tool to ensure UK compliance, most
institutional systems (but not all: Uni Glasgow EPrints-based Enlighten is the
shiniest counterexample) are unable to get their publication records harvested
by the OpenAIRE aggregation.
Progress is happening that will hopefully fix
this serious issue in the mid-term, ideally again by means of a collaboration
between institutional research libraries and research offices. It is a slow process
though and it needs to compete against many other institutional priorities at
the moment, OpenAIRE-compliance not usually being too high up on the to-do lists.
But there may soon be good news in this regard on this very venue. Watch this
conference in India poses specific challenges of its own, both from a logistic
viewpoint – travel arrangements and sponsoring of the travel costs – and on the
difficulty to put together an event programme that bridges the gap between
research library practices in the most advanced workplaces worldwide and the
current practice in a country like India where libraries are often still to
receive the kind of empowerment via public policies that they enjoy for
research support purposes in regions like the US or Europe.
way this was addressed at CLSTL2019 was by putting the event organisation in the
hands of an extremely well-connected librarian at IIT Gandhinagar, Dr TS
Kumbar, who gathered a spectacular group of international speakers covering a
wide range of subjects well beyond the local practice at his institution or in
the wider country. In this sense the conference was an opportunity for the
local attendees to get an insight on what is currently going on at the most
advanced research libraries in the world.
Strathclyde Uni presentation was one of the few guest contributions in the
programme whose speaker was not contacted on the basis of a previous connection
with Dr Kumbar but solely because of scholarly work published by the library
that he had come across. These were the joint
OCLC/euroCRIS worldwide survey report on research information management
practices and the role of research libraries therein, and the recently published article on the
CRIS-mediated implementation of Open Science at the Strathclyde library. This
notwithstanding, there was an ongoing research collaboration between the EEE
Department at Strathclyde and the IIT Gandhinagar at the time the invitations
to guest speakers started to be sent out in Dec’2018, see above, so in some
sense there was indeed a connection between both institutions.
connections between the research carried out at Strathclyde and India were
highlighted in the presentation too, such as a recent trip to Mumbai by a group
of Biomedical Engineering students in prosthetics under the lead of Dr Anthony
McGarry to deliver training in the discipline at local clinics.
India having sent a first-ever
national delegation to the Congress of the International Network of
Research Management Societies (INORMS 2018) held in Edinburgh in June 2018 and
with an emerging Research Information Management platform getting implemented
across the country (Indian Research Information Network System, IRINS), the event provided an excellent
opportunity to highlight (i) the relevance of research information management
workflows for the daily research support practice at libraries and (ii) the
relevance of research administration at institutions as a basis for research-
and innovation-related information workflows at a national level.
Furthermore, the event also
offered the surprisingly unusual opportunity for international speakers to
listen to each other and to engage in discussions on their respective practices
that spanned across continents. In particular, the event provided the forum for
a very fruitful comparison of the degree of progress in specific areas between
US/Canada- and Europe-based institutions.
Open Science was of course one of
the most comprehensively addressed areas among these, even if it was far from
being the sole one (these sections covered a wide range of library-related
topics, from refurbishing the actual physical library buildings to teaching activities
delivered by faculty librarians through library service marketing).
The way the bridging of the above-mentioned
competencies gap was addressed was by staging a ‘long’ talk by an international
guest speaker at the start of a specific session then arranging a number of
short talks providing an insight on the local practice around topics related to
the one addressed in the long talk at the start.
RDM was a heavier discipline in
the event programme than Open Access, but there were excellent presentations on
the latter topic. Among these stood out the update provided by Coleen Campbell,
a Florence-based consultant for the implementation of Plan S from the Max
Planck Society in Munich. A representative from the University of California
(Irvine) was also among the guest speakers and broke out the news to the event
audience of the cancellation of the UC subscription agreement with Elsevier.
The Strathclyde presentation also included a summary of the Open Access
policies applicable in the UK with an emphasis on the best-practice HEFCE
policy and its link to the UK-wide research assessment exercise.
Several attendees chose to
deliver remote presentations that had been previously recorded and were then
projected on the big screen in the auditorium. A particularly good one among
these was the summary on (often very innovative) practices at the KAUST library
in Thuwal (Saudi Arabia) delivered by its director Dr J. K.
Two contributions stood out in
the RDM domain, both of them workshops. The first one, delivered by Harvard
Research Data Program Manager Ceilyn Boyd and
IQSS Dataverse Manager of Data Curation Sonia Barbosa addressed “Research
Data Management and FAIR Data in the Dataverse Infrastructure” (C Boyd’s slides
providing the RDM framework are available at https://bit.ly/rdm_clstl2019;
the rest of the presentations will eventually be uploaded at the conference
Private conversations with these
two colleagues shed more light into the business model behind the Harvard
Dataverse, which is for instance providing data deposit, curation, storage and
preservation services to the Ubiquity Press fully Open Access, affordable-APC
journal titles in a similar (but institutionally-driven) way to how Dryad provides services to many other mainstream
The second RDM-related workshop
worth mentioning was “Data
and Visualisation in Libraries: Services and Practice” delivered by Walt
Gurley from the North Carolina State University Libraries. The case that was
presented for libraries offering peer-to-peer (“Coffee & Viz”) courses on
Tableau and other data visualisation software solutions to NCSU students and
Faculty plus external stakeholders was again both innovative and very
An interesting aspect frequently
raised in conversations with US-based colleagues was the challenge faced by US
institutions for Open Science implementation purposes arising from the very
weak cross-institutional collaboration patterns in the country: there is no
national-level research assessment exercise in the US and the policies issued
by the Federal Government are not strong enough to ensure across–the-board
awareness and compliance. In these circumstances, it’s paradoxically Industry
that often plays a coordinating role via the user groups for specific
solutions, in stark contrast to the European Union member countries where
Industry is often perceived as the enemy by Open Science advocates.
social programme was finally put together for the international visitors including
among others cultural and local shopping tours, local dance performances and
opportunities to taste the local food.
The event as a whole was very fruitful for the
opportunities it offered to get to know more on how the role of research
libraries is evolving in India and to get the chance to discuss current
practices in research support with colleagues from other countries. And
following this event, it was not just the Mukti Foundation in Chennai who got
its Scottish touch from the BME faculty and students, but also IIT Gandhinagar
through the Strathclyde library!
Let’s start by saying that, as a repository, Strathprints is
very fast. Pages load/render very quickly and this benefits both desktop and
mobile users – and we will see further evidence of this later in this brief
blog post. But let us rewind to late 2017 when I posted a blog entitled, “Demonstrating
the need for speed: improving page loading and rendering in repositories”. In
this blog post I briefly reviewed why repository speed was increasingly
important and why repository managers, institutions, and so forth, should be
worrying about it. You can obviously read the full post, but a key take-away
point is this:
Average load time for mobile sites is 19 seconds over 3G
[and] 53% of mobile users abandon sites that take longer than 3 seconds to load
[…] Comparing faster sites (5 seconds) to slower ones (19 seconds), the faster
sites had average session lengths that were 70% longer and bounce rates that
were 35% lower
Clearly this sort of user behaviour is relevant to the work
of institutional repositories, and repositories more generally. One might also
argue that scholarly end users are even more likely to lack patience. Such
users are often interacting with dozens, perhaps hundreds, of scholarly
websites and search tools every day. They engage in horizontal information
seeking strategies and simply don’t have the time or patience to wait for, say,
repository pages to render. Because they are skipping across multiple, disparate
sites to gather the information they need to complete their task, speed is of
The blog post also summarised some of the techniques used on
Strathprints to effect a 79% overall improvement in page loading speed – and
several publications have since been produced which occupy the same
intellectual area. See for example this poster, working paper and Code4Lib
That’s the background over.
Test My Site!
Earlier this week Google unveiled some new tools to support
developers in creating better mobile experiences for users. Google’s further commitment
to mobile is understandable if, like me, you have been following the various
changes that have been rolled out to PageRank and the Googlebot over recent
years. One of the tools is their new Test My Site. This is a dedicated mobile
testing tool as opposed to PageSpeed Insights, which addresses both
desktop and mobile. Test My Site checks:
The speed of both the entire site and of
Whether the site/page speed is faster or slower
compared to the prior month;
Whether the site speed/page speed ranks as FAST,
AVERAGE, or SLOW;
How the site speed compares to others in the
A detailed list of recommended enhancements to
be implemented in order to increase speed
A comprehensive report for sharing with others
So, Test My Site provides a very wide-ranging report on how
your website, or in this case your repository, is executing on mobile, its
speed rank, areas of loading drag and overall
Strathprints = FAST
I am therefore pleased to report that Strathprints was
ranked as ‘Fast’, with mobile page speeds of ≤ 1 second on 4G. This is
gratifying to observe and indicates that our efforts over recent years to improve the performance of Strathprints, including in respect of page loading and mobile optimisation, have
been well spent. :-)
I tested a series of other repositories in order to
establish some kind of benchmark. I tested 8 repositories, with varying underlying
software. The mean speed was 6.2 seconds, with ‘Slow’ rankings dominating. One
thing to note is that the Test My Site always failed whenever I attempted to
benchmark DSpace repositories. There is clearly something peculiar to the
engineering of DSpace which causes the Test My Site tool to fail. The good news
is that most other platforms were successfully tested: EPrints, Samvera, Invenio,
Digital Commons and Pure Portal.
In the last of our ‘review of 2018’ blog posts we explore
the usage of content on Strathprints during the whole of 2018. Followers of our
blog will be aware that we periodically post updates on usage throughout the
year. These posts provide a snapshot of usage during a specific 3-4 monthly
period. This post, however, will look at the top 20 most used (or ‘downloaded’, if you prefer) deposits and, as
always, we will segment our deposits into two categories:
The top 20 most downloaded outputs during all of
2018 that were also deposited in 2018 (in effect, the top 20 new entries…)
The top 20 most downloaded outputs during all of
2018, irrespective of deposit date.
Let’s start with top 20 most downloaded outputs during all
of 2018 that were also deposited in 2018. There are our ‘new entries’, some of
which have generated spectacular usage in only a couple of months of being
What is perhaps most striking about the top 20 new entries
is the high demand demonstrated for ‘grey’ scholarly research content.
So-called ‘grey’ content includes outputs such as, research reports, policy
papers, working papers, white papers, and so forth. Repositories across the
world have been raising the profile of such outputs and, like Strathprints in
2018, many institutions are discovering that such content is among the most
used within their repository. One reason is because grey content will tend not
to be published elsewhere and therefore attracts a lot of attention within the
repository; but it is also because many of these outputs contain valuable
insights, data or findings and often do not receive the exposure they deserve.
The remaining 15 spots are occupied by a healthy mix of
articles and conference papers emanating from across the University. The School
of Law (with 3 outputs in the top 20!), Department of Computer & Information
Sciences, Department of Physics, the School of Social Work & Social Policy,
Department of Marketing (also with 3 outputs in the top 20!), Department of
Electronic & Electrical Engineering, School of Government & Public
Policy and the Department of Naval Architecture, Ocean & Marine Engineering
are all represented. Congratulations to all the authors!
Top 20 most
downloaded outputs during 2018
Below are the top 20 most downloaded outputs, irrespective
of deposit date. This chart is arguably less interesting than the ‘new entries’
because this is a chart that changes little, month-to-month, and even
year-to-year! In fact, many of the outputs listed in this 2018 chart appeared
in the 2017 and 2016 charts. Where outputs appeared in the 2017 chart, we have
indicated whether the output has maintained its position ( - ), or has moved up
or down from its position in 2017 ( ˄ , ˅ ). We are nevertheless pleased to
report that there are three new entries, including outputs from Electronic
& Electrical Engineering, the Centre for Excellence for Looked After
Children in Scotland (CELCIS) and the School of Psychological Sciences &
Health. Congratulations to all!
It is perhaps interesting to note that usage of digital content on
Strathprints demonstrates the Pareto principle, or the 80/20 rule. In other
words, around 80% of all usage in 2017 was generated by 20% of the digital
content, with a very long tail of content picking up minor usage. This can be
observed in the chart below, with the most used item attracting 5,032 COUNTER
downloads, but at the other end there are quite a few outputs attracting only 1
download throughout 2017. This pattern has been observed across usage data for
several years and is likely to be a maxim across all open access repositories,
although it would be interesting to hear the findings of other institutions
about their own local observations.
We did not hear from other institutions about whether they
observed something similar on their repository – there is still time though. Please get in touch! Suffice
to say that the same general Pareto principle can be observed in our 2018 data, with
approx. 75% (299,928) of Strathprints usage generated by 20% of the deposits.
From our data we can also get a sense of where in the world
Strathclyde’s Open Access research is being used. The usage map below
demonstrates the global usage that Strathprints attracts; although we have to
also acknowledge that even by slicing usage by country we can observe a long
tail of countries, which again suggests a Pareto relationship between countries
and their usage. In this case, however, roughly 91% of the usage is generated by the
top 20% of countries, as ranked by usage. It is nevertheless interesting to note the
varied nature of those countries, with a healthy spread between usage from the
Global North and Global South.
The ‘bubbles’ chart below probably gives a better indication of
the size of usage by country, with the size of the bubble relating to the size
of usage. The biggest users of Strathprints content - and the 20% which generated 91% of the usage - are:
Iran, Islamic Republic of
Korea, Republic of
Overall, the usage figures for Strathprints during 2018 demonstrate its continuing ability to generate visibility and discoverability for Strathclyde Open Access research. This is partly based on the work the team has been undertaking to optimise Strathprints but also the fact that we are making Strathclyde’s excellent, high value content open for discovery. Watch this space in February 2020 when, hopefully, will have more improvements to report! :-)
The first thing that struck me about the headline numbers
was the growth in full-text deposits and metadata-only validations for 2018.
This hit me so hard that I tweeted my astonishment immediately!
These increases are especially astounding given that we have fewer staff than we did during the first quarter of 2017. The figures are partly attributable to productivity and efficiency improvements, but also the unwavering workhorse ethos of team members. Of course, such a work ethic cannot continue indefinitely, and addressing this will be essential during 2019 as we bed into our new team structure. We have recently been reorganised into the ‘Scholarly Research Communications Service’ as a sub-section, ‘Scholarly Publications & Research Data’. But I digress! To put these deposit and validation increases in perspective:
3396 full-text deposits to Strathprints were
made during 2018, which is a 20% increase on 2017.
Metadata-only validations for the CRIS also grew
by a similar proportion too (20%), accounting for 3014 validations.
Those familiar with this blog will be aware that better
understanding repository discoverability, impact and usage is a keen interest
of mine. Previous blog posts and publications explore approaches to optimising repository usage; I therefore I have no wish to repeat myself here. Suffice to say that
Strathprints enjoyed a 42% increase in usage during 2018 when compared to 2017.
This is incredibly rewarding to observe, especially since Strathprints
demonstrated only a 20% increase in full-text deposits during the same period. In other words, content usage of Strathprints continues to outpace content growth for another year, and by an even greater percentage.
The remaining info-graphic numbers should be reasonably straightforward
to interpret, I think. I’ll therefore let the graphic speak for itself – but, if not, check out last year’s blog post where we defined “impressions”,
“clicks” and so on.
Update on Archivematica and data preservation at Strathclyde
Alan Morrison, Research Data Support Officer, University of Strathclyde
We have written
previously about the RDMS team’s fledgling use of Archivematica, the open-source
preservation system employed at Strathclyde to aid with the curation and
preservation of research data. However there has been significant progress
since our last report both in terms of how the system is implemented at the
university and in our own proficiency in customising it for our own specific RDM
requirements; so time for an update!
Much of this recent progress was reported on at my first
attendance at an Archivematica UK User Group, which took place at Warwick
University last November (see the group’s administrator Rachel MacGregor’s excellent
write up of the event). Attendees were drawn almost exclusively from the archive/cultural
heritage sector and the services that support them, and while the meeting
highlighted the differences in how Archivematica is employed in our separate
disciplines, it also demonstrated the flexibility of the system to deal with
disparate data formats. The main difference being that Archivematica is used by
RDM at Strathclyde as stand-alone software to create preservation copies of
deposited research data for long-term curation and re-use rather than as a tool
to process, catalogue and provide access to archival digital surrogates using AtoM, the archival description
software also produced by the makers of Archivematica, Artefactual. There are encouraging
signs of greater interest in the use of Archivematica at other academic RDM
services, not least by its inclusion within the newly launched Jisc Open Research Hub as one of
its main preservation systems. We are aware of at least five Scottish HEIs
either using or investigating Archivematica as a preservation tool, and have
welcomed visits from Glasgow and Caledonia universities to Strathclyde to
discuss and demonstrate our experience of using the software. The suggestion to
establish an Archivematica Scottish ‘hub’ might not be that far off either.
Our implementation of Archivematica is not without issue or
the need for continued assessment and review; much of our recent achievements
have been the result of the tailored personal support provided by Artefactual,
whether that be by the judicial use of support tickets, personal visits,
technical email support and more recently a monthly skype meeting with
Artefactual headquarters in Canada. A recent example concerned an ongoing issue
with Archivematica failing to complete the processing of very large complex
datasets. This was resolved by providing Artefactual with some test datasets to
duplicate the problem, resulting in a list of technical recommendations to
improve server capacity and speed up data processing.
Progress in our Archivematica implementation is being
monitored as part of a wider ROADS (Repositories, Open Access & DatasetS) and
Library planning exercise. This involves the setting of measurable targets for
technical improvements and longer term goals including the functionality to
report on file formats by department. The provision of monthly highlights to a
Library planning committee means that the team’s ongoing work in this area is
circulated widely and the Archivematica development is placed in a context of
related improvements to the Archives pipeline, and other Open Science related
technical innovation produced by the ROADS team.
The other main issues highlighted at the user group centre
on the use of Pure as our institutional data repository and the lack of
interoperability it provides with other systems, including Archivematica. As a
proprietary CRIS system Pure provides little in the way of interoperability
with anything other than compatible Elsevier systems, and we have yet to find a
workable API which provides such functionality - whether the Jisc Open Research
Hub has solved this issue remains to be seen and tested! The lack of
interoperability presents us with a weak link in our preservation workflow as
it necessitates both the manual extraction of data from Pure to our networked
drives and the addition of relevant metadata to the preservation copy. Both of
these actions invite the potential for human error and add a significant
extension to the processing time.
workflow: Ingest, Distribution, Curation and Preservation
Naturally such issues are beyond the control of
Archivematica and despite them we will continue to process our way through the
data deposit backlog. Once every deposited dataset has received preservation
processing and is moved to our Archivematica Storage Service we can then start
to look closer at the spectrum of data the university is producing and plan for
future preservation and re-use.
Other developments at Strathclyde include our imminent
upgrade to version 1.9 of Archivematica (watch the Archivematica Roadmap and Product Vision webinar
for more details), including the implementation of performance monitoring tools
which will provide options to maximize functionality and available disc space
and developing an ePrints
Archivematica integration potentially for use with StrathPrints. Strathclyde
University will also be hosting the next Archivematica UK User Group in the Spring
of 2019 so look out for the announcements or get in touch if you are interested
in presenting or attending.
Recent changes in institutional Gold Open Access funding practices at Strathclyde
Pablo de Castro, Open Access Advocacy Librarian
Sometime in mid-November, the situation that most other
research-intensive universities in Scotland had already reported when we met
last Sep for the Autumn 2018 Open
Access Scotland working group meeting in Aberdeen also hit Strathclyde Uni:
the block grant awarded to the library by the Research Councils UK (now UK Research
and Innovation, UKRI) to cover Open Access publishing fees (APCs) in the
period spanning from Apr 1st, 2018 to Mar 31st, 2019 ran out. This typically
marks the introduction of APC funding restrictions and a subsequent change in
Gold Open Access funding eligibility criteria. Open Access librarians do not
enjoy this situation, as institutional researchers are notoriously difficult to
be made aware of such sudden changes in policy, but there is not much
alternative given the circumstances.
It’s not the first time we run out of block grant funding at
Strathclyde – this was also the case last year, but back then we managed to
complete the funding period by overspending in the knowledge that a new tranche
of funding would eventually arrive. This year however, following the release of
Plan S and the updated
Wellcome Trust Open Access policy, we have decided to take a different
course: while we will still overspend and charge the excess expenditure to the
next block grant to be awarded later this year, a tightening of the eligibility
criteria in line with the above-mentioned recent pieces of policy-making will
take place, very much in line with what our colleagues at fellow
research-intensive Scottish institutions have been doing for some time. This
means enforcing a no-hybrid Open
Access funding policy.
It has occasionally been the case in the past few months
while we still had some budget that researchers would come to us with a request
to fund Gold Open Access for a top-of-the-class accepted manuscript whose
leading institution had refused to pay an APC for, instead sticking to the
Green Open Access line of action, namely depositing a copy of the accepted
author manuscript – typically under a 12-month embargo – into the appropriate
institutional systems. Since the manuscript (accepted in a hybrid journal) was
technically eligible for Gold Open Access funding, we granted the funding
request. Comments from the co-authors of the publication celebrated that
“at least some universities still cared about the subsequent increase in
research impact”. This is an interesting remark. It again proves the fact,
well-known by any institutional librarian dealing with APC funding, that every
single researcher thinks that their accepted manuscript is the sole one that
the library will process.
This post is not aimed however to discuss possible
improvements in cross-institutional coordination around APC funding, but rather
to examine the effects of the sudden introduction of the no-hybrid policy on
the APC funding distribution by publishers and on researchers’ attitudes. On
the former, the impact of the updated eligibility criteria on the distribution
of funded journal titles has been immediate. Although it’s still too early to assess,
early results suggest it could mark a more permanent element in our policy even
beyond the arrival of the new tranche of UKRI funding on Apr 1st, 2019. This is
mainly because a return to the hybrid funding policy would automatically result
in running out of budget in the midst of the funding period again sometime
towards the end of 2019 (if not earlier).
The APC funding by publishers resulting from the first weeks
of no-hybrid policy shows, as expected, a much more balanced distribution
between hybrid and fully Open Access publishers. Publishers like MDPI or PLoS have
suddenly become a regular entry behind the titles that are receiving funding,
while hybrid publishers like the IEEE, the Optical Society of America or
Elsevier still feature on the list thanks to fully Open Access journals of
theirs like IEEE Access, Optics Express or Materials & Design.
SpringerNature is the sole publisher that sees almost no impact from the
updated eligibility criteria, since their hybrid journals are covered by the
Springer Compact agreement and – assuming the deal gets renewed in the UK as
it has in the Netherlands – are hence not affected by the policy update.
Critically, the APC distribution is not only more balanced across publishers,
it is also significantly more affordable – and this is one of the main
objectives at this stage.
On the latter aspect, i.e. the reactions from institutional
researchers whose Gold Open Access funding requests are turned down due to the
lack of funds, the findings thus far are that authors understand the situation
and do not insist on having their paper published Gold Open Access via the
library. Whether this means that they’ll choose the Green Open Access route or
that they will instead try to find some alternative source of funding is hard
to tell at the moment, but the evidence so far tends to suggest the former
One unexpected – if not unwelcome – outcome of the new
policy is that it takes longer to address the funding requests than in the
past, when they would just be routed into the default payment workflow for the
specific publisher. Now it’s often necessary to explain to the authors that the
current limitations in funding availability prevent the library from accepting
their funding request, and this requires a higher level of customisation in the
communications than it used to be the rule.
Another potential outcome of this updated eligibility policy
might be a stronger case for setting up an Institutional Open Access Fund at
Strathclyde. Centralised APC funding via block grant budgets from research
funders inevitably introduces a strong bias towards the more intensively UKRI-
and COAF-funded departments (in our case, Electronic & Electrical
Engineering, Physics and the Strathclyde Institute of Pharmacy & Biomedical
Sciences, mostly to the detriment of Social Science and Humanities Schools and
more generally of unfunded authors anywhere). An updated funding eligibility
policy that resulted in lower expenditures could reinforce the feasibility of
serving a significant fraction of unfunded authors with a limited amount of
From the Scholarly Comms team at the library we
would see such a development – which is already operating at fellow Scottish
institutions like St
Andrews or Stirling – as a desirable outcome in line with the Plan S requirements. We’re at the
same time keeping a close eye on potentially complementary ways forward such as
the much-discussed ‘Read
and Publish’ deals that publishers are currently offering. The issues around
these agreements will be addressed in a future post, and we will also provide
regular updates on the progress around the updated APC funding eligibility
policy, ideally including a list of the funded journals since the change in the
The words “moving on up” bring to mind a hit single by 90s
pop-dance behemoths, M-People. This blog post is not a review of mediocre
pop-pap from the early 1990s – and let us be clear, I loathed M-People;
however, it is worth noting that their single, “Movin’ On Up”, climbed the pop charts
across the world in 1993, including reaching #6 in Finland.
In a somewhat awkward segue between M-People and
repositories, this lighthearted blog post reports on the news that
Strathprints, the University of Strathclyde institutional repository, has
something in common with the Finnish chart performance of “Movin’ On Up” in Finland. In
fact, the M-People song title also provides some impressive descriptive comment
about the movement of Strathprints within the Ranking Web of Repositories. Can you
see where this going?
While there has been only a modest improvement in the global
ranking of Strathprints, this improvement has nevertheless equated to a
24% and 75% improvement in its European and UK rankings respectively. Only UCL,
Oxford, Imperial, LSE and Edinburgh Research Archive (ERA) rank more highly
within the UK. According to OpenDOAR there are 280 UK repositories. The placing
of Strathprints at #6 within the UK situates Strathprints well within the top
decile of UK repositories, as per the Ranking Web of Repositories, and deep within
Europe’s top quartile – and within the top decile of all repositories in the
world. Hooray! Hard work within our team and a focused repository
strategy may be starting to reap rewards.
In other news, it is very pleasing to observe a strong
performance from Brazilian and Indonesian repositories, many of which dominate
the global top 20. In recent years both countries have been cultivating their
burgeoning open science ethos and it is great to see this reflected in the
reach of their open access repositories. Keep up the good work everyone!
Strathprints: Supporting DS Dimensions badges on repositories
George Macgregor, Institutional Repository Manager,
University of Strathclyde
Strathprints has supported alternative metrics for several
years, via the Altmetric donut and the Altmetric Attention Score. Displaying
these sorts of metrics alongside repository deposits has become de rigueur for
repositories, and publishers too, enabling users to understand the wider impact
and attention a particular research output might be receiving in the social
sphere, media and within the corridors of policy making bodies. More than that,
alternative metrics are an important component of the wider open science agenda
and have therefore been widely adopted by repositories.
This brief blog post is a note that we have decided to add
to our alternative metrics by also displaying the DS Dimensions badge alongside
eligible deposits, thereby allowing users to easily see how many citations an
output may have attracted. Like the Altmetric donut, users can explore the data
further by clicking on the badge to visit the Dimensions platform.
To accommodate the addition of this data in the Strathprints
abstract pages, it has been necessary to move things around a little.
Previously, the Altmetric donut and Attention Score would be displayed in the
right-hand navigation column but, in order to simplify the UI for users and
allow for a good mobile experience, it is now displayed underneath the
abstract. See the screen snippets above and below. Here are some good examples on Strathprints itself for you to explore:
It is our intention to display additional metrics in due
course, including citation metrics from Scopus, Web of Science and Google
Scholar, perhaps with Plum X alternative metrics too. Including this additional
content within the UI will require some creativity since UI real estate is
already very tight and squeezing in more could be problematic. But where there
is a will… There are other things on the “to do” list for now though, but watch
this space for further details soon!
The annual UKCORR Members’ Day took place at the British
Library at St Pancras on 10 September 2018. It was a terrific day with
interesting presentations from the likes of Torsten Reimer (Head of Research
Services at the British Library), Petr Knoth (CORE), among many others. But the day also included some breakout sessions designed to stimulate discussion about
some of the hottest topics in repository-land. This blog post provides a
summary of the principal discussion points raised during the breakout session: ‘Open scholarship beyond the REF’.
Pic: Torsten Reimer, presenting ‘The once and future library - reimagining the national library as infrastructure service provider in an open science world’ at the UKCoRR Members’ Day
The scope of the breakout session was the
future role of repositories outside research funder and REF compliance, which, we
felt, was often distracting teams around the UK from engaging in ‘value added’
activities. In fact, one breakout session participant quipped that the session
could be re-titled, “Open Scholarship despite the REF”.
The current authors facilitated the breakout session. Upon
preparing for it, we anticipated that a structured discussion would emerge,
from which consensus on actions for the community might be drawn. However, as
is commonly the case with interesting discussions, consensus was not always
possible and, in fact, little structure seemed to emerge. Instead, a series of
session discussion points prompted further questions, divergent opinion and
Kicking off the session was a discussion about dataset
discovery. With the dataset repository landscape less mature than literature
repositories, and with datasets finally flowing (or trickling?) into dataset
repositories, attention has recently turned to discovery of those datasets. The
discussion enjoyed additional context by the recent emergence of Google Dataset
Search. But discussion initially explored the abstract notion of datasets,
which some discussion participants felt remained too fluid. For example, it
emerged that few institutions were promoting datasets in the same way as
publications because datasets, “as an object, require reconceptualising”. Therefore,
ideas of how best to improve their visibility were still developing. Datasets
remain a ‘niche’ product, unlike publications.
Interestingly, several participants suggested that dataset
formats were a key impediment to discovery, primarily because publishers were
often requiring the deposit of datasets to facilitate the publication of a
related article, but did not know how best to treat the dataset. Datasets
reduced to PDF documents for ‘supplementary information’ purposes was not an uncommon
observation. It was also observed that it can be difficult to locate funder
mandate/data access statements in these cases without a lot of manual
Data journals were proposed by the present authors as a possible
route to better exposing rich datasets for reuse and citation; however few felt
that it was within their gift to suggest to scholars that they should invest
yet more time in the data by going to the effort of documenting via an article.
Securing the dataset, suitable metadata, etc. was a big enough task and it
should be possible for repository and open data specialists to disseminate the
dataset optimally. Indeed, there was a view that research publications were the
immediate path to discovery (i.e. via in-article data statements). The role of
Data Management Plans (DMPs) was also highlighted, with some suggesting that the
question, “Who is my audience?”, is rarely considered when DMPs are being
generated; yet consideration of its possible uses and audience should be
factored into its description.
Thus, if a consensus was to be identified on the topic of
dataset discovery, it was that there was a lot that could be done to make
datasets more discoverable and more used. However, little of it involved search
tools like Google Dataset Search and instead better training of researchers on
dataset management (e.g. DMPs). More thinking on the part of the community
about the concept of datasets and how they relate to publications too.
datasets: linkage between datasets and publications
“Compliance taints datasets”.
…said a breakout session participant.
Something that funder policies on research data and open
access have resulted in is the dreaded notion of ‘compliance monitoring’. Institutions
spend a not inconsiderable amount of time monitoring compliance with, say, the
RCUK (UKRI) Policy on Open Access, attached to which are obligations
surrounding research data management. Typical questions include, how many
research publications which were the outcome of UKRI funding are available open
access, and of those how many reference their funding source(s) appropriately
and include a dataset statement?
Very negative views were expressed about the role of funder
compliance in research data management. There was also a rejection of automated
methods designed to chase researchers that might have been evading their data
deposit obligations. The importance of manual work and developing connections
with researchers was instead emphasised as a more productive way of improving
the compliance situation without spending unspecified amounts of time
attempting to monitor compliance or chase over-burdened researchers. To this
end few session participants were proactively making associative links between
research publications and research datasets, either because this was not
technically possible or because checking this sort of compliance was not
Grey literature: the
future of repositories
Improved accommodation of ‘grey’ materials is a significant
opportunity for open scholarship in a REF free world. Interesting discussions emerged
around the idea that repositories were, in general, too research focused to
the exclusion of scholarly teaching materials and other content types.
Moreover, the focus on REF meant that teams were less likely to advocate for
non-REF material deposition.
A key example cited during the session related to student
dissertations (e.g. MSc, BSc). Several institutions accepted the deposit of
such items in their repositories but most did not. Yet the argument was that
there were strong reasons for accepting this content. Much of this type of content,
if exposed via a repository, could be considered ‘citable work’, such is its quality. Such
content also provides exemplars to students of dissertations or posters,
thereby functioning as a learning and teaching tool too. There did not appear to be
agreement about whether deposit of dissertations was something all repositories
should be working towards. Issues around workload management/storage capacity
were cited as a potential impediment. In other words, that accepting student dissertations
was desirable but opened up the possibility of insurmountable workloads at a
time when teams were already stretched. This was also mentioned in relation to
datasets; that being proactive about encouraging dataset deposit was a
double-edged sword which invited the prospect of insurmountable deposit backlogs.
There was certainly agreement surrounding the importance of
grey literature for the future of repositories and that institutions should be
working towards its persistent identification via DOIs. The present authors
also floated the assignation of ISBNs as an alternative approach to ensure a
robust level of identification over time. Everyone appeared to agree that repositories
should be the vehicle for the long-term preservation/discovery of this ephemeral
material, especially in the case of in-house produced material.
From here the discussion segued into OA monographs, partly
via the recent announcement of PlanS, and the suggestion that the community
should be on the front foot by insisting on the deposit of book chapters and
entire books using a Scholarly Communications Licence (SCL) inspired approach. The rationale was that by allowing
book publishers time to consider how to respond the community was encouraging publishers to establish restrictive deposit policies, whereas if the community was
aggressive in their deposit behaviour publishers would be more likely to cede.
As we noted earlier, little consensus emerged on
many of the issues we discussed, and perhaps we were deluded to think that it
would. But perhaps that is in fact the key lesson of this particular breakout
session? That UKCORR and its members should be thinking seriously about the future
role of repositories, and certainly their role beyond the REF.
My thanks to the library
for funding my travel to attend the Open Access Scotland Group meeting on 17
September. This blog post is a brief overview of the issues that were discussed
on the day. It was an interesting mix of
topics which spanned from Summer Event round ups to researcher teaching tools
(ie – The Publishing Trap board game). As I am not as familiar with the scholarly
communications landscape as I would like to be I used this as a learning
opportunity. There were a few topics
discussed that I had little or no knowledge about and was using it as a
signpost for learning.
The meeting was well attended with around 30 attendees from
around Scotland. It started off with a
roundup of a number of meetings that had taken place over the summer. The Repo fringe,
the CASRAI reconnect, the REF OA compliance meeting, the Research
Data Funders requirements….. To name but
a few. The one I was most interested in
hearing about was the OA Compliance meeting where the topic of exceptions had
been discussed. However, the content of
these events was not discussed in great detail as many of the members in the
meeting had already attended. Thankfully
further details of these can be found on the ARMA website.
One of the meetings
that was discussed more fully was the Dataset Licensing Workshop, where the
attendees had discussed providing clear guidance on what licences can and can’t
be used for using other people’s data. An
area I’m not too familiar with but it will be interesting to see how this
progresses. More details of this can be found at this link.
There are plans to follow up the initial meeting with training sessions
including input from the legal profession.
This will all be updated on the web page above.
Theo Andrews of
Edinburgh University discussed Edinburgh’s stance on the UK Scholarly Communications
readiness for this and very briefly discussed Plan S,
posing the question ‘does this makes the UKSCL redundant?’. Edinburgh, and others, think not. They believe it to be a stepping stone
towards Plan S.
Valerie McCutcheon took
to the floor to discuss OA funder compliance.
The example she gave us was a list of non-compliant items that had been
sent to her from Cancer Research. The list
had been taken from Researchfish
and there was a general consensus that this was less than ideal as it is not as
accurate as we would like it to be. VM
has since been in discussions with Cancer Research and has invited them to join
our discussion on improving reporting.
It was also discussed that we need to find a way to make the
interoperability of institutional system-to-Researchfish better.
Elinor Tolland headed a
discussion about REF Exception workflows.
She discussed the workflow used by Glasgow Caledonian University then
opened for floor to others to discuss the varying ways that institutions handled
this. Very interesting discussion which
led to me thinking ‘Should exceptions be signed off and if so by whom? Who should be responsible for them, us, REF
team or the HOD/HOR?
The Publishing Trap is
a game produced by The Copyright Literacy Organisation. This
was a topic from the Repository Fringe, and George Bray from RGU gave us a
brief summary of the game and brought along a copy for attendees to play over
the lunch break.
After lunch was a small
introduction from Valerie about an upcoming workshop discussing Electronic Lab
Books and how we can support researchers that use them.
The ‘Burning Concerns and
Questions’ section followed with discussions covering topics such as:
Plan S (links were given to Danny Kingsley’s
Relax…. Blog post – saying it’s a
great starting point for discussions around Plan S. It was noted that COAR had a written position
on Plan S and it was suggested that OA Scotland may want to provide one
too. This will be discussed further.
VM wanted to mention that the REF Citation
Project was forthcoming…..
There was discussion about publisher delays in
making paid items Gold OA. Some
attendees had been experiencing a problem whereby APCs had been paid for CC-BY,
the publisher had put the item on their web pages as closed, then changing them
to CC-BY once the hardcopy had been published – thus taking payment for the
x-amount of months it had been available online in addition to the APC. No resolution from this discussion, but a
suggestion of asking the authors to pester the publishers on our behalf was put
Clarifying document types. For example, some
journals have letters which are actually articles, some reviews are regarded as
articles, some commentaries are regarded as articles – the OAS would like a future
discussion to clarify what/which of these are ACTUALLY in scope for REF.
Holistic Budgets. Here followed a discussion about who holds
the purse, are the APCs paid by the same department who pay the journal
subscriptions? It was interesting to
see the different models around Scotland.
Our own Pablo de Castro
took on the topic of OA policy workflows.
Discussing the need for better/smoother bibliographic integration from
one institutional system to another, when an academic moves to a different
university. He suggested that if we were
given a list of authors who were leaving, we could ensure the bibliographic
information was in good order for the next institution to use. He also suggested a directory of contacts to
make communications between OA departments in the UK smoother.
Our last talk was from
Theo Andrews of Edinburgh. He was
reviewing OA payments to Hybrid Journals and questioning why they still had the
monopoly over the APC payments. He gave
examples from Edinburgh, but most attendees agreed their figures were similarly
skewed, and there were not enough offsetting deals available.
Theo posed the
question: are there others out there who would give up paying the hybrid
journals and only use the funds for pure gold items? There were a mixed bag of answers here, but
again it came round to discussing Plan S, which states there is to be no hybrid
payments unless there is evidence of offsetting.
So in conclusion, it was a thought provoking day for me, as
it did pose more questions than answers.
It seems there’s a lot of things in the pipeline, and it is encouraging
to see that OAS appear to be stakeholders in a lot of these discussions. It’ll be interesting to see how things
Teams working to support Open Access publishing (in its various permutations) and those working on Research Data Management tend to work closely and are often based within the exact same team structure. Yet, integrated workflows between both areas can be difficult to deliver and can be unintentionally siloed.
The ROADS* team here at Strathclyde is no different - and it is something we have wanted to improve. OA and RDM are, after all, inextricably linked. It is just that both can be complicated, demonstrate numerous pathways with few ideal types, and, to make things even more complicated, a lack of workflow understanding from academics themselves is frequently demonstrated. This lack of academic understanding it often what derails workflow integration. For example, academics are often unaware of the linkage between OA and RDM and perform isolated actions independentally of our team.
Colleagues Pablo de Castro and Alan Morrison [with some minor contributions from myself] have therefore formulated this workflow diagram. The diagram performs several functions:
for members of our ROADS team, highlighting critical linkage points and academics’ actions; but, perhaps more importantly…
A visual tool for researchers / academics to assist them in understanding the OA and RDM timeline, critical actions and how both OA and RDM are actually interconnected.
The old adage, “A picture is worth a thousand words”, applies in #2, because our experience is that academic staff need high levels of summarisation. They are busy people who simply want to do the right thing and do it quickly. Words haven’t been eliminated entirely from the diagram, but at least they have been greatly reduced and the processes themselves have been visualised rather than explained in a document.
Pablo and Alan have been using the diagram in anger and early feedback suggests academics are finding it very useful because it maps out their most important milestones and actions. As further feedback is collated the diagram will no doubt be modified and improved. Anyway, we are sharing here under a CC-BY licence if anyone else would like to reuse or repurpose.
Debbie Prior, Institutional Repository Support Assistant, University of Strathclyde
It’s time for another look at our now quarterly review of
Strathprints’ usage. Consumption continues to rise, with an increase in COUNTER
downloads of 53% on the same reporting period in 2017.
Taking the top spot this time around is a significant review
and analysis of Scotland’s
digital health skills from Strathclyde’s Digital Health and Care Institute.
As George and Pablo have previously
covered, issues of discoverability arise when grey literature is deposited
without meeting acceptable standards of open publishing. This is something we
are looking to rectify and the Digital Health and Care Institute’s publications
are somewhat of a test bed in this regard. With that in mind, it is very
gratifying to find the first report we have minted a DOI for riding high in the
download chart. This report also enjoyed significant presence on social media,
with an Altmetric score of 48 derived through Twitter shares. All too often
research outputs receive traction in the press and on social media without
link-backs to the original research and, even more so, to OA versions. This is
something that needs to change if we want to maximise discoverability and
New Entries, Movers and Shakers. Top 10 downloaded outputs
in April-June for outputs deposited in the 9 months up to 30/06/2018.
Ten Most Downloaded Outputs 01 April 2018-30 Jun 2018
Gobinda Chowdhury continues to come out top of the pops as
ever, with a re-entry for Irene Stevens’ and Pat Cox’s 2008 British Journal of Social Work article
on child protection, last seen in our Top 20 rundown of 2017 at number 12. That
Top 10 in full:
This blog has recently focussed on the issue of publishing “best practice” – and there will be some updates on this strand of work soon.
Suffice to say that the mention of “best practice” encompasses the important
concept of persistent identification and, within scholarly communications, this
identification tends to be by DOI.
But what about the scholarly publications cited within a
publication? What is the point of ensuring persistent identification and access
to your work if the sources cited within do not share the same degree of persistence?!
These are important questions that have caused much gnashing
of teeth within scholarly circles, especially within open science, repository,
library and digital scholarship communities where worry about maintaining the digital
scholarly record and the effects of “link rot” (or “reference rot”) dominate
A while back I blogged about Memento and problems
surrounding maintenance of the digital scholarly record. Memento remains an innovative
and important tool in combating link rot and enabling verification of the
scholarly record. But another part of solving the persistence problem is to promote
persistence at the source. That is, to encourage scholars to cite transient web
sources using a persistent identifier that points to an archived version of the
source they are citing. This approach is similar to Memento insofar as an
archive is consulted; however, the difference is that the creation of the archive
is instigated by the scholar rather than relying on a “memento” captured at an
unspecified point in the past, if indeed a memento is available.
Enabling stable citations to web sources is something that Perma.cc
provides. An outcome of a project based at the Harvard Library Innovation Lab ,
Perma.cc enables scholars to generate permanent archives of their cited web
sources and identify them using a persistent identifier. The good news is that…
Perma.cc is simple, free to use, and is built and supported
By way of example, here is a Perma.cc record for a guest
blog post I wrote for the CORE blog. Blogs, incidentally, are a prime candidate
for reference rot owing to the fact that they are increasingly cited but are often
mounted on unstable infrastructure or are subject to author abandonment.
https://perma.cc/JK9C-S4VN – Implementing the CORE Recommender in Strathprints: a
“whitehat” improvement to promote user interaction
This citable and persistent URI provided by Perma.cc resolves to the Perma.cc record, which includes a fully archived capture of my blog post, as captured on 11/06/2018 at 11:51am (see screen snippet below). Additional user options are available, such as viewing the Perma.cc record metadata or visiting the live page. The important thing, however, is that a fully citable, persistent archive of my blog post has been created. The digital scholarly record has been maintained.
Perma.cc is available to anyone with a free registered
account. Up to 10 records per month can be preserved. But the critical thing to
note is that:
Perma.cc also offers unlimited free accounts to academic
journals and faculty members affiliated with any registrar library. Courts and
other government organizations also can qualify for unlimited free accounts.
In other words, Perma.cc is seeking the help of not just scholars,
but other interested stakeholders, to advocate for its use in order to better
maintain the digital scholarly record. And much advocating is what needs to be done here, especially from open science quarters where the creation and citing of open knowledge is key.
Restore equilibrium in digital scholarship. Use Perma.cc for citing web pages!
Strathprints joins USA-based Repository Analytics & Metrics Portal (RAMP) in UK first
George Macgregor, Repository Coordinator, University of
Better understanding repository usage and impact is an
important part of operating repositories. Such analytics enable standards-based
measurements to be taken and intelligence about repository usage to steer
development objectives and better meet user requirements. In some circumstances
it can also facilitate benchmarking with other repositories and institutions.
We use several flavours of repository analytics as the basis of our intelligence
RAMP is a USA-based project spearheaded by Montana State
University, the Association of Research Libraries, the University of New
Mexico, and OCLC Research that seeks to examine the issues arising from reporting
on digital repository usage. RAMP itself is a prototype web
service that improves the accuracy of repository analytics and introduces the concept
of “citable clicks”. Repository managers also benefit from a number of
on-screen reports (see screens). The RAMP team have published a couple of
papers about their approach which are worth dipping into in order to learn
Our membership of RAMP is a UK first and it is a pleasure to be
working with the RAMP team. Thanks are extended to the RAMP team but also
IRUS-UK, with which RAMP are collaborating. We look forward to analysing RAMP intelligence
on repository usage soon!
A number of research outputs by
Strathclyde researchers ranging from the very valuable to the extraordinary are typically
being released without meeting the basic standards of open publishing. This is
mainly because these are unusual types of publications – i.e. not the standard
journal articles, conference proceedings, books or book chapters, but rather
reports or policy papers, which automatically fall under the category of ‘grey
literature’. Scholarly publishers face a great deal of often justified
criticism from Open Science advocates, but this specific area is one where they
definitely prove to be reliable collaborators for academics. Very few
publishers these days will not make emphasis on aspects like using the
appropriate open licence, minting the appropriate persistent identifiers (DOIs
or ISBNs) or ensuring that the impact of their publications on social media is
adequately tracked. As opposite to this, and even in cases where they are made
openly available, ‘internally’ published reports or books are to a certain extent
condemned to ‘digital obscurity’ if they fail to keep these aspects in mind
upon their online release.
For anyone aware of how relevant these
publications may be for the institution and the wider society, this is very painful to watch. It also points at
certain lack of digital science skills among researchers – an issue that
furthermore tends to arise in very specific fields.
As with all other aspects that
relate to a given scholarly culture, changing this – even gradually –
represents quite a mountain to climb. Training activities on social media competence
as a critical scholarly communication skill for researchers may be designed
and advertised, but it will usually prove difficult to attract very busy
researchers to this kind of ‘soft’ skills training. While the HEFCE policy of
requiring an early deposit of every institutional research output in Pure may
help identifying candidates for providing the adequate support, raising awareness
of these basic requirements among the key authors and departments is extremely
To climb a mountain one starts with the first
step though, and it’s rewarding to see that in certain occasions the internal
publishing workflows do happen to meet the appropriate standards. This has
recently been the case for a
report arising from the field of Speech and Language Therapy. The lead
author of this publication got in contact with the Library while the final
version was still being written. The purpose was to ask about Creative Commons
licences that might suit the publication.
report was subsequently released under a CC-BY licence. Moreover, a DOI or Digital Object identifier was
also minted for the publication and shared with the researcher so that it would
feature on the document cover for citation purposes. The full-text final
version of the report was published in the Strathprints repository, which
allows to track its centralised usage both via the number of downloads and its
Altmetric score for impact on social media. It was not a surprise to find out
that just four days after its online
release, the figures for the social media impact of this humble user manual on
the use of ultrasound to treat speech disorders were far higher than any other
research work that their authors have ever published. The lead author
herself declared to be delighted about this.
is not to say that there is just one way of meeting these basic requirements
for an effective open institutional publishing effort: posting the report or
the book in an internal departmental webpage with a google analytics tracking
mechanism might also allow to closely follow its usage figures. It is hardly a
good practice however, as it fails to benefit from the far more advanced – and
rather expensive – tools for tracking research impact that Strathclyde Uni has
at its disposal.
Improving the publication of significant grey literature…
George Macgregor, Repository Coordinator, University of Strathclyde
Colleague, Pablo de Castro, recently blogged about best
practice in the publication of “in house” grey literature. Publications falling
into this category typically include reports, technical papers and policy
papers. Pablo was highlighting the issues that can arise when such outputs are simply
thrown up on web pages, or published in a manner which excludes them from
plugging into important open access or open science infrastructure. It is
something we see a lot of and it is disappointing to observe. Said Pablo:
For anyone aware of how relevant these publications may be
for the institution and the wider society, this is very painful to
watch. It also points at certain lack of digital science skills among
But there is good news. This brief update is to report that, since
Pablo published this blog post, we have been working with some “incubator”
academic teams to tighten up the publication of significant grey outputs. The
International Public Policy Institute (IPPI), Speech & Language Therapy and
the Digital Health & Care Institute all produce a great many “significant”
grey outputs, many attracting high levels of repository usage. Moreover, these
outputs often attract attention outside traditional scholarly discourse and
generate societal impact. It is therefore essential to ensure their publication
observes best practice.
By adhering to best publication practice we can already observe improvements to the alternative impact these reports are producing (e.g. see screens below) and, over time, one can imagine that the persistent identification of these grey outputs will enable their use, re-use, and citation over many years to come.