Module 7 Handbook
Site: | CABI Academy |
Course: | Data Sharing Toolkit Learning Materials |
Book: | Module 7 Handbook |
Printed by: | Guest user |
Date: | Sunday, 29 September 2024, 12:18 PM |
Introduction
This handbook is designed to help you to answer the Module 7 activity questions.
Sustainable access to data is critical to ensure that data remains FAIR and safeguarded over the long term.
This module will enable you to:
- plan for sustainable access to data
- recognise aspects that help ensure sustainable access to data
- choose the right solution
- understand the role of a data management plan
Why and when?
Why
You need sustainable access to data:
- so existing research can be validated and build upon
- to support digitally-enabled services that rely on sustainable access to relevant data
- provided in a way that enables integration, analysis and use
- to reduce risk and improve decisions
- to support data ecosystems (it gives organisations the confidence to invest in using data to develop new products, services and research)
When
You need to plan for sustainable access to data from the very beginning of the grant proposal.
As part of this, you will need to consult the people and organisations impacted by access to the data.
Ensuring sustainable access to data will incur a cost and you should budget for this at the start of the project.
Aspects that help ensure sustainable access to data
These are three key factors that will help you ensure sustainable access to data:
Policies and processes
Policies
Many research organisations and publishers have policies to ensure research is reproducible. For example, the BMGF has its own Open Access Policy which requires data underlying the published research results be immediately accessible and open.
You will find that many governments also have policies that require the publication and sharing of data that is essential in delivering societal, environmental and/or economic benefits. Such public sector data often includes
- geographic
- environmental
- demographic
- and financial data
Processes
You should ensure processes, including Data Management Plans, cover requirements for longer term access to data.
People
You need to have a clear approach to data stewardship. This means:
- assigning roles to people so it is clear who decides about, and invests in, datasets, so they are sustainable
- ensuring roles are suitably resourced with appropriate time committed to managing and sustaining data
Technology
You should use technology to curate and control access to data. This will support the people, policies and processes.
Some of the methods you may wish to examine are:
- open access platforms
- non-open data platforms allowing access over time to appropriate users
You can find guidance on preparing data for publication and a non-exhaustive list of repositories approved by the Foundation on the Gates Open Research platform.
Storing and managing data
You will find it helpful to ask yourself the following questions when choosing a data storage and management solution:
- What are the legal and contractual obligations?
- Who are the data users and what are their goals?
- What is the scope of the data reuse?
- Is the chosen solution suitable for the capability of both current and potential users?
- How will sustainable access be funded?
You can see more on each next.
Legal and contractual obligations
Do you need to retain the data legally?
You should check if the relevant governments require certain commitments or action.For example the Ethiopian Government requires that soil and agronomy data is maintained in a database system by the Ministry of Agriculture in order to ensure that data will be reliably accessible to the research community in the future.
Do policies and mandates require you publish the data? Under what conditions?
The BMGF open access policy requires that data underlying the published research results be immediately accessible and open.How long do you need to make the data available for?
Does it need to be removed after a certain period of time (for example for privacy reasons)?Data users and their goals
Your users may specify needs to access or use data over time, or may have strategic goals related to access provisions. These users may be:
- partners
- funders
- customers
- users of a service
You may find the data ecosystem mapping tool helpful to identify existing and potential re-users of data and define requirements for sustainability.
Scope of data reuse
You can ask these questions to establish the scope of data reuse:
- Is the data purely collected and needing to be available to back up research?
- Is the data suitable for use in digitally enabled services, requiring ongoing updates and maintenance to ensure relevance?
When choosing a repository, the desirable properties are that it enables:
- access to the dataset
- dataset persistence
- dataset stability
- searching and retrieval of datasets
You can find guidance on preparing data for publication and a non-exhaustive list of repositories approved by the Foundation on the Gates Open Research platform.
Capability of both current and potential users
You will need to consider the capability of others in your ecosystem.
In addition to making data FAIR, consider if you should:
- document your data - describe attributes, features and limitations
- showcase existing ways to use the data
- become an active member of the community, allowing others to reach out to you for help or to make suggestions
As a data publisher you can use self assessment tools to show that you are following best practices in enabling data re-users to use data with confidence.
Funding sustainable access
If there is an expectation that the current data holders provided sustained access, you will need to ask how this will be funded.
As an alternative you could consider depositing data with a third party. In this case you need to question their approach to sustainable access to data, including:
- funding
- conflicting interests
- commercial interest
For example an agreement between the research repository FigShare and the LOCKSS (Lots Of Copies Keeps Stuff Safe) alliance means that if a “trigger event” is caused on FigShare (e.g. they cease to exist or a commercial interest changes access permissions) then the controlled copy in the LOCKSS archive can be released.
Types of data repositories
You will find that each type data repository fulfills a different need and set of requirements.
Research data repositories
There are two types of research data repository, both of which broadly allow anyone to contribute data:
1: Discipline-specific
Examples include:
2: Interdisciplinary
Examples include:
Many are backed by large communities of funders as well as academic journals.
Government data platforms
These provide a platform through which any government or country specific data can be accessed.
Governments often have clauses which require anyone working with specific types of data, be it as a government department or third party organisation, to make data available via the official government platform.
Examples include
Curated data repositories
- Provide sustainable access to carefully managed and curated datasets
- Offer data services (such as direct access to the data via an API rather than just file downloads)
- Sustainable through country memberships and donor contributions
- Provides most flexibility for specific data services
If considering curated data repositories, you should be aware that establishing and sustaining a curated data repository is challenging and many are not able to be sustained long term.
Examples include:
Code and data platforms
These are hybrid platforms somewhere between the research data repository and the curated data repository. They can offer you management features such as:
- version control
- ingest pipelines
- validation services
One of the most popular platforms for both code and data is GitHub.
Using multiple data repositories and platforms
You could consider choosing a number of solutions to give you multiple benefits. This approach means:
- data can be replicated and transformed into multiple versions simultaneously
- you can provide for more people if the legally-required solution is not appropriate
- you must plan early where to deposit your data
Data management plans
A data management plan will help you:
- outline how data is handled during a project
- outline how data is handled on completion of a project
- consider data management before project commencement
- ensure data is safeguarded and widely shared
You should involve all stakeholders, including financial donors in creating a data management plan. It is a collaborative, iterative, process.
You can reocognise a good data management plan as it will feature:
- a data inventory that lists the data and identifies any third party rights in the data
- a list of platforms and agreements that support the sustainability of the data (e.g. in government data platforms)
- a set of roles and responsibilities related to the continued management of data
- a clear and realistic budget
- any training or capacity development needs
You can use the Developing a data management plan checklist to put in place an effective data management.
Tools and guides
You can use the following tools and guides to support data sharing:
- Cheat sheet: Ensuring sustainable access to data
- Guide: Ensuring sustainable access to data
- Guide: Developing a data management plan
- Checklist: Developing a data management plan
- Checklist: How to create a data inventory
- Gates Open Research Data guidelines
- Data ecosystem mapping tool
Summary
You can find all the key points from this Module in the Cheat Sheet: Ensuring sustainable access to data
Don't forget to complete Module 7 activity questions to review your knowledge of this topic.