
Below you will find information and guidance to support you with:
Preserving research data is essential to ensure long-term access, prevent loss from technology obsolescence, and to fulfil compliance requirements from funders and institutions. Preservation of research data in a trusted digital repository, such as ORDO, is also a first step in being able to share that data.
Data sharing is at the heart of research data management. It is one of the key reasons why researchers are being encouraged to manage research data; so that the data may be shared with others, whether the general public, other researchers in their field, or even themselves at a future date.
The coolest thing to do with your data will be thought of by someone else.
Rufus Pollock
Cambridge University and Open Knowledge Foundation
It is now widely accepted that sharing research data can benefit both the public and the research community. Whilst it's true that most researchers think about sharing their data because their funders require it, there are other good reasons to share, not all of them altruistic:
When you share research data, it is important that those data are documented and made available in a way that enables others to find, understand and re-use them. The FAIR data principles offer a framework for ensuring that data have long-term value. You should keep the FAIR principles in mind throughout your research, but they’re especially important when preparing to share data.
Back to top
Decisions about preserving data should begin during the data management planning stage, and should consider institutional, funder and repository requirements.
Most funders expect data underlying published papers, and other data of long-term value, to be preserved for 10 to 30 years and made available if possible.
For projects with extremely large datasets, retaining full datasets may not be possible. In these instances, it may be appropriate to dispose of datasets when no longer required, but to retain data samples, along with detailed methodologies or code to allow the data to be recreated.
The Digital Curation Centre provides comprehensive advice on deciding what data to keep.
Files that are not required for long-term preservation should be deleted when they have fulfilled their purpose. Researchers have a legal responsibility for collected data, and personal data should be disposed of in an appropriate manner. This will typically be set out during the Data Protection Impact Assessment that occurs prior to research being undertaken. See the Open University Data Protection intranet site (log-in required) for more information.
Further guidance on the retention and destruction of data can be found in The Open University’s Retention Schedule (log-in required).
Back to top
Once you’ve selected data for preservation, some preparation is required before it is ready to deposit into a data repository:
See the Ethics and Data Protection section of our website for further information. This must be considered during the data management planning process, as it is often difficult or impossible to re-contact participants to gain consent retrospectively. If you’re unsure if you have adequate consent in place, please contact us.
You must clearly identify rights-holders, because your authorisation to archive the dataset depends on their permission. Remember that by sharing your data via an archive you are also distributing them (unless they are kept under permanent embargo), and doing this without the authorisation of the rights-holder(s) will be a breach of copyright law.
If you are using third-party materials (i.e. materials where the copyright is owned by someone else), you will need to check the license or t's & c's associated with that data to find out how you are able to use it, and how the rights owner should be attributed. If you need help understanding a license or terms, let us know. If you are unable to include the third-party materials, you could consider linking out to them instead.
Research participants can also hold copyright in research data, particularly when they produce original, expressive, or creative content during the research process, such as in interviews, written responses, or creative arts projects. For this reason, it is essential that participant consent forms and information sheets clearly outline the intended uses of data, and specify any rights being waived or granted for research, sharing, or publication purposes.
Choose formats for preservation:
Plan how you want to organise your data within your chosen repository. For example on ORDO, it is possible to add multiple files to an individual item record, to upload folders of files which maintain hierarchical structures, and to group multiple items into a Collection.
An essential part of data preservation and sharing is including sufficient accompanying metadata and documentation. For guidance on how to do this see the Describing Your Data (Metadata) page of our website.
Back to top
Research funders often require grant recipients to use their own funder specific repositories, whilst publishers may ask submitting authors to use a specific data service. Researchers may want to use subject specific repositories, as this can guarantee their research outputs are visible to a particular research community.
There is no requirement to deposit research data in the OU’s data repository (ORDO). However, if you choose to use an external repository you are required to publish a metadata record in ORDO which details how the data can be accessed. Instructions on how to do this can be found on our ORDO guidance webpage.
You can share your data in the University's own research data repository, Open Research Data Online (ORDO). Guidance and policies for using ORDO are available on the ORDO page of this site.
Sometimes, external repositories are an appropriate choice, particularly where they provide a discipline-specific service. External data repositories will often provide specialist assistance with the preparation of your data for deposit.
re3data is a global registry of research data repositories from various academic disciplines, which allows you to find an appropriate subject- or discipline-specific repository for your research data.
The Digital Curation Centre has published a checklist for evaluating data repositories which will help you to evaluate the quality of services provided by data repositories.
Here are some of the main funder repositories:
There are a number of general-purpose repositories which enable data sharing. This is a list of the most popular ones.
Sometimes these repositories are widely used in particular academic disciplines, so their use may improve discoverability of data. However, these repositories do not offer the same level of user support or data curation as institutional, funder or discipline-specific repositories, so are seldom the best choice for data sharing.
The Open University Research Data Community on Zenodo is a curated community for datasets and software originating from research undertaken at The Open University (UK) by researchers affiliated to the University. The Open University expects its researchers to comply with the institutional Research Data Management Policy. The Open University Research Data Community is managed and curated by The Open University Library Research Support team. All submissions will be reviewed before acceptance, and revisions to metadata and files may be required prior to acceptance.
Back to top
Although the OU’s Research Data Management policy, and most funder and publisher policies expect data to be made openly available wherever possible, there may be legal, ethical, or contractual reasons to withhold access to research data. However, it may still be possible to publish and share your data either through the use of consent and anonymisation, or by restricting access to data, through a data repository.
If you restrict access to data, your publications should still include a data access statement providing the reasons why the data are not openly available and, if applicable, the conditions that must be met for access to be granted.
There are several access levels for data sharing, which can be achieved securely through a data repository. All of the following options are available on ORDO:
In rare situations it might be necessary to restrict access to both the data and to the metadata describing the data. If you think this might apply to your research, please contact the Library Research Support team for advice.
Back to top
When sharing your data, clear guidance on what re-users can and can’t do with your data helps disentangle some of the complexities and ambiguities surrounding rights. Licensing your data is a good way of clarifying the terms of use.
It is important to consider how you want your data to be reused. You can then apply a licence that most closely reflects those intended uses. Applying an explicit licence removes any ambiguity over what users can and cannot do with your data.
Lawyers can craft licences to meet specific criteria, but there are a number of open licences developed for widespread use that anyone can apply. There are many advantages to using standard licences rather than bespoke ones; as well as the benefits of enhanced organisational efficiency and cost saving, the use of standard licensing terms can lead to greater interoperability of data and increased user awareness of the licence terms, thereby enabling better compliance. If you are developing software, see our guidance below on Preserving and Sharing Research Software for advice on choosing an appropriate license.
Standard open licences include:
The most commonly used licenses are Creative Commons licenses, and there are several licence variations which determine how the material can be shared and reused. As a general rule, we recommend you use the Creative Commons Attribution (CC BY) licence for licensing research data, which means anyone can adapt, reuse, redistribute your data, and credit you as the author. More restrictive licences should only be used if there is a justification for doing so, for example, to protect commercial or other confidential interests. You should be particularly wary of the No Derivatives (ND) licenses- these prohibit others from adapting, remixing, transforming, translating, or updating your work in any way that makes a derivative. In short, people are not allowed to create adaptations, so this could prevent someone from recombining and reusing your data for new research, so is bad for innovation. See the Understanding Creative Commons Licences section of our website for further detail on licence variations.
Back to top
Linking research data to publications and other outputs is a critical practice for enhancing research transparency and complying with funder and publisher mandates. If you have taken the time and effort to deposit and share your data in a repository, but you don’t link it your outputs, it’s likely your efforts will be wasted as nobody will be able to find it.
The primary method of linking data to outputs is to deposit the data in a trusted digital repository (such as ORDO), obtain a persistent identifier (such as a DOI), and cite this in the publication via a Data Access Statement. However, it’s important to also remember to reference or link out to any research publications/outputs from the dataset record in the repository too, so that a reciprocal connection exists between data and outputs. On ORDO, you can make use of the “Related materials” section to link to any papers, publications, or other outputs that your output informs, is informed by, or is related to.
Data Access Statements (DAS), sometimes called “Data Availability Statements” are used in publications to describe where supporting research data (including code, images, sounds, textual records, objects, etc.) can be found and under what conditions they can be accessed.
UKRI and some other funders and publishers require papers to include a Data Access Statement, even where there are no data associated with the publication, or the data are inaccessible.
Your journal’s or publisher’s guidance for authors should indicate the format and placement of a Data Access Statement. If no ‘Data access’ or ‘Data availability’ section is specified, we suggest placing your statement in the ‘Acknowledgements’ section.
We also recommend that you include a full citation to the data in your main references list.
Data access statements should typically include the following information (and examples are provided in the next section, below):
Please note, a direction to contact the author for access to data would not normally be considered acceptable by most funders and publishers. If email contact is required to request access data, consider setting up a shared email address for your research group or use an existing departmental address.
The following are examples of data access statements covering a variety of scenarios.
Back to top
Research software is a fundamental part of research. As with other research outputs, there are good reasons for preserving software and making it available for future use. The benefits of preserving and sharing research software include:
Many funding bodies and journal publishers now expect software that supports research findings to be made publicly available upon publication.
A source code repository is an online service that supports version control and other software development tools such as file hosting, bug tracking and issue tracking. Popular code repositories include GitHub, BitBucket and GitLab. Hosted services are especially useful for working with collaborators across more than one institution.
Software documentation provides essential contextual information which describes the software and tells others how the software was created, how it operates and how it can be used. Software documentation can be embedded in the source code or included in written texts that accompany the software. As a minimum, you should include a README file with your software.
A README file typically contains information such as:
GitHub will automatically display the content of a file named README on your repository's top-level page. Consider using Markdown syntax for your README file, since GitHub will format this in a manner that makes it more readable and accessible.
Every GitHub repository includes a wiki which can also be used to add descriptive information about your software. For help creating a README file and a wiki for software projects in GitHub see Documenting your projects on GitHub.
Code hosting services such as GitHub, BitBucket or GitLab are not suitable for archiving software. If you have software that supports published findings, is needed to reproduce or replicate research results, or has potential for use in future research, we recommend archiving a copy with a data repository such as ORDO (Open Research Data Online - the OU's research data repository). As well as ensuring long-term preservation and access to the software, ORDO assigns DOIs (Digital Object Identifiers) to software outputs, making it easier to cite and track specific versions of your software.
ORDO has an integration to support the automated deposit of software from GitHub. For further information see How to connect ORDO with your GitHub account.
For additional support from the Software Sustainability Institute see Software deposit: Where to deposit software.
A software licence is an agreement between a developer and user which provides a set of conditions for the use and distribution of software. An open source licence alllows users to use, copy, modify and redistribute the software providing that credit is given to the author. Because open source licences place minimal restrictions on how others can use software outputs, they are an important tool for making software accessible and resusable.
When you deposit your software in a respository you will typically be given the option of choosing a licence. Some popular examples of open source licences for software are Apache 2.0, GNU General Public License (GPL) and MIT. These are all available to choose from when uploading software to ORDO.
You should refer to the Open Source Software licenss (pdf) guidance from the Open University RES Enterprise and Commercialisation team. A comprehensive list of open source licences with detailed information for each licence is available from the Open Source Initiative's web page: Licenses and Standards.
If you are planning to modify and/or redistribute someone else's code, check to make sure that this is compatible with the permissions granted under the original licence.
Open source licences are unlikely to be appropriate for software that are subject to contractual restrictions set by third parties or where the intellectual property rights (IP) for the software are owned by the Open University and have potential for commercialisation. The Open University's Research Intellectual Property Policy and Commercialisation Handbook (which contains further guidance on IP, knowledge exchange, material transfer agreements and non-disclosure agreements) is available from the Research, Enterprise and Scholarship intranet pages. For further help and advice on Intellectual Property, please contact the RES Enterprise team.
Research software should be given the same importance as any other research output and considered citable in the same way as an article or monograph.
If you are making your software publicly available via ORDO or another research data repository, you can encourage others to cite your software by including a recommended citation in your README file or by including a CITATION file. If the software supports published findings, you should also include a recommended citation to the published paper. This can be added to the data access statement.
If you have used somone else's software in your research project you should include a citation in your published paper. This could be in the acknowledgements, footnotes, methods section or appendices, depending on publisher requirements. It is not necessary to ackowledge all the software used. The Software Sustainability Institute recommend citation only if the software played a critical role in your research and made a novel contribution.
The Australian National Data Service (ANDS) recommend using these core citation elements:
Author, Publication, Year, Title, Version, Publisher, Resource type (Software), Identifier (e.g. DOI)
Example:
Xu, C., & Christoffersen, B. (2017). The Functionally-Assembled Terrestrial Ecosystem Simulator Version 1. Los Alamos National Laboratory (LANL), Los Alamos, NM (United States). [Software]. https://doi.org/10.11578/dc.20171025.1962
Some publishers and software providers have also published guidelines for software citation. Examples can be found on the Software Sustainability Institute's webpage How to cite software.
In addition to making your software publicly available you might also consider publishing a paper describing your software. Journals which specialise on publishing peer reviewed software papers include:
For a more detailed list of journals whichb accept software papers, see the Software Sustainability Institute's web guide In which journals should I publish sofware?
Research software is now being understood as a type of digital object to which the FAIR Guiding Principles (Findable, Accessible, Interoperable and Reusable) should be applied. The reflects the maturation of the research community to better understand the crucial role of FAIR research software in maximising research value. The FAIR for Research Software (FAIR4RS) Working Group has adapted the FAIR Guiding Principles to create the FAIR Principles for Research Software (FAIR4RS Principles), published in 2022, which sets out the principles and provides examples of how they cab applied to different types of research software. By following the guidance above, you will already be going some way in making your research software FAIR.
The Software Sustainability Institute website provides comprehensive guidance and links to resources, See in particular their Guides for Researchers and Guides for Developers.
Adapted with permission from Imperial College London's guidance on Making research software open and shareable.
Back to top
FAIR stands for findable, accessible, interoperable, reusable. The aim is that open data 'can be freely used, modified, and shared by anyone for any purpose', The Open Definition.
The FAIR principles were originally published in a 2016 article in Scientific Data. Since then, the FAIR Principles have achieved widespread acceptance, and have been adopted as standards for management of data, development of infrastructure and delivery of services.
This means other scholars and computers can find your data due to strong metadata and persistent identifiers (DOIs).
What does this mean?
Your data are described with metadata; your data and metadata have a global unique persistent identifier; your data and metadata are registered or indexed in a searchable resource, i.e. a research data repository.
What to do:
Your data and/or metadata also needs to be accessed by both humans and computers.
What does this mean?
Your data are available and downloadable from a reputable repository.
What to do:
As the aim is to speed up discover and uncover new insights, research data should be easily combined with other datasets, applications and workflows by humans and computer systems.
What does this mean?
This means your data exist in file formats that are not dependent on proprietary or obsolete software.
What to do:
Research data should be ready for future research and processing. It should be clear that findings can be replicated and the data/results can be used to build/develop new research questions.
What does this mean?
That there is proper documentation to support interpretation of the data. That there is a clear and accessible data usage licence.
What to do:
Back to top