Preserving and sharing data and software

Below you will find information and guidance to support you with:

Why preserve and share data

Preserving research data is essential to ensure long-term access, prevent loss from technology obsolescence, and to fulfil compliance requirements from funders and institutions. Preservation of research data in a trusted digital repository, such as ORDO, is also a first step in being able to share that data. 

Data sharing is at the heart of research data management. It is one of the key reasons why researchers are being encouraged to manage research data; so that the data may be shared with others, whether the general public, other researchers in their field, or even themselves at a future date.

The coolest thing to do with your data will be thought of by someone else.

Rufus Pollock
Cambridge University and Open Knowledge Foundation

It is now widely accepted that sharing research data can benefit both the public and the research community. Whilst it's true that most researchers think about sharing their data because their funders require it, there are other good reasons to share, not all of them altruistic:

  • Increase research impact - those who make use of your data and cite it in their own research will help to increase your impact within your field and beyond it. Users of your data may include those in other disciplines, sectors, and countries
    • There is evidence that studies that make their data openly available receive 9%-30% more citations than those who do not (See Piowar, H. and Vision, T.J. (2013) "Data reuse and the open data citation advantage", https://peerj.com/articles/175/
  • Research integrity - publishing your data and citing its location in published research papers can allow others to replicate, verify, validate, or correct your results, thereby improving the scientific record, and fosters a culture of transparency, accountability and collaboration
  • Your own future use - by preparing your data for sharing with others, you will benefit by being able to identify, retrieve, and understand the data yourself after you have lost familiarity with it, perhaps several years later
  • Teaching purposes - your data may be ideal for students to learn how to collect and analyse similar types of data themselves
  • Funding mandates - UK research councils are increasingly mandating data sharing so as to avoid duplication of effort and save costs. Check with your individual funder what their data sharing requirements are.
  • Publisher mandates - increasingly, publishers insist that underlying data be made openly available. Check with your publisher what their policy is.

When you share research data, it is important that those data are documented and made available in a way that enables others to find, understand and re-use them. The FAIR data principles offer a framework for ensuring that data have long-term value. You should keep the FAIR principles in mind throughout your research, but they’re especially important when preparing to share data. 

Back to top

 

Selecting data for preservation and sharing

Decisions about preserving data should begin during the data management planning stage, and should consider institutional, funder and repository requirements.

Most funders expect data underlying published papers, and other data of long-term value, to be preserved for 10 to 30 years and made available if possible.

For projects with extremely large datasets, retaining full datasets may not be possible. In these instances, it may be appropriate to dispose of datasets when no longer required, but to retain data samples, along with detailed methodologies or code to allow the data to be recreated.

The Digital Curation Centre provides comprehensive advice on deciding what data to keep.

Files that are not required for long-term preservation should be deleted when they have fulfilled their purpose. Researchers have a legal responsibility for collected data, and personal data should be disposed of in an appropriate manner. This will typically be set out during the Data Protection Impact Assessment that occurs prior to research being undertaken. See the Open University Data Protection intranet site (log-in required) for more information.

Further guidance on the retention and destruction of data can be found in The Open University’s Retention Schedule (log-in required). 

Back to top

 

Preparing data for preservation and sharing

Once you’ve selected data for preservation, some preparation is required before it is ready to deposit into a data repository:

Check you have gained (and recorded) valid ethical consent to preserve and/or share the data

See the Ethics and Data Protection section of our website for further information. This must be considered during the data management planning process, as it is often difficult or impossible to re-contact participants to gain consent retrospectively. If you’re unsure if you have adequate consent in place, please contact us.

Identify rights holders and obtain permissions if necessary

You must clearly identify rights-holders, because your authorisation to archive the dataset depends on their permission. Remember that by sharing your data via an archive you are also distributing them (unless they are kept under permanent embargo), and doing this without the authorisation of the rights-holder(s) will be a breach of copyright law. 

If you are using third-party materials (i.e. materials where the copyright is owned by someone else), you will need to check the license or t's & c's associated with that data to find out how you are able to use it, and how the rights owner should be attributed. If you need help understanding a license or terms, let us know. If you are unable to include the third-party materials, you could consider linking out to them instead.

Research participants can also hold copyright in research data, particularly when they produce original, expressive, or creative content during the research process, such as in interviews, written responses, or creative arts projects.  For this reason, it is essential that participant consent forms and information sheets clearly outline the intended uses of data, and specify any rights being waived or granted for research, sharing, or publication purposes.

Finalise your dataset

Choose formats for preservation:

  • Where possible, choose or convert data to non-proprietary or widely-used formats (no specialist software required to open) e.g .txt, .csv, .html.
  • If specialist software is required, include details of the software in the metadata, convert your files to an open-source format where possible, and archive both.
  • How rich do your data need to be? Use lossless (uncompressed) formats where size/space not an issue.
  • Find the best file format for preservation for your data type.

Plan how you want to organise your data within your chosen repository. For example on ORDO, it is possible to add multiple files to an individual item record, to upload folders of files which maintain hierarchical structures, and to group multiple items into a Collection

An essential part of data preservation and sharing is including sufficient accompanying metadata and documentation. For guidance on how to do this see the Describing Your Data (Metadata) page of our website. 

Back to top

 

Where to preserve and share data

Research funders often require grant recipients to use their own funder specific repositories, whilst publishers may ask submitting authors to use a specific data service. Researchers may want to use subject specific repositories, as this can guarantee their research outputs are visible to a particular research community.

There is no requirement to deposit research data in the OU’s data repository (ORDO). However, if you choose to use an external repository you are required to publish a metadata record in ORDO which details how the data can be accessed. Instructions on how to do this can be found on our ORDO guidance webpage.

Open University Institutional Repository (ORDO)

You can share your data in the University's own research data repository, Open Research Data Online (ORDO). Guidance and policies for using ORDO are available on the ORDO page of this site.

External Repositories

Sometimes, external repositories are an appropriate choice, particularly where they provide a discipline-specific service. External data repositories will often provide specialist assistance with the preparation of your data for deposit. 

re3data is a global registry of research data repositories from various academic disciplines, which allows you to find an appropriate subject- or discipline-specific repository for your research data.

The Digital Curation Centre has published a checklist for evaluating data repositories which will help you to evaluate the quality of services provided by data repositories. 

Funder repositories

Here are some of the main funder repositories:

  • UK Data Service ReShare – funded by ESRC but will accept any research data which fulfils their Collection Policy (pdf)
  • NERC Data Centres - NERC has a network of environmental data centres that provide a focal point for NERC's scientific data and information. These centres hold data from environmental scientists working in the UK and around the world. The data centres are responsible for maintaining environmental data and making them available to all users, not just NERC researchers but others from science, commerce, government and education as well as the general public
  • STFC Data Centres - several data centres, services and research portals are in place, such as the UK Solar System Data Centre and the Physical Sciences Data-Science Service.These are generally organised on a subject basis rather than serving the outputs of the whole council, and deposit is not mandated

General-purpose repositories

There are a number of general-purpose repositories which enable data sharing. This is a list of the most popular ones. 

Sometimes these repositories are widely used in particular academic disciplines, so their use may improve discoverability of data. However, these repositories do not offer the same level of user support or data curation as institutional, funder or discipline-specific repositories, so are seldom the best choice for data sharing.

  • Figshare – an online repository which allows researchers to upload and share their research data. It is free to upload unlimited public data. With in-browser visualisation and automatically assigned DOIs, this service is gaining in popularity with researchers from a range of academic disciplines. Note that ORDO is an institutional version of Figshare. Datasets uploaded to ORDO are indexed by Figshare search, so we recommend you always choose ORDO over Figshare.
  • Zenodo - this free online repository was developed by CERN as part of the EU OpenAIRE project and is aimed at the long-tail of science. There is a maximum threshold for upload of 50GB per file and you are able to include multiple files in one dataset or collection
    • The Open University Research Data Community on Zenodo
  • DryadDryad hosts research data underlying scientific and medical publications free of charge. Most data in the repository are associated with peer-reviewed journal articles, but data associated with non-peer reviewed publications from other reputable sources (such as dissertations) are also accepted
  • Open Science Foundation (OSF) - enables users to store data, code, and other materials in OSF Storage, or connect your Dropbox or other third-party account. Every file gets a unique, persistent URL for citing and sharing.

The Open University Research Data Community on Zenodo is a curated community for datasets and software originating from research undertaken at The Open University (UK) by researchers affiliated to the University. The Open University expects its researchers to comply with the institutional Research Data Management Policy. The Open University Research Data Community is managed and curated by The Open University Library Research Support team. All submissions will be reviewed before acceptance, and revisions to metadata and files may be required prior to acceptance.

Back to top

 

Restricted data sharing

Although the OU’s Research Data Management policy, and most funder and publisher policies expect data to be made openly available wherever possible, there may be legal, ethical, or contractual reasons to withhold access to research data. However, it may still be possible to publish and share your data either through the use of consent and anonymisation, or by restricting access to data, through a data repository. 

If you restrict access to data, your publications should still include a data access statement providing the reasons why the data are not openly available and, if applicable, the conditions that must be met for access to be granted. 

Levels of data access

There are several access levels for data sharing, which can be achieved securely through a data repository. All of the following options are available on ORDO:

  • Open access: make data openly available for download directly from a data repository
  • Partial publication: consider whether restrictions only apply to a subset of the data (such as raw data containing personal identifiers) thus allowing you to openly share the remaining data
  • Temporary embargo: delay access to data for a specified period to allow you to complete your analyses or publications, or while patents are filed and the research commercialised
  • Permanent embargo: preserve data on a repository, but under a permanent embargo so that only the depositor is able to access it
  • Restricted access: place your data under a permanent embargo, and enable third party researchers to request access to datasets. You decide whether, and how, the data will be accessed
  • Metadata only record:  publish a record of your dataset that includes a description of the data but not the data itself; this might be suitable if your data are physical (artefacts or tissue samples, for example), or if your data are highly sensitive but you need to publish a record of your data to meet policy requirements

In rare situations it might be necessary to restrict access to both the data and to the metadata describing the data. If you think this might apply to your research, please contact the Library Research Support team for advice.

Back to top

 

Licensing research data

Why should I licence my data?

When sharing your data, clear guidance on what re-users can and can’t do with your data helps disentangle some of the complexities and ambiguities surrounding rights. Licensing your data is a good way of clarifying the terms of use.

Which licence should I use?

It is important to consider how you want your data to be reused. You can then apply a licence that most closely reflects those intended uses. Applying an explicit licence removes any ambiguity over what users can and cannot do with your data.

Lawyers can craft licences to meet specific criteria, but there are a number of open licences developed for widespread use that anyone can apply. There are many advantages to using standard licences rather than bespoke ones; as well as the benefits of enhanced organisational efficiency and cost saving, the use of standard licensing terms can lead to greater interoperability of data and increased user awareness of the licence terms, thereby enabling better compliance. If you are developing software, see our guidance below on Preserving and Sharing Research Software for advice on choosing an appropriate license.

Standard open licences include:

The most commonly used licenses are Creative Commons licenses, and there are several licence variations which determine how the material can be shared and reused. ​As a general rule, we recommend you use the Creative Commons Attribution (CC BY) licence for licensing research data, which means anyone can adapt, reuse, redistribute your data, and credit you as the author. More restrictive licences should only be used if there is a justification for doing so, for example, to protect commercial or other confidential interests. You should be particularly wary of the No Derivatives (ND) licenses- these prohibit others from adapting, remixing, transforming, translating, or updating your work in any way that makes a derivative. In short, people are not allowed to create adaptations, so this could prevent someone from recombining and reusing your data for new research, so is bad for innovation. See the Understanding Creative Commons Licences section of our website for further detail on licence variations.

Further guidance on licensing research data

Back to top

Linking data to outputs

Linking research data to publications and other outputs is a critical practice for enhancing research transparency and complying with funder and publisher mandates. If you have taken the time and effort to deposit and share your data in a repository, but you don’t link it your outputs, it’s likely your efforts will be wasted as nobody will be able to find it. 

The primary method of linking data to outputs is to deposit the data in a trusted digital repository (such as ORDO), obtain a persistent identifier (such as a DOI), and cite this in the publication via a Data Access StatementHowever, it’s important to also remember to reference or link out to any research publications/outputs from the dataset record in the repository too, so that a reciprocal connection exists between data and outputs. On ORDO, you can make use of the “Related materials” section to link to any papers, publications, or other outputs that your output informs, is informed by, or is related to.

Data Access Statements

Data Access Statements (DAS), sometimes called “Data Availability Statements” are used in publications to describe where supporting research data (including code, images, sounds, textual records, objects, etc.) can be found and under what conditions they can be accessed. 

UKRI and some other funders and publishers require papers to include a Data Access Statement, even where there are no data associated with the publication, or the data are inaccessible.

Your journal’s or publisher’s guidance for authors should indicate the format and placement of a Data Access Statement. If no ‘Data access’ or ‘Data availability’ section is specified, we suggest placing your statement in the ‘Acknowledgements’ section.

We also recommend that you include a full citation to the data in your main references list.

What to include in a data access statement

Data access statements should typically include the following information (and examples are provided in the next section, below):

  • where the data can be accessed (preferably a data repository);
  • a unique persistent identifier, such as a Digital Object Identifier (DOI) or accession number, or a link to a permanent record for the data;
  • details of any restrictions on accessing the data and a justifiable explanation (e.g. for legal, ethical or commercial reasons).

Please note, a direction to contact the author for access to data would not normally be considered acceptable by most funders and publishers. If email contact is required to request access data, consider setting up a shared email address for your research group or use an existing departmental address. 

Example data access statements

The following are examples of data access statements covering a variety of scenarios.

Openly available data
  • The data supporting the findings reported in this paper are openly available from the [insert name of repository] repository at [insert DOI].
  • This publication is supported by multiple datasets which are openly available at locations cited in the ‘References’ section of this paper.
Secondary analysis of existing data
  • This study is a re-analysis of existing data which are openly available at locations cited in the ‘References’ section of this paper. Further documentation about data processing are openly available from the [insert repository name] repository at [insert DOI].
  • This study brought together existing research data obtained upon request and subject to licence restrictions from a number of different sources. Full details of how these data were obtained are available in the documentation available at the [insert repository name] repository at [insert DOI].
Ethical constraints
  • Due to ethical concerns, supporting data cannot be made openly available. Further information about the data and conditions for access are available from the [insert repository name] repository at [insert DOI].
  • Anonymised interview transcripts from participants who consented to data sharing, plus other supporting information, are available from the UK Data Service, subject to registration at [insert DOI].
  • Due to the [insert appropriate term: ethically, politically, commercially] sensitive nature of the research, no participants consented to their data being retained or shared. Further information about the data are available from the [insert repository name] repository at [insert DOI].
  • Processed, qualitative data from this study is available from the [insert repository name] repository at [insert DOI]. Additional raw data related to this publication cannot be openly released; the raw data contains transcripts of interviews, but none of the interviewees consented to data sharing.
Commercial constraints
  • Data supporting this paper will be available from the [insert repository name] repository at [insert DOI] after a six-month embargo period from the date of publication to allow for commercialisation of research findings.
  • Due to confidentiality agreements with research collaborators, data supporting this paper can only be made available to bona fide researchers subject to a non-disclosure agreement. Details of the data and how to request access are available from the [insert name of repository] repository at [insert DOI].
Cost-effective sharing of data
  • This paper is accompanied by a representative sample of experimental data which is available from the [insert repository name] repository at [insert DOI]. Detailed procedures explaining how this representative sample was selected, and how this experiment can be repeated, are provided in the ‘Materials and Methods’ section of this paper. Additional raw data underlying this paper contain [insert relevant number] additional images. These additional images are not shared online due to size of the images ([insert relevant number]GB/image); public sharing of these images is not cost-effective, and the experiment can be easily reproduced.
Non-digital data
  • Non-digital data supporting this study are stored by the corresponding author at [insert name of institution]. Details of how to request access to these data are available from the [insert repository name] repository at [insert DOI].
No data created or analysed
  • No data were created or analysed in this study.

Back to top

 

Sharing and preserving research software

Why preserve and share research software?

Research software is a fundamental part of research. As with other research outputs, there are good reasons for preserving software and making it available for future use. The benefits of preserving and sharing research software include:

  • allowing researchers and developers to get credit for their work
  • supporting research integrity by making it easier to reproduce and replicate research results
  • promoting open research by creating new opportunities for collaboration and the reuse of software in future research
  • helping researchers and research organisations track the use and reuse of software

Many funding bodies and journal publishers now expect software that supports research findings to be made publicly available upon publication.

How to share research software

Make use of a code repository when developing your software

A source code repository is an online service that supports version control and other software development tools such as file hosting, bug tracking and issue tracking. Popular code repositories include GitHubBitBucket and GitLab. Hosted services are especially useful for working with collaborators across more than one institution.

Document and describe your software

Software documentation provides essential contextual information which describes the software and tells others how the software was created, how it operates and how it can be used. Software documentation can be embedded in the source code or included in written texts that accompany the software. As a minimum, you should include a README file with your software.

A README file typically contains information such as:

  • the project name
  • a description of the project
  • instructions on how to install and operate the software
  • copyright and licensing information
  • where users can find help

GitHub will automatically display the content of a file named README on your repository's top-level page. Consider using Markdown syntax for your README file, since GitHub will format this in a manner that makes it more readable and accessible.

Every GitHub repository includes a wiki which can also be used to add descriptive information about your software. For help creating a README file and a wiki for software projects in GitHub see Documenting your projects on GitHub.

Archive key versions/releases to get a DOI

Code hosting services such as GitHub, BitBucket or GitLab are not suitable for archiving software. If you have software that supports published findings, is needed to reproduce or replicate research results, or has potential for use in future research, we recommend archiving a copy with a data repository such as ORDO (Open Research Data Online - the OU's research data repository). As well as ensuring long-term preservation and access to the software, ORDO assigns DOIs (Digital Object Identifiers) to software outputs, making it easier to cite and track specific versions of your software.

ORDO has an integration to support the automated deposit of software from GitHub. For further information see How to connect ORDO with your GitHub account.

For additional support from the Software Sustainability Institute see Software deposit: Where to deposit software.

Choose a license for your software

What is an open source licence?

A software licence is an agreement between a developer and user which provides a set of conditions for the use and distribution of software. An open source licence alllows users to use, copy, modify and redistribute the software providing that credit is given to the author. Because open source licences place minimal restrictions on how others can use software outputs, they are an important tool for making software accessible and resusable.

Which licence should I choose?

When you deposit your software in a respository you will typically be given the option of choosing a licence. Some popular examples of open source licences for software are Apache 2.0GNU General Public License (GPL) and MIT. These are all available to choose from when uploading software to ORDO.

You should refer to the Open Source Software licenss (pdf) guidance from the Open University RES Enterprise and Commercialisation team. A comprehensive list of open source licences with detailed information for each licence is available from the Open Source Initiative's web page: Licenses and Standards. 

If you are planning to modify and/or redistribute someone else's code, check to make sure that this is compatible with the permissions granted under the original licence.

What if my software contains IP which can be commercialised?

Open source licences are unlikely to be appropriate for software that are subject to contractual restrictions set by third parties or where the intellectual property rights (IP) for the software are owned by the Open University and have potential for commercialisation. The Open University's Research Intellectual Property Policy and Commercialisation Handbook (which contains further guidance on IP, knowledge exchange, material transfer agreements and non-disclosure agreements) is available from the Research, Enterprise and Scholarship intranet pages. For further help and advice on Intellectual Property, please contact the RES Enterprise team.

Cite your software

Research software should be given the same importance as any other research output and considered citable in the same way as an article or monograph.

How to enable others to cite your software

If you are making your software publicly available via ORDO or another research data repository, you can encourage others to cite your software by including a recommended citation in your README file or by including a CITATION file. If the software supports published findings, you should also include a recommended citation to the published paper. This can be added to the data access statement.

How to cite others' software

If you have used somone else's software in your research project you should include a citation in your published paper. This could be in the acknowledgements, footnotes, methods section or appendices, depending on publisher requirements. It is not necessary to ackowledge all the software used. The Software Sustainability Institute  recommend citation only if the software played a critical role in your research and made a novel contribution.

What to include in a software citation?

The Australian National Data Service (ANDS) recommend using these core citation elements:

Author, Publication, Year, Title, Version, Publisher, Resource type (Software), Identifier (e.g. DOI)

Example:

Xu, C., & Christoffersen, B. (2017). The Functionally-Assembled Terrestrial Ecosystem Simulator Version 1. Los Alamos National Laboratory (LANL), Los Alamos, NM (United States). [Software]. https://doi.org/10.11578/dc.20171025.1962

Some publishers and software providers have also published guidelines for software citation. Examples can be found on the Software Sustainability Institute's webpage How to cite software.

Publish a software paper

In addition to making your software publicly available you might also consider publishing a paper describing your software. Journals which specialise on publishing peer reviewed software papers include:

For a more detailed list of journals whichb accept software papers, see the Software Sustainability Institute's web guide In which journals should I publish sofware?

Make your software FAIR

Research software is now being understood as a type of digital object to which the FAIR Guiding Principles (Findable, Accessible, Interoperable and Reusable) should be applied. The reflects the maturation of the research community to better understand the crucial role of FAIR research software in maximising research value. The FAIR for Research Software (FAIR4RS) Working Group has adapted the FAIR Guiding Principles to create the FAIR Principles for Research Software (FAIR4RS Principles), published in 2022, which sets out the principles and provides examples of how they cab applied to different types of research software. By following the guidance above, you will already be going some way in making your research software FAIR.

Preserving and sharing research software: Additional resources

The Software Sustainability Institute website provides comprehensive guidance and links to resources, See in particular their Guides for Researchers and Guides for Developers.

Adapted with permission from Imperial College London's guidance on Making research software open and shareable.

Back to top

 

FAIR Data Principles

FAIR stands for findable, accessible, interoperable, reusable. The aim is that open data 'can be freely used, modified, and shared by anyone for any purpose', The Open Definition.

The FAIR principles were originally published in a 2016 article in Scientific Data. Since then, the FAIR Principles have achieved widespread acceptance, and have been adopted as standards for management of data, development of infrastructure and delivery of services.

Findable 

This means other scholars and computers can find your data due to strong metadata and persistent identifiers (DOIs).

What does this mean?

Your data are described with metadata; your data and metadata have a global unique persistent identifier; your data and metadata are registered or indexed in a searchable resource, i.e. a research data repository.

What to do:

  • You need a unique persistent identifier; many data repositories assign a persistent identifier.
  • Use consistent file naming conventions and logical folder structures when organising data.
  • Use robust file management when you archive your data.
  • Use a repository that exposes your metadata and manages your files according to your wishes. Trusted data repositories will also allocate a digital object identifier (DOI) to your data. 

Accessible 

Your data and/or metadata also needs to be accessed by both humans and computers.

What does this mean?

Your data are available and downloadable from a reputable repository.

What to do:

  • If you can share your data openly then deposit your data in the ORDO (The OU’s Research Data Repository), a discipline specific research data repository or well-known general data repository.
  • If you cannot share openly, create a metadata record only, which includes under what conditions the data can be shared.

Interoperable 

As the aim is to speed up discover and uncover new insights, research data should be easily combined with other datasets, applications and workflows by humans and computer systems.

What does this mean?

This means your data exist in file formats that are not dependent on proprietary or obsolete software.

What to do:

  • You should use whenever possible, well-known and preferably open formats and software.
  • Use standard vocabulary and language that can be understood beyond your discipline. Avoid jargon.

Reusable 

Research data should be ready for future research and processing. It should be clear that findings can be replicated and the data/results can be used to build/develop new research questions.

What does this mean?

That there is proper documentation to support interpretation of the data. That there is a clear and accessible data usage licence. 

What to do:

  • Include detailed information and accurately cite the provenance of data you're reusing in your metadata and README file.
  • Know from the start which data you cannot share and what restrictions you may need to account for. Consider the implications for personal data, intellectual property or commercial potential.
  • Make sure your intentions for data reuse are unambiguous by applying a Creative Commons license or similar.

FAIR data: Additional resources

  • For further information, the principles are described in detail by the GO FAIR initiative
  • If you want to evaluate your data sharing approach and identify opportunities to make your data more FAIR, or simply explore the FAIR principles in more depth, try the FAIR-Aware self-assessment tool developed by DANS, the Dutch national centre of expertise and repository for research data.
  • “FAIR’s fair, how to share your data” is a webinar that the OU Library Research Support team runs twice a year. For details of how to book, and to access previous recordings, visit the training pages of this website.

Back to top

Contact us

Library Research Support team