Storing and organising data

Effective organization and storage of research data are crucial for ensuring research integrity, reproducibility, and efficiency. Good practices allow researchers to save time, prevent data loss or misuse, maintain transparency, meet funding requirements, and increase the impact of their work through easier sharing and reuse. 

Below you will find information and guidance to support you with:

 

Organising your files

Once you create, gather, or start analysing data and files, they can very quickly become disorganised. Using file and folder structures and naming, describing and documenting your data throughout your project will save time, reduce errors and enable you and others to find and understand what you have done. There are many ways to organise your files, so think about what makes sense for you and your research. 

Version control

Version control is automatically activated within much software; however, if you are manually in control of your code, draft, or results, then you should manage the versioning carefully. If your collaborators also have access to your files, then version control can help you prevent over-writing by colleagues, and if you make an edit that you change your mind on, then you can revert to a previous version.

Things you can do to manage different versions of your files:

  • File naming: include clear version information in a file name e.g. v1v2v1-0 , v1-1 using the first number for major changes and the second for minor changes; v1_LP if you are working in a group and need to keep track of who made the changes.
  • Version control tables: document changes in a version control table within the document recording the version number, date of the change, name of the person making the change, and the purpose/nature of the change.

Data organisation tips

  • Existing conventions and procedures – find out what they are (research group, department) and follow them, or if there are none put some in place
  • Consistency – agree what works and stick to it.  Don’t use no or multiple conventions
  • Good file naming – files should be named consistently, and names should be short and descriptive, and avoid spaces or special characters. YYYYMMDD is a good format for dates, and to sort files chronologically. If you use sequential numbering, add leading zeros (e.g. 001, 002, etc.) for clarity, and to sort numerically.
  • Good folder naming – folders should be named after projects and research issues, with clear meaning.  Do not create names which are meaningless (or only mean something to you), are excessively long, or relate to individuals.
  • Use of folders – apply logical structuring of files within folders relating to projects or issues, keeping things in the same place and making them easy to find. Don’t leave files unsorted, hanging under top level folders
  • Structure folders hierarchically – design a hierarchy with higher level broader topics, with more specific folders within these.  Do not create very tall, and/or labyrinthine structures e.g. with similar issues appearing at multiple levels
  • Create an index file - this could be a word or text document which shows how your files are organised, which should be kept alongside your files. This only takes a few minutes but can save hours of searching later. Just make sure that if you re-organise your files, that the index is also updated
  • Sensitive data – should be stored in separate folders, with appropriate access controls, restricted to only those who need it
  • Current and completed work – it may help to separate current and completed work or versions or files/documents e.g. where a document will have many versions and multiple contributors consider a “current version” folder
  • Review what you have – don’t keep pointless multiple copies of data, and consider carefully what you need to retain, for how long, and what can (and can’t) be destroyed/deleted. Consider this at intervals and at the end of a project

Further guidance on file organisation

Back to top

 

Storing your data

Keeping data safe

The OU Information Security Team's role is to ensure that OU information and data are kept safe and they offer guidance and tools on how you can best do this. See the Information Security SharePoint site (requires login) for advice on what we should be concerned with in regards to information security at the OU, the threat landscape, and specialist advice and guidance on risks, policy vulnerabilities, incident management and compliance with industry standards.

  • The Information Security Policies are designed to guide organisational and individual behaviour and decision making.
  • University Information Security Specific policies are mandatory.
  • The Information Classification Policy defines the classification of Open University information so that appropriate controls can be applied. Before choosing a storage solution for your data, you should first determine its classification.

Where to store live research data

When you are working with live research data, you need to consider all the different actions you need to perform upon your data before choosing the best storage solution. If you are working with personal or special category (sensitive) information, your project workspace will need to be very secure. Large project teams need more complex project management functions to help them control data, to manage file versions and backups. The information below will help you to determine which is the most suitable storage location for your data.

Data storage: collaborative working

Microsoft Teams: If you are working within a team, we recommend that you set up a Microsoft Team to store documents and to enable collaborative working. Teams is secure and appropriate for storing files up to and including Highly Confidential and provides regular backup.  You can also add external partners as guests. You can create new Teams from the Teams app itself. Please ensure you are familiar with the usage guidelines for Groups and Teams. For external teams working with data classified as Proprietary/Highly Confidential or above, you will need to contact IT for advice via IT Self-Service. For more information on using Teams, see the Getting Started with OU Teams intranet page (login required). Microsoft Teams is not appropriate if you require different permissions for different libraries or folders because it is not possible to remove members’ access to part of the site (within the Team) – a standard SharePoint site should be used instead.

SharePoint: If fine-grained or read-only permissions are required for your project, we recommend storing documents on a standard SharePoint site with appropriate permissions instead.  If you wish to share files from a SharePoint site with external collaborators, they will need to request a visitor OUCU. For more information, see the intranet page for how to request a new visitor account (login required). For more information on using SharePoint, see the SharePoint online intranet page (login required) and watch the recording of the interactive webinar held by IT services on setting up a SharePoint site and applying access controls.  

Data storage: individual researchers

OneDrive: If you have no requirement to regularly share your data with collaborators during your project, then we recommend that you use your Open University OneDrive account. Save a file here and you can view and edit it from anywhere - no need for VPN. You are able to share individual files and folders with anyone who has a Microsoft account. OneDrive is secure and appropriate for storing files up to and including Highly Confidential and provides regular back up. Please note that your OU OneDrive for Business account is different from personal OneDrive accounts. Personal accounts are not covered by the specific terms and conditions that The Open University has negotiated with Microsoft and thus are generally not recommended for storing research data. For more information on OneDrive, see the OneDrive page on the OU intranet (login required). 

Data storage: STEM researchers

Researchers within the STEM faculty have access to specialist IT support which they should use in preference to centrally supported storage options.

Backing up data

Whilst Microsoft cloud storage options such as Teams, SharePoint and OneDrive all create automated backups, data loss incidents can still occur, so it is advisable to create backup copies of important data on an encrypted external drive. This should be stored in a secure location, and you must take responsibility for appropriate deletion of this data when no longer required, in line with Data Protection requirements. Please follow IT guidance for encrypting portable media devices.

Transferring live research data

There are several options for securely sharing data with collaborators during your project, as outlined in the storage options above, but for one-off secure transfers there is also the ZendTo service, as outlined in the Secure File Transfer guidance (requires login) from the IT team.

What if I leave the OU?

It is your responsibility to place your files somewhere other than your OneDrive account or Microsoft Team if they will be needed when you leave, as these sites will not be maintained by IT after you have gone. It is advisable to upload any research data to ORDO or another suitable data repository to ensure continued access and preservation. Please note that you can preserve data in ORDO publicly or privately, depending on your requirements. Please contact the Library Research Support Team if you need continued access to any data stored privately on ORDO once you leave. Any research related documentation which must be retained as per the Open University Retention Schedule for Research, such as completed consent forms, should be transferred to an appropriate location, such as departmental storage, or a colleague or supervisor’s account, to ensure the OU retains access once you leave.

Back to top

 

Describing your data (metadata)

A crucial part of ensuring that research data can be used, shared and reused by a wide range of researchers, for a variety of purposes, is by taking care that those data are accessible, understandable and (re)usable. This requires clear data description, annotation, contextual information and documentation that explains how data were created or digitised, what data mean, what their content and structure are, and any manipulations that may have taken place. Creating comprehensive data documentation is easiest when begun at the onset of a project and continued throughout the research process.

Good documentation ensures your data can be:

  • Searched for and retrieved
  • Understood now and in the future
  • Properly interpreted, as relevant context is available

Data quickly becomes unusable because key details of the context have been forgotten, so ensure you keep enough information to interpret the data.

Whatever you need to make sense of your data should be kept with the data files themselves. Lab-based research is often recorded in a lab notebook, which should be kept safe. However, the practice of keeping a research journal can be used for any research. It’s a good idea to record the notebook page number with the data files, and if possible, scan the page(s) in and keep them with the data too.

This information also helps when deciding ownership and assigning credit, so make sure you keep a note of who collected the data and when, especially if it's not you.

All of this extra information is collectively known as metadata. There are a number of metadata standards in use in different disciplines, along with more generic standards, available too.

Metadata is comprised of descriptive material at two levels: project-level metadata, and data-level metadata.

Project-level metadata

This is high level documentation or information which describes the data collection as a whole. When you come to preserve or publish your data via a data repository, you will be required to enter much of this high-level metadata during the upload process, but anything you can provide above and beyond the minimum repository requirement is beneficial. 

Project level metadata should address the following:

  • For what purpose the data was collected (I.e. project history, hypotheses, investigators/funders)
  • What the contents of the collection are (I.e. data type, structure of the data, relationships between data items)
  • How the data was collected (I.e. methodology/protocols, sampling design. workflow/instruments, digitisation/transcription methods, secondary data sources
  • Where and when data were collected, and who by
  • How data was processed, including any tools or software used
  • Quality assurance processes followed
  • How the data can be accessed and (re)used (I.e. persistent identifier for the data, access and use conditions, license and copyright).

Data-level metadata

Data-level metadata provides information about the individual data files or databases within your data collection, and is therefore highly dependent on your data type. Below are some types of data-level metadata for qualitative and quantitative data.

Quantitative

  • Names, labels, and descriptions for variables and/or records.
  • Value code labels.
  • Explanation of coding and classification schemes.
  • Codes for missing values.
  • Weighting and grossing variables.

Qualitative

  • Finding aid
  • Data list
  • Metadata included at the top of each file and/or record, including project title, key biographical details of the participant, and/or summary of the text file
  • Readme file

How to create “Readme style” metadata

A readme file provides information about a data file and is intended to help ensure that the data can be correctly interpreted by yourself at a later date or by others when sharing or publishing data.Standards-based metadatais generally preferable, but where no appropriate standard exists, writing readme style metadata is an appropriate strategy. 

A template for a README file is available on ORDO. 

  • Create one readme file for each data file, whenever possible.It is also appropriate to describe a "dataset" that has multiple, related, identically formatted files, or files that are logically grouped together for use (e.g. a collection of Matlab scripts). When appropriate, also describe the file structure that holds the related data files
  • Name the readme so that it is easily associated with the data file(s) it describes
  • Write your readme document as a plain text file, avoiding proprietary formats such as MS Word whenever possible. Format the readme document so it is easy to understand (e.g. separate important pieces of information with blank lines, rather than having all the information in one long paragraph)
  • Format multiple readme files identically.Present the information in the same order, using the same terminology
  • Follow the scientific conventions for your discipline for taxonomic, geospatial and geologic names and keywords.Whenever possible, use terms from standardized taxonomies and vocabularies 

Recommended content 

To enable data sharing you should ensure that you at least include all elements which are labelled "recommended minimum content" below:

Introductory information
  • For each filename, a short description of what data it contains (recommended minimum content)
  • Format of the file if not obvious from the file name
  • If the data set includes multiple files that relate to one another, the relationship between the files or a description of the file structure that holds them
  • Name/institution/address/email information for
    • Principal investigator (or person responsible for collecting the data) (recommended minimum content)
    • Associate or co-investigators
    • Contact person for questions
  • Date of data collection (can be a single date, or a range) (recommended minimum content)
  • Information about geographic location of data collection (recommended minimum content)
  • Date that the file was created (recommended minimum content)
  • Date(s) that the file(s) was updated and the nature of the update(s), if applicable
  • Keywords used to describe the data topic
  • Language information 
Methodological information 
  • Method description, links or references to publications or other documentation containing experimental design or protocols used in data collection (recommended minimum content)
  • Any instrument-specific information needed to understand or interpret the data
  • Standards and calibration information, if appropriate
  • Describe any quality-assurance procedures performed on the data
  • Definitions of codes or symbols used to note or characterize low quality/questionable/outliers that people should be aware of
  • People involved with sample collection, processing, analysis and/or submission
Data-specific information 
  • Full names and definitions (spell out abbreviated words) of column headingsfortabular data (recommended minimum content)
  • Units of measurement (recommended minimum content)
  • Definitions for codes or symbols used to record missing data (recommended minimum content)
  • Specialized formats or abbreviations used (recommended minimum content)
Sharing/Access information 
  • Licencesor restrictions placed on the data
  • Links to publications that cite or use the data
  • Links to publicly accessiblelocationsof the data
  • Recommendedcitationfor the data
  • Information about funding sources that supported the collection of the data

A template for a README file is available on ORDO. 

Acknowledgements

These guidelines have been adapted from Cornell University’s Guide to writing “readme” style metadata

Further guidance on describing data

Back to top

Contact us

Library Research Support team