FAQ

Some commonly asked questions
relating to data sharing.

What is a data repository?

This is a permanent, publicly accessible location for your data. Typically, a data repository will store research data for a minimum of 10 years and will provide a permanent identifier to its location, usually a digital object identifier (DOI).

Most data repositories have a back-up plan in place should the company/organisation be unable to continue. These plans typically involve moving data to a different repository while retaining the same identifier. See CLOCKSS for example.

What is a data catalogue?

Rather than storing research data like a data repository, a data catalogue acts as a contents list, or index, to locations where research data can be found. 

Why should I place my data in a data repository?

There is now a growing requirement for data acquired on projects supported by national funding programmes to be shared openly. This includes organisations such as UKRI, ERC, Horizon, NIH, NSF, Wellcome, DFG etc. 

In addition, colleagues working on machine learning and artificial intelligence projects require data with which to train and test their models. Your data can help these teams produce better data analysis tools that you can benefit from in the future. 

Where should I store my data?

Any repository is better than none. However, your data is more likely to be discovered in a location where other similar data is hosted. A list of data repositories can be found here https://www.re3data.org/. The clinical infrared and Raman community does not currently have a dedicated data repository and so we recommend using the Zenodo repository https://zenodo.org/ and adding the CLIRSPEC tag to the upload. This will add your data to the CLIRSPEC Community on the Zenodo site https://zenodo.org/communities/clirspec

Many universities now host research data for their academic teams. These are also suitable repositories, although the findability of the data is more limited. 

What should I store?

This is an open question as there are no well-defined rules. Below are our suggestions. We also recommend engaging in the FAIRSpectra Initiative to help build a common set of guidelines for the field. 

Data - raw and processed

We recommend storing the raw data in the instrument vendor’s format. In addition, we recommend storing a version in an open file format so researchers without access to the vendor’s software can read the files. An open file format for hyperspectral imaging is being developed by FAIRSpectra. Please join the conversation there, to ensure your requirements are captured.

Metadata

A data file is usually not sufficient for another researcher to understand what the file contains, and to which sample and treatment it refers. Therefore, some metadata (data about the data) is also required. At the minimum there should be a document containing a list of files with information relating to the sample and experiment each file corresponds to. More information is always better. Remember that you may have some implicit knowledge of the experimental conditions that the recipient may not be aware of. Documenting this is important.

Although the paper to which the data/metadata refers will contain many important sources of information, these can often be generic to the experiment as a whole and not to each file specifically.

Other references

A link (DOI) to a published paper is always helpful. In addition, appropriate keywords that can help narrow down a search for data will aid in its discoverability.

The data repository will present a number of required and optional fields to enter information pertinent to the upload. The extent of these will depend on the specificity or generality of the repository. Completing these accurately will assist in the repository’s ability to respond to users’ search enquiries. Duplicating these items in the file metadata is often a good idea since the repository typically will not harvest information included in your file metadata.

What is FAIR?

FAIR stands for Findable, Accessible, Interoperable and Reusable. This relates to the FAIR Principles published in “The FAIR Guiding Principles for scientific data management and stewardship” by Wilkinson et alSci Data 3, 160018 (2016). More information can be found here https://www.go-fair.org/fair-principles/.

Where can I get more information on data sharing?

There are many resources available online and in the literature relating to FAIR data. One site tailored to spectroscopy is the FAIRSpectra Initiative https://fairspectra.net. This is a community-led initiative to develop metadata recommendations, and open file formats for spectroscopy.

I have an issue with what is, or is not listed

We have tried to be as open and correct as possible with the information presented on this site. If you have spotted an error, or have a comment on the content, please email data@clirspec.org.  

For any other issues relating to the content, please see our disclaimer.