Supporting documentation is essential as it helps others understand a dataset and supports its potential re-use. This guidance document outlines what supporting information is required when submitting a dataset for deposit with the EIDC.
What information is required?
The information required for supporting documentation is likely to already exist. For example, in technical reports, in a data management plan, on a project website or wiki, or the information may be integral to the data itself (e.g. netCDF format).
Supporting documentation should provide information on the following areas:
A brief description/overview of the data being described
This is a summary that allows the reader to determine the relevance and usefulness of the data. The text should be concise but should contain sufficient detail to allow the reader to ascertain rapidly the scope and limitations of the resource.
To help organise thinking, you may like to use the following structure:
- What has been recorded and what form does the data take? This should immediately convey to the reader precisely what the data is.
- Where were the data collected? This should include (where appropriate) whether the coverage is gridded or scattered data; whether the coverage is even or very variable
- When were the data were collected?
- How were the data collected? A brief description of methods and instrumentation used.
- Why were the data collected? For what purpose?
- Who was responsible for the collection and interpretation of data?
- Completeness. Are any data absent from the dataset? Explain which data are included or excluded and why.
Experimental design/sampling regime
Metadata should be provided which details the experimental design and/or sampling regime, where applicable. This should include information on:
- The feature(s) of interest - including feature type, feature name, relevant geographical information and grid or reference system used e.g. location, aspect, elevation, surface area, volume, etc.
- The treatments applied - including details of how treatments were applied/created/managed or verified, where relevant.
- Replication - details of any sample/observation replication, including explanations for any missing samples/observations.
- Controls - information on any control methods employed.
- The periodicity of sampling - details of the time period covered by the dataset, date and frequency of sample collection/observation recording and any reasons for missed sampling/observations.
- The number of samples/observations - the total overall number of samples/observations collected, if not already included as part of metadata relating to treatments and replication.
Information should be provided covering methods used for collection of samples or observations. This should include
- details of sampling/observation locations
- techniques employed for physical collection of samples or measurement of observations
- sample storage/treatment or recording of observations
- Standard Operating Procedures (SOPs) for specific techniques and/or references for methods used, if available
- date of analysis of samples if different from date of sampling
Alternatively, where data values are derived/generated/transformed, then details of how this was achieved should be provided. For model output, this should include information relating to the key points of the theory forming the basis of the model. The type of model used (e.g. ordinary differential equations, partial differential equations, compartment model) should be documented and any relevant technical information (e.g. operating system(s) and programming language) and/or mathematical information (e.g. input and output) used to generate the output also documented.
Fieldwork and/or laboratory instrumentation
Information should be supplied on instruments/machines used for collection/analysis of samples/observations where relevant. This should include the type, make, model and serial number of each particular instrument/machine, where known.
Calibration steps and values
Details of the steps taken to calibrate any instruments/machines used, including use of any blanks, and the values used for calibration should be provided.
Nature and units of recorded values
Information should be provided describing the nature of the recorded values contained and the units used sufficient to unambiguously define what has been measured and recorded in the dataset. Details should include description of the parameters, determinands, variables, valid range of values, lowest level of detection, units etc.
Full descriptions of any analytical methods used to generate the data values contained in the dataset should be included. These should detail any reagents and the specific conditions required for each analysis, and provide sufficient detail to enable replication of the methods used for analysis if desired.
Any quality control measures undertaken to ensure the quality of the data values included in the dataset should be detailed e.g. methods of quality control, explanation of quality codes, factors affecting the data, etc.
Details of data structure
Details of the structure of the dataset should also be provided, covering the order in which variables appear within the dataset. For example:
This dataset comprises six csv files entitled xxx, yyy, zzz etc. The first csv file has five columns labelled aa, bb etc.
Any additional information necessary to expand on that given in the discovery metadata record.
How should I supply the information?
In order to meet funders' expectations and the needs of future research, supporting information must be openly accessible in perpetuity.
In order to guarantee this the EIDC requires supporting documentation to be submitted at the same time as submission of data. Unfortunately it is not acceptable to link to pages/documents on non-EIDC websites, or to include documents with hyperlinks to external websites. This is because we are unable to guarantee that those websites/pages will exist in perpetuity.
Our preferred format is rich-text format (rtf). However we will also accept pdfs, html or plain text files.
If your information is stored in a proprietary format such as Microsoft Word it should be converted to a preferred format.
Supporting documentation should be supplied separately from data. This is to ensure detailed metadata are available without the need to download the data, thus permitting users to make an informed decision about the utility of data prior to data access, and in perpetuity. Making the data and metadata available separately ensures the EIDC are able to securely store the data and ensure it remains unchanged whilst being able to improve the quality and usefulness of the contextual metadata, as required.
If the information is integral to the data itself (e.g. netCDF) it should be copied into a document in a preferred format.
Names of files supplied must not include any spaces or non-standard characters.
More information is available in the NERC Data Policy guidance notes.
Depositors often ask if they can provide their scientific papers published through journals as supporting documentation for the dataset. Unfortunately these do not meet our requirements for a number of reasons:
- Copyright and distribution restrictions - there may be rights issues preventing us from making an article available publicly
- Lacking dataset description - the dataset is not described adequately in a publication designed to describe a scientific research outcome
- Access cannot be guaranteed in perpetuity - the EIDC cannot guarantee permanent, open access to external websites over which we have no control.
However, research papers may contain a lot of useful, pertinent information. In which case, we recommend copying the relevant content into a stand-alone document which can be supplied to us.