Supporting documentation

Supporting documentation helps potential users fully understand a dataset.  It is essential that data is accompanied by sufficient supporting information to help those researchers to comprehend & re-use the data appropriately.  Perhaps even more importantly, it helps to avoid misuse and misunderstanding of the data.

This guidance outlines what supporting information is required when submitting a dataset for deposit with the EIDC.

We have a template that you could use to help structure your supporting documentation should you wish.

What information is required?

The information required for supporting documentation is likely to already exist. For example, in technical reports, in a data management plan, on a project website or wiki, or the information may be integral to the data itself (e.g. netCDF format).

Supporting documentation should provide information on the following areas:

A brief description/overview of the data being described

This is a summary that allows the reader to determine the relevance and usefulness of the data. The text should be concise but should contain sufficient detail to allow the reader to ascertain rapidly the scope and limitations of the resource.

To help organise thinking, you may like to use the following structure:

  1. What has been recorded and what form does the data take? This should immediately convey to the reader precisely what the data is.
  2. Where were the data collected? This should include (where appropriate) whether the coverage is gridded or scattered data; whether the coverage is even or very variable
  3. When were the data were collected?
  4. How were the data collected? A brief description of methods and instrumentation used.
  5. Why were the data collected? For what purpose?
  6. Who was responsible for the collection and interpretation of data?
  7. Completeness. Are any data absent from the dataset? Explain which data are included or excluded and why.

Experimental design/sampling regime

Metadata should be provided which details the experimental design and/or sampling regime, where applicable. This should include information on:

  • The feature(s) of interest - including feature type, feature name, relevant geographical information and grid or reference system used e.g. location, aspect, elevation, surface area, volume, etc.
  • The treatments applied - including details of how treatments were applied/created/managed or verified, where relevant.
  • Replication - details of any sample/observation replication, including explanations for any missing samples/observations.
  • Controls - information on any control methods employed.
  • The periodicity of sampling - details of the time period covered by the dataset, date and frequency of sample collection/observation recording and any reasons for missed sampling/observations.
  • The number of samples/observations - the total overall number of samples/observations collected, if not already included as part of metadata relating to treatments and replication.

Collection/Generation/Transformation Methods

Information should be provided covering methods used for collection of samples or observations. This should include

  • details of sampling/observation locations
  • techniques employed for physical collection of samples or measurement of observations
  • sample storage/treatment or recording of observations
  • Standard Operating Procedures (SOPs) for specific techniques and/or references for methods used, if available
  • date of analysis of samples if different from date of sampling

Alternatively, where data values are derived/generated/transformed, then details of how this was achieved should be provided. For model output, this should include information relating to the key points of the theory forming the basis of the model. The type of model used (e.g. ordinary differential equations, partial differential equations, compartment model) should be documented and any relevant technical information (e.g. operating system(s) and programming language) and/or mathematical information (e.g. input and output) used to generate the output also documented.

Fieldwork and/or laboratory instrumentation

Information should be supplied on instruments/machines used for collection/analysis of samples/observations where relevant. This should include the type, make, model and serial number of each particular instrument/machine, where known.

Calibration steps and values

Details of the steps taken to calibrate any instruments/machines used, including use of any blanks, and the values used for calibration should be provided.

Nature and units of recorded values

Information should be provided describing the nature of the recorded values contained and the units used sufficient to unambiguously define what has been measured and recorded in the dataset. Details should include description of the parameters, determinands, variables, valid range of values, lowest level of detection, units etc.

Analytical Methods

Full descriptions of any analytical methods used to generate the data values contained in the dataset should be included. These should detail any reagents and the specific conditions required for each analysis, and provide sufficient detail to enable replication of the methods used for analysis if desired.

Quality control

Any quality control measures undertaken to ensure the quality of the data values included in the dataset should be detailed e.g. methods of quality control, explanation of quality codes, factors affecting the data, etc.

Details of data structure

Details of the structure of the dataset should also be provided, covering the order in which variables appear within the dataset. For example:

This dataset comprises six csv files entitled xxx, yyy, zzz etc. The first csv file has five columns labelled aa, bb etc.

Miscellaneous

Any additional information necessary to expand on that given in the discovery metadata record.

How should I supply the information?

We have a template that you could use to help structure your supporting documentation if you wish.

In order to meet funders' expectations and the needs of future research, supporting information must be openly accessible in perpetuity.

In order to guarantee this the EIDC requires supporting documentation to be submitted at the same time as submission of data. Unfortunately it is not acceptable to link to pages/documents on non-EIDC websites, or to include documents with hyperlinks to external websites. This is because we are unable to guarantee that those websites/pages will exist in perpetuity.

Format

Our preferred format is Microsoft Word Open XML (.docx) or Open Document Format (.odt).  We will also accept rtf, html and plain text files.

PDFs may also be acceptable but are not our preferred option because they are not editable - this makes improving the documentation or correcting problems more difficult.

If your information is stored in a proprietary format it should be converted to a preferred format.

Stand-alone documents

Supporting documentation should be supplied separately from data. This is to ensure detailed metadata are available without the need to download the data, thus permitting users to make an informed decision about the utility of data prior to data access, and in perpetuity. Making the data and metadata available separately ensures the EIDC are able to securely store the data and ensure it remains unchanged whilst being able to improve the quality and usefulness of the contextual metadata, as required.

If the information is integral to the data itself (e.g. netCDF) it should be copied into a document in a preferred format.

Filenames

Names of files supplied must not include any spaces or non-standard characters.

Additional notes

More information is available in the NERC Data Policy guidance notes.

Depositors often ask if they can provide their scientific papers published through journals as supporting documentation for the dataset.  Unfortunately these do not meet our requirements for a number of reasons:

  • Copyright and distribution restrictions - there may be rights issues preventing us from making an article available publicly
  • Lacking dataset description - the dataset is not described adequately in a publication designed to describe a scientific research outcome
  • Access cannot be guaranteed in perpetuity - the EIDC cannot guarantee permanent, open access to external websites over which we have no control.

However, research papers may contain a lot of useful, pertinent information.  In which case, we recommend copying the relevant content into a stand-alone document which can be supplied to us.