What is a dataset?

A dataset is a structured collection of data organised and stored together. Data within a dataset is typically related in some way. It can include different types or formats of data and may be comprised of one or many files.

To decide if you have one dataset or many, think about how you will describe the data and what the supporting documentation will consist of. If the metadata is similar or identical for each dataset, it indicates that you probably have one dataset. For example, if you have repeated the same experiment/monitoring event at several locations, this is likely to be be one dataset. Similarly, if you have carried out the same experiment/monitoring event over several years, this too could be one dataset, rather than several.

As part of the deposit process we will agree with you the format and structure of your data and a handover date. Ensuring that your data are correct and well-formatted will help to speed up the process.

Format

Data provided to the EIDC should normally be in a non-proprietary format (e.g. .csv(s) rather than an Excel workbook) 

We maintain a list of acceptable formats, however the list is not exhaustive and we will consider other formats on a case-by-case basis.

Filenames

Examples

1486Xiuytr.csv
  This doesn't tell us anything about the data

Site location data from the UK Butterfly Monitoring Scheme collected during 2011.csv
  This is very long and contains spaces

ukbmsLocationData2011.csv
  This is descriptive, short and contains no spaces or special characters

Variables

Examples

  Sample ID
contains space
  Sample_ID

  Count of individual perch
contains spaces and is unnecessarily long
  Perch_count

  Binomial/Latin_name
contains non-standard character (/)
  Binomial_name

  Soil temperature ° C
contains spaces and non-standard character (°)
  Soil_temp

Codelists and abbreviations

Using codes and abbreviations in the data is often very useful. However, if you do use them you must ensure:

Missing data/nulls

Tabular data

Structure

Headings

 

illustration of a bad csv

 

illustration of a good csv

Multiple tables

 

illustration of a bad csv

Anonymity and data security

Quality

If you have any queries or are unsure about the suitability of your dataset(s) for deposit, we'll be happy to discuss it with you. Please contact us.