As part of the deposit process we will agree with you the format and structure of your data and a handover date. Ensuring that your data are correct and well-formatted will help to speed up the process.
Data provided to the EIDC should normally be in a non-proprietary format (e.g. .csv(s) rather than an Excel workbook)
We maintain a list of acceptable formats, however the list is not exhaustive and we will consider other formats on a case-by-case basis.
- Try to keep filenames short
- Do not use spaces. Replace them with underscores or hyphens (e.g. windermere_chemistry_data, windermere-chem-data) or use camel case (e.g. windermereChemistryData)
- Other than hypens and underscores, avoid non-alphanumeric characters ($*@%”\/?*<>#^¾ etc.)
- If possible, filenames should be meaningful and reflect the content
- If you have multiple, related files be consistent and use a naming convention
This doesn't tell us anything about the data
Site location data from the UK Butterfly Monitoring Scheme collected during 2011.csv
This is very long and contains spaces
This is descriptive, short and contains no spaces or special characters
- Variable names should be unique, short and (preferably) meaningful.
- Avoid spaces and special characters (e.g. $*@/, ) in variable names. Best practice is to use only alphanumeric characters, underscores (_) and hyphens (-).
- Remove any variables which are not important for re-using the data (e.g. created for admin or internal purposes).
Count of individual perch
contains spaces and is unnecessarily long
contains non-standard character (/)
Soil temperature ° C
contains spaces and non-standard character (°)
Codelists and abbreviations
Using codes and abbreviations in the data is often very useful. However, if you do use them you must ensure:
- they are unique (within the dataset) and used consistently
- they are all described in the accompanying metadata
- Any explanations you provide in the metadata are applicable. For example, the metadata states "T = trace", but the code
Tdoesn't actually occur in the data.
- It is preferable to identify nulls or missing data as blanks. However, depending on the format of the data this isn't always possible. Alternative ways of identifying nulls are to use codes such as NaN or N/A.
- Numerical values such as -999999 may also be acceptable but should be avoided if possible.
- Zeros (0) should NEVER be used to identify nulls as zero is a meaningful data value.
- Whatever method you use to identify nulls, it should be applied consistently throughout the dataset and must be documented in the accompanying metadata.
- We normally expect tabular data to be formatted with variables (mass, temperature, concentration etc) arranged in columns and observations in rows.
- Variable names should be in the first row (and only the first row). Data should follow in row 2.
- Remove superfluous information in heading rows.
- Never include more than one table within a single spreadsheet. This makes it far more difficult for a machine to read the data.
- Each table should be separated into its own file.
Anonymity and data security
- Ensure that data are anonymised where needed and cannot be linked to any identifiable person
- Consider anonymising site location data where this is necessary for the safety of the site, equipment or future research
- Where data are derived from existing data, check if permission needs to be obtained from the data owner
- When converting data for deposit, ensure that all data and metadata are correct after conversion
- Confirm that data detail is consistent with the access and licensing agreements as stated
- Complete all internal consistency checks BEFORE offering your data for deposit
- Resolve any data issues and ensure data are complete BEFORE deposit, to minimise the risk of further deposit(s) being necessary
If you have any queries or are unsure about the suitability of your dataset(s) for deposit, we'll be happy to discuss it with you. Please contact us.