Open data formats
When depositing your data, the format used should ideally be an open, non-proprietary format in common usage by the research community. This is to ensure that the data is available to the widest possible audience without the need for access to restrictive software.
Data that is stored in a proprietary format should be converted to an open format. The format chosen for the conversion will depend on the original file format (see table below). For example, the preferred format for depositing tabular data is comma-separated values (.csv ). This format has proven to be robust and future-proof, allowing its use in a wide variety of common software tools and applications.
Proprietary formats may be considered if there is no alternative or if conversion would result in data loss. These will be dealt with on a case by case basis so if in doubt, just ask us.
Preferred formats
The following are our preferred formats for deposit
- Tabular data: Comma-separated (CSV), or tab-delimited (TAB)
- Database tables: CSV, TAB
- Spatial raster data: GeoTIFF
- Spatial vector data: SpatiaLite
- Images: PNG, JPEG
- Movies: MPEG, MP4, MOV, AVI
- Sound: MP3, WAV
Recommended open format conversions
The table below outlines common data formats and the conversion we recommend when depositing data into the EIDC. Please note:
- This list is periodically updated and is not exhaustive
- If a format is not listed, it does not mean we will not accept it
- If conversion of data would result in data loss we may accept proprietory formats
If in doubt, please contact us for advice.
Original format | Format for deposit | Notes |
---|---|---|
comma separated values (csv) | no change required | To be acceptable, files must open in commonly available software that reads CSV (e.g. OpenOffice Calc, MS Excel, Numbers) |
tab delimited values (e.g. .txt, .tab, .dat) | no change required | Must open in commonly available software e.g. Notepad++ |
text (.txt) | No change required | |
Spreadsheets (e.g. Excel, Pages) | Convert to csv file(s) | |
Database tables (e.g. Access, Oracle, SQL) | Convert to csv file(s) | |
SpatiaLite | No change required | Must open in common GIS e.g. QGIS, ArcGIS |
Shapefile | No change is required if shapefile is the original format. However, if the original data is not a shapefile we do not recommend converting data into this format for deposit (see notes below). | Must open in common GIS e.g. QGIS, ArcGIS |
File geodatabase (.gdb folder) | Convert to SpatiaLite | |
Personal geodatabase (.mdb) | Convert to SpatiaLite | |
GeoTiff Data | No change required | Must open in common GIS e.g. QGIS, ArcGIS |
Esri Grid (ARC/INFO grid) | Convert to GeoTiff | |
ASCII grid | Conversion to GeoTiff is preferred. However, we will accept original data files | Must open in common GIS e.g. QGIS, ArcGIS |
.R | No change required | |
NetCDF | No change required | Must open in two netCDF-capable applications without additional transformation |
SAS | Convert to csv file(s) | |
Minitab | Convert to csv file(s) | |
NASA Ames | Convert to netCDF or csv file(s) | |
MATLAB binary file | Convert to csv file(s) | |
STL | No change required | Must open in commonly available mesh rendering software e.g. Meshlab |
FASTA | No change required | Must open in commonly available software e.g. Notepad++ |
WEAP | No change required for model files | Must open in freely available WEAP software. Documentation must be provided, specifying how to access WEAP software and make users aware of the licensing terms under which it is available. If at any time, the EIDC becomes aware that WEAP software is not freely available to run existing model files, WEAP resources will be deprecated. |
PLINK | No change required | Ensure any supporting documentation links to the PLINK source and cites it appropriately (see https://www.cog-genomics.org/plink/1.9/general_usage#cite) |
Digital terrain elevation data/DTED1 (.dt1) | No change required BUT see notes | Must be deposited with a script to convert it to a non-proprietary format e.g. ASCII or NetCDF. Can be opened using NetCDF or ArcGIS |
Notes on shapefiles
If your original data is generated and stored as shapefile(s) we will accept them. However, because shapefiles have a number of limitations we do not recommend converting data in other formats into shapefiles. If you need to convert spatial vector data - consider the SpatiaLite format. The limitations of shapefiles are:
- It does not support NULL values. Nulls may be represented as zeros which is very problematic for quantitative data
- The maximum length of attribute names is 10 characters so longer names will be truncated
- The maximum number of attributes is 255
- Floating-point numbers are stored as text and may contain rounding errors
- The file size cannot exceed 2GB
Supporting documentation
It is EIDC policy that supporting documentation will be made available with the data as a separate, linked document(s).
One of the main reasons for separating supporting documentation from data is that the EIDC is committed to a programme of review and improvement of metadata in order to make resources easier to find and easier to re-use. The data, conversely, must remain unchanged. Providing supporting documentation separately from data also permits users to make an informed decision about whether the data resource meets their requirements prior to actually downloading a copy of the data itself.
Original format |
Preferred format for deposit |
Notes |
---|---|---|
Rich-text documentation |
rtf |
Whilst a proprietary format, rtf is preferred for its simplicity, ease of maintenance, widespread acceptability, and choice of available editing tools.
|
Portable Document Format (pdf) |
rtf |
Metadata in pdf format cannot easily be maintained, therefore it is not our preferred choice. However, if there are no options to convert the pdf, we will accept documents in that format. |
xls, xlsx, csv, etc |
csv | csvs provide high maintainability, longevity and ease of access. |
Plain text |
txt |
Text files' limitations (i.e. lack of formatting) mean that they are rarely the best option for providing good quality, readable documentation. However, their advantages (small file size, longevity and ease of access) mean that in some instances they are a highly appropriate format. |
If your supporting documentation is in a format other than those listed above, please contact us for advice.