Suitable formats for data and supporting documentation

Open data formats

When depositing your data, the format used should ideally be an open, non-proprietary format in common usage by the research community. This is to ensure that the data is available to the widest possible audience without the need for access to restrictive software.

Data that is stored in a proprietary format should be converted to an open format. The format chosen for the conversion will depend on the original file format (see table below). For example, the preferred format for depositing tabular data is comma-separated values (.csv ). This format has proven to be robust and future-proof, allowing its use in a wide variety of common software tools and applications.

Proprietary formats may be considered if there is no alternative or if conversion would result in data loss. These will be dealt with on a case by case basis so if in doubt, just ask us.

Preferred formats

The following are our preferred formats for deposit

  • Tabular data: Comma-separated (CSV), or tab-delimited (TAB)
  • Database tables: CSV, TAB
  • Spatial raster data: GeoTIFF
  • Spatial vector data: SpatiaLite
  • Images: PNG, JPEG
  • Movies: MPEG, MP4, MOV, AVI
  • Sound: MP3, WAV

Recommended open format conversions

The table below outlines common data formats and the conversion we recommend when depositing data into the EIDC. Please note:

  • This list is periodically updated and is not exhaustive
  • If a format is not listed, it does not mean we will not accept it
  • If conversion of data would result in data loss we may accept proprietory formats

If in doubt, please contact us for advice.

Original format Format for deposit Notes
comma separated values (csv) no change required To be acceptable, files must open in commonly available software that reads CSV (e.g. OpenOffice Calc, MS Excel, Numbers)
tab delimited values (e.g. .txt, .tab, .dat) no change required Must open in commonly available software e.g. Notepad++
text (.txt) No change required  
Spreadsheets (e.g. Excel, Pages) Convert to csv file(s)  
Database tables (e.g. Access, Oracle, SQL) Convert to csv file(s)  
SpatiaLite No change required Must open in common GIS e.g. QGIS, ArcGIS
Shapefile No change is required if shapefile is the original format. However, if the original data is not a shapefile we do not recommend converting data into this format for deposit (see notes below). Must open in common GIS e.g. QGIS, ArcGIS
File geodatabase (.gdb folder) Convert to SpatiaLite  
Personal geodatabase (.mdb) Convert to SpatiaLite  
GeoTiff Data No change required Must open in common GIS e.g. QGIS, ArcGIS
Esri Grid (ARC/INFO grid) Convert to GeoTiff  
ASCII grid Conversion to GeoTiff is preferred. However, we will accept original data files Must open in common GIS e.g. QGIS, ArcGIS
NetCDF No change required Must open in two netCDF-capable applications without additional transformation
SAS Convert to csv file(s)  
Minitab Convert to csv file(s)  
NASA Ames Convert to netCDF or csv file(s)  
MATLAB binary file Convert to csv file(s)  
STL No change required Must open in commonly available mesh rendering software e.g. Meshlab
FASTA No change required Must open in commonly available software e.g. Notepad++
WEAP No change required for model files Must open in freely available WEAP software. Documentation must be provided, specifying how to access WEAP software and make users aware of the licensing terms under which it is available. If at any time, the EIDC becomes aware that WEAP software is not freely available to run existing model files, WEAP resources will be deprecated.
PLINK No change required Ensure any supporting documentation links to the PLINK source and cites it appropriately (see https://www.cog-genomics.org/plink/1.9/general_usage#cite)
Digital terrain elevation data (DTED1).dt1 No change required BUT see notes Must be deposited with a script to convert it to a non-proprietary format e.g. ASCII or NetCDF. Can be opened using NetCDF or ArcGIS

Notes on shapefiles

If your original data is generated and stored as shapefile(s) we will accept them. However, because shapefiles have a number of limitations we do not recommend converting data in other formats into shapefiles. If you need to convert spatial vector data - consider the SpatiaLite format. The limitations of shapefiles are:

  • It does not support NULL values. Nulls may be represented as zeros which is very problematic for quantitative data
  • The maximum length of attribute names is 10 characters so longer names will be truncated
  • The maximum number of attributes is 255
  • Floating-point numbers are stored as text and may contain rounding errors
  • The file size cannot exceed 2GB

Supporting documentation

It is EIDC policy that supporting documentation will be made available with the data as a separate, linked document(s).

One of the main reasons for separating supporting documentation from data is that the EIDC is committed to a programme of review and improvement of metadata in order to make resources easier to find and easier to re-use. The data, conversely, must remain unchanged. Providing supporting documentation separately from data also permits users to make an informed decision about whether the data resource meets their requirements prior to actually downloading a copy of the data itself.

Original format

Preferred format for deposit

Notes

Rich-text documentation 
(Microsoft Word (doc, docx)
Apple Pages (.pages)
OpenOffice (odt))

rtf

Whilst a proprietary format, rtf is preferred for its simplicity, ease of maintenance, widespread acceptability, and choice of available editing tools.

 

Portable Document Format (pdf)

rtf

Metadata in pdf format cannot easily be maintained, therefore it is not our preferred choice.  However, if there are no options to convert the pdf, we will accept documents in that format.

xls, xlsx, csv, etc

csv csvs provide high maintainability, longevity and ease of access.

Plain text

txt

Text files' limitations (i.e. lack of formatting) mean that they are rarely the best option for providing good quality, readable documentation.  

However, their advantages (small file size, longevity and ease of access) mean that in some instances they are a highly appropriate format.

If your supporting documentation is in a format other than those listed above, please contact us for advice.