The preferred format for deposit, long-term storage and accessibility via view and download services will depend on the original format of the data.
For longevity the preferred format for depositing tabular data is comma separated value (.csv ) format. Many commonly used file types can easily be saved in this format, although there are some considerations when doing this. More detailed information on converting data files to .csv are available below.
Guidance look-up table for file formats:
|Original format||Preferred EIDC format||Notes|
|comma separated values||.csv||File must open in commonly available software that reads CSV e.g. MS Excel|
|text||.txt||File must open in commonly available software e.g. Notepad|
|MS Excel||.csv||File must open in commonly available software that reads CSV e.g. MS Excel|
|MS Access||.csv||File must open in commonly available software that reads CSV e.g. MS Excel|
|MS Word||.rtf||File must open in commonly available software e.g. Wordpad|
|Oracle||.csv||Key datasets output as CSV and must open in commonly available software that reads CSV e.g. MS Excel|
|SQL Server||.csv||Key datasets output as CSV and must open in commonly available software that reads CSV e.g. MS Excel|
|NetCDF||no change||File must open in two netCDF-capable applications (e.g. FME, QGIS) without additional transformation|
|SpatiaLite||no change||(Preferred for spatial vector data.) Must open in GIS e.g. QGIS|
|Shapefile||no change||File must open in GIS e.g. QGIS|
|Personal geodatabase (.mdb)||no change||Must open in GIS e.g. QGIS|
|File geodatabase (.gdb folder)||no change||A zipped .gdb folder is required so that none of the contents are lost. File geodatabases are more efficient for storage. Must open in common GIS e.g. QGIS|
|ArcInfo coverage||.e00||ArcInfo coverages are an old folder based format. ArcInfo export (.e00) is preferred. Must open in GIS e.g. QGIS|
|ArcInfo Export||no change||Must open in GIS e.g. QGIS|
|ArcGIS SDE database||.gdb||A zipped .gdb folder is required so that none of the contents are lost. File geodatabases are more efficient for storage. Must open in GIS e.g. QGIS|
|Raster Data||no change||tiff, jpg, png, gif, etc. Must be accompanied by appropriate geo-referencing information such as .jpw or .tfw.|
|ArcInfo Grid||.ascii or .asc||ArcInfo Grid export file. Must open in GIS e.g. QGIS|
|SAS||.csv||File must open in commonly available software that reads CSV e.g. MS Excel|
|Minitab||.csv||File must open in commonly available software that reads CSV e.g. MS Excel|
|NASA Ames||.nc or .csv||File must open in commonly available software (e.g. MS Excel for .csv or two applications such as FME, QGIS for .nc) without additional transformation|
|MATLAB binary file||.csv||File must open in commonly available software that reads CSV e.g. MS Excel|
|STL||no change||File must open in commonly available mesh rendering software e.g. Meshlab|
File must open in commonly available software e.g. Notepad
|Portable Document Format||.rtf or .txt||
Wherever practically possible, files should be supplied as .rtf or .txt and open in commonly available software e.g. Wordpad (.rtf) or Notepad (.txt).
Comma separated value (.csv) files are the preferred format for depositing your tabular datasets as they are proven to be robust and future-proof, allowing reading and viewing of the data through a wide variety of common software tools and conversion to many common formats. This file type has been used since the 1970's and is possibly the most widely used standard for datasets in such circumstances.
Saving your datasets in csv format may take a few extra minutes but it facilitates re-use and will help prevent obsolescence of data due to the format used for its storage.
It is EIDC policy that supporting documentation will be made available as a separate, linked on-line resource accessible via the catalogue record for a resource.
One of the main reasons for separating supporting documentation from data is that the EIDC is committed to a rolling programme of review and improvement of metadata, in order to make resources easier to find and easier to re-use. The data, conversely, must remain unchanged whilst under the custodianship of the Data Centre. Enabling access to supporting documentation separately from data also permits users to make an informed decision about whether the data resource meets their requirements prior to actually placing an order for or downloading a copy of the data resource itself, some of which may contain large volumes of data.
With some data formats, the metadata is inextricably embedded with the data. In such situations the Data Centre may extract as much metadata as possible to a separate supporting document, which can then be added to, or otherwise enhanced.
Guidance look-up table for supporting documentation file formats
|Original format||Examples||Preferred EIDC format||Notes|
Microsoft (xls, xlsx)
|csv||csv provides high maintainability, longevity and ease of access.|
|Plain Text||txt||txt||txt provides high maintainability, longevity and ease of access.|
|Related tables||Microsoft (mdb)||multiple csv files||If appropriate, relational metadata can be denormalised into a single table - however, a normalised set of tables is more maintainable in the situation where error corrections or potential improvements are identified at a later date.|
Microsoft (doc, docx)
Portable Document Format (pdf)
OpenDocument Text Document (odt)
Whilst a proprietary format, rtf is preferred for its simplicity, ease of maintenance, widespread acceptability, and choice of available editing tools.
Metadata in pdf format cannot easily be maintained. It can be accepted reluctantly if there are no other options.
Microsoft (ppt, pps, ppsx)
OpenDocument Presentation Format (odp)
|png / pdf||Slide presentations are treated as essentially "non-maintainable" metadata. As such it is important to capture meta-metadata such as the author and date published, so that future users can read the information with knowledge of its original context. Sometimes slideshow authors embed such information in the presentation itself, but not always.|
|Hierarchic||Extensible Markup Language (xml)||xml||xml provides high maintainability, longevity and ease of access.|
|Special||Network Common Data Form (NetCDF)(nc)||ncml||ncml provides an xml encoded version of the embedded metadata.|
If your supporting documentation is in another format than those listed above, please contact us for advice.