A checksum or hash s a small piece of data used by the EIDC to verify data integrity and to ensure no errors have been introduced during a dataset's transmission or storage.
Checksums (alternatively known as hashes) are "fingerprints" created by applying a procedure (called a "checksum algorithm") to a file. When the algorithm is applied to the file, it generates a simple hexadecimal string - the checksum/hash.
If the algorithm is applied repeatedly to the same file (or an identical copy of the file) it will always generate the SAME checksum. However, if a file is changed, even slightly, it will generate a completely different checksum.
Checksums are used to verify that a file or group of files has not changed. This is crucial in the EIDC because integrity of the resources we safeguard is essential.
We create a checksum report when we receive data. This helps us to ensure that a resource has not been changed or corrupted if it moves from one location to another during the process of secure long-term storage.
Checksums also help to provide verification against accidental or deliberate tampering, virus infection or corruption of resources.
The EIDC uses two methods of generating checksums: MD5 and SHA256.
When we accept a data deposit, we provide the depositor with a checksum report. This is a simple text file which contains a list of files in the deposit along with the checksums for each file:
You can use this to verify that the data we've received has not become corrupted during delivery and is the same as the data you submitted for deposit.