How to check data integrity using md5sum under GNU/Linux


Data Integrity PreviewIn this article, we will describe how you can check the integrity of your data using the md5sum utility under the GNU/Linux operating system.

What is md5sum?

md5sum is a tool generally used to check data integrity. It calculates and verifies 128-bit MD5 hashes, so you could know if a particular file is a valid one or a corrupt one.

For example, let’s create a backup of the whole ‘/etc’ directory tarred in a ‘tar.gz’ file containing all the configuration data of the vps we’re using in this example:

# mkdir /tmp/example && cd /tmp/example
# tar -cpzf etc-backup.tar.gz /etc/

Then use the md5sum tool to calculate the hash value of the ‘etc-backup.tar.gz’ archive:

# md5sum etc-backup.tar.gz
6e0bde8e7a325322417e9782ed8e73f4  etc-backup.tar.gz

Ok, now the hexadecimal value is the MD5 hash for our data. How can we use this hash value to check if the integrity of the ‘etc-backup.tar.gz’ archive is valid and the archive is not modified? It’s quite easy and trivial, so once you’ve downloaded the backup archive:

# mkdir /tmp/downloads && cd /tmp/downloads
# wget http://example.com/path/to/etc-backup.tar.gz

you can use the md5sum tool to get the MD5 hash of the archive you’ve just downloaded.

# md5sum etc-backup.tar.gz
6e0bde8e7a325322417e9782ed8e73f4  etc-backup.tar.gz

As you are already noticing, the MD5 hash values are identical which means the file we downloaded is the one we need, is valid and healthy.

But what if in the meantime, someone / something modified the archive, for example let’s clear the backed ‘/etc/passwd’ file and re-create the ‘etc-backup.tar.gz’:

– extract the archive by executing:

# tar zxvf etc-backup.tar.gz

– clear the ‘/etc/passwd’ extracted from the archive:

# > etc/passwd

– create the ‘etc-backup.tar.gz’:

# tar -cpzf etc-backup.tar.gz etc/

– check file’s integrity:

# md5sum etc-backup.tar.gz
25e34baa193512242bdee7158cfa2205  etc-backup.tar.gz

As you can see the MD5 hashes are different 
(6e0bde8e7a325322417e9782ed8e73f4 != 25e34baa193512242bdee7158cfa2205) for the same exact file. So, this way you can know if your backup archive is valid and healthy.

And what if you’ve downloaded debian-net installer iso image for example and want to check it against the provided MD5 hashes? You can use the ‘-c’ switch which will read the hashes from the file(s) specified and will check them against the iso images.

So, to check it run:

# md5sum -c MD5SUMS 2>/dev/null | grep net

and you should get:

debian-6.0.5-amd64-netinst.iso: OK

PS. If you liked this post please share it with your friends on the social networks using the buttons on the left or simply leave a reply below. Thanks.

Tutorials
Install ownCloud on a CentOS 6 Server
Tips and Tricks
How to install Jekyll on Debian 8
Tips and Tricks
Useful Exim Commands
  • Abhi Kumar

    Abhi KumarAbhi Kumar

    Author Reply

    Excellent information on data integrity checking.Thanks for sharing,really helpful information…