Let’s take a look at the problem of calculating the “md5″ sum of every file in a directory including within subdirectories.

Well, we know the “md5sum” command works fine on a single file or for an entire directory. How about we write a short perl script to recurse every directory & subdirectory from a given starting point issuing the “md5sum” command on any files it encounters…

#!/usr/bin/perl

$startDir = $ARGV[0];

if ($startDir =~ /^$/) {
        $startDir = ".";
}

printDir($startDir);

sub printDir {
        my $baseDir = $_[0];

        if ($baseDir !~ /.*\/$/) {
                $baseDir = $baseDir . "/";
        }

        my @files = glob($baseDir . "*");

        foreach $file (@files) {
                if (-f $file) {
                        `md5sum $file >> /tmp/md5`;
                } else {
                        printDir($file);
                }
        }
}

Well, that does work and did not take too long to write, but we could have done it much simpler…

How about we let a system tool build a list of all files (and recurse subdirectories) then we use that file as an input file for md5sum…

find . > files.md5
md5sum -c files.md5

Okay, so that works as well and is much shorter than the script. But I think we can do it in one line…

find . ! -type d -print0 | xargs -0 md5sum

There we go, that works nicely.

Can you do better? let me know.