 Have you ever needed to compare the contents of two files (or other streams), and it mattered how quickly you got it done? To be frank, it doesn’t normally come up in the list of things you may need on a daily code-crunching basis, but that rather depends on what kind of programs you tend to write. In our world, let’s just say it’s not an uncommon task.
Have you ever needed to compare the contents of two files (or other streams), and it mattered how quickly you got it done? To be frank, it doesn’t normally come up in the list of things you may need on a daily code-crunching basis, but that rather depends on what kind of programs you tend to write. In our world, let’s just say it’s not an uncommon task.
At a first blush, it would seem to be no harder than comparing two arrays. A pointer reading from each file, compare bytes as you come across them, and bail when things differ. And it would be that easy if you were to use memory-mapped files and let the OS map a file on disk to a range in memory, but that has some drawbacks that may not always be OK depending on what you’re trying to do with the files (or streams) in question. It also requires having a physical path on the filesystem that you can pass in to the kernel, and it unduly burdens the kernel with some not insignificant workloads that aren’t (in practice) subject to the same scheduling and fairness guarantees that user code would be, and they can tend to slow down older machines significantly1.
So you need to read from a file and into a buffer (well, two files and two buffers), then compare the contents of the buffers. Only you can’t guarantee that each time you ask the OS to read n bytes, you’ll actually get n bytes back — depending on where the file is, what value of n you asked for (and a million other conditions). And if you get two different values m and n when reading from the two different streams, you need to make sure you juggle the pointers correctly and don’t overrun any buffers (or wind up allocating buffers larger than necessary).
We found ourselves dealing with this one time too many and factored out the code into a library, now available open source on GitHub and on NuGet. StreamCompare is simple and to the point, and offers two interfaces for dealing with streams and files (asynchronously, of course). We’ve taken care of the optimizations and the not-so-obvious gotchas to give you a solid and smart option for comparing the contents Stream instances (via StreamCompare) or files by their paths (via FileCompare). Underneath the hood, StreamCompare avails itself of the fastest approaches it finds available for your runtime, and tries to maximize parallelization while minimizing waits, allocations, and computations.
StreamCompare is released under the MIT public license and binary packages are available on NuGet for .NET Standard 1.3, .NET Standard 2.0, .NET Core 2.2, and .NET Core 3.0. Symbols for all versions and platforms are available via the NuGet symbol server, and the package is strong named. Contributions on GitHub are welcomed!
- Especially under Windows ↩ 
