In this section, we will implement a little tool that recursively iterates over a given directory. While doing that, it counts the number and size of all files, grouped by their extensions. Finally, it prints which filename extensions exist within that directory, how many there are per extension, and their average file size.
- We need to include necessary headers and we declare that we use namespace std and filesystem.
#include <iostream>
#include <sstream>
#include <iomanip>
#include <map>
#include <filesystem>
using namespace std;
using namespace filesystem;
- The size_string function was already helpful in other recipes. It transforms file sizes to human-readable strings.
static string size_string(size_t size)
{
stringstream ss;
if (size >= 1000000000) {
ss << (size / 1000000000) << 'G';
} else if (size >= 1000000) {
ss << (size / 1000000) << 'M';
} else if (size >= 1000) {
ss << (size / 1000) << 'K';
} else { ss << size << 'B'; }
return ss.str();
}
- Then, we implement a helper function that accepts a path object as its argument and iterates over all files within that path. On its way, it collects all information in a map that maps from filename extensions to pairs that contain the total number and accumulated size of all files that have the same extension.
static map<string, pair<size_t, size_t>> ext_stats(const path &dir)
{
map<string, pair<size_t, size_t>> m;
for (const auto &entry :
recursive_directory_iterator{dir}) {
- If a directory entry is a directory itself, we skip it. Skipping it at this point does not mean that we are not recursively descending into it. recursive_directory_iterator still does that, but we do not want to look at the directory entries themselves.
const path p {entry.path()};
const file_status fs {status(p)};
if (is_directory(fs)) { continue; }
- Next, we extract the extension part of the directory entry string. If it has no extension, we simply skip it.
const string ext {p.extension().string()};
if (ext.length() == 0) { continue; }
- Next, we calculate the size of the file we are looking at. Then, we look up the aggregate object in the map for this extension. If there are yet none at this point, it is created implicitly. We simply increment the file count and add the file size to the size accumulator.
const size_t size {file_size(p)};
auto &[size_accum, count] = m[ext];
size_accum += size;
count += 1;
}
- Afterward, we return the map.
return m;
}
- In the main function, we take either a user-provided path from the command line or the current directory. Of course, we need to check whether it exists because it would not make sense to continue otherwise.
int main(int argc, char *argv[])
{
path dir {argc > 1 ? argv[1] : "."};
if (!exists(dir)) {
cout << "Path " << dir << " does not exist.n";
return 1;
}
- We can immediately iterate over the map that ext_stats gives us. Because the accum_size items in the map contain the sum of all files with the same extension, we divide this sum by the total number of such files before printing it.
for (const auto &[ext, stats] : ext_stats(dir)) {
const auto &[accum_size, count] = stats;
cout << setw(15) << left << ext << ": "
<< setw(4) << right << count
<< " items, avg size "
<< setw(4) << size_string(accum_size / count)
<< 'n';
}
}
- Compiling and running the program yields the following output. I gave it a folder from the offline C++ reference as a command-line argument.
$ ./file_type ~/Documents/cpp_reference/
.css : 2 items, avg size 41K
.gif : 7 items, avg size 902B
.html : 4355 items, avg size 38K
.js : 3 items, avg size 4K
.php : 1 items, avg size 739B
.png : 34 items, avg size 2K
.svg : 53 items, avg size 6K
.ttf : 2 items, avg size 421K