In this section, we will implement simple compress and decompress functions for strings:
- We include some STL libraries first, then we declare that we use the std namespace:
#include <iostream>
#include <string>
#include <algorithm>
#include <sstream>
#include <tuple>
using namespace std;
- For our cheap compression algorithm, we try to find chunks of text containing ranges of the same characters, and we compress those individually. Whenever we start at one string position, we want to find the first position where it contains a different character. We use std::find to find the first character in the range, which is different than the character at the current position. Afterward, we return a tuple containing an iterator to that first different item, the character variable c, which fills the range at hand, and the number of occurrences that this subrange contains:
template <typename It>
tuple<It, char, size_t> occurrences(It it, It end_it)
{
if (it == end_it) { return {it, '?', 0}; }
const char c {*it};
const auto diff (find_if(it, end_it,
[c](char x) { return c != x; }));
return {diff, c, distance(it, diff)};
}
- The compress algorithm continuously calls the occurrences function. This way, we jump from one same character group to another. The r << c << n line pushes the character into the output stream and then the number of occurrences it has in this part of the input string. The output is a string stream that automatically grows with our output. In the end, we return a string object from it, which contains the compressed string:
string compress(const string &s)
{
const auto end_it (end(s));
stringstream r;
for (auto it (begin(s)); it != end_it;) {
const auto [next_diff, c, n] (occurrences(it, end_it));
r << c << n;
it = next_diff;
}
return r.str();
}
- The decompress method works similarly, but it is much simpler. It continuously tries to get a character value out of the input stream and, then, the following number. From those two values, it can construct a string containing the character as often as the number says. In the end, we again return a string from the output stream. By the way, this decompress function is not safe. It can be exploited easily. Can you guess, how? We will have a look at this problem later:
string decompress(const string &s)
{
stringstream ss{s};
stringstream r;
char c;
size_t n;
while (ss >> c >> n) { r << string(n, c); }
return r.str();
}
- In our main function, we construct a simple string with a lot of repetition, on which the algorithm works very well. Let's print the compressed version, and then the compressed and again decompressed version. In the end, we should get the same string as we initially constructed:
int main()
{
string s {"aaaaaaaaabbbbbbbbbccccccccccc"};
cout << compress(s) << 'n';
cout << decompress(compress(s)) << 'n';
}
- Compiling and running the program yields the following output:
$ ./compress
a9b9c11
aaaaaaaaabbbbbbbbbccccccccccc