In this section, we take the ASCII Mandelbrot fractal renderer that we implemented in Chapter 23, Advanced Use of STL Algorithms. First, we are going to make the calculation take much more time by incrementing the calculation limit. Then we get some speedup by doing only four little changes to the program in order to parallelize it:
- In order to follow the steps, it is best to just copy the whole program from the other recipe. Then follow the instructions in the following steps in order to do all needed adjustments. All differences from the original program are highlighted in bold.
The first change is an additional header, <future>:
#include <iostream>
#include <algorithm>
#include <iterator>
#include <complex>
#include <numeric>
#include <vector>
#include <future>
using namespace std;
- The scaler and scaled_cmplx functions don't need any change:
using cmplx = complex<double>;
static auto scaler(int min_from, int max_from,
double min_to, double max_to)
{
const int w_from {max_from - min_from};
const double w_to {max_to - min_to};
const int mid_from {(max_from - min_from) / 2 + min_from};
const double mid_to {(max_to - min_to) / 2.0 + min_to};
return [=] (int from) {
return double(from - mid_from) / w_from * w_to + mid_to;
};
}
template <typename A, typename B>
static auto scaled_cmplx(A scaler_x, B scaler_y)
{
return [=](int x, int y) {
return cmplx{scaler_x(x), scaler_y(y)};
};
}
- In the function mandelbrot_iterations, we are just going to increment the number of iterations in order to make the program a bit more computation-heavy:
static auto mandelbrot_iterations(cmplx c)
{
cmplx z {};
size_t iterations {0};
const size_t max_iterations {100000};
while (abs(z) < 2 && iterations < max_iterations) {
++iterations;
z = pow(z, 2) + c;
}
return iterations;
}
- Then we have a part of the main function that does not need any change again:
int main()
{
const size_t w {100};
const size_t h {40};
auto scale (scaled_cmplx(
scaler(0, w, -2.0, 1.0),
scaler(0, h, -1.0, 1.0)
));
auto i_to_xy ([=](int x) {
return scale(x % w, x / w);
});
- In the to_iteration_count function, we do not call mandelbrot_iterations(x_to_xy(x)) directly any longer, but make the call asynchronous using std::async:
auto to_iteration_count ([=](int x) {
return async(launch::async,
mandelbrot_iterations, i_to_xy(x));
});
- Before the last change, the function to_iteration_count returned us the number of iterations a specific coordinate needs for the Mandelbrot algorithm to converge. Now it returns a future variable that will contain the same value later because it is computed asynchronously. Because of this, we need a vector that holds all the future values, so let's just add one. The output iterator we provide transform as the third argument must be the begin iterator of the new output vector r:
vector<int> v (w * h);
vector<future<size_t>> r (w * h);
iota(begin(v), end(v), 0);
transform(begin(v), end(v), begin(r),
to_iteration_count);
- The accumulate call which did all the printing for us doesn't get size_t values as its second argument any longer, but future<size_t> values. We need to adapt it to this type (if we had used auto& as its type from the beginning then this would not even be necessary), and then we need to call x.get() where we just accessed x before, in order to wait for the value to arrive:
auto binfunc ([w, n{0}] (auto output_it, future<size_t> &x)
mutable {
*++output_it = (x.get() > 50 ? '*' : ' ');
if (++n % w == 0) { ++output_it = 'n'; }
return output_it;
});
accumulate(begin(r), end(r),
ostream_iterator<char>{cout}, binfunc);
}
- Compiling and running gives us the same output as before. The only interesting difference is the execution speed. If we increase the number of iterations for the original version of the program, too, then the parallelized version should compute faster. On my computer with four CPU cores with hyperthreading (which results in 8 virtual cores), I get different results with GCC and clang. The best speedup is 5.3, and the worst is 3.8. The results will also vary across machines, of course.