In this section, we are going to use linear and binary search algorithms on a small example data set:
- We first include all the necessary headers and declare that we use the std namespace:
#include <iostream>
#include <vector>
#include <list>
#include <algorithm>
#include <string>
using namespace std;
- Our data set will consist of city structs, which just save a city's name, and its population count:
struct city {
string name;
unsigned population;
};
- Search algorithms need to be able to compare one item to the other, so we overload the == operator for the city struct instances:
bool operator==(const city &a, const city &b) {
return a.name == b.name && a.population == b.population;
}- We also want to print the city instances, so we overload the stream operator, <<:
ostream& operator<<(ostream &os, const city &city) {
return os << "{" << city.name << ", "
<< city.population << "}";
}- Search functions typically return iterators. These iterators point to the item if they found it or, otherwise, to the end iterator of the underlying container. In the last case, we are not allowed to access such an iterator. Because we are going to print our search results, we implement a function that returns us another function object, which encapsulates the end iterator of a data structure. When used for printing, it will compare its iterator argument against the end iterator and then print the item or, otherwise, just <end>:
template <typename C>
static auto opt_print (const C &container)
{
return [end_it (end(container))] (const auto &item) {
if (item != end_it) {
cout << *item << 'n';
} else {
cout << "<end>n";
}
};
}
- We start with an example vector of some German cities:
int main()
{
const vector<city> c {
{"Aachen", 246000},
{"Berlin", 3502000},
{"Braunschweig", 251000},
{"Cologne", 1060000}
};
- Using this helper, we build a city printer function, which captures the end iterator of our city vector c:
auto print_city (opt_print(c));
- We use std::find to find the item in the vector, which saves the city item of Cologne. At first, this search looks pointless because we get exactly the item we searched for. But we did not know its position in the vector before, and the find function returns us just that. However, we could, for example, make the operator == of the city struct that we overloaded only compare the city name, then we could search just using the city name, without even knowing its population. But that would not be a good design. In the next step, we will do it differently:
{
auto found_cologne (find(begin(c), end(c),
city{"Cologne", 1060000}));
print_city(found_cologne);
}
- Without knowing the population count of a city, and also without tampering with its == operator, we can search only by comparing its name with the vector's content. The std::find_if function accepts a predicate function object instead of a specific value. This way, we can search for the Cologne city item when we only know its name:
{
auto found_cologne (find_if(begin(c), end(c),
[] (const auto &item) {
return item.name == "Cologne";
}));
print_city(found_cologne);
}- In order to make searching a bit prettier and expressive, we can implement predicate builders. The population_higher_than function object accepts a population size and returns us a function that tells if a city instance has a larger population than the captured value. Let's use it to search for a German city with more than two million inhabitants in our small example set. Within the given vector, that city is only Berlin:
{
auto population_more_than ([](unsigned i) {
return [=] (const city &item) {
return item.population > i;
};
});
auto found_large (find_if(begin(c), end(c),
population_more_than(2000000)));
print_city(found_large);
}- The search functions we just used, traverse our containers linearly. Thus they have a runtime complexity of O(n). The STL also has binary search functions, which work within O(log(n)). Let's generate a new example data set, which just consists of some integer values, and build another print function for that:
const vector<int> v {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
auto print_int (opt_print(v));- The std::binary_search function returns boolean values and just tells us if it found an item, but it does not return the item itself. It is important that the container we are searching in is sorted because otherwise, binary search doesn't work correctly:
bool contains_7 {binary_search(begin(v), end(v), 7)};
cout << contains_7 << 'n';- In order to get the items we are searching for, we need other STL functions. One of them is std::equal_range. It does not return an iterator for the item we found, but a pair of iterators. The first iterator points to the first item that is not smaller than the value we've been looking for. The second iterator points to the first item that is larger than it. In our range, which goes from 1 to 10, the first iterator points to the actual 7, because it is the first item, that is not smaller than 7. The second iterator points to the 8 because it's the first item that is larger than 7. If we had multiple values of 7, both the iterators would, in fact, represent a subrange of items:
auto [lower_it, upper_it] (
equal_range(begin(v), end(v), 7));
print_int(lower_it);
print_int(upper_it);
- If we just need one iterator; we can use std::lower_bound or std::upper_bound. The lower_bound function only returns an iterator to the first item that is not smaller than what we searched. The upper_bound function returns an iterator to the first item that is larger than what we searched for:
print_int(lower_bound(begin(v), end(v), 7));
print_int(upper_bound(begin(v), end(v), 7));
}
- Let's compile and run the program to see if the output matches our assumptions:
$ ./finding_items
{Cologne, 1060000}
{Cologne, 1060000}
{Berlin, 3502000}
1
7
8
7
8