In this section, we will implement our own prefix tree only made from STL data structures and algorithms.
- We will include all the headers from the STL parts we use and declare that we use the std namespace by default:
#include <iostream>
#include <optional>
#include <algorithm>
#include <functional>
#include <iterator>
#include <map>
#include <vector>
#include <string>
using namespace std;
- The entire program revolves around a trie for which we have to implement a class first. In our implementation, a trie is basically a recursive map of maps. Every trie node contains a map, which maps from an instance of the payload type T to the next trie node:
template <typename T>
class trie
{
map<T, trie> tries;
- The code for inserting new item sequences is simple. The user provides a begin/end iterator pair and we loop through it recursively. If the user input sequence is {1, 2, 3}, then we look up 1 in the subtrie and then look up 2 in the next subtrie, in order to get the subtrie for 3. If any of those subtries did not exist before, they are implicitly added by the [] operator of std::map:
public:
template <typename It>
void insert(It it, It end_it) {
if (it == end_it) { return; }
tries[*it].insert(next(it), end_it);
}
- We also define convenience functions, which enable the user to just provide a container of items, which are then automatically queried for iterators:
template <typename C>
void insert(const C &container) {
insert(begin(container), end(container));
}
- In order to allow the user to write my_trie.insert({"a", "b", "c"});, we must help the compiler a bit to correctly deduce all the types from that line, so we add a function, which overloads the insert interface with an initializer_list parameter:
void insert(const initializer_list<T> &il) {
insert(begin(il), end(il));
}
- We will also want to see what's in a trie, so we need a print function. In order to print, we can do a depth-first-search through the trie. On the way from the root node down to the first leaf, we record all payload items we have seen already. This way, we have a complete sequence together once we reach the leaf, which is trivially printable. We see that we reached a leaf when tries.empty() is true. After the recursive print call, we pop off the last added payload item again:
void print(vector<T> &v) const {
if (tries.empty()) {
copy(begin(v), end(v),
ostream_iterator<T>{cout, " "});
cout << 'n';
}
for (const auto &p : tries) {
v.push_back(p.first);
p.second.print(v);
v.pop_back();
}
}
- The recursive print function passes around a reference to a printable list of payload items, but the user should call it without any parameters. Therefore, we define a parameterless print function, which constructs the helper list object:
void print() const {
vector<T> v;
print(v);
}
- Now that we can construct and print tries, we may want to search for subtries. The idea is that if the trie contains sequences such as {a, b, c} and {a, b, d, e}, and we give it a sequence, {a, b}, for search, it would return us the subtrie that contains the {c} and {d, e} parts. If we find the subtrie, we return a const reference to it. The possibility exists that there is no such subtrie in case the trie does not contain the sequence we are searching for. In such cases, we still need to return something. The std::optional is a nice helper because we can return an empty optional object if there is no match:
template <typename It>
optional<reference_wrapper<const trie>>
subtrie(It it, It end_it) const {
if (it == end_it) { return ref(*this); }
auto found (tries.find(*it));
if (found == end(tries)) { return {}; }
return found->second.subtrie(next(it), end_it);
}
- Similar to the insert method, we provide a one-parameter version of the subtrie method, which automatically takes iterators from the input container:
template <typename C>
auto subtrie(const C &c) {
return subtrie(begin(c), end(c));
}
};
- That's already it. Let's put the new trie class to use in our main function by instantiating a trie specialized on std::string objects and fill it with some example content:
int main()
{
trie<string> t;
t.insert({"hi", "how", "are", "you"});
t.insert({"hi", "i", "am", "great", "thanks"});
t.insert({"what", "are", "you", "doing"});
t.insert({"i", "am", "watching", "a", "movie"});
- Let's first print the whole trie:
cout << "recorded sentences:n";
t.print();
- Then we obtain the subtrie for all the input sentences that start with "hi", and print it:
cout << "npossible suggestions after "hi":n";
if (auto st (t.subtrie(initializer_list<string>{"hi"}));
st) {
st->get().print();
}
}
- Compiling and running the program shows that it does indeed return us only the two sentences that start with "hi", when we query the trie for exactly that subtrie:
$ ./trie
recorded sentences:
hi how are you
hi i am great thanks
i am watching a movie
what are you doing
possible suggestions after "hi":
how are you
i am great thanks