Python Algorithms: Mastering Basic Algorithms in the Python Language, Second Edition

APPENDIX B

List of Problems and Algorithms

If you’re having hull problems, I feel bad for you, son; I’ve got 99 problems, but a breach ain’t one.

— Anonymous¹

This appendix does not list every problem and algorithm mentioned in the book because some algorithms are discussed only to illustrate a principle and some problems serve only as examples for certain algorithms. The most important problems and algorithms, however, are sketched out here, with some references to the main text. If you’re unable to find what you’re looking for by consulting this appendix, take a look in the index.

In most descriptions in this appendix, n refers to the problem size (such as the number of elements in a sequence). For the special case of graphs, though, n refers to the number of nodes, and m refers to the number of edges.

Problems

Cliques and independent sets. A clique is a graph where there is an edge between every pair of nodes. The main problem of interest here is finding a clique in a larger graph (that is, identifying a clique as a subgraph). An independent set in a graph is a set of nodes where no pair is connected by an edge. In other words, finding an independent set is equivalent to taking the complement of the edge set and finding a clique. Finding a k-clique (a clique of k nodes) or finding the largest clique in a graph (the max-clique problem) is NP-hard. (For more information, see Chapter 11.)

Closest pair. Given a set of points in the Euclidean plane, find the two points that are closest to each other. This can be solved in loglinear time using the divide-and-conquer strategy (see Chapter 6).

Compression and optimal decision trees. A Huffman tree is a tree whose leaves have weights (frequencies), and the sum of their weights multiplied by their depth is as small as possible. This makes such trees useful for constructing compression codes and as decision trees when a probability distribution is known for the outcomes. Huffman trees can be built using Huffman’s algorithm, described in Chapter 7 (Listing 7-1).

Connected and strongly connected components. An undirected graph is connected if there is a path from every node to every other. A directed graph is connected if its underlying undirected graph is connected. A connected component is a maximal subgraph that is connected. Connected components can be found using traversal algorithms such as DFS (Listing 5-5) or BFS (Listing 5-9), for example. If there is a (directed) path from every node to every other in a directed graph, it is called strongly connected. A strongly connected component (SCC) is a maximal subgraph that is strongly connected. SCCs can be found using Kosaraju’s algorithm (Listing 5-10).

Convex hulls. A convex hull is the minimum convex region containing a set of points in the Euclidean plane. Convex hulls can be found in loglinear time using the divide-and-conquer strategy (see Chapter 6).

Finding the minimum/maximum/median. Finding the minimum and maximum of a sequence can be found in linear time by a simple scan. Repeatedly finding and extracting the maximum or minimum in constant time, given linear-time preparation, can be done using a binary heap. It is also possible to find the kth smallest element of a sequence (the median for k = n/2) in linear (or expected linear) time, using the select or randomized select. (For more information, see Chapter 6.)

Flow and cut problems. How many units of flow can be pushed through a network with flow capacities on the edges? That is the max-flow problem. An equivalent problem is finding the set of edge capacities that most constrain the flow; this is the min-cut problem. There are several versions of these problems. For example, you could add costs to the edges and find the cheapest of the maximum flows. You could add a lower bound on each edge and look for a feasible flow. You could even add separate supplies and demands in each node. These problems are dealt with in detail in Chapter 10.

Graph coloring. Try to color the nodes of a graph so that no neighbors share a color. Now try to do this with a given number of colors, or even to find the lowest such number (the chromatic number of the graph). This is an NP-hard problem in general. If, however, you’re asked to see whether a graph is two-colorable (or bipartite), the problem can be solved in linear time using simple traversal. The problem of finding a clique cover is equivalent to finding an independent set cover, which is an identical problem to graph coloring. (See Chapter 11 for more on graph coloring.)

The halting problem. Determine whether a given algorithm will terminate with a given input. The problem is undecidable (that is, unsolvable) in the general case (see Chapter 11).

Hamilton cycles/paths and TSP … and Euler tours. Several path and subgraph problems can be solved efficiently. If, however, you want to visit every node exactly once, you’re in trouble. Any problem involving this constraint is NP-hard, including finding a Hamilton cycle (visit every node once and return), a Hamilton path (visit every node once, without returning), or a shortest tour of a complete graph (the Traveling Salesman/Salesrep problem). The problems are NP-hard both for the directed and undirected case (see Chapter 11). The related problem of visiting every edge exactly once, though—finding a so-called Euler tour—is solvable in polynomial time (see Chapter 5). The TSP problem is NP-hard even for special cases such as using Euclidean distances in the plane, but it can be efficiently approximated to within a factor of 1.5 for this case, and for any other metric distance. Approximating the TSP problem in general, though, is NP-hard. (See Chapter 11 for more information.)

The knapsack problem and integer programming. The knapsack problem involves choosing a valuable subset of a set of items, under certain constraints. In the (bounded) fractional case, you have a certain amount of some substances, each of which has a unit value (value per unit of weight). You also have a knapsack that can carry a certain maximum weight. The (greedy) solution is to take as much as you can of each substance, starting with the one with the highest unit value. For the integral knapsack problem, you can take only entire items—fractions aren’t allowed. Each item has a weight and a value. For the bounded case (also known as 0-1 knapsack), you have a limited number of objects of each type. (Another perspective would be that you have a fixed set of objects that you either take or not.) In the unbounded case, you can take as many as you want from each of a set of object types (still respecting your carrying capacity, of course). A special case known as the subset sum problem involves selecting a subset of a set of numbers so that the subset has a given sum. These problems are all NP-hard (see Chapter 11), but admit pseudopolynomial solutions based on dynamic programming (see Chapter 8). The fractional knapsack case, as explained, can even be solved in polynomial time using a greedy strategy (see Chapter 7). Integer programming is, in some ways, a generalization of the knapsack problem (and is therefore obviously NP-hard). It is simply linear programming where the variables are constrained to be integers.

Longest increasing subsequence. Find the longest subsequence of a given sequence whose elements are in increasing order. This can be solved in loglinear time using dynamic programming (see Chapter 8).

Matching. There are many matching problems, all of which involve linking some object to others. The problems discussed in this book are bipartite matching and min-cost bipartite matching (Chapter 10) and the stable marriage problem (Chapter 7). Bipartite matching (or maximum bipartite matching) involves finding the greatest subset of edges in a bipartite graph so that no two edges in the subset share a node. The min-cost version does the same but minimizes the sum of edge costs over this subset. The stable marriage problem is a bit different; there, all men and women have preference rankings of the members of the opposite sex. A stable set of marriages is characterized by the fact that you can’t find a pair that would rather have each other than their current mates.

Minimum spanning trees. A spanning tree is a subgraph whose edges form a tree over all the nodes of the original graph. A minimum spanning tree is one that minimizes the sum of edge costs. Minimum spanning trees can be found using Kruskal’s algorithm (Listing 7-4) or Prim’s algorithm (Listing 7-5), for example. Because the number of edges is fixed, a maximum spanning tree can be found by simply negating the edge weights.

Partitioning and bin packing. Partitioning involves dividing a set of numbers into two sets with equal sums, while the bin packing problem involves packing a set of numbers into a set of “bins” so that the sum in each bin is below a certain limit and so that the number of bins is as small as possible. Both problems are NP-hard. (See Chapter 11.)

SAT, Circuit-SAT, k-CNF-SAT. These are all varieties of the satisfaction problem (SAT), which asks you to determine whether a given logical (Boolean) formula can ever be true, if you’re allowed to set the variables to whatever truth values you want. The circuit-SAT problem simply uses logical circuits rather than formulas, and k-CNF-SAT involves formulas in conjunctive normal form, where each clause consists of k literals. The latter can be solved in polynomial time for k = 2. The other problems, as well as k-CNF-SAT for k > 2, are NP-complete. (See Chapter 11.)

Searching. This is a very common and extremely important problem. You have a key and want to find an associated value. This is, for example, how variables work in dynamic languages such as Python. It’s also how you find almost anything on the Internet these days. Two important solutions are hash tables (see Chapter 2) and binary search or search trees (see Chapter 6). Given a probability distribution for the objects in the data set, optimal search trees can be constructed using dynamic programming (see Chapter 8).

Sequence comparison. You may want to compare two sequences to know how similar (or dissimilar) they are. One way of doing this is to find the longest subsequence the two have in common (longest common subsequence) or to find the minimum number of basic edit operations to go from one sequence to the other (so-called edit distance, or Levenshtein distance). These two problems are more or less equivalent; see Chapter 8 for more information.

Sequence modification. Inserting an element into the middle of a linked list is cheap (constant time), but finding a given location is costly (linear time); for an array, the opposite is true (constant lookup, linear insert, because all later elements must be shifted). Appending can be done cheaply for both structures, though (see the “Black Box” sidebar on list in Chapter 2).

Set and vertex covers. A vertex cover is a set of vertices that cover (that is, are adjacent to) all the edges of the graph. A set cover is a generalization of this idea, where the nodes are replaced with subsets, and you want to cover the entire set. The problem lies in constraining or minimizing the number of nodes/subsets. Both problems are NP-hard (see Chapter 11).

Shortest paths. This problem involves finding the shortest path from one node to another, from one node to all the others (or vice versa), or from all nodes to all others. The one-to-one, one-to-all, and all-to-one cases are solved the same way, normally using BFS for unweighted graphs, DAG shortest path for DAGs, Dijkstra’s algorithm for nonnegative edge weights, and Bellman–Ford in the general case. To speed up things in practice (although without affecting the worst-case running time), you can also use bidirectional Dijkstra, or the A* algorithm. For the all pairs shortest paths problem, the algorithms of choice are probably Floyd–Warshall or (for sparse graphs) Johnson’s algorithm. If the edges are nonnegative, Johnson’s algorithm is (asymptotically) equivalent to running Dijkstra’s algorithm from every node (which may be more effective). (For more information on shortest path algorithms, see Chapters 5 and 9.) Note that the longest path problem (for general graphs) can be used to find Hamilton paths, which means that it is NP-hard. This, in fact, means that the shortest path problem is also NP-hard in the general case. If we disallow negative cycles in the graph, however, our polynomial algorithms will work.

Sorting and element uniqueness. Sorting is an important operation and an essential subroutine for several other algorithms. In Python, you would normally sort by using the list.sort method or the sorted function, both of which use a highly efficient implementation of the timsort algorithm. Other algorithms include insertion sort, selection sort, and gnome sort (all of which have a quadratic running time), as well as heapsort, mergesort, and quicksort (which are loglinear, although this holds only in the average case for quicksort). For information on the quadratic sorting algorithms, see Chapter 5; for the loglinear (divide-and-conquer) algorithms, see Chapter 6. Deciding whether a set of real numbers contains duplicates cannot (in the worst case) be solved with a running time better than loglinear. By reduction, neither can sorting.

Topological sorting. Order the nodes of a DAG so that all the edges point in the same direction. If the edges represent dependencies, a topological sorting represents an ordering that respects the dependencies. This problem can be solved by a form of reference counting (see Chapter 4) or by using DFS (see Chapter 5).

Traversal. The problem here is to visit all the objects in some connected structure, usually represented as nodes in a graph or tree. The idea can be either to visit every node or to visit only those needed to solve some problem. The latter strategy of ignoring parts of the graph or tree is called pruning and is used (for example) in search trees and in the branch and bound strategy. For a lot on traversal, see Chapter 5.

Algorithms and Data Structures

2-3-trees. Balanced tree structure, allowing insertions, deletions, and search in worst-case Θ(lg n) time. Internal nodes can have two or three children, and the tree is balanced during insertion by splitting nodes, as needed. (See Chapter 6.)

A*. Heuristically guided single source shortest path algorithm. Suitable for large search spaces. Instead of choosing the node with the lowest distance estimate (as in Dijkstra’s), the node with the lowest heuristic value (sum of distance estimate and guess for remaining distance) is used. Worst-case running time identical to Dijkstra’s algorithm. (See Listing 9-10.)

AA-tree. 2-3-trees simulated using node rotations in a binary tree with level-numbered nodes. Worst-case running times of Θ(lg n) for insertions, deletions, and search. (See Listing 6-6.)

Bellman–Ford. Shortest path from one node to all others in weighted graphs. Looks for a shortcut along every edge n times. Without negative cycles, correct answer guaranteed after n–1 iterations. If there’s improvement in the last round, a negative cycle is detected, and the algorithm gives up. Running time Θ(nm). (See Listing 9-2.)

Bidirectional Dijkstra. Dijkstra’s algorithm run from start and end node simultaneous, with alternating iterations going to each of the two algorithms. The shortest path is found when the two meet up in the middle (although some care must be taken at this point). The worst-case running time is just like for Dijkstra’s algorithm. (See Listings 9-8 and 9-9.)

Binary search trees. A binary tree structure where each node has a key (and usually an associated value). Descendant keys are partitioned by the node key: Smaller keys go in the left subtree, and greater keys go in the right. On the average, the depth of any node is logarithmic, giving an expected insertion and search time of Θ(lg n). Without extra balancing, though (such as in the AA-tree), the tree can become unbalanced, giving linear running times. (See Listing 6-2.)

Bisection, binary search. A search procedure that works in a manner similar to search trees, by repeated halving the interval of interest in a sorted sequence. The halving is performed by inspecting the middle element and deciding whether the sought value must lie to the left or right. Running time Θ(lg n). A very efficient implementation can be found in the bisect module. (See Chapter 6.)

Branch and bound. A general algorithmic design approach. Searches a space of solutions in a depth-first or best-first order by building and evaluating partial solutions. A conservative estimate is kept for the optimal value, while an optimistic estimate is computed for a partial solution. If the optimistic estimate is worse than the conservative one, the partial solution is not extended, and the algorithm backtracks. Often used to solve NP-hard problems. (See Listing 11-2 for a branch-and-bound solution to the 0-1 knapsack problem.)

Breadth-first search (BFS). Traversing a graph (possibly a tree) level by level, thereby also identifying (unweighted) shortest path. Implemented by using a FIFO queue to keep track of discovered nodes. Running time Θ(n+m). (See Listing 5-9.)

Bucket sort. Sort numerical values that are evenly (uniformly) distributed in a given interval by dividing the interval into n equal-sized buckets and placing the values in them. Expected bucket size is constant, so they can be sorted with (for example) insertion sort. Total running time Θ(n). (See Chapter 4.)

Busacker–Gowen. Finds the cheapest max-flow (or the cheapest flow with a given flow value) in a network by using the cheapest augmenting paths in the Ford–Fulkerson approach. These paths are found using Bellman–Ford or (with some weight adjustments) Dijkstra’s algorithm. The running time in general depends on the maximum flow value and so is pseudopolynomial. For a maximum flow of k, the running time is (assuming Dijkstra’s algorithm is used) O(km lg n). (See Listing 10-5.)

Christofides’ algorithm. An approximation algorithm (with an approximation ratio bound of 1.5) for the metric TSP problem. Finds a minimum spanning tree and then a minimum matching² among the odd-degree nodes of the tree, short-circuiting as needed to make a valid tour of the graph. (See Chapter 11.)

Counting sort. Sort integers with a small value range (with at most Θ(n) contiguous values) in Θ(n) time. Works by counting occurrences and using the cumulative counts to directly place the numbers in the result, updating the counts as it goes. (See Chapter 4.)

DAG shortest path. Finds the shortest path from one node to all others in a DAG. Works by finding a topological sorting of the nodes and then relaxing all out-edges (or, alternatively, all in-edges) at every node from left to right. Can (because of the lack of cycles) also be used to find longest paths. Running time Θ(n+m). (See Listing 8-4.)

Depth-first search (DFS). Traversing a graph (possibly a tree) by going in depth and then backtracking. Implemented by using a LIFO queue to keep track of discovered nodes. By keeping track of discover- and finish-times, DFS can also be used as a subroutine in other algorithms (such as topological sorting or Kosaraju’s algorithm). Running time Θ(n+m). (See Listings 5-4, 5-5, and 5-6.)

Dijkstra’s algorithm. Find the shortest paths from one node to all others in a weighted graph, as long as there are no negative edge weights. Traverses the graph, repeatedly selecting the next node using a priority queue (a heap). The priority is the current distance estimate of the node. These estimates are updated whenever a shortcut is found from a visited node. The running time is Θ((m+n) lg n), which is simply Θ(m lg n) if the graph is connected.

Double-ended queues. FIFO queues implemented using linked lists (or linked lists of arrays), so that inserting and extracting objects at either end can be done in constant time. An efficient implementation can be found in the collections.deque class. (See the “Black Box” sidebar on the topic in Chapter 5.)

Dynamic arrays, vectors. The idea of having extra capacity in an array, so appending is efficient. By relocating the contents to a bigger array, growing it by a constant factor, when it fills up, appends can be constant in average (amortized) time. (See Chapter 2.)

Edmonds–Karp. The concrete instantiation of the Floyd–Warshall method where traversal is performed using BFS. Finds min-cost flow in Θ(nm²) time. (See Listing 10-4.)

Floyd–Warshall. Finds shortest paths from each node to all others. In iteration k, only the first k nodes (in some ordering) are allowed as intermediate nodes along the paths. Extending from k–1 involves checking whether the shortest paths to and from k via the first k–1 nodes is shorter than simply going directly via these nodes. (That is, node k is either used or not, for every shortest path.) Running time is Θ(n³). (See Listing 9-6.)

Ford–Fulkerson. A general approach to solving max-flow problems. The method involves repeatedly traversing the graph to find a so-called augmenting path, a path along which the flow can be increased (augmented). The flow can be increased along an edge if it has extra capacity, or it can be increased backward across an edge (that is, canceled) if there is flow along the edge. Thus, the traversal can move both forward and backward along the directed edges, depending on the flow across them. The running time depends on the traversal strategy used. (See Listing 10-4.)

Gale–Shapley. Finds a stable set of marriages given preference rankings for a set of men and women. Any unengaged men propose to the most preferred woman they haven’t proposed to. Each woman will choose her favorite among her current suitors (possibly staying with her fiancé). Can be implemented with quadratic running time. (See the sidebar “Eager Suitors and Stable Marriages” in Chapter 7.)

Gnome sort. A simple sorting algorithm with quadratic running time. Probably not an algorithm you’ll use in practice. (See Listing 3-1.)

Hashing, hash tables. Look up a key to get the corresponding value, just like in a search tree. Entries are stored in an array, and their positions are found by computing a (pseudorandom, sort of) hash value of the key. Given a good hash function and enough room in the array, the expected running time of insertion, deletion and lookup is Θ(1). (See Chapter 2.)

Heaps, heapsort. Heaps are efficient priority queues. With linear-time preprocessing, a min- (max-) heap will let you find the smallest (largest) element in constant time and extract or replace it in logarithmic time. Adding an element can also be done in logarithmic time. Conceptually, a heap is a full binary tree where each node is smaller (larger) than its children. When modifications are made, this property can be repaired with Θ(lg n) operations. In practice, heaps are usually implemented using arrays (with nodes encoded as array entries). A very efficient implementation can be found in the heapq module. Heapsort is like selection sort, except that the unsorted region is a heap, so finding the largest element n times gives a total running time of Θ(n lg n). (See the “Black Box” sidebar on heaps, heapq, and heapsort in Chapter 6.)

Huffman’s algorithm. Builds Huffman trees, which can be used for building optimal prefix codes, for example. Initially, each element (for example, character in an alphabet) is made into a single-node tree, with a weight equal to its frequency. In each iteration, the two lightest trees are picked, combining them with a new root and giving the new tree a weight equal to the sum of the original two tree weights. This can be done in loglinear time (or, in fact, in linear time if the frequencies are presorted). (See Listing 7-1.)

Insertion sort. A simple sorting algorithm with quadratic running time. It works by repeatedly inserting the next unsorted element in an initial sorted segment of the array. For small data sets, it can actually be preferable to more advanced (and optimal) algorithms such as merge sort or quicksort. (In Python, though, you should use list.sort or sorted if at all possible.) (See Listing 4-3.)

Interpolation search. Similar to ordinary binary search, but linear interpolation between the interval endpoints is used to guess the correct position, rather than simply looking at the middle element. The worst-case running time is still Θ(lg n), but the average-case running time is O(lg lg n) for uniformly distributed data. (Mentioned in the “If You’re Curious …” section of Chapter 6.)

Iterative deepening DFS. Repeated runs of DFS, where each run has a limit to how far it can traverse. For structures with some fanout, the running time will be the same as for DFS or BFS (that is, Θ(n+m)). The point is that it has the advantages of BFS (it finds shortest paths and explores large state spaces conservatively), with the smaller memory footprint of DFS. (See Listing 5-8.)

Johnson’s algorithm. Finds shortest paths from every node to all others. Basically runs Dijkstra’s from every node. However, it uses a trick so that it also works with negative edge weights: It first runs Bellman–Ford from a new start node (with added edges to all existing nodes) and then uses the resulting distances to modify the edge weights of the graph. The modified weights are all nonnegative but are set so that the shortest paths in the original graph will also be the shortest paths in the modified graph. Running time Θ(mn lg n). (See Listing 9-4.)

Kosaraju’s algorithm. Finds strongly connected components, using DFS. First, nodes are ordered by their finish times. Then the edges are reversed, and another DFS is run, selecting start nodes using the first ordering. Running time Θ(n+m). (See Listing 5-11.)

Kruskal’s algorithm. Finds a minimum spanning tree by repeatedly adding the smallest remaining edge that doesn’t create a cycle. This cycle checking can (with some cleverness) be performed very efficiently, so the running time is dominated by sorting the edges. All in all, the running time is Θ(m lg n). (See Listing 7-4.)

Linked lists. An alternative to arrays for representing sequences. Although linked lists are cheap (constant time) to modify once you’ve found the right entries, finding those normally takes linear time. Linked lists are implemented sort of like a path, with each node pointing to the next. Note that Python’s list type is implemented as an array, not a linked list. (See Chapter 2.)

Merge sort. The archetypal divide-and-conquer algorithm. It divides the sequence to be sorted in the middle, sorts the two halves recursively, and then merges the two sorted halves in linear time. The total running time is Θ(n lg n). (See Listing 6-5.)

Ore’s algorithm. An algorithm for traversing actual mazes in person, by marking passage entries and exits. In many ways similar to iterative deepening DFS or BFS. (See Chapter 5.)

Prim’s algorithm. Grows a minimum spanning tree by repeatedly adding the node closest to the tree. It is, at core, a traversal algorithm and uses a priority queue, just like Dijkstra’s algorithm. (See Listing 7-5.)

Radix sort. Sorts numbers (or other sequences) by digit (element), starting with the least significant one. As long as the number of digits is constant and the digits can be sorted in linear time (using, for example, counting sort), the total running time is linear. It’s important that the sorting algorithm used on the digits is stable. (See Chapter 4.)

Randomized select. Finds the median, or, in general, the kth order statistic (the kth smallest element). Works sort of like “half a quicksort.” It chooses a pivot element at random (or arbitrarily) and partitions the other elements to the left (smaller elements) or right (greater elements) of the pivot. The search then continues in the right portion, more or less like binary search. Perfect bisection is not guaranteed, but the expected running time is still linear. (See Listing 6-3.)

Select. The rather unrealistic, but guaranteed linear, sibling of randomized select. It works as follows: Divide the sequence into groups of five. Find the median in each using insertion sort. Find the median of these medians recursively, using select. Use this median of medians as a pivot and partition the elements. Now run select on the proper half. In other words, it’s similar to randomized select—the difference is that it can guarantee that a certain percentage will end up on either side of the pivot, avoiding the totally unbalanced case. Not really an algorithm you’re likely to use in practice, but it’s important to know about. (See Chapter 6.)

Selection sort. A simple sorting algorithm with quadratic running time. Very similar to insertion sort, but instead of repeatedly inserting the next element into the sorted section, you repeatedly find (that is, select) the largest element in the unsorted region (and swap it with the last unsorted element). (See Listing 4-4.)

Timsort. A super-duper in-place sorting algorithm based on mergesort. Without any explicit conditions for handling special cases, it is able to take into account partially sorted sequences, including segments that are sorted in reverse, and can therefore sort many real-world sequences faster than what would seem possible. The implementation in list.sort and sorted is also really fast, so if you need to sort something, that’s what you should use. (See the “Black Box” sidebar on timsort in Chapter 6.)

Topological sorting by reference counting. Orders the nodes of a DAG so that all edges go from left to right. This is done by counting the number of in-edges at each node. The nodes with an in-degree of zero are kept in a queue (could just be a set; the order doesn’t matter). Nodes are taken from the queue and placed in the topological sorted order. As you do so, you decrement the counts for the nodes that this node has edges to. If any of them reaches zero, they are placed in the queue. (See Chapter 4.)

Topological sorting with DFS. Another algorithm for sorting DAG nodes topologically. The idea is simple: perform a DFS and sort the nodes by inverse finish time. To easily get a linear running time, you can instead simply append nodes to your ordering as they receive their finish times in DFS. (See Listing 5-7.)

Tremaux’s algorithm. Like Ore’s algorithm, this is designed to be executed in person, while walking through a maze. The pattern traced by a person executing Tremaux’s algorithm is essentially the same as that of DFS. (See Chapter 5.)

Twice around the tree. An approximation algorithm for the metric TSP problem, guaranteed to yield a solution with a cost of at most twice the optimum. First it builds a minimum spanning tree (which is less than the optimum), and then it “walks around” the tree, taking shortcuts to avoid visiting the same edge twice. Because of the metricity, this is guaranteed to be cheaper than walking each edge twice. This last traversal can be implemented by a preorder DFS. (See Listing 11-1.)

__________________

¹Facetiously attributed to Lt. Cdr. Geordi La Forge of Star Trek: The Next Generation.

²Note that finding matchings in general (possibly nonbipartite) graphs is not covered in this book.

Previous Chapter

Appendix A: Pedal to the Metal: Accelerating Python

Next Chapter

Appendix C: Graph Terminology

Table of Contents for Python Algorithms: Mastering Basic Algorithms in the Python Language, Second Edition

Table of Contents for
Python Algorithms: Mastering Basic Algorithms in the Python Language, Second Edition