Python Algorithms: Mastering Basic Algorithms in the Python Language, Second Edition

APPENDIX D

Hints for Exercises

To solve any problem, here are three questions to ask yourself: First, what could I do? Second, what could I read? And third, who could I ask?

— Jim Rohn

Chapter 1

1-1. As machines get faster and get more memory, they can handle larger inputs. For poor algorithms, this will eventually lead to disaster.

1-2. A simple and quite scalable solution would be to sort the characters in each string and compare the results. (In theory, counting the character frequencies, possibly using collections.Counter, would scale even better.) A really poor solution would be to compare all possible orderings of one string with the other. I can’t overstate how poor this solution is; in fact, algorithms don’t get much worse than this. Feel free to code it up and see how large anagrams you can check. I bet you won’t get far.

Chapter 2

2-1. You would be using the same list ten times. Definitely a bad idea. (For example, try running a[0][0] = 23; print(a).)

2-2. One possibility is to use three arrays of size n; let’s call them A, B, and C, along with a count of how many entries have been assigned yet, m. A is the actual array, and B, C, and m form the extra structure for checking. Only the first m entries in C are used, and they are all indices into B. When you perform A[i] = x, you also set B[i] = m and C[m] = i and then increment m (that is, m += 1). Can you see how this gives you the information you need? Extending this to a two-dimensional adjacency array should be quite straightforward.

2-3. If f is O(g), then there is a constant c so that for n > n₀, f(n) ≤ cg(n). This means that the conditions for g being Ω(f) are satisfied by using the constant 1/c. The same logic holds in reverse.

2-4. Let’s see how it works. By definition, b^{logb n} = n. It’s an equation, so we can take the logarithm (base a) on both sides and get log_a (b^log_b n) = log_a n. Because log x^y = y log x (standard logarithm rule), we can write this as (log_a b)(log_b n) = log_a n, which gives us log_b n = (log_a n). The takeaway from this result is that the difference between log_a n and log_b n is just a constant factor (log_a b), which disappears when we use asymptotic notation.

2-5. We want to find out whether, as n increases, the inequality kⁿ ≥ c·n^j is eventually true, for some constant c. For simplicity, we can set c = 1. We can take the logarithm (base k) on both sides (it won’t flip the inequality because it is an increasing function), and we’re left with finding out whether n ≥ j log_k n at some point, which is given by the fact that (increasing) linear functions dominate logarithms. (You should verify that yourself.)

2-6. This can be solved easily using a neat little trick called variable substitution. Like in Exercise 1-5, we set up a tentative inequality, n^k ≥ lg n, and want to show that it holds for large n. Again, we take the logarithm on both sides and get k lg n ≥ lg(lg n). The double logarithm may seem scary, but we can sidestep it quite elegantly. We don’t care about how fast the exponential overtakes the polynomial, only that it happens at some point. This means we can substitute our variable—we set m = lg n. If we can get the result we want by increasing m, we can get it by increasing n. This gives us km ≥ lg m, which is just the same as in Exercise 2-5!

2-7. Anything that involves finding or modifying a certain position will normally take constant time in Python lists because their underlying implementation is arrays. You have to traverse a linked list to do this (on average, half the list), giving a linear running time. Swapping things around, once you know the positions, is constant in both. (See whether you can implement a linear-time linked list reversal.) Modifying the list structure (by inserting or deleting element, except at the end) is generally linear for arrays (and Python lists) but can in many cases be done in constant time for linked lists.

2-8. For the first result, I’ll stick to the upper half here and use O notation. The lower half (Ω) is entirely equivalent. The sum O( f ) + O(g) is the sum of two functions, say, F and G, such that (for large enough n, and some c) F(n) ≤ cf (n) and G(n) ≤ cg(n). (Do you see why it’s OK to use the same c for both?) That means that, for large enough n, we will have F(n) + G(n) ≤ c·( f(n) + g(n)), which simply means that F(n) + G(n) is O( f (n) + g(n)), which is what we wanted to prove. The f · g case is mostly equivalent (with a little wrinkle related to the c). Showing that max(Θ( f), Θ(g)) = Θ(max( f, g)) follows a similar logic. The most surprising fact may be that f + g is O(max( f, g)), or max( f, g) is Ω( f + g)—that is, that maximum grows at least as fast as a sum. This is easily explained by the fact that f + g ≤ 2 · max( f, g).

2-9. When showing equivalence of statements like this, it’s common to show implication from one to the next through the list and then from the last to the first. (You might want to try to show some of the other implications directly as well; there are 30 to choose from.) Here are a couple of hints to get you started. 1Þ2: Imagine that the tree is oriented. Then every edge represents a parent–child relationship, and each node except the root has a parent edge, giving n – 1 edges. 2Þ3: Build T gradually by adding the n – 1 edges one by one. You aren’t allowed to connect nodes already in T (it’s acyclic), so each edge must be used to connect a new node to T, meaning that it will be connected.

2-10. This is sort of an appetizer for the counting in Chapter 3, and you can prove the result by induction (a technique discussed in depth in Chapter 4). There is an easy solution, though (which is quite similar to the presentation in Chapter 2). Give each parent (internal node) two imaginary ice cream cones. Now, every parent gives its children one ice cream cone each. The only one stuck without ice cream is the root. So, if we have n leaves and m internal nodes, we can now see that 2m (the number of ice cream cones, as specified initially) is equal to m + n – 1 (all nodes except the root, with one cone each), which means that m = n – 1. And this is the answer we’re looking for. Neat, huh? (This is an example of a nice counting technique, where we count the same thing in two different ways and use the fact that the two counts—in this case, the number of ice cream cones—must be equal.)

2-11. Number the nodes (arbitrarily). Orient all edges from lower to higher numbers.

2-12. The advantages and disadvantages depend on what you’re using it for. It works well for looking up edge weights efficiently but less well for iterating over the graph’s nodes or a node’s neighbors, for example. You could improve that part by using some extra structures (for example, a global list of nodes, if that’s what you need or a simple adjacency list structure, if that’s required).

Chapter 3

3-1. You could try doing this with induction or even recursion!

3-2. Start by rewriting to (n²–n)/2. Then first drop the constant factor, leaving n²–n. After that, you can drop the n, because it is dominated by n².

3-3. Binary encoding shows us which powers of two are included in a sum, and each is included only once. Let’s say that the first k powers (or binary digits) let us represent any number up to 2^k – 1 (our inductive hypothesis; clearly true for k = 1). Now try to use the property mentioned in the exercise to show another digit (that is, being allowed to add in the next power of two) will let you represent any number up to 2^k+1 – 1.

3-4. One of these is basically a for loop over the possible values. The other one is bisection, which is discussed in more detail in Chapter 6.

3-5. This is quite obvious from the symmetry in the numerator of the formula. Another way of seeing it is that there are as many ways of removing k elements as there are of leaving k elements.

3-6. The act of extracting sec[1:] requires copying n–1 elements, meaning that the running time of the algorithm turns into the handshake sum.

3-7. This quickly yields the handshake sum.

3-8. When unraveling the recursion, you get 2{2T(n–2) + 1} + 1 = 2²T(n–2) + 2 + 1, which eventually turns into a doubling sum, 1 + 2 + … + 2ⁱ. To get to the base case, you need to set i = n, which gives you a sum of powers up to 2ⁿ, which is Θ(2ⁿ).

3-9. Similar to Exercise 3-8, but here the unraveling gives you 2{2T(n–2)+(n–1)}+n = 2²T(n–2)+2(n–1)+n. After a while, you end up with a rather tricky sum, which has 2ⁱT(n–i) as its dominating summand. Setting i = n gives you 2ⁱ. (I hope this sketchy reasoning didn’t completely convince you; you should check it with induction.)

3-10. This is a neat one: take the logarithm on both sides, yielding log x^{log y} = log y^{log x}. Now, simply note that both of these are equal to log x · log y. (See why?)

3-11. What’s happening outside the recursive calls is basically that the two halves of the sequence are merged together to a single sequence. First, let’s just assume that the sequences returned by the recursive calls have the same length as the arguments (that is, lft and rgt don’t change length). The while loop iterates over these, popping off elements until one of them is empty; this is at most linear. The reverse is also at most linear. The res list now consists of elements popped off from lft and rgt, and finally, the rest of the elements (in lft or rgt) are combined (linear time). The only thing remaining is to show that the length of the sequence returned from mergesort has the same length as its argument. You could do this using induction in the length of seq. (If that still seems a bit challenging, perhaps you could pick up some tricks in Chapter 4?)

3-12. This would give us the handshake sum inside f(n), meaning that the recurrence is now T(n) = 2T(n/2) + Θ(n²). Even a basic familiarity with the master theorem should tell you that the quadratic part dominates, meaning that T(n) is now Θ(n²)—drastically worse than the original!

Chapter 4

4-1. Try induction on E, and do the induction step “backward,” as in the internal node count example. The base case (E = 0 or E = 1) is trivial. Assume the formula is true for E – 1, and consider an arbitrary connected planar graph with E edges. Try to remove an edge, and assume (for now) that the smaller graph is still connected. Then the edge removal has reduced the edge count by one, and it must have merged two regions, reducing the region count by one. The formula holds for this, meaning that V – (E – 1) + (F – 1) = 2, which is equivalent to the formula we want to prove. Now see if you can tackle the case when removing an edge disconnects the graph. (Hint: You can apply the induction hypothesis to each of the connected components, but this counts the infinite region twice, so you must compensate for that.) Also try to use induction on V and F. Which version suits your tastes?

4-2. This is actually sort of a trick question, because any sequence of breaks will give you the same “running time” of n–1. You can show this by induction, as follows (with n=1 giving a trivial base case): the first break will give you one rectangle with k squares and one with n–k (where k depends on where you break). Both of these are smaller than n, so by strong induction, we assume that the number of breaks for each of them is k–1 and n–k–1, respectively. Adding these, along with the initial break, we get n–1 for the entire chocolate.

4-3. You can represent this as a graph problem, where an edge uv means that u and v know each other. You’re trying to find the largest subgraph (that is, with the greatest number of nodes) where each node v has a degree d(v) ≥ k. Once again, induction comes to the rescue. The base case is n = k + 1, where you can solve the problem only if the graph is complete. The reduction (inductive hypothesis) is, as you might have guessed, that you can solve the problem for n–1, and the way to solve it for n is to either (1) see that all nodes have degrees greater than or equal to k (we’re done!) or (2) find a single node to remove and solve the rest (by the induction hypothesis). It turns out that you can remove any node you like that has a degree smaller than k, because it could never be part of a solution. (This is a bit like the permutation problem—if it’s necessary to remove a node, just go ahead and remove it.)

Hint for bonus question: Note that d/2 is the ratio of edges to nodes (in the full graph), and as long as you delete nodes with a degree of less than or equal to d/2, that ratio (for remaining subgraph) will not decrease. Just keep deleting until you hit this limit. The remaining graph has a nonzero edge-to-node ratio (because it’s at least as great as for the original), so it must be non-empty. Also, because we couldn’t remove any more nodes, each node has a degree greater than d/2 (that is, we’ve removed all nodes with smaller degrees).

4-4. Although there are many ways of showing that there can be only two central nodes, the easiest way is, perhaps, to construct the algorithm first (using induction) and then use that to complete the proof. The base cases for V = 0, 1, or 2 are quite easy—the available nodes are all central. Beyond that, we want to reduce the problem from V to V – 1. It turns out that we can do that by removing a leaf node. For V > 2, no leaf node can be central (its neighbor will always be “more central” because its longest distance will be lower), so we can just remove it and forget about it. The algorithm follows directly: keep removing leaves (possibly once again implemented by maintaining degrees/counts) until all remaining nodes are equally central. It should now be quite obvious that this occurs when V is at most 2.

4-5. This is a bit like topological sorting, except that we may have cycles, so there’s no guarantee we’ll have nodes with an in-degree of zero. This is, in fact, equivalent to finding a directed Hamiltonian path, which may not even exist in a general graph (and finding out is really hard; see Chapter 11), but for a complete graph with oriented edges (what is actually called a tournament in graph theory), such a path (that is, one that visits each node once, following the directions of the edges) will always exist. We can do a single-element reduction directly—we remove a node and order the rest (which is OK by inductive hypothesis; the base case is trivial). The question now becomes whether (and how) we can insert this last node, or knight. The easiest way to see that this is possible is to simply insert the knight before the first opponent he (or she) defeated (if there is such an opponent; otherwise, place him last). Because we’re choosing the first one, the knight before must have defeated him, so we’re preserving the desired type of ordering.

4-6. This shows how important it is to pay attention to detail when doing induction. The argument breaks down for n=2. Even though the inductive hypothesis is true for n–1 (the base case, n=1), in this case there is no overlap between the two sets, so the inductive step breaks down! Note that if you could somehow show that any two horses had the same color (that is, set the base case to n=2), then the induction would (obviously) be valid.

4-7. The point isn’t that it should work for any tree with n–1 leaves, because we had already assumed that to be the case. The important thing is that the argument hold for any tree with n leaves, and it does. No matter which tree with n leaves you choose, you can delete a leaf and its parent and construct a valid binary tree with n–1 leaves and n–2 internal nodes.

4-8. This is just a matter of applying the rules directly.

4-9. Once we get down to a single person (if we ever do), we know that this person couldn’t have been pointing to anyone else, or that other person would not have been removed. Therefore, he (or she) must be pointing to himself (or, rather, his own chair).

4-10. A quick glance at the code should tell you that this is the handshake recurrence (with the construction of B taking up linear time in each call).

4-11. Try sorting sequences (of “digits”). Use counting sort as a subroutine, with a parameter telling you which digit to sort by. Then just loop over the digits from the last to the first, and sort your numbers (sequences) once for each digit. (Note: You could use induction over the digits to prove radix sort correct.)

4-12. Figure out how big each interval (value range) must be. You can then divide each value by this number, rounding down, to find out which bucket to place it in.

4-13. We assume (as discussed in Chapter 2) that we can use constant-time operations on numbers that are large enough to address the entire data set, and that includes d_i. So, first, find these counts for all strings, adding them as a separate “digit.” You can then use counting sort to sort the numbers by this new digit, for a total running time so far of Θ(∑d_i + n) = Θ(∑d_i). Each “block” of numbers with equal digit length can now be sorted individually (with radix sort). (Do you see how this still gives a total running time of Θ(∑d_i) and how we actually get all the numbers sorted correctly in the end?)

4-14. Represent them as two-digit numbers, where each digit has a value range of 1…n. (Do you see how to do this?) You can then use radix sort, giving you a linear running time in total.

4-15. The list comprehension has a quadratic running time complexity.

4-16. See Chapter 2 for some hints on running experiments.

4-17. It cannot be placed before this point, and as long as we don’t place it any later, it cannot end up after anything that depends on it (because there are no cycles).

4-18. You could generate DAGs by, for example, randomly ordering the nodes, and add a random number of forward-pointing edges to each of them.

4-19. This is quite similar to the original. You now have to maintain the out-degrees of the remaining nodes, and insert each node before the ones you have already found. (Remember not to insert anything in the beginning of a list, though; rather, append, and then reverse it at the end, to avoid a quadratic running time.)

4-20. This is a straightforward recursive implementation of the algorithm idea.

4-21. A simple inductive solution would be to remove one interval, solving the problem for the rest, and then checking whether the initial interval should be added back. The problem is that you’d have to check this interval against all the others, giving a quadratic running time overall. You can improve this running time, however. First, sort the intervals by their left endpoints, and use the induction hypothesis that you can solve the problem for the n–1 first intervals. Now, extend the hypothesis: Assume that you can also find the largest right endpoint among the n–1 intervals. Do you see how the inductive step can now be performed in constant time?

4-22. Instead of randomly selecting pairs u, v, simply go through every possible pair, giving a quadratic running time. (Do you see how this necessarily gives you the right answer for each town?)

4-23. To show that foo was hard, you would have to reduce bar to foo. To show that foo was easy, you would have to reduce foo to baz.

Chapter 5

5-1. The asymptotic running time would be the same, but you’d probably get more overhead (that is, a higher constant factor) because instead of adding lots of objects with a built-in operation, you’d run slower, custom Python code for each object.

5-2. Try turning the induction proof into a recursive algorithm. (You might also want to look up Fleury’s algorithm.)

5-3. Try to reconstruct the inductive argument (and recursive algorithm) that’s used for undirected graphs—it’s virtually identical. The link to Trémaux’s algorithm is the following: Because you’re allowed to traverse each maze passage once in each direction, you can treat the passage as two directed edges, with opposite directions. This means that all intersections (nodes) will have equal in- and out-degrees, and you’re guaranteed that you can find a tour that walks every edge twice, one in each direction. (Note that you couldn’t use Trémaux’s algorithm in the more general case presented in this exercise.)

5-4. This is just a simple matter of traversing the grid that consists of pixels, with adjacent pixels acting as neighbors. It is common to use DFS for this, but any traversal would do.

5-5. I’m sure there would be many ways of using this thread, but one possibility would be to use it like the stack of DFS (or IDDFS), if you’re unable to make any other kinds of marks. You would probably end up visiting the same rooms multiple times, but at least you’d never walk in a cycle.

5-6. It’s not really represented at all in the iterative version. It just implicitly occurs once you’ve popped off all your “traversal descendants” from the stack.

5-7. As explained in Exercise 5-6, there is point in the code where backtracking occurs in the iterative DFS, so we can’t just set the finish time at some specific place (like in the recursive one). Instead, we’d need to add a marker to the stack. For example, instead of adding the neighbors of u to the stack, we could add edges of the form (u, v), and before all of them, we’d push (u, None), indicating the backtracking point for u.

5-8. Let’s say node u must come before v in a topological sorting. If we run DFS from (or through) v first, we could never reach u, so v would finish before we (at some later point) start a new DFS that is run either from or through u. So far, we’re safe. If, on the other hand, we pass u first. Then, because there is a (direct or indirect) dependency (that is, a path) between u and v, we will reach v, which will (once again) finish before u.

5-9. You could just supply some functions as optional arguments here, for example.

5-10. If there is a cycle, DFS will always traverse the cycle as far as it can (possibly after backtracking from some detours). This means it’ll eventually get to where it entered the cycle, creating a back edge. (Of course, it could already have traversed this edge by following some other cycle, but that would still make it a back edge.) So if there are no back edges, there can’t be any cycles.

5-11. Other traversal algorithms would also be able to detect cycles by finding an edge from a visited node to one of its ancestors in the traversal tree (a back edge). However, determining when this happens (that is, distinguishing back edges from cross edges) wouldn’t necessarily be quite as easy. In undirected graphs, however, all you need in order to find a cycle is to reach a node twice, and detecting that is easy, no matter what traversal algorithm you’re using.

5-12. Let’s say you did find a forward and cross edge to some node u. Because there are no direction restrictions, DFS would never have backtracked beyond u without exploring all its out-edges, which means it would already have traversed the hypothetical forward/cross edge in the other direction!

5-13. This is just a matter of keeping track of the distance for each node instead of its predecessor, beginning with zero for the start node. Instead of remembering a predecessor, you simply add 1 to the predecessor’s distance, and remember that. (You could certainly do both, of course.)

5-14. The nice thing about this problem is that for an edge uv, if you color u white, v must be black (or vice versa). This is an idea we’ve seen before: If the constraints of the problem force you to do something, it must be a safe step to take when building a solution. Therefore, you can simply traverse the graph, making sure you’re coloring neighbors in different colors; if, at some point, you can’t, there is no solution. Otherwise, you’ve successfully created a bipartitioning.

5-15. In a strong component, every node can reach every other, so there must be at least one path in each direction. If the edges are reversed, there still will be. On the other hand, any pair that is not connected by two paths like this won’t be after the reversal either, so no new nodes will be added to the strong components either.

5-16. Let’s say the DFS starts somewhere in X. Then, at some point, it will migrate over to Y. We already know it can’t get back to X without backtracking (the SCC graph is acyclic), so every node in Y must receive a finishing time before we get back to X. In other words, at least one node in X will be finished after all the nodes in Y are finished.

5-17. Try finding a simple example where this would give the wrong answer. (You can do it with a really small graph.)

Chapter 6

6-2. The asymptotic running time would be the same. The number of comparison goes up, however. To see this, consider the recurrences B(n) = B(n/2) + 1 and T(n) = T(n/3) + 2 for binary and ternary search, respectively (with base cases B(1) = T(1) = 0 and B(2) = T(2) = 1). You can show (by induction) that B(n) < lg n + 1 < T(n).

6-3. As shown in Exercise 6-2, the number of comparisons won’t go down; however, there can be other advantages. For example, in the 2-3-tree, the 3-nodes help us with balancing. In the more general B-tree, the large nodes help reduce the number of disk accesses. Note that it is common to use binary search inside each node in a B-tree.

6-4. You could just traverse the tree and print or yield each node key between the recursive calls to the left and right subtrees (inorder traversal).

6-5. First you find it the node; let’s call it v. If it’s a leaf, just remove it. If it’s an internal node with a single child, just replace it with its child. If the node has two children, find the largest (rightmost) node in the left subtree or the smallest (leftmost) in the right subtree—your choice. Now replace the key and value in v with those of this descendant and then delete the descendant. (To avoid making the tree unnecessarily unbalanced, you should switch between the left and right versions.)

6-6. We’re inserting n random values, so each time we insert a value, the probability of it being the smallest among the k inserted so far (including this value) is 1/k. If it is, the depth of the leftmost node increases by 1. (For simplicity, let’s say the depth of the root is 1, rather than 0, as is customary.) This means that the node depth is 1 + 1/2 + 1/3 + … + 1/n, a sum known as the n-th harmonic number, or H_n. Interestingly, this sum is Θ(lg n).

6-7. Let’s say you switch place with your left child, and it turns out to be greater than your right child. You’ve just broken the heap property.

6-8. Each parent has two children, so you need to move two steps to get to the children of the next one; hence, the children of node i are at 2i + 1and 2i + 2. If you’re having trouble seeing how this works, try drawing the nodes in a sequence, as they’re placed in the array, with tree edges arcing between parents and children.

6-9. It can be a bit tricky to see how building a heap is linear when considering the standard implementation, which goes through the nodes level by level, from just above the leaves, performing a logarithmic operation on each node as it goes. This almost looks loglinear. However, we can reformulate this into an equivalent divide-and-conquer algorithm, which is a bit more familiar: Make a heap out of the left subtree, then of the right subtree, and then repair the root. The recurrence becomes T(n) = 2T(n/2) + Θ(lg n), which we know (for example, by the master theorem) is linear.

6-10. First, heaps give you direct access to the minimum (or maximum) node. This could, of course, be implemented by maintaining a direct pointer to the leftmost (or rightmost) node in a search tree as well. Second, the heaps allows you to maintain balance easily, and because it is perfectly balanced, it can be represented compactly, leading to very low overhead (you save one reference per node, and you can keep the values located in the same memory area, for example). Finally, building a (balanced) search tree takes loglinear time, while building a heap takes linear time.

6-13. For random input, it wouldn’t really make any difference (except the cost of the extra function call). In general, though, it would mean that no single input would be guaranteed to always elicit worst-case behavior.

6-15. Here you can use the pigeonhole principle (if you try to fit more than n pigeons into n pigeonholes, at least one hole will hold at least two pigeons). Divide the square into four of side n/2. If you have more than four points, one of these must hold at least two points. By simple geometry, the diagonal of these squares is less than d, so this is impossible.

6-16. Just do a pass over the data before you start, removing co-located points. They’re already sorted, so finding duplicates would only be a linear-time operation. Now, when running the algorithm, the slices along the midline can, at most, hold six points (do you see why?), so you now need to compare to at most five following points in the y-sequence.

6-17. This is similar to how the lower bound for sorting is used to prove the lower bound for the convex hull problem: You can reduce the element uniqueness for real numbers to the closest pair problem. Just plot your numbers as points on the x-axis (linear time, which is asymptotically less than the bound at hand) and find the closest pair. If the two points are identical, the elements aren’t unique; otherwise, they are. Because uniqueness cannot be determined in less than loglinear time, it would be impossible for the closest pair problem to be any more efficient.

6-18. The crucial observation is that there’s never any point in including an initial portion of the slice whose sum is zero or negative (you could always discard it and get the same or a higher sum). Also, there’s never any point in discarding an initial portion whose sum is positive (including it will give a higher sum). Thus, we can start summing from the left side, always keeping the best sum (and corresponding interval) so far. Once the sum goes negative, we move i (the starting index) to the next position and start our sum afresh from there. (You should convince yourself that this really works; perhaps prove it by induction?)

Chapter 7

7-1. There are many possibilities here (such as dropping a few coins from the U.S. system). One significant example is the old British system (1, 2, 6, 12, 24, 48, 60).

7-2. This is just a way of viewing how a base-k number system works. This is especially easy to see with k = 10.

7-3. When you consider whether to include the greatest remaining element or not, it will always pay to include it because if you don’t, the sum of the remaining elements can’t make up for the lost value.

7-4. Let’s say Jack is the first one to get rejected by his best feasible wife, Jill, and that she rejects him for Adam. By assumption, Adam has not yet been rejected by his best feasible wife, Alice, which means that he likes Jill at least as well as her. Consider a stable pairing where Jack and Jill are together. (This must exist because Jill is a feasible wife for Jack.) In this pairing, Jill would still prefer Adam, of course. However, we know that Adam prefers Jill over Alice—or any other feasible wife—so this matching wouldn’t be stable after all! In other words, we have a contradiction, invalidating our assumption that some man was not paired with his best feasible wife.

7-5. Let’s say Jack was married to Alice and Jill to Adam in a stable pairing. Because Jill is Jack’s best feasible wife, he will prefer her to Alice. Because the pairing is stable, Jill must prefer Adam. This holds for any stable pairing where Jill has another husband—meaning that she’d prefer any other feasible husband to Jack.

7-6. A greedy algorithm would certainly work if the capacity of your knapsack was divisible by all the various increments. For example, if one item was breakable in increments of 2.3 and another in increments of 3.6 and your knapsack capacity was divisible by 8.28, you’d be OK, because you have a “resolution” that is good enough. (Do you see any further variations we could allow? Other implications of this idea?)

7-7. This follows rather directly from the tree structure. Because the codes all give us unique, deterministic instructions on how to navigate from the root to a leaf, there is never any doubt when we have arrived, or where we have arrived.

7-8. We know that a and b are the two items with the lowest frequencies; that means the frequency of a is lower than (or equal to) the one of c, and the same holds for b and d. If a and d have equal frequencies, we’d sandwich all the inequalities (including a ≤ b and c ≤ d), and all four frequencies are equal.

7-9. Take the case where all files are of equal, constant size. Then a balanced merge tree would give us a loglinear merging time (typical divide and conquer). However, if we make the merge tree completely unbalanced, we’d get a quadratic running time (just like insertion sort, for example). Now consider a set of files whose sizes are the powers of two up to n/2. The last file would have linear size, and in a balanced merge tree, it would be involved in a logarithmic number of merges, meaning that we’d get (at least) loglinear time. Now consider what Huffman’s algorithm would do: It would always merge the two smallest files, and they’d always sum to about (that is, one less than) the size of the next. We get a sum of powers and end up with a linear merge time.

7-10. You would need to have at least edges with the same weight that both are viable as part of a solution. For example, if the lowest weight was used twice, on two different edges, you’d have (at least) two solutions.

7-11. Because the number of edges in all spanning trees is the same, we could do this by simply negating the weights (that is, if an edge had weight w, we’d change it to –w) and finding the minimum spanning tree.

7-12. We need to show this for the general case where we have a set of edges that we know are going into the solution. The subproblem is then the remaining graph, and we want to show that finding a minimum spanning tree in the rest that’s compatible with what we have (no cycles) will give us an optimal solution globally. As usual, we show this by contradiction, assuming that we could find a nonoptimal solution to this subproblem that would give us a better global solution. Both subsolutions would be compatible with what we had, so they would be interchangeable. Clearly, swapping out the nonoptimal solution with the optimal one would improve the global sum, which gives us our contradiction.

7-13. Kruskal’s algorithm invariably finds a minimum spanning forest, which in the case of connected graphs turns into a minimum spanning tree. Prim’s algorithm could be extended with a loop, like depth-first search, so that it would restart in all components.

7-14. It will still run, but it won’t necessarily find the cheapest traversal (or min-cost arborescence).

7-15. Because you can use this to sort real numbers, which has a loglinear lower bound. (This is similar to the case with convex hulls.) You just use the numbers as x-coordinates and use identical y-coordinates. The minimum spanning tree would then be a path from the first number to the last, giving you the sorted order.

7-16. All we need to show is that the component trees have (at most) logarithmic height. The height of a component tree is equal to the highest rank in the component. This rank is increased only if two component tree of equal height are merged, and then it is increased by one. The only way to increase some rank in every union would be to merge the components in a balanced way, giving a logarithmic final rank (and height). Going some rounds without incrementing any ranks won’t help because we’re just “hiding” nodes in trees without changing their ranks, giving us less to work with. In other words, there is no way to get more than a logarithmic height for the component trees.

7-17. It’s all hidden by the logarithmic operations of the heap. In the worst case, if we added each node only once, these operations would be logarithmic in the number of nodes. Now, they could be logarithmic in the number of edges, but since the number of edges is polynomial (quadratic) in the number of nodes, that amounts only to a constant difference: Θ(lg m) = Θ(lg n²) = Θ(lg n).

7-18. The interval with the earliest starting time could, potentially, cover the entire reminder of the set, which could all be nonoverlapping. If we wanted to go with the highest starting time, we’d be equally doomed to failure, by always getting only a single element.

7-19. We’d have to sort them all, but after that, the scanning and elimination can be performed in linear time in total (do you see how?). In other words, the total running time is dominated by the sorting, which is loglinear in general.

Chapter 8

8-1. Instead of checking whether the parameter tuple is already in the cache, just retrieve it and catch the KeyError that might occur if it’s not there. Using some nonexistent value (such as None) along with get might give even better performance.

8-2. One way of viewing this might be counting subsets. Each element is either in the subset or not.

8-3. For fib, you just need the two previous values at each step, while for two_pow, you only need to keep doubling the value you have.

8-5. Just use the “predecessor pointer” idea from Chapter 5. If you’re doing the forward version, store which choice you made (that is, which out-edge you followed) in each node. If you’re doing the reverse version, store where you came from to each node.

8-6. Because the topological sorting still has to visit every edge.

8-7. You could let each node observe its predecessors and then explicitly trigger an update in the estimate in the start node (giving it a value of zero). The observers would be notified of changes and could update their own estimates accordingly, triggering new updates in their observers. This is in many ways quite similar to the relaxation-based solution in this chapter. The solution would be a bit “over-eager,” though. Because cascades of updates are triggered instantly (instead of letting each node finish its out- or in-updates at a time), the solution could, in fact, have exponential running time. (Do you see how?)

8-8. This can be shown in many ways—but one is simply to look at how the list is constructed. Each object is added (either appended or overwriting an old element) using bisect, which finds the right place to put it, in sorted order. By induction, end will be sorted. (Can you think of other ways of seeing that this list must be sorted?)

8-9. You don’t need bisect when the new element is larger than the last element or if end is empty. You could add an if statement to check for that. It might make the code faster, but it would probably make it a bit less readable.

8-10. Just like in the DAG shortest path problem, this would involve remembering “where you came from,” that is, keeping track of predecessors. For the quadratic version, you could—instead of using predecessor pointers—simply copy the list of predecessors at each step. It wouldn’t affect the asymptotic running time (copying all those lists would be quadratic, but that’s what you already have), and the impact on actual running time and memory footprint should be negligible.

8-11. This is quite similar to the LCS code in many ways. If you need more help, you could do a web search for levenshtein distance python, for example.

8-12. Just like the other algorithms, you’d keep track of which choices were made, corresponding to which edges you followed in the “subproblem DAG.”

8-13. You could swap the sequences and their lengths.

8-14. You could divide c and all the elements in w by their greatest common divisor.

8-16. The running time is pseudopolynomial—meaning that it is still exponential. You could easily crank up the knapsack capacity so that the running time became unacceptable, while keeping the actual problem instance size low.

8-19. You could add a set of dummy leaf nodes representing failed searches. Each leaf node would then represent all the nonexistent elements between two that were actually in the tree. You’d have to treat these separately in the sums.

Chapter 9

9-1. You have to somehow modify the algorithm or the graph so the detection mechanism for negative additive cycles can be used to find multiplicative cycles where the product of the exchange rates ends up above 1. The easiest solution is to simply take transform all the weights by taking their logarithms and negating them. You could then use the standard version of Bellman–Ford, and a negative cycle would give you what you needed. (Do you see how?) Of course, to actually use this for anything, you should work out how to output which nodes are involved in this cycle.

9-2. This isn’t a problem, no more than it is in the DAG shortest path problem. It doesn’t matter which one of them ends up first in the ordering because the other one (which then comes later) can’t be used to create a shortcut anyway.

9-3. It gives you a pseudopolynomial running time, all of a sudden (with respect to the original problem instance). Do you see why?

9-4. This can depend on how you do it. Adding nodes multiple times is no longer a good idea, and you should probably set things up so you can access and modify entries directly in your queue when you run relax. You could then do this part in constant time, while the extraction from the queue would now be linear, and you’d end up with a quadratic running time. For a dense graph, that’s actually quite OK.

9-5. Things can go wrong if there are negative cycles—but the Bellman–Ford algorithm would have raised an exception in that case. Barring this, we can turn to the triangular inequality. We know that h(v) ≤ h(u) + w(u, v) for all nodes u and v. This means that w’(u, v) = w(u, v) + h(u) – h(v) ≥ 0, as required.

9-6. We might preserve the shortest paths, but we couldn’t necessarily guarantee that the weights would be nonnegative.

9-9. This requires few changes. You’d use a (binary, Boolean) adjacency matrix instead of a weight matrix. When seeing if you could improve a path, you would not use addition and minimum; instead, you would see whether there was a new path there. In other words, you would use A[u, v] = A[u, v] or A[u, k] and A[k, v].

9-10. The tighter stopping criterion tells us to stop as soon as l + r is greater than the shortest path we’ve found so far, and we’ve already established that that is correct. At the point when both directions have yielded (and therefore visited) the same node, we know the shortest path going through that node has been explored; because it is itself one of those we have explored, it must be greater than or equal to the smallest one of those we have explored.

9-11. No matter which edge you pick we know that d(s,u) + w(u,v) + d(v,t) is smaller than the length of the shortest path found so far, which is, again, shorter than or equal to l + r. This means that both l and r would go past the midpoint of the path, wherever that is. If the midpoint is inside an edge, just choose that; if it’s exactly on a node, choosing either adjacent edge on the path would do the trick.

9-14. Consider the shortest path from v to t. The modified cost can be expressed in two ways. The first is as d(v,t) – h(v) + h(t), which is the same as d(v,t) – h(v) because h(t) = 0. The other way of expressing this modified cost is as the sum of the individual modified edge weights; by assumptions, these are all nonnegative (that is, h is feasible). Therefore, we get d(v,t) – h(v) ≥ 0, or d(v,t) ≥ h(v).

Chapter 10

10-1. Simply split each node v into two nodes, v and v’, and add the edge vv’ with the desired capacity. All in-edges would then stay with v, while all out-edges from v would be moved to v’.

10-2. You could modify the algorithm, or you could modify your data. For example, you could split each node into two, with a unit-capacity edge between them, and give all remaining edges infinite capacity. Then the maximum flow would let you identify the vertex-disjoint paths.

10-3. We know the running time is O(m²), so all we need to do is construct a situation where the quadratic running time occurs. One possibility is to have m/2 nodes in addition to s and t, each with an edge from s and to t. In the worst case, the traversal would visit all the unsaturated out-edges from s, which (by the handshake sum) gives us a quadratic running time.

10-4. Simply replace each edge uv by uv and vu, both with the same capacity. In this case, you could, of course, end up with flow going both ways at the same time. This isn’t really a problem—to find out the actual flow through the undirected edge, just subtract one from the other. The sign of the result will indicate the direction of flow. (Some books avoid having edges in both directions between nodes in order to simplify the use of residual networks. This can be done by splitting one of the two directed edges in two, with a dummy node.)

10-6. For example, you could give the source a capacity (as described in Exercise 10-1) equal to the desired flow value. If feasible, the maximum flow would then have that value.

10-8. You can solve this by finding a minimum cut, as follows. If guest A will attend only if B also attends, add the edge (A,B) to your network, with an infinite capacity. This edge will then never cross a cut (in the forward direction), if it can be avoided. The friends you invite will be on the source side of the cut, while the others will be on the sink side. Your compatibilities can be modeled as follows: any positive compatibility is used as a capacity for an edge from the source, while any negative compatibility is used, negated, as a capacity for an edge to the source. The algorithm will then minimize the sum of such edges crossing the cut, keeping the ones you like on the source side and the ones you don’t on the sink side (to the extent possible).

10-9. Because each person has a single favorite seat, each node on the left side has a single edge to the right. That means that the augmenting paths all consist of single unsaturated edges—so the behavior described is equivalent to the augmenting path algorithm, which we know will give an optimal answer (that is, it’ll pair as many people as possible to their favorite seats).

10-10. Represent the groups for both rounds as nodes. Give the first groups in-edges from the source, with a capacity of k. Similarly, give the second groups out-edges to the sink, also with capacity k. Then add edges from all the first groups to all the second groups, all with capacity 1. Each flow unit is then a person, and if you’re able to saturate the edges from the source (or to the sink), you’ve succeeded. Each group will then have k persons, and each of the second groups will at most have one person from each of the first.

10-11. This solution combines the supply/demand idea with min-cost flow. Represent each planet by a node. Also add a node for every passenger type (that is, for each valid combination of passenger origin and destination). Link every planet to i < n to planet i +1 with a capacity equal to the actual carrying capacity of the spaceship. The passenger type nodes are given supplies equal to the number of passenger of that type (that is, the number of passengers wanting to go from i to j). Consider the node v, representing passengers wanting to go from planet i to planet j. These can either make the trip or not. We represent that fact by adding an edge (with infinite capacity) from v to i and another to j. We then add a demand to node j equal to the supply at v. (In other words, we make sure that each planet has a demand that will account for all passengers who want to go there.) Finally, we add a cost on (v,i) equal to the fare for the trip from i to j, except it’s negative. This represents the amount we’ll make for each of the passengers at v that we take on. We now find a min-cost flow that is feasible with respect to these supplies and demands. This flow will make sure that either each passenger is routed to their desired origin (meaning that they’ll take the trip) and then via the planet-to-planet edges to their destination, adding to our revenue, or they are routed directly to their destination (meaning they won’t take the trip) along a zero-cost edge.

Chapter 11

11-1. Because the running time of bisection is logarithmic. Even if the size of the value range in question is exponential as a function of the problem size, the actual running time will only be linear. (Do you see why?)

11-2. Because they are all in NP, and all problems in NP can be reduced to any NP-complete problem (by the definition of NP-completeness).

11-3. Because the running time is O(nW), where W is the capacity. If W is polynomial in n, then so is the running time.

11-4. The reduction from the version with an arbitrary k is simple enough: Simply add –k as an element.

11-5. It should be clear that we could reduce an unbounded version of the subset sum problem to unbounded knapsack (just set weights and values equal to the numbers in question). The challenge is getting from the unbounded to the bounded case. This is basically a matter of juggling numbers, and there are several elements to this juggling. We still want to keep the weights so that the optimization maximizes them. In addition, however, we need to add some kind of constraints to make sure at most one of each number is used. Let’s look at this constraint thing separately. For n numbers, we can try to create n “slots” using powers of two, representing number i by 2ⁱ. We could then have a capacity of 2¹ + … + 2ⁿ, and run our maximization. This isn’t quite enough, though; the maximization wouldn’t care if we have one instance of 2ⁿ or two instances of 2^n–1. We can add another constraint: We represent number i by 2ⁱ + 2ⁿ⁺¹ and set the capacity to 2¹ + … + 2ⁿ + n2ⁿ⁺¹. For the maximization, it will still pay to fill every slot from 1 to n, but now it can include only n occurrences of 2ⁿ⁺¹, so a single instance of 2n will be preferable to two instances of 2^n–1. But we’re not done yet … this only lets us force the maximization to take exactly one of each number, and that’s not really what we want. Instead, we want two versions of each item, one representing that the number is included and one representing that it is excluded. If number i is included, we will add w_i, and if it is excluded, we will add 0. We will also have a the original capacity, k. These constraints are subordinate to the “one item per slot” stuff, so we’d really like to have two “digits” in our representation. We can do that by multiplying the slot constraint number by a huge constant. If the largest of our numbers is B, we can multiply the constraints with nB, and we should be on the safe side. The resulting scheme, then, is to represent number w_i from the original problem by the following two new numbers, representing inclusion and exclusion, respectively: (2ⁿ⁺¹ + 2ⁱ)nB + w_i and (2ⁿ⁺¹ + 2i)nB. The capacity becomes (n2ⁿ⁺¹ + 2ⁿ + … + 2¹)nB + k.

11-6. It’s easy to reduce three-coloring to any k-coloring for k > 3; you simply conflate two or more of the colors.

11-7. Here you can reduce from any number of things. A simple example would be to use subgraph isomorphisms to detect cliques.

11-8. You can simply simulate undirected edges by adding directed ones in both directions (antiparallel edges).

11-9. You can still use the red-green-blue scheme to simulate direction here and then use the previous reduction from directed Hamilton cycle to directed Hamilton path (you should verify how and why this would still work). Here’s another option, though. Consider how to reduce the undirected Hamilton cycle problem to the undirected Hamilton path problem. Choose some node u, and add three new nodes, u’, v and v’, as well as the (undirected) edges (v,v’) and (u,u’). Now add an edge between v and every neighbor of u. If the original graph has a Hamilton cycle, this new one will obviously have a Hamilton path (just disconnect u from one of its neighbors in the cycle, and add u’ and v’ at either end). More importantly, the implication works both ways: A Hamilton path in the new graph must have u’ and v’ as its end points. If we remove u’, v, and v’, we’re left with a Hamilton path from u to a neighbor of u, and we can link them up to get a Hamilton cycle.

11-10. This is (unsurprisingly) sort of the opposite of the reduction in the other direction. Instead of splitting an existing node, you can add a new one. Connect this node to every other. You will then have a Hamilton cycle in the new graph if and only if there is a Hamilton path in the original one.

11-11. We can trace things up the chain. Longest paths in DAGs can be used to find Hamilton paths, but only in DAGs. This will, again, let us find directed Hamilton cycles in digraphs that become DAGs when we split them at a single node (or, by fiddling with the reduction, something very close to that). However, the digraph we used for reducing 3-SAT to the directed Hamilton cycle was nothing like this. True, we could see a hint of this structure in the s and t nodes, and the general downward direction of edges from s to t, but every row was full of antiparallel edges, and the ability to go in either direction was crucial to the proof. Therefore, things break down here if we assume acyclicity further down.

11-12. The reasoning here is quite similar to that in Exercise 11-11.

11-13. As discussed in the main text, if the object is bigger than half the knapsack, we’re done. If it’s slightly less (but not as small as a quarter of the knapsack), we can include two and again have filled more than half. The only remaining case is if it’s even smaller. In either case, we can just keep piling on, until we get past the midline—and because the objects is so small, it won’t extend far enough across the midline to get us into trouble.

11-14. This is actually easy. First, randomly order the nodes. This will give you two DAGs, consisting of the edges going left-to-right and those going right-to-left. The largest of these must consist of at least half the edges, giving you a 2-approximation.

11-15. Let’s say all the nodes are of odd degree (which will give the matching as large a weight as possible). That means the cycle will consist only of these nodes, and every second edge of the cycle will be part of the matching. Because we’re choosing the minimum matching, we of course choose the smallest of the two possible alternating sequences, ensuring that the weight is at most half the total of the cycle. Because we know the triangle inequality holds, relaxing our assumption and dropping some nodes won’t make the cycle—or the matching—more expensive.

11-16. Feel free to be creative here. You could, perhaps, just try to add each of the objects individually, or you could add some random objects? Or you could run the greedy bound initially—although that will happen already in one of the first expansions …

11-17. Intuitively, you’re getting the most possible value out of the items. See whether you can come up with a more convincing proof, though.

11-18. This requires some knowledge of probability theory, but it’s not that hard. Let’s look at a single clause, where each literal (either a variable or its negation) is either true or false, and the probability of either outcome is 1/2. This means that the probability of the entire clause being true is 1–(1/2)³ = 7/8. This is also the expected number of clauses that will be true, if we have only a single clause. If we have m clauses, we can expect to have 7m/8 true clauses. We know that m is an upper bound on the optimum, so our approximation ratio becomes m/(7m/8) = 8/7. Pretty neat, don’t you think?

11-19. The problem is now expressive enough to solve (for example) the maximum independent set problem, which is NP-hard. Therefore, your problem is also NP-hard. One reduction goes as follows: Set the compatibility for each guest to 1 and add conflicts for each edge in the original graph. If you can now maximize the compatibility sum without inviting guests who dislike each other, you have found the largest independent set.

11-20. The NP-hardness can easily be established, even for m = 2, by reducing from the partition problem. If we can distribute the jobs so that the machines finish at the same time, that will clearly minimize the completion time—and if we can minimize the completion time, we will also know whether they can finish simultaneously (that is, whether the values can be partitioned). The approximation algorithm is easy, too. We consider each job in turn (in some arbitrary order) and assign it to the machine that currently has the earliest finish time (that is, the lowest workload). In other words, it’s a straightforward greedy approach. Showing that it’s a 2-approximation is a little bit harder. Let t be the optimum completion time. First, we know that no job duration is greater than t. Second, we know that the average finish time cannot exceed t, as a completely even distribution is the best we can get. Let M be the machine to finish last in our greedy scheme, and let j be the last job on that machine. Because of our greedy strategy, we know that at the starting time of j, all the other machines were busy, so this starting time was before the average finish time and therefore before t. The duration of j must also be lower than t, so adding this duration to its starting time, we get a value lower than 2t … and this value is, indeed, our completion time.

11-21. You could reuse the basic structure of Listing 11-2, if you’d like. A straightforward approach would be to consider each job in turn and try to assign it to each machine. That is, the branching factor of your search tree will be m. (Note that the ordering of the jobs within a machine doesn’t matter.) At the next level of the search, you then try to place the second job. The state can be represented by a list of the finish times of the m machines. When you tentatively add a job to a machine, you simply add its duration to the finish time; when you backtrack, you can just subtract the duration again. Now you need a bound. Given a partial solution (some scheduled jobs), you need to give an optimistic value for the final solution. For example, we can never finish earlier than the latest finish time in the partial solution, so that’s one possible bound. (Perhaps you can think of better bounds?) Before you start, you must initialize your solution value to an upper bound on the optimum (because we’re minimizing). The tighter you can get this, the better (because it increases your pruning power). Here you could use the approximation algorithm from Exercise 11-20.

Previous Chapter

Appendix C: Graph Terminology

Next Chapter

Index

Table of Contents for Python Algorithms: Mastering Basic Algorithms in the Python Language, Second Edition

Table of Contents for
Python Algorithms: Mastering Basic Algorithms in the Python Language, Second Edition