Table of Contents for
Your Code as a Crime Scene

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Your Code as a Crime Scene by Adam Tornhill Published by Pragmatic Bookshelf, 2015
  1. Title Page
  2. Your Code as a Crime Scene
  3. Your Code as a Crime Scene
  4. For the Best Reading Experience...
  5. Table of Contents
  6. Early praise for Your Code as a Crime Scene
  7. Foreword by Michael Feathers
  8. Acknowledgments
  9. Chapter 1: Welcome!
  10. About This Book
  11. Optimize for Understanding
  12. How to Read This Book
  13. Toward a New Approach
  14. Get Your Investigative Tools
  15. Part 1: Evolving Software
  16. Chapter 2: Code as a Crime Scene
  17. Meet the Problems of Scale
  18. Get a Crash Course in Offender Profiling
  19. Profiling the Ripper
  20. Apply Geographical Offender Profiling to Code
  21. Learn from the Spatial Movement of Programmers
  22. Find Your Own Hotspots
  23. Chapter 3: Creating an Offender Profile
  24. Mining Evolutionary Data
  25. Automated Mining with Code Maat
  26. Add the Complexity Dimension
  27. Merge Complexity and Effort
  28. Limitations of the Hotspot Criteria
  29. Use Hotspots as a Guide
  30. Dig Deeper
  31. Chapter 4: Analyze Hotspots in Large-Scale Systems
  32. Analyze a Large Codebase
  33. Visualize Hotspots
  34. Explore the Visualization
  35. Study the Distribution of Hotspots
  36. Differentiate Between True Problems and False Positives
  37. Chapter 5: Judge Hotspots with the Power of Names
  38. Know the Cognitive Advantages of Good Names
  39. Investigate a Hotspot by Its Name
  40. Understand the Limitations of Heuristics
  41. Chapter 6: Calculate Complexity Trends from Your Code’s Shape
  42. Complexity by the Visual Shape of Programs
  43. Learn About the Negative Space in Code
  44. Analyze Complexity Trends in Hotspots
  45. Evaluate the Growth Patterns
  46. From Individual Hotspots to Architectures
  47. Part 2: Dissect Your Architecture
  48. Chapter 7: Treat Your Code As a Cooperative Witness
  49. Know How Your Brain Deceives You
  50. Learn the Modus Operandi of a Code Change
  51. Use Temporal Coupling to Reduce Bias
  52. Prepare to Analyze Temporal Coupling
  53. Chapter 8: Detect Architectural Decay
  54. Support Your Redesigns with Data
  55. Analyze Temporal Coupling
  56. Catch Architectural Decay
  57. React to Structural Trends
  58. Scale to System Architectures
  59. Chapter 9: Build a Safety Net for Your Architecture
  60. Know What’s in an Architecture
  61. Analyze the Evolution on a System Level
  62. Differentiate Between the Level of Tests
  63. Create a Safety Net for Your Automated Tests
  64. Know the Costs of Automation Gone Wrong
  65. Chapter 10: Use Beauty as a Guiding Principle
  66. Learn Why Attractiveness Matters
  67. Write Beautiful Code
  68. Avoid Surprises in Your Architecture
  69. Analyze Layered Architectures
  70. Find Surprising Change Patterns
  71. Expand Your Analyses
  72. Part 3: Master the Social Aspects of Code
  73. Chapter 11: Norms, Groups, and False Serial Killers
  74. Learn Why the Right People Don’t Speak Up
  75. Understand Pluralistic Ignorance
  76. Witness Groupthink in Action
  77. Discover Your Team’s Modus Operandi
  78. Mine Organizational Metrics from Code
  79. Chapter 12: Discover Organizational Metrics in Your Codebase
  80. Let’s Work in the Communication Business
  81. Find the Social Problems of Scale
  82. Measure Temporal Coupling over Organizational Boundaries
  83. Evaluate Communication Costs
  84. Take It Step by Step
  85. Chapter 13: Build a Knowledge Map of Your System
  86. Know Your Knowledge Distribution
  87. Grow Your Mental Maps
  88. Investigate Knowledge in the Scala Repository
  89. Visualize Knowledge Loss
  90. Get More Details with Code Churn
  91. Chapter 14: Dive Deeper with Code Churn
  92. Cure the Disease, Not the Symptoms
  93. Discover Your Process Loss from Code
  94. Investigate the Disposal Sites of Killers and Code
  95. Predict Defects
  96. Time to Move On
  97. Chapter 15: Toward the Future
  98. Let Your Questions Guide Your Analysis
  99. Take Other Approaches
  100. Let’s Look into the Future
  101. Write to Evolve
  102. Appendix 1: Refactoring Hotspots
  103. Refactor Guided by Names
  104. Bibliography
  105. You May Be Interested In…

Analyze Temporal Coupling

In Chapter 7, Treat Your Code As a Cooperative Witness, we said that temporal coupling can be an interview tool for your codebase. The first step in an interview is to know who you should talk to.

Let’s use sum of coupling analysis to find our first code witness.

Use Sum of Coupling to Identify the Modules to Inspect

You’ve already seen that there are different reasons for modules to be coupled. Some couples, such as a unit and its unit test, are valid. So modules with the highest degree of coupling may not be the most interesting to us. Instead, we want modules that are architecturally significant. A sum of coupling analysis finds those modules.

Sum of coupling looks at how many times each module has been coupled to another one in a commit and sums it up. For example, in the following figure, you see that module app.clj changed with both core.clj and project.clj in Commit #1, but just with core.clj in Commit #2. Its sum of coupling is three.

images/Chp9_SOC.png

The module that changes most frequently together with others must be important and is a good starting point for an investigation. Let’s try it out on Code Maat by reusing the the logfile we mined in Chapter 3, Creating an Offender Profile.

Move into the top-level directory in your Code Maat repository and type the following command:

 
prompt>​ maat -l maat_evo.log -c git -a soc
 
entity,soc
 
src/code_maat/app/app.clj,105
 
test/code_maat/end_to_end/scenario_tests.clj,97
 
src/code_maat/core.clj,93
 
project.clj,74
 
...

You can see that this command uses the same format we saw in the earlier hotspot analysis. The only difference is that we’re requesting -a soc (sum of coupling) instead.

We see that app.clj changes the most with other modules. Let’s keep an eye on app.clj as we dive deeper.

Measure Temporal Coupling

At this point you know that app.clj is the module with the most temporal coupling. The next step is to find out which modules it’s coupled to. We use Code Maat for this analysis:

 
prompt>​ maat -l maat_evo.log -c git -a coupling
 
entity,coupled,degree,average-revs
 
src/code_maat/parsers/git.clj,test/code_maat/parsers/git_test.clj,83,12
 
src/code_maat/analysis/entities.clj,test/code_maat/analysis/entities_test.clj,76,7
 
src/code_maat/analysis/authors.clj,test/code_maat/analysis/authors_test.clj,72,11
 
...

The command line is identical to the one you just used, with the exception that we’re requesting -a coupling instead. The resulting CSV output contains plenty of information:

  1. entity: This is the name of one of the involved modules. Code Maat always calculates pairs.

  2. coupled: This is the coupled counterpart to the entity.

  3. degree: The degree specifies the percent of shared commits. The higher the number, the stronger the coupling. For example, git.clj and git_test.clj change together in 83 percent of all commits.

  4. average-revs: Finally, we get a weighted number of total revisions for the involved modules. The idea here is that we can filter out modules with too few revisions to avoid bias.

You see a typical pattern in the output: each unit changes together with its unit test (e.g. git.clj and git_test.clj, entities.clj and entities_test.clj).

This kind of temporal coupling is expected and not a problem. Code Maat was developed with test-driven development, so I’d say that getting any other result would’ve been a problem. Just plain old physical coupling—nothing too exciting here.

Things get interesting a bit farther down:

 
prompt>​ maat -l maat_evo.log -c git -a coupling
 
...​
 
src/code_maat/app/app.clj,src/code_maat/core.clj,60,23
 
src/code_maat/app/app.clj,test/code_maat/end_to_end/scenario_tests.clj,57,23
 
...

We see that app.clj changed with core.clj 60 percent of the time and with scenario_tests.clj 57 percent of the time. There’s no way to tell why just from the names alone, but 60 percent is a high degree of coupling. We are talking about every second (or so), change in app.clj triggering a change in two other modules. That can’t be good. Let’s investigate why.

Check Out the Evolution Radar

images/aside-icons/tip.png

In a large codebase, a temporal coupling analysis sparks an explosion of data. Code Maat resolves that by allowing us to specify optional thresholds. The research tool Evolution Radar[24] takes a different approach and lets us zoom in and out to the level of detail we’re interested in. So check out the tool and take inspiration.

Investigate Temporal Couples

Once we make such a finding, we need to drill down into the code. Because all changes are recorded in our version-control system, we can perform a diff on the modules. I’d recommend focusing on the shared commits and look for recurring modification patterns within those commits.

Code Maat is written in Clojure. Although an exciting language, it’s far outside the scope of this book. So let’s stay with temporal coupling, and allow me to walk you through the design to spot the problems.

images/Chp9_MaatCoupling.png

I’m a bit ashamed to admit that core.clj is the command-line interface of Code Maat. (I changed it later to a better name.) It parses the arguments you give it, converts them to a Clojure representation, and forwards them to app.clj.

app.clj glues the program together by mapping the given arguments to the correct invocations of parsers, analyses, and output formats. As you can see, the program arguments cause the coupling; every time a new argument is added, two distinct modules have to evolve to know about it.

So, your first takeaway is actually a reminder about the power of names that you learned about in Chapter 5, Judge Hotspots with the Power of Names. With proper naming, we’d have a better entry point for our manual code inspection. Second, we failed to encapsulate a concept that varies. If we extract the knowledge of all command-line arguments from app.clj, we break the coupling and make the code easier to evolve and maintain.

Use Temporal Coupling for Design Insights

The analysis on Code Maat illustrates how we can use temporal coupling analysis on small projects. Code Maat (which I wrote to learn Clojure during my daily commute) is a single-developer project with less than 2,000 lines of code.

Such small projects don’t need a hotspot analysis. We already know which modules are hard to change. Temporal coupling is different because it provides insights into our design. We get active feedback on our work so that we can spot improvements we hadn’t even thought of.

Keep Your Temporal Coupling Algorithms Simple

The algorithm we’ve used so far isn’t the only kid in town. Temporal coupling means that some entities change together over time. But there isn’t any formal definition of what change together means. In research papers, you’ll find several alternative measures.

One typical alternative adds the notion of time to the algorithm; the degree of coupling is weighted by the age of the commits. The idea is to prioritize recent changes over changes in the more distant past. A relationship thus gets weaker with the passage of time. However, as you’ll see soon when we discuss software defects, a time parameter doesn’t necessarily improve the metric.

The algorithm that Code Maat implements, the percent of shared commits, is chosen because when faced with several alternatives that seem equally good, simplicity tends to win. The Code Maat measure is straightforward to implement and, more importantly, intuitive to reason about and verify.

images/ch8_circle_hypothesis.png

Interestingly enough, simplicity may win in criminal investigations, too. In a fascinating study, researchers trained people on two simple heuristics for predicting the home location of criminals:

  • Distance decay: Criminals do not travel far from their homes to offend. Thus, crimes are more likely closer to an offender’s home and less likely farther away.

  • Circle hypothesis: Many serial offenders live within a circle defined by the criminals’ two farthest crime locations.

Using these simple principles allowed the participants to predict the likely home location of serial killers with the same accuracy as a sophisticated geographical profiling system. (See Applications of Geographical Offender Profiling [CY08].) We build the techniques in this book on the same kind of simplicity.

Know the Limitations of Temporal Coupling

Our simple definition of temporal coupling as modules that change in the same commit works well. Often, that definition takes us far enough to identify unexpected relationships in our system. But in larger organizations, our measure is too narrow. When multiple teams are responsible for different parts of the system, the temporal period of interest is probably counted in days or even weeks. We’ll address this problem in Chapter 12, Discover Organizational Metrics in Your Codebase, where you’ll learn to group multiple commits into a logical change set based on a custom timespan.

Another problem with the measure is that we’re limited to the information contained in commits. We may miss important coupling relationships that occur between commits. The solution to this problem requires hooks into our text editors and our IDE to record precise information on our code interactions. Tools like that are under active research.

Yet another bias is moving and renaming modules. While version-control systems track renames, Code Maat does not. (If I ever turn Code Maat into a commercial product, that’s a feature I’d add.) It sounds more limiting than it actually is: problematic modules tend to remain where they are. The good thing is that because we lose some of the supporting information, the results we get are more likely to point to true problems. Consider renaming the module as a reset switch triggered by refactoring.