Table of Contents for
Your Code as a Crime Scene

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Your Code as a Crime Scene by Adam Tornhill Published by Pragmatic Bookshelf, 2015
  1. Title Page
  2. Your Code as a Crime Scene
  3. Your Code as a Crime Scene
  4. For the Best Reading Experience...
  5. Table of Contents
  6. Early praise for Your Code as a Crime Scene
  7. Foreword by Michael Feathers
  8. Acknowledgments
  9. Chapter 1: Welcome!
  10. About This Book
  11. Optimize for Understanding
  12. How to Read This Book
  13. Toward a New Approach
  14. Get Your Investigative Tools
  15. Part 1: Evolving Software
  16. Chapter 2: Code as a Crime Scene
  17. Meet the Problems of Scale
  18. Get a Crash Course in Offender Profiling
  19. Profiling the Ripper
  20. Apply Geographical Offender Profiling to Code
  21. Learn from the Spatial Movement of Programmers
  22. Find Your Own Hotspots
  23. Chapter 3: Creating an Offender Profile
  24. Mining Evolutionary Data
  25. Automated Mining with Code Maat
  26. Add the Complexity Dimension
  27. Merge Complexity and Effort
  28. Limitations of the Hotspot Criteria
  29. Use Hotspots as a Guide
  30. Dig Deeper
  31. Chapter 4: Analyze Hotspots in Large-Scale Systems
  32. Analyze a Large Codebase
  33. Visualize Hotspots
  34. Explore the Visualization
  35. Study the Distribution of Hotspots
  36. Differentiate Between True Problems and False Positives
  37. Chapter 5: Judge Hotspots with the Power of Names
  38. Know the Cognitive Advantages of Good Names
  39. Investigate a Hotspot by Its Name
  40. Understand the Limitations of Heuristics
  41. Chapter 6: Calculate Complexity Trends from Your Code’s Shape
  42. Complexity by the Visual Shape of Programs
  43. Learn About the Negative Space in Code
  44. Analyze Complexity Trends in Hotspots
  45. Evaluate the Growth Patterns
  46. From Individual Hotspots to Architectures
  47. Part 2: Dissect Your Architecture
  48. Chapter 7: Treat Your Code As a Cooperative Witness
  49. Know How Your Brain Deceives You
  50. Learn the Modus Operandi of a Code Change
  51. Use Temporal Coupling to Reduce Bias
  52. Prepare to Analyze Temporal Coupling
  53. Chapter 8: Detect Architectural Decay
  54. Support Your Redesigns with Data
  55. Analyze Temporal Coupling
  56. Catch Architectural Decay
  57. React to Structural Trends
  58. Scale to System Architectures
  59. Chapter 9: Build a Safety Net for Your Architecture
  60. Know What’s in an Architecture
  61. Analyze the Evolution on a System Level
  62. Differentiate Between the Level of Tests
  63. Create a Safety Net for Your Automated Tests
  64. Know the Costs of Automation Gone Wrong
  65. Chapter 10: Use Beauty as a Guiding Principle
  66. Learn Why Attractiveness Matters
  67. Write Beautiful Code
  68. Avoid Surprises in Your Architecture
  69. Analyze Layered Architectures
  70. Find Surprising Change Patterns
  71. Expand Your Analyses
  72. Part 3: Master the Social Aspects of Code
  73. Chapter 11: Norms, Groups, and False Serial Killers
  74. Learn Why the Right People Don’t Speak Up
  75. Understand Pluralistic Ignorance
  76. Witness Groupthink in Action
  77. Discover Your Team’s Modus Operandi
  78. Mine Organizational Metrics from Code
  79. Chapter 12: Discover Organizational Metrics in Your Codebase
  80. Let’s Work in the Communication Business
  81. Find the Social Problems of Scale
  82. Measure Temporal Coupling over Organizational Boundaries
  83. Evaluate Communication Costs
  84. Take It Step by Step
  85. Chapter 13: Build a Knowledge Map of Your System
  86. Know Your Knowledge Distribution
  87. Grow Your Mental Maps
  88. Investigate Knowledge in the Scala Repository
  89. Visualize Knowledge Loss
  90. Get More Details with Code Churn
  91. Chapter 14: Dive Deeper with Code Churn
  92. Cure the Disease, Not the Symptoms
  93. Discover Your Process Loss from Code
  94. Investigate the Disposal Sites of Killers and Code
  95. Predict Defects
  96. Time to Move On
  97. Chapter 15: Toward the Future
  98. Let Your Questions Guide Your Analysis
  99. Take Other Approaches
  100. Let’s Look into the Future
  101. Write to Evolve
  102. Appendix 1: Refactoring Hotspots
  103. Refactor Guided by Names
  104. Bibliography
  105. You May Be Interested In…

Predict Defects

A high degree of code churn isn’t a problem in and of itself. It’s more of a symptom, because code changes for a reason. Perhaps we have a feature area that’s poorly understood. Or maybe we just have a module with a low-quality implementation.

Given these reasons, it’s hardly surprising that code churn is a good predictor of defects. Let’s see how we can use that in our hotspot analyses.

Analyze Churn on an Architectural Level

images/aside-icons/tip.png

In Chapter 10, Use Beauty as a Guiding Principle, we used temporal coupling to identify expensive change patterns in different architectures. We used the analysis results to detect modification patterns that violated architectural principles. Code churn measures supplement such analyses as well. Let’s see how.

In the architectural analyses, we specify a transformation file. This file defines our architecturally significant components. To run a churn analysis on that level, we just specify the same transformation when we request an entity-churn analysis. When combined with temporal coupling, code churn provides additional insights on how serious the identified dependencies are.

Detect Hotspots by Churn

In this book, we used the number of revisions of each module to detect hotspots. It’s a simple metric that works surprisingly well. But it sure has its limitations. (We discussed them back in Limitations of the Hotspot Criteria.)

Code churn gives you an alternative metric that avoids some of these biases. Here are the typical cases where you should consider code churn:

  • Differences in individual commit styles: Some developers keep their commits small and cohesive; others stick to big-bang commits.

  • Long-lived feature branches: If we develop code on branches that live for weeks without being merged, as you see in the following figure, we may lose important history with regard to the original change frequencies on the branch.

images/Chp14_BranchCommits.png

While both scenarios indicate symptoms of deeper problems, sometimes you’ll find yourself in one of them. In that case, code churn provides a more accurate metric than raw change frequencies.

To use code churn in a hotspot analysis, you combine the results from an entity-churn analysis with a complexity metric—for example, lines of code. The overlap between these two dimensions lets you identify the hotspots.

Consider Relative Code Churn

The code churn measures we’ve used so far are based on absolute churn values. That means code churn erases the differences between commit styles; it no longer matters if someone puts a day’s work into a single commit or if you commit often. All that matters is the amount of code that was affected.

However, it’s worthwhile to investigate an alternative measure. In Use of relative code churn measures to predict system defect density [NB05], a research team found that code churn was highly predictive of bugs. The twist is that the researchers used a different measure than we do. They measured relative code churn.

Relative code churn means that the absolute churn values are adjusted by the size of each file. And according to that research paper, the relative churn values outperform measures of absolute churn. So, have I wasted your time with almost a whole chapter devoted to absolute churn? I certainly hope not. Let’s see why.

First of all, a subsequent research paper found no difference between the effectiveness of absolute and relative churn measures. In fact, absolute values proved to be slightly better at predicting defects. (See Does Measuring Code Change Improve Fault Prediction? [BOW11].) Further, relative churn values are more expensive to calculate. You need to iterate over past revisions of each file and calculate the total amount of code. Compare that to just parsing a version-control log, as we do to get absolute churn values.

The conclusion is that we just cannot tell for sure whether one measure is better than the other. It may well turn out that different development styles and organizations lend themselves better to different measures. In the meantime, I recommend that you start with absolute churn values. Simplicity tends to win in the long run.

Know the Limitations of Code Churn

Like all metrics, code churn has its limitations, too. You saw one such case in Measure the Churn Trend, where a commit of static test data biased the results. Thus, you should be aware of the following pitfalls:

  • Generated code: This problem is quite easy to solve by filtering out generated code from the analysis results.

  • Refactoring: Refactorings are done in small, predictable increments. As a side effect, code that undergoes refactorings may be flagged as high churn even though we’re making it better.

  • Superficial changes: Code churn is sensitive to superficial changes, such as renaming the instance variables in a class or rearranging the functions in a module.

In this chapter, we’ve used code churn to complement other analyses. We used the combined results to support and guide us. In my experience, that’s where churn measures are the most valuable. This strategy also lets you minimize the impact of code churn’s limitations.