Table of Contents for
Your Code as a Crime Scene

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Your Code as a Crime Scene by Adam Tornhill Published by Pragmatic Bookshelf, 2015
  1. Title Page
  2. Your Code as a Crime Scene
  3. Your Code as a Crime Scene
  4. For the Best Reading Experience...
  5. Table of Contents
  6. Early praise for Your Code as a Crime Scene
  7. Foreword by Michael Feathers
  8. Acknowledgments
  9. Chapter 1: Welcome!
  10. About This Book
  11. Optimize for Understanding
  12. How to Read This Book
  13. Toward a New Approach
  14. Get Your Investigative Tools
  15. Part 1: Evolving Software
  16. Chapter 2: Code as a Crime Scene
  17. Meet the Problems of Scale
  18. Get a Crash Course in Offender Profiling
  19. Profiling the Ripper
  20. Apply Geographical Offender Profiling to Code
  21. Learn from the Spatial Movement of Programmers
  22. Find Your Own Hotspots
  23. Chapter 3: Creating an Offender Profile
  24. Mining Evolutionary Data
  25. Automated Mining with Code Maat
  26. Add the Complexity Dimension
  27. Merge Complexity and Effort
  28. Limitations of the Hotspot Criteria
  29. Use Hotspots as a Guide
  30. Dig Deeper
  31. Chapter 4: Analyze Hotspots in Large-Scale Systems
  32. Analyze a Large Codebase
  33. Visualize Hotspots
  34. Explore the Visualization
  35. Study the Distribution of Hotspots
  36. Differentiate Between True Problems and False Positives
  37. Chapter 5: Judge Hotspots with the Power of Names
  38. Know the Cognitive Advantages of Good Names
  39. Investigate a Hotspot by Its Name
  40. Understand the Limitations of Heuristics
  41. Chapter 6: Calculate Complexity Trends from Your Code’s Shape
  42. Complexity by the Visual Shape of Programs
  43. Learn About the Negative Space in Code
  44. Analyze Complexity Trends in Hotspots
  45. Evaluate the Growth Patterns
  46. From Individual Hotspots to Architectures
  47. Part 2: Dissect Your Architecture
  48. Chapter 7: Treat Your Code As a Cooperative Witness
  49. Know How Your Brain Deceives You
  50. Learn the Modus Operandi of a Code Change
  51. Use Temporal Coupling to Reduce Bias
  52. Prepare to Analyze Temporal Coupling
  53. Chapter 8: Detect Architectural Decay
  54. Support Your Redesigns with Data
  55. Analyze Temporal Coupling
  56. Catch Architectural Decay
  57. React to Structural Trends
  58. Scale to System Architectures
  59. Chapter 9: Build a Safety Net for Your Architecture
  60. Know What’s in an Architecture
  61. Analyze the Evolution on a System Level
  62. Differentiate Between the Level of Tests
  63. Create a Safety Net for Your Automated Tests
  64. Know the Costs of Automation Gone Wrong
  65. Chapter 10: Use Beauty as a Guiding Principle
  66. Learn Why Attractiveness Matters
  67. Write Beautiful Code
  68. Avoid Surprises in Your Architecture
  69. Analyze Layered Architectures
  70. Find Surprising Change Patterns
  71. Expand Your Analyses
  72. Part 3: Master the Social Aspects of Code
  73. Chapter 11: Norms, Groups, and False Serial Killers
  74. Learn Why the Right People Don’t Speak Up
  75. Understand Pluralistic Ignorance
  76. Witness Groupthink in Action
  77. Discover Your Team’s Modus Operandi
  78. Mine Organizational Metrics from Code
  79. Chapter 12: Discover Organizational Metrics in Your Codebase
  80. Let’s Work in the Communication Business
  81. Find the Social Problems of Scale
  82. Measure Temporal Coupling over Organizational Boundaries
  83. Evaluate Communication Costs
  84. Take It Step by Step
  85. Chapter 13: Build a Knowledge Map of Your System
  86. Know Your Knowledge Distribution
  87. Grow Your Mental Maps
  88. Investigate Knowledge in the Scala Repository
  89. Visualize Knowledge Loss
  90. Get More Details with Code Churn
  91. Chapter 14: Dive Deeper with Code Churn
  92. Cure the Disease, Not the Symptoms
  93. Discover Your Process Loss from Code
  94. Investigate the Disposal Sites of Killers and Code
  95. Predict Defects
  96. Time to Move On
  97. Chapter 15: Toward the Future
  98. Let Your Questions Guide Your Analysis
  99. Take Other Approaches
  100. Let’s Look into the Future
  101. Write to Evolve
  102. Appendix 1: Refactoring Hotspots
  103. Refactor Guided by Names
  104. Bibliography
  105. You May Be Interested In…

Evaluate Communication Costs

To reason about communication costs, we need to know who’s communicating with whom. The analysis model we’ll use is based on the idea that we can identify a main developer of each module.

We’ll define the main developer as the programmer who’s likely to know the most about the specific code. Because code knowledge isn’t easy to measure, we’ll use the number of contributed lines of code instead.

Like all heuristics, our metric has its flaws—in particular, since we measure something as multifaceted as programmer contributions. That doesn’t mean the results are useless; the metrics are there to support your decisions, not to make them for you. Your knowledge and expertise cannot be replaced by data.

So sure, using the number of added lines of code is a rough approximation, but the overall results tend to be good enough. Let’s see the metric in action.

Identify Main Developers by Removed Code

images/aside-icons/tip.png

Since we used the number of added lines to identify main developers, this means that a copy-paste cowboy could easily conquer parts of the codebase. So, let’s turn it around and find an alternative.

Good programmers take pride in doing more with less. That means you could use the number of removed lines of code instead. That tweak to the algorithm would identify developers who actively refactor the code. Since Code Maat implements the analysis, refactoring-main-dev, go ahead and try it yourself.

In practice, you’ll often find that in projects that care about code quality, like Hibernate, the two algorithms identify the same people. This is why we used the conceptually simpler metric of added lines in our case study.

Identify Main Developers

As the following figure shows, the contribution information is recorded in every commit. We just need to instruct Code Maat to sum it up and calculate a degree of ownership for each entity. The author with most added lines is considered the main developer, the knowledge owner, of that module.

images/Chp12_ChurnLog.png

You perform a main developer analysis with the main-dev option. On Hibernate, this analysis will deliver a long list of results. (Remember, Hibernate is a large codebase.) So let’s save the results to a file for further inspection:

 
prompt>​ maat -c git -l hib_evo.log -a main-dev > main_devs.csv

Let’s look inside main_devs.csv to find the main developer of AbstractEntityPersister:

images/Chp12_HotspotMainDev.png

The results identify Mr. Ebersole, the productive project lead on Hibernate, as the main developer. In our analysis period, he contributed 695 of the 1,219 lines that have been added to AbstractEntityPersister, an ownership of 57%.

Remember, we’re after expensive communication paths. So who does Mr. Ebersole have to communicate with? To find out, we need to identify the main developers of the modules that are temporally coupled to AbstractEntityPersister. Let’s look at that.

This Only Works on Git

images/aside-icons/important.png

The main-dev analysis we ran only works on Git. The reason is that neither Subversion nor Mercurial includes the number of modified lines of code in its log files. Fortunately, there’s a workaround that’s almost as good.

If you’re on another version-control system—for example, Subversion—then run the main-dev-by-revs analysis instead. That analysis classifies the programmer who has contributed the most commits to a specific module as its main developer.

Analyze Contributions to Coupled Modules

When we analyzed the temporal coupling to AbstractEntityPersister in the ​code​​​, we identified three dependent modules. Let’s extract the main developers of those from our main_devs.csv analysis results:

Coupled ModuleMain DeveloperOwnership (%)

CustomPersister.java

Steve Ebersole

58

EntityPersister.java

Steve Ebersole

39

GoofyPersisterClassProvider.java

Steve Ebersole

54

As you can see in the preceding figure, all entities that have a temporal coupling to the AbstractEntityPersister are within the mind of the same developer. Looks good—or does it? The low ownership degree, 39 percent, of EntityPersister.java indicates that the code is shared between several authors. Let’s see how much each programmer contributed before we can feel safe.

Calculate Individual Contributions

The contributions of each developer are available from the same version-control information. We just need to use an entity-ownership analysis instead. Here’s how it looks, filtered for EntityPersister:

 
prompt>​ maat -c git -l hib_evo.log -a entity-ownership
 
entity,author,added,deleted
 
...​
 
../EntityPersister.java,Gail Badner,1,0
 
../EntityPersister.java,Steve Ebersole,20,9
 
../EntityPersister.java,Rob Worsnop,3,0
 
../EntityPersister.java,Eric Dalquist,19,8
 
../EntityPersister.java,edalquist,8,2
 
...

Oops—one of the programmers, Eric Dalquist, uses two committer names. We see it immediately in the output above, but Code Maat had no way to know. That means we’ve run into our first analysis bias!

The problem is easy to fix once we’ve identified the authors with multiple aliases. On your own projects, you want to investigate and clean the log before any analyses. Once we’ve done a quick search-and-replace on our data, we rerun the analysis on the cleaned log:

images/Chp12_MainDevs.png

The algorithm now identifies the correct main developer. If we put our results together, we can start to reason about communication:

  1. We have a temporal coupling between EntityPersister and AbstractEntityPersister.

  2. Since AbstractEntityPersister is a hotspot, we know we need to modify the code frequently.

  3. That means its coupled part, EntityPersister, will need to change often as well, but the two modules have different main developers!

Let’s look at the consequences.

Check Communication Dependencies Against Your Organization

Hibernate is open source with a development process that’s different from what most companies in the industry use. Without more context and insight, it’s hard to reason about the consequences of our findings.

What we do know is that communication costs are likely to increase with organizational distance. So when you identify a case like this in your own projects, you want to check the information against your organization. Are the two programmers on the same team? Are they located at the same site? If not, it may be a concern.

When we work together, we develop informal communication channels. We meet in the hallway, grab a coffee together in the morning, or chat about our work during lunch breaks. If we lose those opportunities for informal talks, our products suffer.

In this chapter, you got the basic tools to start analyzing how well your own development work aligns with those communication channels. Let’s sum up.