Table of Contents for
Your Code as a Crime Scene

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Your Code as a Crime Scene by Adam Tornhill Published by Pragmatic Bookshelf, 2015
  1. Title Page
  2. Your Code as a Crime Scene
  3. Your Code as a Crime Scene
  4. For the Best Reading Experience...
  5. Table of Contents
  6. Early praise for Your Code as a Crime Scene
  7. Foreword by Michael Feathers
  8. Acknowledgments
  9. Chapter 1: Welcome!
  10. About This Book
  11. Optimize for Understanding
  12. How to Read This Book
  13. Toward a New Approach
  14. Get Your Investigative Tools
  15. Part 1: Evolving Software
  16. Chapter 2: Code as a Crime Scene
  17. Meet the Problems of Scale
  18. Get a Crash Course in Offender Profiling
  19. Profiling the Ripper
  20. Apply Geographical Offender Profiling to Code
  21. Learn from the Spatial Movement of Programmers
  22. Find Your Own Hotspots
  23. Chapter 3: Creating an Offender Profile
  24. Mining Evolutionary Data
  25. Automated Mining with Code Maat
  26. Add the Complexity Dimension
  27. Merge Complexity and Effort
  28. Limitations of the Hotspot Criteria
  29. Use Hotspots as a Guide
  30. Dig Deeper
  31. Chapter 4: Analyze Hotspots in Large-Scale Systems
  32. Analyze a Large Codebase
  33. Visualize Hotspots
  34. Explore the Visualization
  35. Study the Distribution of Hotspots
  36. Differentiate Between True Problems and False Positives
  37. Chapter 5: Judge Hotspots with the Power of Names
  38. Know the Cognitive Advantages of Good Names
  39. Investigate a Hotspot by Its Name
  40. Understand the Limitations of Heuristics
  41. Chapter 6: Calculate Complexity Trends from Your Code’s Shape
  42. Complexity by the Visual Shape of Programs
  43. Learn About the Negative Space in Code
  44. Analyze Complexity Trends in Hotspots
  45. Evaluate the Growth Patterns
  46. From Individual Hotspots to Architectures
  47. Part 2: Dissect Your Architecture
  48. Chapter 7: Treat Your Code As a Cooperative Witness
  49. Know How Your Brain Deceives You
  50. Learn the Modus Operandi of a Code Change
  51. Use Temporal Coupling to Reduce Bias
  52. Prepare to Analyze Temporal Coupling
  53. Chapter 8: Detect Architectural Decay
  54. Support Your Redesigns with Data
  55. Analyze Temporal Coupling
  56. Catch Architectural Decay
  57. React to Structural Trends
  58. Scale to System Architectures
  59. Chapter 9: Build a Safety Net for Your Architecture
  60. Know What’s in an Architecture
  61. Analyze the Evolution on a System Level
  62. Differentiate Between the Level of Tests
  63. Create a Safety Net for Your Automated Tests
  64. Know the Costs of Automation Gone Wrong
  65. Chapter 10: Use Beauty as a Guiding Principle
  66. Learn Why Attractiveness Matters
  67. Write Beautiful Code
  68. Avoid Surprises in Your Architecture
  69. Analyze Layered Architectures
  70. Find Surprising Change Patterns
  71. Expand Your Analyses
  72. Part 3: Master the Social Aspects of Code
  73. Chapter 11: Norms, Groups, and False Serial Killers
  74. Learn Why the Right People Don’t Speak Up
  75. Understand Pluralistic Ignorance
  76. Witness Groupthink in Action
  77. Discover Your Team’s Modus Operandi
  78. Mine Organizational Metrics from Code
  79. Chapter 12: Discover Organizational Metrics in Your Codebase
  80. Let’s Work in the Communication Business
  81. Find the Social Problems of Scale
  82. Measure Temporal Coupling over Organizational Boundaries
  83. Evaluate Communication Costs
  84. Take It Step by Step
  85. Chapter 13: Build a Knowledge Map of Your System
  86. Know Your Knowledge Distribution
  87. Grow Your Mental Maps
  88. Investigate Knowledge in the Scala Repository
  89. Visualize Knowledge Loss
  90. Get More Details with Code Churn
  91. Chapter 14: Dive Deeper with Code Churn
  92. Cure the Disease, Not the Symptoms
  93. Discover Your Process Loss from Code
  94. Investigate the Disposal Sites of Killers and Code
  95. Predict Defects
  96. Time to Move On
  97. Chapter 15: Toward the Future
  98. Let Your Questions Guide Your Analysis
  99. Take Other Approaches
  100. Let’s Look into the Future
  101. Write to Evolve
  102. Appendix 1: Refactoring Hotspots
  103. Refactor Guided by Names
  104. Bibliography
  105. You May Be Interested In…

Learn About the Negative Space in Code

Virtually all programming languages use whitespace as indentation to improve readability. (Even Brainf***[19] programs seem to use it, despite the goal implied by the language’s name.) Indentation correlates with the code’s shape. So instead of focusing on the code itself, we’ll look at what’s not there, the negative space. We’ll use indentation as a proxy for complexity.

images/Chp4_Whitespace.png

The idea of indentation as a proxy for complexity is backed by research. (See the research in Reading Beside the Lines: Indentation as a Proxy for Complexity Metric. Program Comprehension, 2008. ICPC 2008. The 16th IEEE International Conference on [HGH08].) It’s a simple metric, yet it correlates with more elaborate metrics, such as McCabe cyclomatic complexity and Halstead complexity measures.

The main advantage to a whitespace analysis is that it’s easy to automate. It’s also fast and language-independent. Even though different languages result in different shapes, the concept works just as well on Java as it does on Clojure or C.

However, there is a cost: some constructs are nontrivial despite looking flat. (List comprehensions[20] come to mind.) But again, measuring software complexity from a static snapshot of the code is not supposed to produce absolute truths. We are looking for hints. Let’s move ahead and see how useful these hints can be.

Whitespace Analysis of Complexity

Back in Check Your Assumptions with Complexity, we identified the Configuration.java class in Hibernate as a potential hotspot. Its name indicates a plain configuration file, but its large size warns it is something more. A complexity measure gives you more clues.

Calculating indentation is trivial: just read a file line by line and count the number of leading spaces and tabs. Let’s use the Python script complexity_analysis.py in the scripts folder of the code you downloaded from the Code Maat distribution page.[21]

The complexity_analysis.py script calculates logical indentation. Four spaces or one tab counts as one logical indentation. Empty and blank lines are ignored.

Open a command prompt in the Hibernate root directory and fire off the following command. Just remember to provide the real path to your own scripts directory:

 
prompt>​ python scripts/complexity_analysis.py \
 
hibernate-core/src/main/java/org/hibernate/cfg/Configuration.java
 
n,total,mean,sd,max
 
3335,8072,2.42,1.63,14

Like an X-ray, these statistics give us a peek into a module to reveal its inner workings. The total column is the accumulated complexity. It’s useful to compare different revisions or modules against each other. (We’ll build on that soon.) The rest of the statistics tell us how that complexity is distributed:

  • The mean column tells us that there’s plenty of complexity, on average 2.42 logical indentations. It’s high but not too bad.

  • The standard deviation sd specifies the variance of the complexity within the module. A low number like we got indicates that most lines have a complexity close to the mean. Again, not too bad.

  • But the max complexity show signs of trouble. A maximum logical indentation level of 14 is high.

A large maximum indentation value means there is a lot of indenting, which essentially means nested conditions. We can expect islands of complexity. It looks as if we’ve found application logic hidden inside a configuration file.

Analyze Code Fragments

images/aside-icons/tip.png

Another promising application is to analyze differences between code revisions. An indentation measure doesn’t require a valid program—it works just fine on partial programs, too. That means we can analyze the complexity delta in each changed line of code. If we do that for each revision in our analysis period, we can detect trends in the modifications we make. This usage is a way to measure modification effort. A low effort is the essence of good design.

When you find excess complexity, you have a clear candiate for refactoring. Before you begin refactoring, you may want to check out the module’s complexity trend. Let’s apply our whitespace analysis to historical data and track trends in the hotspot.

Joe asks:
Joe asks:
With the Power of Shapes, Wouldn’t Visual Programming Languages Give Us an Edge?

Since the dawn of computing, our industry has tried to simplify programming. Visual programming is one such approach. Instead of typing cryptic commands in text, what if we could just draw some shapes, press a button, and have the computer generate the program? Wouldn’t that simplify programming? Indeed it would. But not in the way the idea is sold, nor in a way that matters.

Visual programming might make small tasks easier, but it breaks down quickly for larger problems. (The Influence of the Psychology of Programming on a Language Design [PM00] has a good overview of the research.) The thing is, it’s the larger problems that would benefit from simplifying the process—small tasks generally aren’t that complex. This is a strong argument against visual programming languages. It also explains why demonstrations of visual programming tools look so convincing—demo programs are small by nature.

Expressions also don’t scale very well. A visual symbol represents one thing. We can assign more meanings to it by having the symbol depend on context. (Natural languages have tried this—hieroglyphs show the limitations of the system.) Contrast this with text where you’re free to express virtually any concept.

images/Chp4_VisualSDL.png

I became painfully aware of the limitations of visual programming when I rewrote in C++ a system created in the graphical Specification and Description Language (SDL). What took four screens of SDL was transformed into just a few lines of high-level C++.