Table of Contents for
Your Code as a Crime Scene

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Your Code as a Crime Scene by Adam Tornhill Published by Pragmatic Bookshelf, 2015
  1. Title Page
  2. Your Code as a Crime Scene
  3. Your Code as a Crime Scene
  4. For the Best Reading Experience...
  5. Table of Contents
  6. Early praise for Your Code as a Crime Scene
  7. Foreword by Michael Feathers
  8. Acknowledgments
  9. Chapter 1: Welcome!
  10. About This Book
  11. Optimize for Understanding
  12. How to Read This Book
  13. Toward a New Approach
  14. Get Your Investigative Tools
  15. Part 1: Evolving Software
  16. Chapter 2: Code as a Crime Scene
  17. Meet the Problems of Scale
  18. Get a Crash Course in Offender Profiling
  19. Profiling the Ripper
  20. Apply Geographical Offender Profiling to Code
  21. Learn from the Spatial Movement of Programmers
  22. Find Your Own Hotspots
  23. Chapter 3: Creating an Offender Profile
  24. Mining Evolutionary Data
  25. Automated Mining with Code Maat
  26. Add the Complexity Dimension
  27. Merge Complexity and Effort
  28. Limitations of the Hotspot Criteria
  29. Use Hotspots as a Guide
  30. Dig Deeper
  31. Chapter 4: Analyze Hotspots in Large-Scale Systems
  32. Analyze a Large Codebase
  33. Visualize Hotspots
  34. Explore the Visualization
  35. Study the Distribution of Hotspots
  36. Differentiate Between True Problems and False Positives
  37. Chapter 5: Judge Hotspots with the Power of Names
  38. Know the Cognitive Advantages of Good Names
  39. Investigate a Hotspot by Its Name
  40. Understand the Limitations of Heuristics
  41. Chapter 6: Calculate Complexity Trends from Your Code’s Shape
  42. Complexity by the Visual Shape of Programs
  43. Learn About the Negative Space in Code
  44. Analyze Complexity Trends in Hotspots
  45. Evaluate the Growth Patterns
  46. From Individual Hotspots to Architectures
  47. Part 2: Dissect Your Architecture
  48. Chapter 7: Treat Your Code As a Cooperative Witness
  49. Know How Your Brain Deceives You
  50. Learn the Modus Operandi of a Code Change
  51. Use Temporal Coupling to Reduce Bias
  52. Prepare to Analyze Temporal Coupling
  53. Chapter 8: Detect Architectural Decay
  54. Support Your Redesigns with Data
  55. Analyze Temporal Coupling
  56. Catch Architectural Decay
  57. React to Structural Trends
  58. Scale to System Architectures
  59. Chapter 9: Build a Safety Net for Your Architecture
  60. Know What’s in an Architecture
  61. Analyze the Evolution on a System Level
  62. Differentiate Between the Level of Tests
  63. Create a Safety Net for Your Automated Tests
  64. Know the Costs of Automation Gone Wrong
  65. Chapter 10: Use Beauty as a Guiding Principle
  66. Learn Why Attractiveness Matters
  67. Write Beautiful Code
  68. Avoid Surprises in Your Architecture
  69. Analyze Layered Architectures
  70. Find Surprising Change Patterns
  71. Expand Your Analyses
  72. Part 3: Master the Social Aspects of Code
  73. Chapter 11: Norms, Groups, and False Serial Killers
  74. Learn Why the Right People Don’t Speak Up
  75. Understand Pluralistic Ignorance
  76. Witness Groupthink in Action
  77. Discover Your Team’s Modus Operandi
  78. Mine Organizational Metrics from Code
  79. Chapter 12: Discover Organizational Metrics in Your Codebase
  80. Let’s Work in the Communication Business
  81. Find the Social Problems of Scale
  82. Measure Temporal Coupling over Organizational Boundaries
  83. Evaluate Communication Costs
  84. Take It Step by Step
  85. Chapter 13: Build a Knowledge Map of Your System
  86. Know Your Knowledge Distribution
  87. Grow Your Mental Maps
  88. Investigate Knowledge in the Scala Repository
  89. Visualize Knowledge Loss
  90. Get More Details with Code Churn
  91. Chapter 14: Dive Deeper with Code Churn
  92. Cure the Disease, Not the Symptoms
  93. Discover Your Process Loss from Code
  94. Investigate the Disposal Sites of Killers and Code
  95. Predict Defects
  96. Time to Move On
  97. Chapter 15: Toward the Future
  98. Let Your Questions Guide Your Analysis
  99. Take Other Approaches
  100. Let’s Look into the Future
  101. Write to Evolve
  102. Appendix 1: Refactoring Hotspots
  103. Refactor Guided by Names
  104. Bibliography
  105. You May Be Interested In…

Automated Mining with Code Maat

In a large system under heavy development, hundreds of commits are made each day. Manually inspecting that data is error-prone and, more importantly, takes time away from all the fun programming. Let’s automate this.

Calculating change frequencies is straightforward: parse the log file and summarize the number of times each module occurs. You could also add more complex processing to keep track of renamed or moved files.

You already know about Code Maat. Now we’re going to use it to analyze change frequencies. The git output is fine for humans but too verbose for a tool. The following command generates a more compact version:

 
prompt>​ git log --pretty=format:'[%h] %an %ad %s' --date=short \
 
--numstat --before=2013-11-01

Code Maat is strict about its input. (It doesn’t have to be—it’s just easier to write a parser if we can ignore special cases.) Here are the rules:

  • Everything except --before is mandatory.

  • Use the --before to get a reproducible, historical output in this example. Here we include all commits before that given date. It’s our temporal period of interest for this analysis.

  • If you want to analyze the complete evolution, just leave out the flag.

  • Specify an optional start date through the --after flag.

As long as you keep the supported log format, you’re free to vary and combine different filtering options.

To persist the log information, just redirect the git output to a file. For example:

 
prompt>​ git log --pretty=format:'[%h] %an %ad %s' --date=short \
 
--numstat --before=2013-11-01 > maat_evo.log

This will result in a file maat_evo.log in your current directory. Before we feed this file to Code Maat, let’s open it and take a look. You will see a logfile with the same type of information as shown in the earlier example.

images/Chp3_GitLogSketchComp.png

Inspect the Data

Inspecting the input data is a good starting point. Code Maat provides a summary option that presents an overview of the information in the log. Once you’ve installed Code Maat as described on the distribution page,[10] fire up the tool by entering the following command—we’ll discuss the options in just a minute:

 
prompt>​ maat -l maat_evo.log -c git -a summary
 
statistic,value
 
number-of-commits,88
 
number-of-entities,45
 
number-of-entities-changed,283
 
number-of-authors,2

The -a flag specifies the analysis we want. In this case, we’re interested in a summary. In addition, we need to tell Code Maat where to find the logfile (-l maat_evo.log) and which version-control system we’re using (-c git). That’s it. These three options should cover most use cases.

The summary statistics displayed above are generated as comma-separated values (CSV). The first line, statistic,value, specifies the heading of each column.

For our purposes, the row number-of-entities-changed holds the interesting data. During our specified development period, the different modules in the system have been changed 283 times. Let’s see whether we can find any patterns in those changes.

Use CSV Output

images/aside-icons/tip.png

Code Maat is designed to be minimalistic. It just collects the results. By generating output as CSV, a well-supported text format, the output can be read by other programs. You can import the CSV into a spreadsheet or, with a little scripting, populate a database with the data.

This model allows you to build more elaborate visualizations and analyses on top of Code Maat. Pure text is the universal interface.

Analyze Change Frequencies

Now that you have the modification data, the next step is to analyze the distribution of those changes across modules. To analyze change frequencies, specify the revisions analysis:

 
prompt>​ maat -l maat_evo.log -c git -a revisions
 
entity,n-revs
 
src/code_maat/analysis/logical_coupling.clj,26
 
src/code_maat/app/app.clj,25
 
src/code_maat/core.clj,21
 
test/code_maat/end_to_end/scenario_tests.clj,20
 
project.clj,19
 
...

The revisions analysis results in two columns: an entity column specifying the name of a source code module, and n-revs, stating the number of revisions of that module.

The output is sorted on the number of revisions. That means our most frequently modified candidate is logical_coupling.clj with 26 changes, followed by 25 changes to the fuzzily named app.clj. I named it—I really should know better.

Thanks to the revisions analysis, you identified the parts of the code with most developer activity. Sure, the number of commits is a rough metric, but we’ll meet more elaborate measures later. As you saw earlier in See That Hotspots Really Work, the relative number of commits is a surprisingly good predictor of defects and design issues. Its simplicity makes it an attractive starting point.