Table of Contents for
Your Code as a Crime Scene

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Your Code as a Crime Scene by Adam Tornhill Published by Pragmatic Bookshelf, 2015
  1. Title Page
  2. Your Code as a Crime Scene
  3. Your Code as a Crime Scene
  4. For the Best Reading Experience...
  5. Table of Contents
  6. Early praise for Your Code as a Crime Scene
  7. Foreword by Michael Feathers
  8. Acknowledgments
  9. Chapter 1: Welcome!
  10. About This Book
  11. Optimize for Understanding
  12. How to Read This Book
  13. Toward a New Approach
  14. Get Your Investigative Tools
  15. Part 1: Evolving Software
  16. Chapter 2: Code as a Crime Scene
  17. Meet the Problems of Scale
  18. Get a Crash Course in Offender Profiling
  19. Profiling the Ripper
  20. Apply Geographical Offender Profiling to Code
  21. Learn from the Spatial Movement of Programmers
  22. Find Your Own Hotspots
  23. Chapter 3: Creating an Offender Profile
  24. Mining Evolutionary Data
  25. Automated Mining with Code Maat
  26. Add the Complexity Dimension
  27. Merge Complexity and Effort
  28. Limitations of the Hotspot Criteria
  29. Use Hotspots as a Guide
  30. Dig Deeper
  31. Chapter 4: Analyze Hotspots in Large-Scale Systems
  32. Analyze a Large Codebase
  33. Visualize Hotspots
  34. Explore the Visualization
  35. Study the Distribution of Hotspots
  36. Differentiate Between True Problems and False Positives
  37. Chapter 5: Judge Hotspots with the Power of Names
  38. Know the Cognitive Advantages of Good Names
  39. Investigate a Hotspot by Its Name
  40. Understand the Limitations of Heuristics
  41. Chapter 6: Calculate Complexity Trends from Your Code’s Shape
  42. Complexity by the Visual Shape of Programs
  43. Learn About the Negative Space in Code
  44. Analyze Complexity Trends in Hotspots
  45. Evaluate the Growth Patterns
  46. From Individual Hotspots to Architectures
  47. Part 2: Dissect Your Architecture
  48. Chapter 7: Treat Your Code As a Cooperative Witness
  49. Know How Your Brain Deceives You
  50. Learn the Modus Operandi of a Code Change
  51. Use Temporal Coupling to Reduce Bias
  52. Prepare to Analyze Temporal Coupling
  53. Chapter 8: Detect Architectural Decay
  54. Support Your Redesigns with Data
  55. Analyze Temporal Coupling
  56. Catch Architectural Decay
  57. React to Structural Trends
  58. Scale to System Architectures
  59. Chapter 9: Build a Safety Net for Your Architecture
  60. Know What’s in an Architecture
  61. Analyze the Evolution on a System Level
  62. Differentiate Between the Level of Tests
  63. Create a Safety Net for Your Automated Tests
  64. Know the Costs of Automation Gone Wrong
  65. Chapter 10: Use Beauty as a Guiding Principle
  66. Learn Why Attractiveness Matters
  67. Write Beautiful Code
  68. Avoid Surprises in Your Architecture
  69. Analyze Layered Architectures
  70. Find Surprising Change Patterns
  71. Expand Your Analyses
  72. Part 3: Master the Social Aspects of Code
  73. Chapter 11: Norms, Groups, and False Serial Killers
  74. Learn Why the Right People Don’t Speak Up
  75. Understand Pluralistic Ignorance
  76. Witness Groupthink in Action
  77. Discover Your Team’s Modus Operandi
  78. Mine Organizational Metrics from Code
  79. Chapter 12: Discover Organizational Metrics in Your Codebase
  80. Let’s Work in the Communication Business
  81. Find the Social Problems of Scale
  82. Measure Temporal Coupling over Organizational Boundaries
  83. Evaluate Communication Costs
  84. Take It Step by Step
  85. Chapter 13: Build a Knowledge Map of Your System
  86. Know Your Knowledge Distribution
  87. Grow Your Mental Maps
  88. Investigate Knowledge in the Scala Repository
  89. Visualize Knowledge Loss
  90. Get More Details with Code Churn
  91. Chapter 14: Dive Deeper with Code Churn
  92. Cure the Disease, Not the Symptoms
  93. Discover Your Process Loss from Code
  94. Investigate the Disposal Sites of Killers and Code
  95. Predict Defects
  96. Time to Move On
  97. Chapter 15: Toward the Future
  98. Let Your Questions Guide Your Analysis
  99. Take Other Approaches
  100. Let’s Look into the Future
  101. Write to Evolve
  102. Appendix 1: Refactoring Hotspots
  103. Refactor Guided by Names
  104. Bibliography
  105. You May Be Interested In…

Investigate Knowledge in the Scala Repository

The Scala codebase is fairly large, with approximately 300,000 lines of code. The project flourishes with developer activity; over the past two years, more than 150 developers have contributed to the project. That scale of development makes it virtually impossible for any single developer to keep it all in his or her head. So, let’s put our map together to guide us.

Start by cloning Scala’s git repository:

 
prompt>​ git clone https://github.com/scala/scala.git
 
Cloning into 'scala'...
 
...

To get reproducible results, we need to go back in time to when this book was written. We do that with the piece of git magic we learned in Turn Back Time. But we need to be careful. Because Scala uses different branches, we need to know where we are before we travel in time. We do that with the git status:

 
prompt>​ git status
 
On branch 2.11.x
 
Your branch is up-to-date with 'origin/2.11.x'.

You use the name of the branch—in this case, origin/2.11.x—as the final argument to the command that rolls back the codebase:

 
prompt>​ git checkout `git rev-list -n 1 --before="2013-12-31" origin/2.11.x`
 
...​
 
HEAD is now at 969a269..

Now your local Scala repository should look just like it did at the end of 2013, and you’re ready to analyze.

Analyze the Knowledge Distribution

Because we want to use the information to find the right people to talk to, we have to identify the developers who know the different parts of the system. This is a similar problem to the one we solved back in Evaluate Communication Costs, where we used a main developer analysis to identify the top contributor to each module.

We start by generating a version-control log:

 
prompt>​ git log --pretty=format:'[%h] %an %ad %s' --date=short \
 
--numstat --before=2013-12-31 --after=2011-12-31 > scala_evo.log

The command limits our analysis period to the last two years. That’s because knowledge is fragile and dissolves over time; if we haven’t touched a piece of code for a couple of years, it’s unlikely that we remember much about it. In addition, code that’s been stable for that long is rarely the focus of our daily activities. But remember that these time periods are all heuristics that you may have to fine-tune in your own projects.

Now that we have a version-control log, let’s see what happens when we aggregate all that information to identify the main developers:

 
prompt>​ maat -c git -l scala_evo.log -a main-dev > scala_main_dev.csv

The command saves the analysis results to the file scala_main_dev.csv for further processing. If you look inside it, you should see the main developer of each module:

 
entity,main-dev,added,total-added,ownership
 
..
 
GenICode.scala,Paul Phillips,584,1579,0.37
 
ICodeCheckers.scala,Jason Zaugg,19,44,0.43
 
ICodes.scala,Grzegorz Kossakowski,16,32,0.5
 
...

The main developer information serves as the basis of our knowledge map. We just need to project the information on the geography of our system. Let’s do it.

Project the Main Developers onto a Map

Back in Chapter 3, Creating an Offender Profile, we used lines of code as a proxy for complexity as we hunted hotspots. It makes sense to use the same metric and visualization here, since it allows you to compare hotspots against the knowledge map.

We collect the information with cloc:

 
prompt>​ cloc ./ --unix --by-file --csv --quiet --report-file=scala_lines.csv

Now we have the elements we need. We have the structure in scala_lines.csv and the presumed knowledge owners in scala_main_dev.csv. Let’s combine those with a unique color for each individual developer.

Specify the Color of Each Developer

We humans can distinguish between hundreds of thousands of different color variations. However, in your visualization, you want to keep a larger distinction between each color. Several tools can help you select good color schemes. (See, for example, ColorBrewer.[35])

The colors we use in Figure 1, Knowledge map showing the main developer (indicated by color) of each module, are specified as HTML5 color names.[36] Take a look in the visualization samples we downloaded from the Code Maat distribution site.[37] Inside that bundle, there’s a scala folder with a scala_author_colors.csv. That file specifies the mapping from author to color for the top contributors. Here’s a sample:

 
author,color
 
Martin Odersky,darkred
 
Adriaan Moors,orange
 
Paul Phillips,green
 
...

Now that we’ve assigned a color to each developer, we can put it all together.

Generate Your Own Map

As we discussed in Visualize Hotspots, the enclosure diagram is built on D3.js.[38] Since D3.js is data-driven, we need to serve it a JSON document that specifies the content to visualize.

That JSON document is generated by a Python script included on the Code Maat distribution page. Before you run it, just remember to specify your local path to the Python scripts:

 
prompt>​ python scripts/csv_main_dev_as_knowledge_json.py \
 
--structure scala_lines.csv --owners scala_main_dev.csv \
 
--authors scala_author_colors.csv > scala_knowledge_131231.json

Now you should have a scala_knowledge_131231.json file in your local directory. The JSON inside that file should be identical to the one that was used to create Figure 1, Knowledge map showing the main developer (indicated by color) of each module.

Once you have the JSON document, you can reuse the d3.js code that’s included in the Code Maat sample visualization of Scala. Just open the scala_knowledge.html file and replace the included scala_knowledge_131231.json with a reference to your own content. Explore, experiment, and automate from there.