Table of Contents for
Your Code as a Crime Scene

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Your Code as a Crime Scene by Adam Tornhill Published by Pragmatic Bookshelf, 2015
  1. Title Page
  2. Your Code as a Crime Scene
  3. Your Code as a Crime Scene
  4. For the Best Reading Experience...
  5. Table of Contents
  6. Early praise for Your Code as a Crime Scene
  7. Foreword by Michael Feathers
  8. Acknowledgments
  9. Chapter 1: Welcome!
  10. About This Book
  11. Optimize for Understanding
  12. How to Read This Book
  13. Toward a New Approach
  14. Get Your Investigative Tools
  15. Part 1: Evolving Software
  16. Chapter 2: Code as a Crime Scene
  17. Meet the Problems of Scale
  18. Get a Crash Course in Offender Profiling
  19. Profiling the Ripper
  20. Apply Geographical Offender Profiling to Code
  21. Learn from the Spatial Movement of Programmers
  22. Find Your Own Hotspots
  23. Chapter 3: Creating an Offender Profile
  24. Mining Evolutionary Data
  25. Automated Mining with Code Maat
  26. Add the Complexity Dimension
  27. Merge Complexity and Effort
  28. Limitations of the Hotspot Criteria
  29. Use Hotspots as a Guide
  30. Dig Deeper
  31. Chapter 4: Analyze Hotspots in Large-Scale Systems
  32. Analyze a Large Codebase
  33. Visualize Hotspots
  34. Explore the Visualization
  35. Study the Distribution of Hotspots
  36. Differentiate Between True Problems and False Positives
  37. Chapter 5: Judge Hotspots with the Power of Names
  38. Know the Cognitive Advantages of Good Names
  39. Investigate a Hotspot by Its Name
  40. Understand the Limitations of Heuristics
  41. Chapter 6: Calculate Complexity Trends from Your Code’s Shape
  42. Complexity by the Visual Shape of Programs
  43. Learn About the Negative Space in Code
  44. Analyze Complexity Trends in Hotspots
  45. Evaluate the Growth Patterns
  46. From Individual Hotspots to Architectures
  47. Part 2: Dissect Your Architecture
  48. Chapter 7: Treat Your Code As a Cooperative Witness
  49. Know How Your Brain Deceives You
  50. Learn the Modus Operandi of a Code Change
  51. Use Temporal Coupling to Reduce Bias
  52. Prepare to Analyze Temporal Coupling
  53. Chapter 8: Detect Architectural Decay
  54. Support Your Redesigns with Data
  55. Analyze Temporal Coupling
  56. Catch Architectural Decay
  57. React to Structural Trends
  58. Scale to System Architectures
  59. Chapter 9: Build a Safety Net for Your Architecture
  60. Know What’s in an Architecture
  61. Analyze the Evolution on a System Level
  62. Differentiate Between the Level of Tests
  63. Create a Safety Net for Your Automated Tests
  64. Know the Costs of Automation Gone Wrong
  65. Chapter 10: Use Beauty as a Guiding Principle
  66. Learn Why Attractiveness Matters
  67. Write Beautiful Code
  68. Avoid Surprises in Your Architecture
  69. Analyze Layered Architectures
  70. Find Surprising Change Patterns
  71. Expand Your Analyses
  72. Part 3: Master the Social Aspects of Code
  73. Chapter 11: Norms, Groups, and False Serial Killers
  74. Learn Why the Right People Don’t Speak Up
  75. Understand Pluralistic Ignorance
  76. Witness Groupthink in Action
  77. Discover Your Team’s Modus Operandi
  78. Mine Organizational Metrics from Code
  79. Chapter 12: Discover Organizational Metrics in Your Codebase
  80. Let’s Work in the Communication Business
  81. Find the Social Problems of Scale
  82. Measure Temporal Coupling over Organizational Boundaries
  83. Evaluate Communication Costs
  84. Take It Step by Step
  85. Chapter 13: Build a Knowledge Map of Your System
  86. Know Your Knowledge Distribution
  87. Grow Your Mental Maps
  88. Investigate Knowledge in the Scala Repository
  89. Visualize Knowledge Loss
  90. Get More Details with Code Churn
  91. Chapter 14: Dive Deeper with Code Churn
  92. Cure the Disease, Not the Symptoms
  93. Discover Your Process Loss from Code
  94. Investigate the Disposal Sites of Killers and Code
  95. Predict Defects
  96. Time to Move On
  97. Chapter 15: Toward the Future
  98. Let Your Questions Guide Your Analysis
  99. Take Other Approaches
  100. Let’s Look into the Future
  101. Write to Evolve
  102. Appendix 1: Refactoring Hotspots
  103. Refactor Guided by Names
  104. Bibliography
  105. You May Be Interested In…

Analyze a Large Codebase

When you start with a new project, how do you know which parts need extra attention? That kind of expertise takes time to build. You need to read a lot of code, talk to more experienced developers, and start small with your own changes. There’s no way around this.

At the same time, it’s important that you get a quick overview of where potential problems may be hiding. Those problems will influence how you approach design. If you have to add a feature in the middle of the worst spot, you want to know about it so that you can plan countermeasures, such as writing extra tests and setting aside time to refactor the code. You may decide to come up with a different design altogether.

A hotspot analysis gives you an overview of the good as well as the fragile areas of the codebase. The best part is that you get all that information faster than a CSI agent can hack together a Visual Basic GUI to track an IP address in real time.

As an example of in a large-scale system, let’s investigate Hibernate[13]—a popular open-source Java library for object-relational mapping. We’re using Hibernate because it’s well known, has a rich history, and is under active development. If you’ve worked with a database in the Java ecosystem, chances are you’ve come across Hibernate.

Clone the Hibernate Repository

To get started, let’s clone Hibernate’s Git repository to your computer:

 
prompt>​ git clone https://github.com/hibernate/hibernate-orm.git
 
Cloning into 'hibernate-orm'...
 
...​
 
Receiving objects: 100% (210129/210129), 127.83 MiB | 1.99 MiB/s, done.
 
Resolving deltas: 100% (118283/118283), done.
 
Checking connectivity... done.

Because Hibernate is under active development, we know things may have changed since I wrote this book. So let’s roll back the code, as we learned in chapter Turn Back Time, so that we all start with the same code:

 
prompt>​ git checkout `git rev-list -n 1 --before="2013-09-05" master`
 
Note: checking out '46c962e9b04a883e03137962b0bdb71fdcfa0c4e'.
 
...​
 
HEAD is now at 46c962e... HHH-8468 cleanup and simplification

Now the Hibernate code on your computer looks as it did back in September of 2013. Let’s generate a log, as we did in Automated Mining with Code Maat.

Generate a Version-Control Log

We are going to limit our analysis to code changes made in the last year and a half. Here’s how you specify that:

 
prompt>​ git log --pretty=format:'[%h] %an %ad %s' --date=short \
 
--numstat --before=2013-09-05 --after=2012-01-01 > hib_evo.log

This generates a detailed hib_evo.log we can use with Code Maat. Let’s explore the generated data:

 
prompt>​ maat -l hib_evo.log -c git -a summary
 
statistic,value
 
number-of-commits,1346
 
number-of-entities,10193
 
number-of-entities-changed,18258
 
number-of-authors,89

As you can see, there’s been plenty of development activity over the last year and a half. Remember how we said earlier, in Analyze a Large Codebase, that finding hotspots makes it easier to get started with a new project? This is a good example: you’re starting out with Hibernate and are faced with 400,000 lines of unfamiliar code. Talking to the 89 different developers who’ve contributed to the project over the past year and a half is impractical (particularly since some of them may have left the project).

Follow along, and you’ll see how a hotspot analysis can guide you through unfamiliar code territory.

Choose a Timespan for Your Analyses

First of all, it’s important to limit the data you are analyzing to a shorter time period than the project’s total lifetime. If you include too much historic data in the analysis, you skew the results and obscure important recent trends. You also risk flagging hotspots that no longer exist.

One approach is to include time in your analysis, by weighing individual measures by their relative age. The challenge if you choose that route is how to set up the algorithm. We go with an alternative approach in this book, which is to limit the period of time we look at. It’s a more general approach, but it requires you to be familiar with the development history.

To select an appropriate analysis period, you have to know how you work on the project. You have to know the methodology you’re using and the length of your release cycles. The period also depends on the questions you want answered. On my projects I choose the following timeframes:

  • Between releases: Compare hotspots between releases to evaluate your long-term improvements.

  • Over iterations: If you work iteratively, measure between each iteration. This lets you spot code that starts to grow into hotspots early.

  • Around significant events: Define the temporal period around significant events, such as reorganizations of code or personnel. When you make large redesigns or change the way you work, it will reflect in the code. With this analysis method, you have a way to investigate both impact and outcome.

Start with a Long Period

images/aside-icons/tip.png

As you start with your first analysis, go with a longer period, such as one or two years of historic data. That lets you explore the system and spot long-term trends. On projects with high development activity, select a shorter initial period, perhaps as little as one month.