Version-control data is our software-development equivalent to spatial movement in geographical profiling. A version-control system—such as Subversion, Git, or Mercurial—records the steps each developer took. I focus on Git in my examples, but you can find an overview of how to mine data from other systems in the Code Maat documentation.[8]
The first codebase we’ll study is Code Maat. That’s right—we’ll use our analysis tool to analyze the tool itself. I’m a Lisp programmer—we love circular stuff like that.
To follow along with the examples, you need to clone the Code Maat repository[9] so that you have the complete source tree on your computer:
| | prompt> git clone https://github.com/adamtornhill/code-maat.git |
| | Cloning into 'code-maat'... |
| | remote: Reusing existing pack: 1092, done. |
| | remote: Total 1092 (delta 0), reused 0 (delta 0) |
| | Receiving objects: 100% (1092/1092), 365.75 KiB | 271.00 KiB/s, done. |
| | Resolving deltas: 100% (537/537), done. |
| | Checking connectivity... done. |
| | prompt> |
Once the clone command completes, you’ll find a local code-maat directory with the source code. Move into that directory:
| | prompt>cd code-maat |
Code Maat is still under development, and some of the worst issues we spot here will probably be fixed by the time you read this. So we are going to pretend it’s still 2013. That’s fine—the digital world lets us easily travel back to less gracious times:
| | prompt> git checkout `git rev-list -n 1 --before="2013-11-01" master` |
| | ... |
| | HEAD is now at d804759... Documented tree map visualizations |
The git command is a bit tricky because it does two things: it fetches the revision on the specified date and checks out that revision. Other version-control systems provide similar rollback mechanisms.
Your local copy of Code Maat should now look as the code did back in 2013.
Git lets us inspect historical commits by its log command. To get the level of detail we need, we use the --numstat flag:
| | prompt> git log --numstat |
This command will output a detailed log, as shown in the figure.

The sample output contains a lot of valuable information. In the following chapters, we’ll get several opportunities to inspect it in depth. For now, we’ll limit the analysis to the changed modules. We see that the oldest commit involved changes to six files, while the next one only modified churn.clj.
Verify Your Intuitions | |
|---|---|
|
|
Human intuition is wonderful for making quick decisions. The quality of those decisions, however, is not always wonderful. Expert intuition can lead to high-quality decisions. The problem is that we don’t know up front whether this time is one of those expert intuitions. Intuition is an automatic, unconscious process, and like all automatic mental processes, it’s sensitive to cognitive and social biases. Factors in your surroundings or in the specific situation can influence your judgment. Most of the time you aren’t aware of that influence. (You’ll see some examples in Chapter 11, Norms, Groups, and False Serial Killers.) For example, we may be notoriously bad at evaluating past decisions due to hindsight bias. That’s why it’s important to verify your intuitive ideas with supporting data. From a practical perspective, we need guiding techniques like the ones presented here because intuition doesn’t scale—especially not in complex areas, such as a large-scale software project that is constantly changing. |