The Scala codebase is fairly large, with approximately 300,000 lines of code. The project flourishes with developer activity; over the past two years, more than 150 developers have contributed to the project. That scale of development makes it virtually impossible for any single developer to keep it all in his or her head. So, let’s put our map together to guide us.
Start by cloning Scala’s git repository:
| | prompt> git clone https://github.com/scala/scala.git |
| | Cloning into 'scala'... |
| | ... |
To get reproducible results, we need to go back in time to when this book was written. We do that with the piece of git magic we learned in Turn Back Time. But we need to be careful. Because Scala uses different branches, we need to know where we are before we travel in time. We do that with the git status:
| | prompt> git status |
| | On branch 2.11.x |
| | Your branch is up-to-date with 'origin/2.11.x'. |
You use the name of the branch—in this case, origin/2.11.x—as the final argument to the command that rolls back the codebase:
| | prompt> git checkout `git rev-list -n 1 --before="2013-12-31" origin/2.11.x` |
| | ... |
| | HEAD is now at 969a269.. |
Now your local Scala repository should look just like it did at the end of 2013, and you’re ready to analyze.
Because we want to use the information to find the right people to talk to, we have to identify the developers who know the different parts of the system. This is a similar problem to the one we solved back in Evaluate Communication Costs, where we used a main developer analysis to identify the top contributor to each module.
We start by generating a version-control log:
| | prompt> git log --pretty=format:'[%h] %an %ad %s' --date=short \ |
| | --numstat --before=2013-12-31 --after=2011-12-31 > scala_evo.log |
The command limits our analysis period to the last two years. That’s because knowledge is fragile and dissolves over time; if we haven’t touched a piece of code for a couple of years, it’s unlikely that we remember much about it. In addition, code that’s been stable for that long is rarely the focus of our daily activities. But remember that these time periods are all heuristics that you may have to fine-tune in your own projects.
Now that we have a version-control log, let’s see what happens when we aggregate all that information to identify the main developers:
| | prompt> maat -c git -l scala_evo.log -a main-dev > scala_main_dev.csv |
The command saves the analysis results to the file scala_main_dev.csv for further processing. If you look inside it, you should see the main developer of each module:
| | entity,main-dev,added,total-added,ownership |
| | .. |
| | GenICode.scala,Paul Phillips,584,1579,0.37 |
| | ICodeCheckers.scala,Jason Zaugg,19,44,0.43 |
| | ICodes.scala,Grzegorz Kossakowski,16,32,0.5 |
| | ... |
The main developer information serves as the basis of our knowledge map. We just need to project the information on the geography of our system. Let’s do it.
Back in Chapter 3, Creating an Offender Profile, we used lines of code as a proxy for complexity as we hunted hotspots. It makes sense to use the same metric and visualization here, since it allows you to compare hotspots against the knowledge map.
We collect the information with cloc:
| | prompt> cloc ./ --unix --by-file --csv --quiet --report-file=scala_lines.csv |
Now we have the elements we need. We have the structure in scala_lines.csv and the presumed knowledge owners in scala_main_dev.csv. Let’s combine those with a unique color for each individual developer.
We humans can distinguish between hundreds of thousands of different color variations. However, in your visualization, you want to keep a larger distinction between each color. Several tools can help you select good color schemes. (See, for example, ColorBrewer.[35])
The colors we use in Figure 1, Knowledge map showing the main developer (indicated by color) of each module, are specified as HTML5 color names.[36] Take a look in the visualization samples we downloaded from the Code Maat distribution site.[37] Inside that bundle, there’s a scala folder with a scala_author_colors.csv. That file specifies the mapping from author to color for the top contributors. Here’s a sample:
| | author,color |
| | Martin Odersky,darkred |
| | Adriaan Moors,orange |
| | Paul Phillips,green |
| | ... |
Now that we’ve assigned a color to each developer, we can put it all together.
As we discussed in Visualize Hotspots, the enclosure diagram is built on D3.js.[38] Since D3.js is data-driven, we need to serve it a JSON document that specifies the content to visualize.
That JSON document is generated by a Python script included on the Code Maat distribution page. Before you run it, just remember to specify your local path to the Python scripts:
| | prompt> python scripts/csv_main_dev_as_knowledge_json.py \ |
| | --structure scala_lines.csv --owners scala_main_dev.csv \ |
| | --authors scala_author_colors.csv > scala_knowledge_131231.json |
Now you should have a scala_knowledge_131231.json file in your local directory. The JSON inside that file should be identical to the one that was used to create Figure 1, Knowledge map showing the main developer (indicated by color) of each module.
Once you have the JSON document, you can reuse the d3.js code that’s included in the Code Maat sample visualization of Scala. Just open the scala_knowledge.html file and replace the included scala_knowledge_131231.json with a reference to your own content. Explore, experiment, and automate from there.