This chapter covers
This chapter focuses on techniques to create data visualization using canvas drawing, sometimes paired with SVG, a technique typically used for large amounts of data. Because it would be impractical to include a few large datasets, we’ll also touch on how to create large amounts of sample data to test your code with. You’ll use several layouts that you saw earlier, such as the force-directed network layout from chapter 6 and the geospatial map from chapter 7, as well as the brush component from chapter 9, except this time you’ll use the brush component to select regions across the x- and y-axes.
This chapter touches on an exotic piece of functionality in D3: the quadtree (shown in figure 11.1). The quadtree is an advanced technique we’ll use to improve interactivity and performance. We’ll also look into the specifics of how to use canvas in tandem with SVG to get high performance and maintain the interactivity that SVG is so useful for.

We’ve worked with data throughout this book, but this time we’ll appreciably up the ante by trying to represent a thousand or more datapoints using maps, networks, and charts, which are significantly more resource-intensive than a circle pack chart, bar chart, or spreadsheet.
Fortunately, D3v4 introduced built-in functionality in D3 for drawing complex shapes with canvas. For this chapter, we’ll need to include a <canvas> element in our DOM, as shown in the following listing.
<!doctype html>
<html>
<head>
<title>Big Data Visualization</title>
<meta charset="utf-8" />
<link type="text/css" rel="stylesheet" href="bigdata.css" />
</head>
<body>
<div>
<canvas height="500" width="500"></canvas> 1
<div id="viz">
<svg></svg>
</div>
</div>
<footer>
<script src="d3.v4.min.js"></script>
</footer>
</body>
</html>
In the following listing we see how to make our <canvas> element line up with our <svg> element so that we can use canvas drawing as a background layer to any SVG elements we create.
body, html {
margin: 0;
}
canvas {
position: absolute;
width: 500px;
height: 500px; 1
}
svg {
position: absolute;
width:500px;
height:500px; 2
}
path.country {
fill: #C4B9AC;
stroke-width: 1;
stroke: #4F442B;
opacity: .5;
}
path.sample {
stroke: #41A368;
stroke-width: 1px;
fill: #93C464;
fill-opacity: .5;
}
line.link {
stroke-width: 1px;
stroke: #4F442B;
stroke-opacity: .5;
}
circle.node {
fill: #93C464;
stroke: #EBD8C1;
stroke-width: 1px;
}
circle.xy {
fill: #FCBC34;
stroke: #FE9922;
stroke-width: 1px;
}
Everything that comes out of d3-shape can be used to draw to canvas using the generator’s built-in .context() method. The way you interface with a canvas element is to register a context, which can be “2d”, “webgl”, “webgl2”, or “bitmaprenderer”. We’re only going to use “2d” in our examples in this chapter. Once you have that context, you can then use it to draw lines with commands similar to the SVG d attribute drawing instructions. With d3-shape generators, if you set a .context() of a generator, the function will no longer return an SVG d attribute drawing string, instead it will run commands to draw the shape on the canvas element. The following listing shows how to use this functionality to draw the violin plots from chapter 5, except this time using canvas drawing.
var fillScale = d3.scaleOrdinal().range(["#fcd88a", "#cf7c1c", "#93c464"])
var normal = d3.randomNormal()
var sampleData1 = d3.range(100).map(d => normal())
var sampleData2 = d3.range(100).map(d => normal())
var sampleData3 = d3.range(100).map(d => normal())
var data = [sampleData1, sampleData2, sampleData3]
var histoChart = d3.histogram();
histoChart
.domain([ -3, 3 ])
.thresholds([ -3, -2.5, -2, -1.5, -1,
-0.5, 0, 0.5, 1, 1.5, 2, 2.5, 3 ])
.value(d => d)
var yScale = d3.scaleLinear().domain([ -3, 3 ]).range([ 400, 0 ]) 1
var context = d3.select("canvas").node().getContext("2d") 2
area = d3.area()
.x0(d => -d.length)
.x1(d => d.length)
.y(d => yScale(d.x0))
.curve(d3.curveCatmullRom)
.context(context) 3
context.clearRect(0,0,500,500) 4
context.translate(0, 50)
data.forEach((d, i) => {
context.translate(100, 0) 5
context.strokeStyle = fillScale(i)
context.fillStyle = d3.hsl(fillScale(i)).darker()
context.lineWidth = "1px";
context.beginPath() 6
area(histoChart(d)) 7
context.stroke() 8
context.fill() 8
})
The results, seen in figure 11.2, are similar to what we saw in chapter 5.

When we look at canvas rendering there are a couple clear differences from SVG. First, you’re going to need to manually perform part of the behavior you’ve grown accustomed to having D3 handle for you. For one thing, you need to clear the canvas in between rendering if you’re going to do any kind of transitioning or animation. The other major difference is that when you draw to canvas, you have no kind of object to associate mouse events onto. There are still ways to register mouse events using bitmaps, such as using the color of the pixel clicked or translating the xy coordinate to back to whatever shape would occupy that space. The final difference is highlighted in figure 11.3, the pixelated rendering on canvas compared to that of SVG.

You can use this method to render any of your existing code that uses D3 generators from d3-shape, such as d3.arc for canvas pie charts or d3.area for canvas streamgraphs. From this point on, we’re going to focus on particular applications of canvas rendering, combining it with SVG rendering (known as mixed mode rendering) for interactivity, and using quadtrees to improve performance for large datasets.
In chapter 7, you had only 10 cities representing the entire globe. That’s not typical: when you’re working with geodata, you’ll often work with large datasets describing many complex shapes. In this section we’ll see how to create a map with many features. To get there, we’ll first learn how to generate some random geographic features (in this case, simple triangles) and then learn how to render those features using canvas. Then we’ll wire that all up with a smart implementation of d3-zoom to ensure that our users get the best mix of performance and functionality.
The first thing we need is a dataset with a thousand datapoints. Rather than using
data from a pregenerated file, we’ll invent it. One useful function available in D3 is d3.range(), which allows you to create an array of values. We’ll use d3.range() to create an array of a thousand values. We’ll then use that array to populate an array of objects with enough data to put on a network and on a map. Because we’re going to put this data on a map, we need to make sure it’s properly formatted geoJSON, as in the following listing, which uses the randomCoords() function to create triangles.
var sampleData = d3.range(1000).map(d => { 1
var datapoint = {}; 2
datapoint.id = "Sample Feature " + d;
datapoint.type = "Feature";
datapoint.properties = {};
datapoint.geometry = {};
datapoint.geometry.type = "Polygon";
datapoint.geometry.coordinates = randomCoords();
return datapoint;
});
function randomCoords() { 3
var randX = (Math.random() * 350) - 175;
var randY = (Math.random() * 170) - 85;
return [[[randX - 5,randY],[randX,randY - 5],
[randX - 10,randY - 5],[randX - 5,randY]]];
};
After we have this data, we can throw it on a map like the one we first created in chapter 7. In the following listing we use the world.geojson file from chapter 7 so that we have context for where the triangles are drawn.
d3.json("world.geojson", data => {createMap(data)});
function createMap(countries) {
var projection = d3.geoMercator()
.scale(100).translate([250,250]) 1
var geoPath = d3.geoPath().projection(projection);
var g = d3.select("svg").append("g");
g.selectAll("path.country")
.data(countries.features)
.enter()
.append("path")
.attr("d", geoPath)
.attr("class", "country");
g.selectAll("path.sample")
.data(sampleData)
.enter()
.append("path")
.attr("d", geoPath)
.attr("class", "sample");
};
Although our random triangles will obviously be in different places, our code should still produce something that looks like figure 11.4.

By the time you read this book, big data will probably sound as dated as Pentium II, Rich Internet Application, or Buffy Cosplay. Big data and all the excitement surrounding big data resulted from the broad availability of large datasets that were previously too large to handle. Often, big data is associated with exotic data stores like Hadoop or specialized techniques like GPU supercomputing (along with overpriced consultants).
But what constitutes big is in the eye of the beholder. In the domain of data visualization, the representation of big data doesn’t typically mean placing thousands (or millions or trillions) of individual datapoints onscreen at once. Rather, it tends to mean demographic, topological, and other traditional statistical analysis of these massive datasets. Counterintuitively, big data visualization often takes the form of pie charts and bar charts. But when you look at traditional practice with presenting data interactively—natively—in the browser, the size of the datasets you’re dealing with in this chapter can be considered big.
A thousand datapoints isn’t many, even on a small map like this. And in any browser that supports SVG, the data should be able to render quickly and provide you with the kind of functionality, such as mouseover and click events, that you may want from your data display. But if you add zoom controls, like you see in listing 11.6 (the same zooming we had in chapter 7), you might notice that the performance of the zooming and panning of the map isn’t so great. If you expect your users to be on mobile, optimization is still a good idea.
var mapZoom = d3.zoom()
.on("zoom", zoomed);
var zoomSettings = d3.zoomIdentity
.translate(250, 250)
.scale(100);
d3.select("svg").call(mapZoom).call(mapZoom.transform, zoomSettings);
function zoomed() {
var e = d3.event
projection.translate([e.transform.x, e.transform.y])
.scale(e.transform.k); 1
d3.selectAll("path.country, path.sample").attr("d", geoPath)
}
Now we can zoom into our map and pan around, as shown in figure 11.5. If you expect your users to be on browsers that handle SVG well, like Chrome or Safari, and you don’t expect to put more features on a map, you may not even need to worry about optimization.

Depending on when you execute this code, it might be that 1,000 features like this render fine. Change your d3.range() setting from 1,000 to 5,000 (or 10,000 or a billion if you’ve found this in the Classics section of your Earth Empire lending library) to see that with enough SVG elements, your browser starts to choke. It’s less about rendering the complex shapes than it is about managing all those DOM elements.
One way to optimize the rendering of so many elements is to use canvas instead of SVG. Instead of creating SVG elements using D3’s enter syntax, we use the built-in functionality in d3.geoPath to provide a context for canvas drawing. In the following listing, you can see how to use that built-in functionality with your existing dataset.
function createMap(countries) {
var projection = d3.geoMercator().scale(50).translate([150,100]);
var geoPath = d3.geoPath().projection(projection);
var mapZoom = d3.zoom()
.on("zoom", zoomed)
var zoomSettings = d3.zoomIdentity
.translate(250, 250)
.scale(100)
d3.select("svg").call(mapZoom).call(mapZoom.transform, zoomSettings)
function zoomed() {
var e = d3.event
projection.translate([e.transform.x, e.transform.y])
.scale(e.transform.k)
var context = d3.select("canvas").node().getContext("2d")
context.clearRect(0,0,500,500) 1
geoPath.context(context) 2
context.strokeStyle = "rgba(79,68,43,.5)" 3
context.fillStyle = "rgba(196,185,172,.5)" 3
context.fillOpacity = 0.5
context.lineWidth = "1px"
for (var x in countries.features) {
context.beginPath()
geoPath(countries.features[x]) 4
context.stroke()
context.fill()
}
context.strokeStyle = "#41A368"
context.fillStyle = "rgba(147,196,100,.5)";
context.lineWidth = "1px"
for (var x in sampleData) {
context.beginPath()
geoPath(sampleData[x]) 5
context.stroke()
context.fill()
}
}
}
You can see some key differences between listings 11.5 and 11.6. In contrast with SVG, where you can move elements around as well as redraw them, you always have to clear and redraw the canvas to update it. Although it seems this would be slower, performance increases on all browsers, particularly those that don’t have the best SVG performance, because you don’t need to manage hundreds or thousands of DOM elements. The graphical results, as seen in figure 11.6, demonstrate that it’s hard to see the difference between SVG and canvas rendering.

The drawback with using canvas is that you can’t easily provide the level of interactivity you may want for your data visualization. Typically, you draw your interactive elements with SVG and your large datasets with canvas. If we assume that the countries we’re drawing aren’t going to provide any interactivity, but the triangles will, we can render the triangles as SVG and render the countries as canvas using the code in listing 11.8. Combining these two methods of drawing means we need to create a layer cake of elements in our DOM, like you see in figure 11.7.

This requires that we initialize two versions of d3.geoPath—one for drawing SVG and one for drawing canvas—and then we use both in our zoomed function. This is shown in listing 11.8.
function createMap(countries) {
var projection = d3.geoMercator().scale(50).translate([150,100]);
var geoPath = d3.geoPath().projection(projection);
var svgPath = d3.geoPath().projection(projection); 1
d3.select("svg")
.selectAll("path.sample")
.data(sampleData)
.enter()
.append("path")
.attr("d", svgPath)
.attr("class", "sample")
.on("mouseover", function() {d3.select(this).style("fill", "#75739F")});
var mapZoom = d3.zoom()
.on("zoom", zoomed)
var zoomSettings = d3.zoomIdentity
.translate(250, 250)
.scale(100)
d3.select("svg").call(mapZoom).call(mapZoom.transform, zoomSettings)
function zoomed() {
var zoomEvent = d3.event
projection.translate([zoomEvent.transform.x, zoomEvent.transform.y])
.scale(zoomEvent.transform.k)
const featureOpacity = 0.5
var context = d3.select("canvas").node().getContext("2d");
context.clearRect(0,0,500,500);
geoPath.context(context);
context.strokeStyle = `rgba(79,68,43,${featureOpacity})`;
context.fillStyle = `rgba(196,185,172,${featureOpacity})`;
context.lineWidth = "1px";
countries.features.forEach(feature => {
context.beginPath();
geoPath(feature); 2
context.stroke()
context.fill();
})
d3.selectAll("path.sample").attr("d", svgPath); 3
}
}
This allows us to maintain interactivity, such as the mouseover function on our triangles to change any triangle’s color to pink when moused over. This approach maximizes performance by rendering any graphics that have no interactivity using canvas drawing instead of SVG. As shown in figure 11.8, the appearance produced using this method is virtually identical to that using canvas only or SVG only.

But what if you have massive numbers of elements and you do want interactivity on all of them, but you also want to give the user the ability to pan and drag? In that case, you have to embrace an extension of this mixed mode rendering. You render in canvas whenever users are interacting in such a way that they can’t interact with other elements—we need to render the triangles in canvas when the map is being zoomed and panned, but render them in SVG when the map isn’t in motion and the user is mousing over certain elements.
We can manage this by taking advantage of the start and end events from d3.zoom. These fire, as you may guess, when the zoom event begins and ends, respectively. The following listing shows how you’d initialize a zoom behavior with different functions for these different events.
...
mapZoom = d3.zoom()
.on("zoom", zoomed) 1
.on("start", zoomInitialized) 1
.on("end", zoomFinished); 1
...
This allows us to restore our canvas drawing code for triangles to the zoomed function and move the SVG rendering code out of the zoomed function and into a new zoomFinished function. We also need to hide the SVG triangles when zooming or panning starts by creating a zoomInitialized function that itself also fires the zoomed function (to draw the triangles we hid, but in canvas). Finally, our zoomFinished function also contains the canvas drawing code necessary to only draw the countries. The different drawing strategies based on zoom events are shown in table 11.1.
|
Zoom event |
Countries rendered as |
Triangles rendered as |
|---|---|---|
| zoomed | Canvas | Canvas |
| zoomInitialized | Canvas | Hide SVG |
| zoomFinished | Canvas | SVG |
As you can see in the following listing, this code is inefficient because there’s shared functionality between the zoom events that could be put in separate functions. But I wanted to be explicit about this functionality, because it’s a bit convoluted.
var canvasPath = d3.geoPath().projection(projection);
--- Other code ----
function zoomed() {
var e = d3.event
projection.translate([e.transform.x, e.transform.y])
.scale(e.transform.k)
var context = d3.select("canvas").node().getContext("2d");
context.clearRect(0,0,500,500);
canvasPath.context(context);
context.strokeStyle = "black";
context.fillStyle = "gray";
context.lineWidth = "1px";
for (var x in countries.features) {
context.beginPath();
canvasPath(countries.features[x]);
context.stroke()
context.fill();
}
context.strokeStyle = "black";
context.fillStyle = "rgba(255,0,0,.2)";
context.lineWidth = 1;
for (var x in sampleData) {
context.beginPath(); 1
canvasPath(sampleData[x]);
context.stroke()
context.fill();
}
};
function zoomInitialized() {
d3.selectAll("path.sample")
.style("display", "none"); 2
zoomed(); 3
};
function zoomFinished() {
var context = d3.select("canvas").node().getContext("2d");
context.clearRect(0,0,500,500);
canvasPath.context(context)
context.strokeStyle = "black";
context.fillStyle = "gray";
context.lineWidth = "1px";
for (var x in countries.features) {
context.beginPath();
canvasPath(countries.features[x]); 4
context.stroke()
context.fill();
}
d3.selectAll("path.sample")
.style("display", "block") 5
.attr("d", svgPath); 6
};
As a result of this new code, we have a map that uses canvas rendering when users zoom and pan, but SVG rendering when the map is fixed in place and users have the ability to click, mouse over, or otherwise interact with the graphical elements. It’s the best of both worlds. The only drawback of this approach is that we have to invest more time making sure our <canvas> element and our <svg> element line up perfectly, and that our opacity, fill colors, and so on are close enough matches that it’s not jarring to the user to see the different modes. I haven’t done this in the previous code, so that you can see that the two modes are in operation at the same time, and that’s reflected in the difference between the two graphical outputs in figure 11.9.

You’ll need to take the time to make sure it has pixel-perfect alignment—otherwise your users will notice and complain. And make sure you test it in every browser that you expect to support because there tend to be different assumptions of what default behavior should be for <canvas> or <svg> elements.
Finally, using canvas and SVG drawing simultaneously may present a difficulty. Say we want to draw a canvas layer over an SVG layer because we want the canvas layer to appear above some of our SVG elements visually but below other SVG elements, and we want interactivity on all of them. In that case we’d need to sandwich our canvas layer between our SVG layers and set the pointer-events style of our canvas layer, as shown back in figure 11.7. If you add further alternating layers of interactivity but with graphical placement above and below, then you can end up making a <canvas> and <svg> layer cake in your DOM that can be as hard to manage as it is to conceptualize.
It’s great that d3.geoPath has built-in functionality for drawing geodata to canvas, and it’s great that d3-shape generators do, too, but what about types of data visualization that use geometric primitives like lines, circles, and rectangles? One of the most performance-intensive layouts is the force-directed layout we dealt with in chapter 6. The layout calculates new positions for each node in your network at every tick. When I first started working with force-directed layouts in D3, I found that any network with more than 100 nodes was too slow to prove useful. Since then, browser performance has improved, and even thousand-node networks with SVG are performant. But it’s still a problem when we have larger networks with structure that would benefit from interactivity and animation.
In my own work, I’ve looked at how different small D3 applications hosted on gist.github.com share common D3 functions. D3 coders can understand how different information visualization methods use D3 functions commonly associated with other types of information visualization. You can explore this network along with how D3 Meetup users describe themselves at http://emeeks.github.io/introspect/block_block.html.
To explore these connections, I needed a method for dealing with over a thousand different examples and thousands of connections between them. You can see part of this network in figure 11.10. I wanted to show how this network changed based on a threshold of shared functions, and I also wanted to provide users with the capacity to click each example to get more details, so I couldn’t draw the network using canvas. Instead, I needed to draw the network using the same mixed-rendering method we looked at to draw all those triangles on a map. In this case I used canvas for the network edges and SVG for the network nodes because, as I note later, the rendering of the network links as SVG elements is the most expensive part of a force-directed network visualization.

Although D3 is suitable for building large, complex interactive applications, you often make a small, single-use interactive data visualization that can live on a single page with limited resources. For these small applications, it’s common in the D3 community to host the code on gist.github.com, which is the part of GitHub designed for small applications. If you host your D3 code as a gist, and it’s formatted to have an index.html, then you can use bl.ocks.org to share your work with others.
To make your gist work on bl.ocks.org, you need to have the data files and libraries hosted in the gist or accessible through it. Then you can take the alphanumeric identifier of your gist and append it to bl.ocks.org/username/ to serve a working copy for sharing. For instance, I have a gist at https://gist.github.com/emeeks/0a4d7cd56e027023bf78 that demonstrates how to do the mixed rendering of a force-directed layout like I described in this chapter. As a result, I can point people to http://bl.ocks.org/emeeks/0a4d7cd56e027023bf78, and they can see the code itself as well as the animated network in action.
Doing this kind of mixed rendering with networks isn’t as easy as it is with maps. That’s because there’s no built-in method to render regular data to canvas as with d3.geoPath. If you want to create a similar large network that combines canvas and SVG rendering, you have to build the function manually. First, though, you need data. This time, instead of sample geodata, we need to create sample network data.
Building sample network data is easy: you can create an array of nodes and an array of random links between those nodes. But building a sample network that’s not an undifferentiated mass is a little harder. In listing 11.11 you can see my slightly sophisticated network generator. It operates on the principle that a few nodes are popular and most nodes aren’t (we’ve known about this principle of networks since grade school). This does a decent job of creating a network with 3,000 nodes and 1,000 edges that doesn’t look quite like a giant hairball.
var linkScale = d3.scaleLinear()
.domain([0,.9,.95,1]).range([0,10,100,1000]); 1
var sampleNodes = d3.range(3000).map(d => {
var datapoint = {};
datapoint.id = `Sample Node ${d}`;
return datapoint;
})
var sampleLinks = [];
var y = 0;
while (y < 1000) {
var randomSource = Math.floor(Math.random() * 1000); 2
var randomTarget = Math.floor(linkScale(Math.random())); 3
var linkObject = {source: sampleNodes[randomSource], target:
sampleNodes[randomTarget]}
if (randomSource != randomTarget) { 4
sampleLinks.push(linkObject);
}
y++;
}
With this generator in place, we can instantiate our typical force-directed layout using the code in the following listing and create a few lines and circles with it.
var force = d3.forceSimulation()
.nodes(sampleNodes)
.force("x", d3.forceX(250).strength(1.1))
.force("y", d3.forceY(250).strength(1.1))
.force("charge", d3.forceManyBody())
.force("charge", d3.forceManyBody())
.force("link", d3.forceLink())
.on("tick", forceTick) 1
force.force("link").links(sampleLinks)
d3.select("svg")
.selectAll("line.link")
.data(sampleLinks)
.enter()
.append("line")
.attr("class", "link");
d3.select("svg").selectAll("circle.node")
.data(sampleNodes)
.enter()
.append("circle")
.attr("r", 3)
.attr("class", "node");
function forceTick() {
d3.selectAll("line.link")
.attr("x1", d =>d.source.x) 2
.attr("y1", d =>d.source.y)
.attr("x2", d =>d.target.x)
.attr("y2", d =>d.target.y);
d3.selectAll("circle.node")
.attr("cx", d =>d.x)
.attr("cy", d =>d.y);
};
This code should be familiar to you if you’ve read chapter 6. Generation of random networks is a complex and well-described practice. This random generator isn’t going to win any awards, but it does produce a recognizable structure. Typical results are shown in figure 11.11. What’s lost in the static image is the slow and jerky rendering, even on a fast computer using a browser that handles SVG well.

When I first started working with these networks, I thought the main cause of slowdown was calculating the myriad positions for each node on every tick. After all, node position is based on a simulation of competing forces caused by nodes pushing and edges pulling, and something like this, with thousands of components, seems heavy duty. That’s not what’s taxing the browser in this case, though. Instead, it’s the management of so many DOM elements. You can get rid of many of those DOM elements by replacing the SVG lines with canvas lines. Let’s change our code as shown in the following listing so that it doesn’t create any SVG <line> elements for the links and instead modify our forceTick function to draw those links with canvas.
function forceTick() {
var context = d3.select("canvas").node()
.getContext("2d");
context.clearRect(0,0,500,500); 1
context.lineWidth = 1;
context.strokeStyle = "rgba(0, 0, 0, 0.5)"; 2
sampleLinks.forEach(function (link) {
context.beginPath();
context.moveTo(link.source.x,link.source.y) 3
context.lineTo(link.target.x,link.target.y) 4
context.stroke();
});
d3.selectAll("circle.node") 5
.attr("cx", d =>d.x)
.attr("cy", d =>d.y)
};
The rendering of the network is similar in appearance, as you can see in figure 11.12, but the performance improves dramatically. Using canvas, I can draw 10,000-link networks with performance high enough to have animation and interactivity. The canvas drawing code can be a bit cumbersome (it’s like the old LOGO drawing code), but the performance makes it more than worth it.

We could use the same method as with the earlier maps to use canvas during animated periods and SVG when the network is fixed. But we’ll move on and look at another method for dealing with large amounts of data: quadtrees.
When you’re working with a large dataset, one issue is optimizing search and selection of elements in a region. Let’s say you’re working with a set of data with xy coordinates (anything that’s laid out on a plane or screen). You’ve seen enough examples in this book to know that this may be a scatterplot, points on a map, or any of a number of different graphical representations of data. When you have data like this, you often want to know what datapoints fall in a particular selected region. This is referred to as spatial search (and notice that spatial in this case doesn’t refer to geographic space but rather space in a more generic sense). The quadtree functionality is a spatial version of d3.nest, which we used in chapters 5 and 8 to create hierarchical data. Following the theme of this chapter, we’ll get started by creating a big dataset of random points and render them in SVG.
Our third random data generator doesn’t require nearly as much work as the first two did. In the following listing, all we do is create 3,000 points with random x and y coordinates.
sampleData = d3.range(3000).map(function(d) {
var datapoint = {};
datapoint.id = `Sample Node ${d}`;
datapoint.x = Math.random() * 500;
datapoint.y = Math.random() * 500; 1
return datapoint;
})
d3.select("svg").selectAll("circle")
.data(sampleData)
.enter()
.append("circle")
.attr("class", "xy")
.attr("r", 3)
.attr("cx", d => d.x)
.attr("cy", d => d.y)
As you may expect, the result of this code, shown in figure 11.13, is a bunch of orange circles scattered randomly all over our canvas.

Now we’ll create a brush to select some of these points. Recall when we used a brush in chapter 9 that we only allowed brushing along the x-axis. This time, we allow brushing along both x- and y-axes. Then we can drag a rectangle over any part of the canvas. In the following listing, you can see how quick and easy it is to add a brush to our canvas. We’ll also add a function to highlight any circles in the brushed region.
var brush = d3.brush() 1
.extent([[0,0],[500,500]])
.on("brush", brushed)
d3.select("svg").call(brush)
function brushed() {
var e = d3.event.selection
d3.selectAll("circle")
.style("fill", d => {
if (d.x >= e[0][0] && d.x <= e[1][0]
&& d.y >= e[0][1] && d.y <= e[1][1]) 2
{
return "#FE9922" 3
}
else {
return "#EBD8C1" 4
}
})
}
With this brushing code, we can now see the circles in the brushed region, as shown in figure 11.14.

This works, but it’s terribly inefficient. It checks every point on the canvas without using any mechanism to ignore points that might be well outside the selection area. Finding points within a prescribed area is an old problem that has been well explored. One of the tools available to solve that problem quickly and easily is a quadtree. You may ask, what is a quadtree and what should I use it for?
A quadtree is a method for optimizing spatial search by dividing a plane into a series of quadrants. You then divide each of those quadrants into quadrants, until every point on that plane falls in its own quadrant. By dividing the xy plane like this, you nest the points you’ll be searching in such a way that you can easily ignore entire quadrants of data without testing the entire dataset.
Another way to explain a quadtree is to show it. That’s what this information visualization stuff is for, right? Figure 11.15 shows the quadrants that a quadtree produces based on a set of point data.

Creating a quadtree with xy data of the kind we have in our dataset is easy, as you can see in the following listing. We set the x and y accessors like we do with layouts and other D3 functions.
var quadtree = d3.quadtree()
.extent([[0,0], [500,500]]) 1
var quadIndex = quadtree(sampleData, d => d.x, d => d.y); 2
After you create a quadtree and use it to create a quadtree index dataset like we did with quadIndex, you can use that dataset’s .visit() function for quadtree-optimized searching. The .visit() functionality replaces your test in a new brush function, as shown in listing 11.17. First, I’ll show you how to make it work in listing 11.17. Then I’ll show you that it does work in figure 11.16, and I’ll explain how it works in detail. This isn’t the usual order of things, I realize, but with a quadtree, it makes more sense if you see the code before analyzing its exact functionality.

function brushed() {
var e = d3.event.selection
d3.selectAll("circle")
.style("fill", "#EBD8C1")
.each(d => {d.selected = false}) 1
quadIndex.visit(function(node,x0,y0,x1,y1) {
if (node.data) { 2
if (node.data.x >= e[0][0] && node.data.x <= e[1][0] 3
&&node.data.y >= e[0][1] && node.data.y <= e[1][1]) {
node.data.selected = true;
}
}
return x0 > e[1][0] || y0 > e[1][1] || x1 < e[0][0] || y1 < e[0][1] 4
})
d3.selectAll("circle")
.filter(d => d.selected)
.style("fill", "#FE9922") 5
}
The results are impressive and much faster. In figure 11.16, I increased the number of points to 10,000 and still got good performance. (But if you’re dealing with datasets that large, I recommend switching to canvas because forcing the browser to manage all those SVG elements is going to slow things down.) And even a cursory examination of the code reveals several spots where you could improve performance.
How does it work? When you run the visit function, you get access to each node in the quadtree, from the most generalized to the more specific. With each node that we access in listing 11.16 as node, you also get the bounds of that node (x1, y1, x2, y2). Because nodes in a quadtree can either be the bounding areas or the points that generated the quadtree, you have to test whether the node is a point—if it is, you can then test whether it’s in your brush bounds like we did in our earlier example. The final piece of the visit function is where it gets its power, but it’s also the most difficult to follow, as you can see in figure 11.17.

The visit function looks at every node in a quadtree—unless visit returns true, in which case it stops searching that quadrant and all its child nodes. Test to see whether the node you’re looking at (represented as the bounds x1,y1,x2,y2) is entirely outside the bounds of your selection area (represented as the bounds e[0][0], e[0][1], e[1][0], e[1][1]). You create this test to see whether the top of the selection is below the bottom of the node’s bounds, whether the bottom of the selection is above the top of the node’s bounds, whether the left side of the selection is to the right of the right side of the node’s bounds, or whether the right side of the selection is to the left of the left side of the node’s bounds. That may seem a bit hard to follow (and sure takes up more room as a sentence than it does as a piece of code), but that’s how it works.
You can use that visit function to do more than optimized search. I’ve used it to cluster nearby points on a map (http://bl.ocks.org/emeeks/066e20c1ce5008f884eb) and also to draw the bounds of the quadtree in figure 11.15.
You can improve the performance of the data visualization of large datasets in many other ways. Here are three that should give you immediate returns: avoid general opacity, avoid general selections, and precalculate positions.
Whenever possible, use fill-opacity and stroke-opacity or RGBA color references rather than the element opacity style. General element opacity—the kind of setting you get when you use style: opacity—can slow down rendering. When you use specific fill or stroke opacity, it forces you to pay more attention to where and how you’re using opacity.
So instead of
d3.selectAll(elements).style("fill", "red").style("opacity", .5)
do this:
d3.selectAll(elements).style("fill", "red").style("fill-opacity", .5)
Although it’s convenient to select all elements and apply conditional behavior across those elements, you should try to use selection.filter with your selections to reduce the number of calls to the DOM. If you look back at the code in listing 11.16, you’ll see this general selection that clears the selected attribute for all the circles and sets the fill of all the circles to orange:
d3.selectAll("circle")
.style("fill", "#FE9922")
.each(d => {d.selected = false})
Instead, clear the attribute and set the fill color of only those circles that are currently set to the selection. This limits the number of costly DOM calls:
d3.selectAll("circle")
.filter(d => d.selected})
.style("fill", "#FE9922")
.each(d => {d.selected = false})
If you adjust the code in that example, the performance is further improved. Remember that manipulating DOM elements, even if it’s changing a setting like fill, can cause the greatest performance hit.
You can also precalculate positions and then apply transitions. If you have a complex algorithm that determines an element’s new position, first go through the data array and calculate the new position. Then append the new position as data to the datapoint of the element. After you’ve done all your calculations, select and apply a transition based on the calculated new position. When you’re calculating complex new positions and applying those calculated positions to a transition of a large selection of elements, you can overwhelm the browser and see jerky animations.
So, instead of
d3.selectAll(elements)
.transition()
.duration(1000)
.attr("x", newComplexPosition);
do this:
d3.selectAll(elements)
.each(function(d) {d.newX = newComplexPosition(d)});
d3.selectAll(elements)
.transition()
.duration(1000)
.attr("x", d => d.newX);
If you want to grow your D3 skill set, I’d suggest starting with the D3 Slack channel (d3js.slack.com), which has over a thousand members talking about every aspect of the library. I’d also look at bl.ocksplorer (http://bl.ocksplorer.org), which allows you to find examples of D3 code based on specific D3 functions. You should also check out the work of Mike Bostock (http://bl.ocks.org/mbostock) to see examples of the latest D3 functionality. D3 has an active Google Group (https://groups.google.com/forum/#!forum/d3-js), if you’re interested in discussing the internals of the library, and there are many popular Meetup groups, like the Bay Area D3 User Group (www.meetup.com/Bay-Area-d3-User-Group/). I find the best place to keep up with D3 is on Twitter, where you can see examples posted with the hashtag #d3js and examples of when things don’t quite go right (but are still beautiful) with the hashtag #d3brokeandmadeart.
As you look around the examples of D3, you’ll see one thing in particular: that despite spending so much time and ink on this library, I still haven’t touched on everything in its core functionality, much less the numerous plugins people have used to build on it. Data visualization is one of the most exciting fields right now, and you can be part of pushing that field forward, even if you’re only now getting into it. Though there are other ways to approach data visualization, D3 is still the most robust and well established. I hope this book has provided you with the tools necessary to go out and make impactful data visualization.
Progressive Rendering
Rendering data on a large canvas can take some time and noticeably freeze the UI until it’s done. One trick to free the UI is called progressive rendering: chunking the rendering in small batches, giving the thread back to the UI between each batch. For the streaming charts I developed for Boundary’s Firespray and raster maps at Planet OS maps, I use a tool called Render-slicer, which uses requestAnimationFrame for slicing in a loop as fast as the browser can handle. On a slow network or on a large dataset, the browser is still free, but we can see the drawing happening. I don’t mind—it even looks like an animation feature.


The same effect can be done by chunking data transfer, streaming the data instead of locking the UI with a large request, and drawing each chunk on the canvas line by line on reception. That way, the chart can start rendering the most important data points almost instantly, and the data chunk can be discarded to free up memory as soon as it’s graphed. I like Papa Parse for streaming and parsing data, which can also use Web Workers for even greater performance.