List of Figures
Chapter 1. An introduction to D3.js
Figure 1.1. A map of how to approach data visualization with D3.js that highlights the approach in this book. Start at the
top with data and then follow the path depending on the type of data and the needs you’re addressing.
Figure 1.2. D3 can be used for simple charts, such as this donut chart (explained in chapter 5).
Figure 1.3. D3 can also be used to create web maps (see chapter 8), such as this map showing the ethnic makeup of major metropolitan
areas in the United States.
Figure 1.4. Maps in D3 aren’t limited to traditional Mercator web maps. They can be interactive globes, like this map of undersea
communication cables, or other, more
Figure 1.5. D3 also provides robust capacities to create interactive network visualizations (see chapter 7). Here you see
the social and coauthorship network of archaeologists working at the same dig for nearly 25 years.
Figure 1.6. D3 includes a library of common data visualization layouts, such as the dendrogram (explained in chapter 6), that
let you represent data, such as this word tree.
Figure 1.7. D3 has SVG and canvas drawing functions (see chapter 4) so you can create your own custom visualizations, such
as this representation of musical scores.
Figure 1.8. You can combine these layouts and functions to create a data dashboard like we’ll do in chapter 9. You can also
use the drawing functions to make your bar charts look distinctive, such as this “sketchy” style.
Figure 1.9. An application created with D3 can use selections and data binding over and over again, together and separately,
to update the content of the data visualization based on interaction.
Figure 1.10. Before GIFs were weaponized to share cute animal behavior, they were your only hope for animated data visualization
on the web. Few examples from the 1990s like dpgraph.com still exist, but this page has more than enough GIFs to remind us
of their dangers.
Figure 1.11. The developer tools in Chrome place the JavaScript console on the rightmost tab, labeled Console, with the element
inspector available using the arrow in a rectangle (circled above) on the top left or by browsing the DOM in the Elements
tab.
Figure 1.12. You can run JavaScript code in the console and call global variables or declare new ones as necessary. Any code
you write in the console and changes made to the web page are lost as soon as you reload the page.
Figure 1.13. Rather than adding or modifying individual styles and attributes, you can rewrite the HTML code as you would
in a text editor. As with any changes, these only last until you reload the page.
Figure 1.14. Changing the content of a DOM element is as simple as adding text between the opening and ending brackets of
the element.
Figure 1.15. The page is updated as soon as you finish making your changes. Writing HTML manually in this way is only useful
for planning how you might want to dynamically update the content.
Figure 1.16. The D3 select syntax modifies style using the .style() function, and traditional HTML content using the .html()
function.
Figure 1.17. The commands to draw an SVG path (right) and the resulting graphic (left)
Figure 1.18. Inspecting the DOM of a web page with an SVG canvas reveals the nested graphical elements as well as the style
and attributes that determine their position. Notice that the circle and rectangle exist as child elements of a group.
Figure 1.19. Modifying the height and width attributes of a <rect> element changes the appearance of that element. Inspecting
the element also shows how the stroke adds to the computed size of the element.
Figure 1.20. The same 25 x 25 <rect> with no, 1-px, 2-px, 3-px, 4-px, and 5-px strokes. Though these are drawn on a retina
screen using half-pixels, the second and third report the same width and height (27 px x 27 px) as the fourth and fifth (29
px x 29 px).
Figure 1.21. All SVG elements can be affected by the transform attribute, but this is particularly salient when working with
<g> elements, which require this approach to adjust their position. The child elements are drawn by using the position of
their parent <g> as their relative 0,0 position. The scale() setting in the transform attribute then affects the scale of
any of the size and position attributes of the child elements.
Figure 1.22. Each path shown here uses the same coordinates in its d attribute, with the only differences between them being
the presence or absence of the letter Z at the end of the text string defining the d attribute, the settings for fill and
stroke, and the position via the transform attribute.
Figure 1.23. Examining an SVG rectangle in the console shows that it inherits its fill style from the CSS style applied to
<rect> types and its stroke style from the .active class.
Figure 1.24. The SVG circle has its fill value set by its type in the style sheet, with its opacity set by its membership
in the .tentative class and its stroke set by its membership in the .active class. Notice that the stroke settings from the
.tentative class are overwritten by the stroke settings in the later declared .active class.
Figure 1.25. An SVG circle with fill style determined by its type and its opacity and stroke settings determined by its membership
in the tentative class
Figure 1.26. By binding an array of four values to a selection of <div> elements on the page, the .enter() function created
three new <div> elements to reflect the size mismatch between the data array and the selection.
Figure 1.27. Inspecting the DOM shows that the new <div> elements have been created with unformatted content followed by the
child <span> element with style and content set by your code.
Figure 1.28. Running JavaScript in the console allows you to test your code. Here you’ve created a new array called smallerNumbers
that consists of only three values, which you can then use as your data in a selection to update and create new elements.
Figure 1.29. Nested data represents parent/child relationships of objects, typically with each object having an array of child
objects, and is represented in a number of forms, such as this dendrogram. Notice that each object can have only one parent.
Figure 1.30. Network data consists of objects and the connections between them. The objects are typically referred to as nodes
or vertices, and the connections are referred to as edges or links. Networks are often represented using force-directed algorithms,
such as the example here, that arrange the network in such a way as to pull connected nodes toward each other.
Figure 1.31. Geographic data stores the spatial geometry of objects, such as states. Each of the states in this image is represented
as a separate feature with an array of values indicating its shape. Geographic data can also consist of points, such as for
cities, or lines, such as for roads.
Figure 1.32. Using console.log(), you can test to see if an event is properly firing. Here you create a <div> and assign an
onclick event handler using the .on() syntax. When you click that element and fire the event, the action is noted in the console.
Figure 1.33. The result of running listing 1.10 in the console is the creation of two circles, a line, and two text elements.
The order in which these elements are drawn results in the first label covered by the circle drawn later.
Figure 1.34. Transition behavior when associated with a delay results in a pause before the application of the attribute or
style.
Figure 1.35. Transition behavior when associated with position makes the shape graphically move to its new position over the
course of the assigned duration. Because you used the same y position for both circles, the first circle moves down and the
second circle moves up to the y position you set, which is between the two circles.
Chapter 2. Information visualization data flow
Figure 2.1. Examples from this chapter, including a diagram of how data-binding works (left) from section 2.3.3, a scatterplot
with labels (center) from section 2.3, and the bar chart (right) we’ll build in section 2.2.
Figure 2.2. The data visualization process that we’ll explore in this chapter assumes we begin with a set of data and want
to create (and update) an interactive or dynamic data visualization.
Figure 2.3. The first step in creating a data visualization is getting the data. You can do that by loading the file asynchronously
using one of several D3 XHR functions, or you can import or include the data. If the data is fixed, then either way is suitable,
but if you plan to replace your data source with a dynamic API call, then the XHR requests are the best approach.
Figure 2.4. After loading data, you need to make sure it’s formatted in such a way that it can be used to create graphics.
This includes mapping the data to positions on the screen, colors that indicate quantity, or bins to nest the data visually.
Figure 2.5. Scales in D3 map one set of values (the domain) to another set of values (the range) in a relationship determined
by the type of scale you create.
Figure 2.6. Scales can also be used to map numerical values to color bands, to make it easier to denote values using a color
scale.
Figure 2.7. Quantile scales take a range of values and reassign them into a set of equally sized bins.
Figure 2.8. Objects nested into a new array are now child elements of a values array of newly created objects that have a
key attribute set to the value used in the d3.nest.key function.
Figure 2.9. After formatting your data, you’ll need to measure it to ensure that the graphics you create are appropriately
sized and positioned based on the parameters of the dataset. You’ll use d3.extent, d3.min, d3.mean, and d3.max all the time.
Figure 2.10. To create graphics in D3, you use selections that bind data to DOM elements.
Figure 2.11. When our selection binds the cities.csv data to our web page, it creates eight new divs, each of which is classed
with "cities" and with content drawn from our data.
Figure 2.12. The default setting for any shape in SVG is black fill with no stroke, which makes it hard to tell when the shapes
overlap each other.
Figure 2.13. By changing the opacity settings, you can see the overlapping rectangles.
Figure 2.14. SVG rectangles are drawn from top to bottom.
Figure 2.15. When we set the y position of the rectangle to the desired y position minus the height of the rectangle, the
rectangle is drawn from bottom to top from that y position.
Figure 2.16. SVG shapes will continue to be drawn offscreen.
Figure 2.17. A bar chart drawn using a linear scale
Figure 2.18. The same bar chart from figure 2.17 drawn with a polylinear scale
Figure 2.19. A bar chart drawn with a linear scale where the maximum value in the domain is lower than the maximum value in
the dataset
Figure 2.20. A bar chart drawn with values in the dataset greater than the maximum value of the domain of the scale, but with
the clamp() function set to true
Figure 2.21. The cities.csv data drawn as a bar chart using the maximum value of the population attribute in the domain setting
of the scale
Figure 2.22. By nesting data and counting the objects that are nested, we can create a bar chart out of hierarchical data.
Figure 2.23. Tweets are represented as circles sized by the total number of favorites and retweets and are placed on the canvas
along the x-axis based on the time of the tweet and along the y-axis according to the same impact factor used to size the
circles. Two tweets with the same impact factor that were made at nearly the same time are shown overlapping at the bottom
left.
Figure 2.24. Selections where the number of DOM elements and number of values in an array don’t match will fire either an
.enter() event or an .exit() event, depending on whether there are more or fewer data values than DOM elements, respectively.
Update, in contrast, is not a function, and simply refers to when you update the data bound to the elements.
Figure 2.25. Each tweet is a <g> element with a circle and a label appended to it. The various tweets by Roy at 7 A.M. happen
so close to each other that they’re difficult to label.
Figure 2.26. All elements corresponding to tweets that were not favorited and not retweeted were removed.
Chapter 3. Data-driven design and interaction
Figure 3.1. This chapter covers loading HTML from an external file and updating it (section 3.3.2), as well as loading external
images for icons (section 3.3.1), animating transitions (section 3.2.2), and working with color (section 3.2.4).
Figure 3.2. Circles and labels created from a CSV representing 2014 World Cup statistics.
Figure 3.3. Buttons for each numerical attribute are appended to the controls div behind the viz div. When a button is clicked,
the code runs buttonClick.
Figure 3.4. Our initial buttonClick function resizes the circles based on the numerical value of the associated attribute.
The radius of each circle reflects the number of goals scored against each team, kept in the ga attribute of each datapoint.
Figure 3.5. The effect of our initial highlightRegion selects elements with the same region attribute and colors them orange,
while coloring gray those that aren’t in the same region.
Figure 3.6. A screenshot of your data visualization in the middle of its initial drawing, showing the individual circles growing
to an exaggerated size and then shrinking to their final size in the order in which they appear in the bound dataset.
Figure 3.7. The console results of inspecting a selected element, which show first the datapoint in the selection, then its
position in the array, and then the SVG element itself.
Figure 3.8. The results of running the node function of a selection in the console, which is the DOM element itself—in this
case, an SVG <circle> element.
Figure 3.9. The <text> element “Netherlands” is drawn at the same DOM level as the parent <g>, which, in this case, is behind
the element to its right.
Figure 3.10. Re-appending the <g> element for Germany to the <svg> element moves it to the end of that DOM region and therefore
it’s drawn above the other <g> elements.
Figure 3.11. Using the darker and brighter functions of a d3.rgb object in the highlighting function produces a darker version
of the set color for teams from the same region and lighter colors for teams from different regions.
Figure 3.12. Color mixing between yellow and blue in the RGB (red-green-blue) scale results in muddy, grayish colors displayed
for the values between yellow and blue.
Figure 3.13. Interpolation of yellow to blue based on hue, saturation, and lightness (HSL) results in a different set of intermediary
colors from the same two starting values.
Figure 3.14. Interpolation of color based on hue, chroma, and luminosity (HCL) provides a different set of intermediary colors
between yellow and blue.
Figure 3.15. Interpolation of color based on lightness and color-opponent space (known as LAB; L stands for lightness and
A-B stands for the color-opponent space) provides yet another set of intermediary colors between yellow and blue.
Figure 3.16. Application of the schemeCategory10 to an ordinal scale in D3 assigns distinct colors to each class applied,
in this case, the four regions in your dataset.
Figure 3.17. Utilizing the .unknown() method of an ordinal scale to serve back values for data that doesn’t have a corresponding
entry in the scale’s domain
Figure 3.18. Automatic quantizing linked with the ColorBrewer 3-red scale produces distinct visual categories in the red family.
Figure 3.19. Our graphical representations of each team now include a small PNG national flag, downloaded from Wikipedia and
loaded using an SVG <image> element.
Figure 3.20. The infobox is styled based on the defined style in CSS. It’s created by loading the HTML data from infobox.html
and adding it to the content of a newly created div.
Figure 3.21. An icon for a soccer ball created by James Zamyslianskyj and available at http://thenounproject.com/term/football/1907/
from The Noun Project
Figure 3.22. An SVG loaded using d3.html() that was created in Inkscape. It consists not only of the graphical <path> elements
that make up the SVG but also much data that’s often extraneous.
Figure 3.23. A hand-drawn soccer ball icon is loaded onto the <svg> canvas, along with the other SVG and HTML elements we
created in our code.
Figure 3.24. Each <g> element has its own set of paths cloned as child nodes, resulting in soccer ball icons overlaid on each
element.
Figure 3.25. Football icons with a fill and stroke set by D3
Figure 3.26. The paths now have the data from their parent element bound to them and respond accordingly when a discrete color
scale based on region is applied.
Chapter 4. Chart components
Figure 4.1. The charts we’ll create in this chapter using D3 generators and components. From left to right: a line chart,
a boxplot, and a streamgraph.
Figure 4.2. The three main types of functions found in D3 can be classified as generators, components, and layouts. You’ll
see components and generators in this chapter and layouts in the next chapter.
Figure 4.3. Circle positions indicate the number of friends and the array position of each datapoint.
Figure 4.4. Any point closer to the bottom has more friends, and any point closer to the right has a higher salary. But that’s
not clear at all without labels, which we’re going to make.
Figure 4.5. Elements of an axis created from d3.axis are 1 a <path.domain> with a size equal to the extent of the axis, 2
a <g.tick > that contains a <line> and 3 a <text> for each tick. Not shown, because it’s invisible, is the <g> element that’s
called, and in which these elements are created. By default, a path like the domain is with black (this figure shows that
fill area in purple), but the axis components have some default styles built in to prevent this. SVG line elements don’t have
stroke by default, but the elements created by D3 axes also have default styles in place to make them visible.
Figure 4.6. Default styles for an axis display the ticks and don’t fill the domain area.
Figure 4.7. With CSS settings corresponding to the tick <line> elements, we can draw a rather attractive grid based on our
two axes.
Figure 4.8. The DOM shows how tick <line> elements are appended along with a <text> element for the label to one of a set
of <g.tick.major> elements corresponding to the number of ticks.
Figure 4.9. A box from a boxplot consists of five pieces of information encoded in a single shape: (1) the maximum value,
(2) the high value of some distribution, such as the third quartile, (3) the median or mean value, (4) the corresponding low
value of the distribution, such as the first quartile, and (5) the minimum value.
Figure 4.10. The median age of visitors (y-axis) by day of the week (x-axis) as represented by a scatterplot. It shows a slight
dip in age on the second and third days.
Figure 4.11. The <rect> elements represent the scaled range of the first and third quartiles of visitor age. They’re placed
on top of a gray <circle> in each <g> element, which is placed on the chart at the median age. The rectangles are drawn, as
per SVG convention, from the <g> down and to the right.
Figure 4.12. The <rect> elements are now properly placed so that their top and bottom correspond with the visitor age between
the first and third quartiles of visitors for each day. The circles are completely covered, except for the second rectangle
where the first quartile value is the same as the median age, so we can see half the gray circle peeking out from underneath
it.
Figure 4.13. How a boxplot can be drawn in D3. Pay particular attention to the relative positioning necessary to draw child
elements of a <g>. The 0 positions for all elements are where the parent <g> has been placed, so that <line.max>, <rect.distribution>,
and <line.range> all need to be drawn with an offset placing their top-left corner above this center, whereas <line.min> is
drawn below the center and <line.median> has a 0 y-value, because our center is the median value.
Figure 4.14. Our final boxplot chart. Each day now shows not only the median age of visitors but also the range of visiting
ages, allowing for a more extensive examination of the demographics of site visitorship.
Figure 4.15. A scatterplot showing the datapoints for 10 days of activity on Twitter, with the number of tweets in blue, the
number of retweets in green, and the number of favorites in orange.
Figure 4.16. chartsline chartsdrawing many lines with multiple generatorsline chartsdrawing many lines with multiple generatorsThe
line generator takes the entire dataset and draws a line where the x,y position of every point on the canvas is based on its
accessor. In this case, each point on the line corresponds to the day, and tweets are scaled to fit the x and y scales we
created to display the data on the canvas.
Figure 4.17. The dataset is first used to draw a set of circles, which creates the scatterplot from the beginning of this
section. The dataset is then used three more times to draw each line.
Figure 4.18. Three common curve methods you’ll see in charts. Orange is a “basis” interpolation that provides an organic curve
averaged by the points (and therefore rarely touching them); a blue “step” interpolation changes the position of the line
at right angles; and a green “cardinal” interpolation provides a curve that touches each sample point.
Figure 4.19. Behold the glory of the streamgraph. Look on my works, ye mighty, and despair! (Figure by Pitch Interactive from
Wired, November 1, 2010, www.wired.com/2010/11/ff_311_new_york/all/1.) (Wesley Grubbs/WIRED © Condé Nast)
Figure 4.20. Each movie column is drawn as a separate line. Notice how the “cardinal” interpolation creates a graphical artifact,
where it seems several movies made negative money.
Figure 4.21. By using an area generator and defining the bottom of the area as the inverse of the top, we can mirror our lines
to create an area chart. Here they’re drawn with semitransparent fills, so that we can see how they overlap.
Figure 4.22. Our stacked area code represents a movie by drawing an area, where the bottom of that area equals the total amount
of money made by any movies drawn earlier for that day.
Figure 4.23. Our stacked chart with a legend telling the reader which color corresponds to which movie
Figure 4.24. A horizontal oriented colorLegend from d3-svg-legend rendered with custom settings for shapePadding, shapeWidth,
and shapeHeight.
Chapter 5. Layouts
Figure 5.1. Multiple layouts are demonstrated in this chapter, including the circle pack (section 5.3), tree (section 5.4),
stack (section 5.5), and Sankey (section 5.6.1), as well as tweening to properly animate shapes like the arcs in pie charts
(section 5.2.3).
Figure 5.2. The histogram in its initial state before we change the measure from favorites to retweets by clicking on one
of the bars
Figure 5.3. The processed data from d3.histogram returns an array where each array item also has an x0 and x1 field.
Figure 5.4. The histogram chart we’ve built will make an animated transition to display tweets binned by the number of retweets
instead of the number of favorites.
Figure 5.5. Three violin plots based on the data produced by d3.histogram
Figure 5.6. The traditional pie chart (bottom right) represents proportion as an angled slice of a circle. With slight modification,
it can be turned into a donut or ring chart (top) or an exploded pie chart (bottom left).
Figure 5.7. A pie layout applied to an array of [1,1,2] shows objects created with a start angle, end angle, and value attribute
corresponding to the dataset, as well as the original data, which in this case is a number.
Figure 5.8. A pie chart showing three pie pieces that subdivide the circle between the values in the array [1,1,2]
Figure 5.9. A donut chart showing the number of tweets from our four users represented in the nestedTweets dataset.
Figure 5.10. The pie charts representing, on the left, the total number of favorites and, on the right, the total number of
retweets
Figure 5.11. Snapshots of the transition of the pie chart representing the number of tweets to the number of favorites. This
transition highlights the need to assign key values for data binding and to use tweens for some types of graphical transition,
such as that used for arcs.
Figure 5.12. The streamgraph by Pitch Interactive used in a Wired piece describing the subject of calls to 311 (a city service
for reporting problems) in New York (November 1, 2010; https://www.wired.com/2010/11/ff_311_new_york/all/1)
Figure 5.13. The stack layout default settings, when tied to an area generator, produce a stacked area chart like this one.
Figure 5.14. The streamgraph effect from a stack layout with basis interpolation for the areas and using the silhouette and
inside-out settings for the stack layout. This is similar to our hand-built example from chapter 4 and shows the same graphical
artifacts from the basis interpolation.
Figure 5.15. A stacked bar chart using the stack layout to determine the position of the rectangles that make up each day’s
stacked bar
Figure 5.16. Google Analytics uses Sankey diagrams to chart event and user flow for website visitors.
Figure 5.17. A Sankey diagram where the number of visitors is represented in the color of the path. The flow between index
and contact has an increased opacity as the result of a mouseover event.
Figure 5.18. A squid-like Sankey diagram
Figure 5.19. The Sankey layout algorithm attempts to optimize the positioning of nodes to reduce overlap. The chart reflects
the position of nodes after (from left to right) 1 pass, 20 passes, 100 passes, and 200 passes.
Figure 5.20. A word or tag cloud uses the size of a word to indicate its importance or frequency in a text, creating a visual
summary of text. These word clouds were created by the popular online word cloud generator Wordle (www.wordle.net).
Figure 5.21. A word cloud with words that are arranged horizontally
Figure 5.22. A word cloud using the same worddata.csv but with words slightly perturbed by randomizing the rotation property
of each word.
Figure 5.23. This word cloud highlights keywords and places longer words horizontally and shorter words vertically.
Chapter 6. Hierarchical visualization
Figure 6.1. Some of the hierarchical diagrams we’ll look at in this chapter. The dendrogram (left), the icicle chart (middle),
and a treemap (right) showing a radial projection that’s popular with hierarchical diagrams.
Figure 6.2. Typical A/B testing results in a tabular view, showing several metrics and the change versus the control cell.
Positive changes are denoted with a plus symbol, and statistically significant changes are shown with green for a statistically
signficant positive change and orange for a statistically significant negative change.
Figure 6.3. A circle pack of A/B testing results, showing the nested results by cell, metric, subscription level, and country.
Green shows statistically significant wins, and orange shows statistically significant losses.
Figure 6.4. Hierarchical viz of A/B testing results ordered by cell, country, subscription, and then metric
Figure 6.5. Pack layouts are useful for representing nested data. They can be flattened (top) or they can visually represent
hierarchy (bottom).
Figure 6.6. Each tweet is represented by a green circle nested inside an orange circle that represents the user who made the
tweet. One of those green circles is exactly the same size as its parent orange circle, which we address below. The users
are all nested inside a blue circle that represents our “root” node.
Figure 6.7. An example of a fixed margin based on hierarchical depth. We can create this by reducing the circle size of each
node based on its computed depth value.
Figure 6.8. A circle-packing layout with the size of the leaf nodes set to the impact factor of those nodes
Figure 6.9. Tree layouts are another useful method for expressing hierarchical relationships and are often laid out vertically
(top), horizontally (middle), or radially (bottom). (Examples from Mike Bostock at d3js.org.)
Figure 6.10. A dendrogram laid out vertically using data from tweets.json. The level 0 “root” node (which we created to contain
the users) is in blue, the level 1 nodes (which represent users) are in orange, and the level 2 “leaf” nodes (which represent
tweets) are in green.
Figure 6.11. A dendrogram with labels for each of the nodes
Figure 6.12. The same dendrogram as figure 6.11 but laid out horizontally
Figure 6.13. The same dendrogram laid out in a radial manner.
Figure 6.14. Example of using a dendrogram in a word tree by Jason Davies (www.jasondavies.com/wordtree/).
Figure 6.15. The same dataset rendered using d3.tree (left) and d3.cluster (right)
Figure 6.16. A partition layout of our data, showing tweets at the bottom in green, sized by “impact” with users in orange
sized by the total impact of their tweets and the root node (in this case “All Tweets”) in blue.
Figure 6.17. Icicle charts look like melting icicles hanging from the gutter when you have hierarchical data of uneven depth,
as you have in this example.
Figure 6.18. The most popular blocks on October 20, 2016, which include not one or two but four different sunburst diagrams
Figure 6.19. A sunburst version of our nested tweets.
Figure 6.20. An example of d3-flame-graph, which implements the flame graph first developed by Brandon Gregg. Note that the
value of the children (in this case the higher bars) often adds up to less than the value of the parents.
Figure 6.21. A treemap without padding will only show the leaf nodes.
Figure 6.22. A treemap with the padding method set. Notice that padding determines the space between children and not siblings.
Figure 6.23. The “zoomed in” view of our treemap, showing only the leaf nodes in one of the intermediary node views. Note
that the recalculated treemap has adjusted the padding because the orange node is now the root node.
Figure 6.24. A radial treemap accomplished by taking the drawing instructions from d3.treemap and using them to draw paths
using d3.arc instead of svg:rect elements.
Chapter 7. Network visualization
Figure 7.1. Along with explaining the basics of network analysis (section 7.2.3), this chapter includes laying out networks
using xy positioning (section 7.2.5), force-directed algorithms (section 7.2), adjacency matrices (section 7.1.2), and arc
diagrams (section 7.1.3).
Figure 7.2. Some basic kinds of network connections (directed, reciprocated, and undirected) that show up in basic networks
like simple directed and undirected networks
Figure 7.3. How edges are described graphically in an adjacency matrix. In this kind of diagram, the nodes are listed on the
axes as columns, and a connection is indicated by a shaded cell where those columns intersect.
Figure 7.4. The array of connections we’re building. Notice that every possible connection is stored in the array. Only those
connections that exist in our dataset have a weight value other than 0. Also note that our CSV import creates the weight value
as a string.
Figure 7.5. A weighted, directed adjacency matrix where lighter orange indicates weaker connections and darker orange indicates
stronger connections. The source is on the y-axis, and the target is on the x-axis. The matrix shows that Sarah, Nadieh, and
Hajra didn’t give anyone feedback, whereas Kai gave Susie feedback, and Susie gave Kai feedback (what we call a reciprocated
tie in network analysis).
Figure 7.6. Adjacency highlighting of the column and row of the grid square. In this instance, the mouse is over the Erik-to-Kai
edge, and as a result highlights the Erik row and the Kai column. You can see that Erik gave feedback to three people, whereas
Kai received feedback from four people.
Figure 7.7. The components of an arc diagram are circles for nodes and arcs for connections, with nodes laid out along a baseline
and the location of the arc relative to that baseline indicative of the direction of the connection.
Figure 7.8. An arc diagram, with connections between nodes represented as arcs above and below the nodes. We can see the first
(left) two nodes have no outgoing links, and the rightmost three nodes also have no outgoing links. The length of the arcs
is meaningless and based on how we’ve laid the nodes out (nodes that are far away will have longer links), but the width of
the arcs is based on the weight of the connection.
Figure 7.9. Mouseover behavior on edges (right), with the edge being moused over in orange, the source node in light green,
and the target node in dark green. Mouseover behavior on nodes (left), with the node being moused over in orange and the connected
edges in light orange.
Figure 7.10. The forces in a force-directed algorithm: attraction/repulsion, gravity, and link attraction. Other factors,
such as hierarchical packing and community detection, can also be factored into force-directed algorithms, but the aforementioned
features are the most common. Forces are approximated for larger networks to improve performance.
Figure 7.11. The results of a force simulation where the only force acting on the nodes is attraction
Figure 7.12. Our sample node data laid out with collision detection. This is one way to create a simple bubble chart.
Figure 7.13. A beeswarm plot created with our code (rotated to better fit on the page)
Figure 7.14. A force-directed layout based on our dataset and organized graphically using default settings in the force layout.
Managers are in orange, employees green, and contractors purple.
Figure 7.15. Edges now display markers (arrowheads) indicating the direction of connection. Notice that all the arrowheads
are the same size. You can control the color of the arrowheads by using CSS rules such as marker > path {fill: # 93C464;}.
Figure 7.16. The same network measured using degree centrality (top left), closeness centrality (top right), eigenvector centrality
(bottom left), and betweenness centrality (bottom right). We’ll only see degree centrality, but you can explore the others
with libraries like jsnetworkx.js. More-central nodes are larger and bright red, whereas less-central nodes are smaller and
gray. Notice that although some nodes are central according to all measures, their relative centrality varies, as does the
overall centrality of other nodes.
Figure 7.17. Sizing nodes by weight indicates the number of total connections for each node by setting the radius of the circle
equal to the weight times 2.
Figure 7.18. By basing the strength of the attraction between nodes on the strength of the connections between nodes, you
see a dramatic change in the structure of the network. The weaker connections between x and y allow that part of the network
to drift away.
Figure 7.19. The two nodes representing managers have been dragged to the top corners, allowing the rest of the nodes to take
their positions based on the forces of the simulation (being dragged toward the center along with being dragged toward the
fixed nodes).
Figure 7.20. The network has been filtered to only show nodes that are not managers or contractors. This figure catches two
processes in midstream, the transition of nodes from full to 0 opacity, and the removal of edges.
Figure 7.21. Network with a new edge added
Figure 7.22. Network with two new nodes added (Mike and Noah), both with links to Sam
Figure 7.23. When the network is represented as a scatterplot, the links increase the visual clutter. It provides a useful
contrast to the force-directed layout, but can be hard to read on its own.
Chapter 8. Geospatial information visualization
Figure 8.1. Mapping with D3 takes many forms and offers many options, including topology operations like merging and finding
neighbors (section 8.4), globes (section 8.3.1), spatial calculations (section 8.1.4), and data-driven maps (section 8.1)
using novel projections (section 8.1.3).
Figure 8.2. A polygon drawn at the coordinates [–74.0479, 40.8820], [–73.9067, 40.8820], [–73.9067, 40.6829], and [–74.0479,
40.6829].
Figure 8.3. A map of the world using the default settings for D3’s Mercator projection. You can see most of the Western Hemisphere
and some of Europe and Africa, but the rest of the world is rendered out of sight.
Figure 8.4. The Mercator-projected world from our data now fitting our SVG area. Notice the enormous distortion in size of
regions near the poles, such as Greenland and Antarctica.
Figure 8.5. chartsgeospatial information visualizationprojections and areasgeospatial information visualization (mapping)projections
and areasMollweideprojectionsareas andOur map with our eight world cities added to it. At this distance, you can’t tell how
inaccurate these points are, but if you zoom in, you see that both of our Italian cities are in the Mediterranean.
Figure 8.6. Mercator (left) dramatically distorts the size of Antarctica so much that no other shape looks as large. In comparison,
the Mollweide projection maintains the physical area of the countries and continents in your geodata, at the cost of distorting
their shape and angle. Notice that geo.path.area measures the graphical area and not the physical area of the features.
Figure 8.7. Your interactivity provides a bounding box around each country and a red circle representing its graphical center.
Here you see the bounding box and centroid of China. The D3 implementation of a centroid is weighted, so that it’s the center
of most area, and not only the center of the bounding box.
Figure 8.8. Our map with a graticule (in light gray) and a graticule outline (the black border around the edge of the map)
Figure 8.9. Our map with zooming enabled. Panning occurs with the drag behavior and zooming with mousewheel and/or double-clicking.
Notice that the bounding box and centroid functions still work because they’re based on our constantly updating projection.
Figure 8.10. Zoom buttons and the effect of clicking Zoom In twice. Because the zoom buttons modify the zoom behavior’s translate
and scale, any mouse interaction afterward reflects the updated settings.
Figure 8.11. An orthographic projection makes our map look like a globe. Notice that even though the paths for countries are
drawn over each other, they’re still drawn above the graticules. Also notice that although zooming in and out works, panning
doesn’t spin the globe but instead moves it around the canvas. The coloration of our countries is once again based on the
graphical size of the country.
Figure 8.12. A draggable globe that clips the cities based on whether they should be in view and recolors the countries based
on their displayed size
Figure 8.13. Our globe with countries colored by their geographic area, rather than their graphical area
Figure 8.14. A satellite projection of data from the Middle East facing Europe
Figure 8.15. Arcs making up the counties of California and Nevada and neighboring states. You can see that the arcs are split
whenever there is a possibility that they could be used in a different polygon. As a result, the 17 arcs making up the border
of California and Nevada are used not only in the polygons making California and Nevada but also the polygons making their
counties. Because the dataset knows the arcs are shared, it can easily derive neighbors.
Figure 8.16. TopoJSON data formatted using Topojson.feature(). The data is an array of objects, and it represents geometry
as an array of coordinates like the features that come out of a GeoJSON file.
Figure 8.17. The results of merging based on the centroid of a feature. The feature in gray is a single merged feature made
up of many separate polygons.
Figure 8.18. By adjusting the merge settings, we can create something like northern and eastern hemispheres as merged features.
Notice that because this is based on a centroid, we can see at the bottom a piece of Eastern Russia as part of our merged
feature, along with Antarctica.
Figure 8.19. Hover behavior displaying the neighbors of France using TopoJSON’s neighbor function. Because Guyana is an overseas
department of France, France is considered to be neighbors with Brazil and Suriname. This is because France is represented
as a multipolygon in the data, and any neighbors with any of its shapes are returned as neighbors.
Figure 8.20. An example of hexbinning by Mike Bostock showing the locations of Walmart stores in the United States (available
at http://bl.ocks.org/mbostock/4330486).
Figure 8.21. An example of a Voronoi diagram used to split the United States into polygons based on the closest state capital
(available at www.jasondavies.com/maps/voronoi/us-capitals/).
Chapter 9. Interactive applications with React and D3
Figure 9.1. Throughout this chapter, we’ll build toward this fully operational data dashboard, first creating the individual
chart elements (section 9.1), then adding interactivity (section 9.2), and finally adding a brush to filter the data (section
9.3).
Figure 9.2. A sketch of a dashboard, showing a map, bar chart, and stacked area chart that display our data
Figure 9.3. The default page that create-react-app deploys with
Figure 9.4. Your first React + D3 app, with a simple bar chart rendered in your app
Figure 9.5. The basic map we saw in chapter 8 but now rendered via React and JSX with D3 providing the drawing instructions
Figure 9.6. Our rendered WorldMap component, with countries colored by launch day
Figure 9.7. A rudimentary dashboard with two views into the data. The bars are ordered by our fake “Launch Day,” and sometimes
the randomized data shows interesting patterns like the dark green bar showing a higher total sales than countries that launched
10 days earlier.
Figure 9.8. A typical dashboard, with three views into the same dataset. In this case, it shows MatFlicks launch waves geographically
while summing up sales in the bar chart and showing sales over time in the streamgraph.
Figure 9.9. The same dashboard on a large screen and a small screen
Figure 9.10. onMouseEnter propertyOur MatFlicks table mat rollout dashboard, now with a legend to show your users which countries
are in which wave of launches
Figure 9.11. Canada as our cross-highlighting example. It was earlier in the alphabet and as a result millions of Canadians
were enjoying high quality European mats long before citizens of the United States could.
Figure 9.12. Components of a brush
Figure 9.13. Here’s our brush, though without any function associated with the brush events, so it’s little more than a toy.
A boring, boring toy.
Figure 9.14. The results of our brushed() function showing only wave 1 and 2, then wave 3, and finally wave 4 countries.
Figure 9.15. Our final dashboard, showing a statline at the top indicating the number of countries we’ve selected out of the
total number of countries in the data as well as the average sales of the selected countries compared to the average sales
overall. Here it’s resized to be smaller and because we don’t resize the map, we only see North America.
Chapter 10. Writing layouts and components
Figure 10.1. The results of our makeAGrid function that uses our new d3.gridLayout to arrange the data in a grid. In this
case, our data consists of employees that are each represented as a green circle laid out on a grid and size by salary.
Figure 10.2. The grid layout has automatically adjusted to the size of our new dataset. Notice that our new elements are above
the old elements, but our layout has changed in size from a 4 x 4 grid to a 5 x 5 grid, causing the old elements to move to
their newly calculated position.
Figure 10.3. The grid layout run with a 400 x 400 size setting
Figure 10.4. The grid layout run in a 200 x 200 size (left) and a 400 x 200 size (center), and a 200 x 400 size (right)
Figure 10.5. The three states of the grid layout using rectangles for the grid cells
Figure 10.6. The countries of the world as a grid
Figure 10.7. Circles representing countries colored by area
Figure 10.8. The new legend component, when called by a <g> element placed below our grid, creates five red rectangles.
Figure 10.9. The updated legend component is automatically created, with a <rect> element for each band in the quantize scale
that’s colored according to that band’s color.
Figure 10.10. The legendOver behavior highlights circles falling in a particular band and deemphasizes the circles not in
that band by making them transparent.
Figure 10.11. Our legend with rudimentary labels
Figure 10.12. Our legend with title, unit labels, appropriate number formatting, and additional graphical elements to highlight
the breakpoints
Chapter 11. Mixed mode rendering
Figure 11.1. This chapter focuses on optimization techniques such as using canvas drawing to render large datasets in tandem
with SVG for the interactive elements. This is demonstrated with maps (section 11.1), networks (section 11.2), and traditional
xy data (section 11.3), which uses the D3 quadtree function (section 11.3.2).
Figure 11.2. Violin plots drawn using canvas. You can see that they’re more pixelated.
Figure 11.3. Two zoomed-in shapes, one rendered in SVG (left) and one rendered with canvas (right)
Figure 11.4. Drawing random triangles on a map entirely with SVG
Figure 11.5. Zooming in on the sample geodata around East Asia and Oceania
Figure 11.6. Drawing our map with canvas produces higher performance, but slightly less crisp graphics. On the left, it may
seem like the triangles are as smoothly rendered as the earlier SVG triangles, but if you zoom in as we’ve done on the right,
you can start to see clearly the slightly pixelated canvas rendering.
Figure 11.7. Placing interactive SVG elements below a <canvas> element requires that you set its pointer-events style to none,
even if it has a transparent background, in order to register click events on the <svg> element underneath it.
Figure 11.8. Background countries are drawn with canvas, while foreground triangles are drawn with SVG to use interactivity.
SVG graphics are individual elements in the DOM and are therefore amenable to having click, mouseover, and other event listeners
attached to them.
Figure 11.9. The same randomly generated triangles rendered in SVG while the map isn’t being zoomed or panned (left) and in
canvas while the map is being zoomed or panned (right). Notice that only the SVG triangles have different fill values based
on user interaction, because that isn’t factored into the canvas drawing code for the triangles on the right.
Figure 11.10. A network of D3 examples hosted on gist.github.com that connects different examples to each other by shared
functions. Here you can see that the example “Bivariate Hexbin Map” by Mike Bostock (http://bl.ocks.org/mbostock/4330486)
shares functions in common with three different examples: Metropolitan Unemployment, Marey’s Trains II, and GitHub Users Worldwide.
The brush and axis components allow you to filter the network by the number of connections from one block to another.
Figure 11.11. A randomly generated network with 3,000 nodes and 1,000 edges
Figure 11.12. A large network drawn with SVG nodes and canvas links
Figure 11.13. 3,000 randomly placed points represented by orange SVG <circle> elements
Figure 11.14. Highlighting points in a selected region
Figure 11.15. A quadtree for points shown in red with quadrant regions stroked in black. Notice how clusters of points correspond
to subdivision of regions of the quadtree. Every point falls in only one region, but each region is nested in several levels
of parent regions.
Figure 11.16. Quadtree-optimized selection used with a dataset of 10,000 points
Figure 11.17. The test to see whether a quadtree node is outside a brush selection involves four tests to see if it’s above,
left, right, or below the selection area. If it passes true for any of these tests, the quadtree will stop searching any child
nodes.