D3.js in Action, Second Edition: Data visualization with JavaScript

List of Figures

Chapter 1. An introduction to D3.js

Figure 1.1. A map of how to approach data visualization with D3.js that highlights the approach in this book. Start at the top with data and then follow the path depending on the type of data and the needs you’re addressing.

Figure 1.2. D3 can be used for simple charts, such as this donut chart (explained in chapter 5).

Figure 1.3. D3 can also be used to create web maps (see chapter 8), such as this map showing the ethnic makeup of major metropolitan areas in the United States.

Figure 1.4. Maps in D3 aren’t limited to traditional Mercator web maps. They can be interactive globes, like this map of undersea communication cables, or other, more

Figure 1.5. D3 also provides robust capacities to create interactive network visualizations (see chapter 7). Here you see the social and coauthorship network of archaeologists working at the same dig for nearly 25 years.

Figure 1.6. D3 includes a library of common data visualization layouts, such as the dendrogram (explained in chapter 6), that let you represent data, such as this word tree.

Figure 1.7. D3 has SVG and canvas drawing functions (see chapter 4) so you can create your own custom visualizations, such as this representation of musical scores.

Figure 1.8. You can combine these layouts and functions to create a data dashboard like we’ll do in chapter 9. You can also use the drawing functions to make your bar charts look distinctive, such as this “sketchy” style.

Figure 1.9. An application created with D3 can use selections and data binding over and over again, together and separately, to update the content of the data visualization based on interaction.

Figure 1.10. Before GIFs were weaponized to share cute animal behavior, they were your only hope for animated data visualization on the web. Few examples from the 1990s like dpgraph.com still exist, but this page has more than enough GIFs to remind us of their dangers.

Figure 1.11. The developer tools in Chrome place the JavaScript console on the rightmost tab, labeled Console, with the element inspector available using the arrow in a rectangle (circled above) on the top left or by browsing the DOM in the Elements tab.

Figure 1.12. You can run JavaScript code in the console and call global variables or declare new ones as necessary. Any code you write in the console and changes made to the web page are lost as soon as you reload the page.

Figure 1.13. Rather than adding or modifying individual styles and attributes, you can rewrite the HTML code as you would in a text editor. As with any changes, these only last until you reload the page.

Figure 1.14. Changing the content of a DOM element is as simple as adding text between the opening and ending brackets of the element.

Figure 1.15. The page is updated as soon as you finish making your changes. Writing HTML manually in this way is only useful for planning how you might want to dynamically update the content.

Figure 1.16. The D3 select syntax modifies style using the .style() function, and traditional HTML content using the .html() function.

Figure 1.17. The commands to draw an SVG path (right) and the resulting graphic (left)

Figure 1.18. Inspecting the DOM of a web page with an SVG canvas reveals the nested graphical elements as well as the style and attributes that determine their position. Notice that the circle and rectangle exist as child elements of a group.

Figure 1.19. Modifying the height and width attributes of a <rect> element changes the appearance of that element. Inspecting the element also shows how the stroke adds to the computed size of the element.

Figure 1.20. The same 25 x 25 <rect> with no, 1-px, 2-px, 3-px, 4-px, and 5-px strokes. Though these are drawn on a retina screen using half-pixels, the second and third report the same width and height (27 px x 27 px) as the fourth and fifth (29 px x 29 px).

Figure 1.21. All SVG elements can be affected by the transform attribute, but this is particularly salient when working with <g> elements, which require this approach to adjust their position. The child elements are drawn by using the position of their parent <g> as their relative 0,0 position. The scale() setting in the transform attribute then affects the scale of any of the size and position attributes of the child elements.

Figure 1.22. Each path shown here uses the same coordinates in its d attribute, with the only differences between them being the presence or absence of the letter Z at the end of the text string defining the d attribute, the settings for fill and stroke, and the position via the transform attribute.

Figure 1.23. Examining an SVG rectangle in the console shows that it inherits its fill style from the CSS style applied to <rect> types and its stroke style from the .active class.

Figure 1.24. The SVG circle has its fill value set by its type in the style sheet, with its opacity set by its membership in the .tentative class and its stroke set by its membership in the .active class. Notice that the stroke settings from the .tentative class are overwritten by the stroke settings in the later declared .active class.

Figure 1.25. An SVG circle with fill style determined by its type and its opacity and stroke settings determined by its membership in the tentative class

Figure 1.26. By binding an array of four values to a selection of <div> elements on the page, the .enter() function created three new <div> elements to reflect the size mismatch between the data array and the selection.

Figure 1.27. Inspecting the DOM shows that the new <div> elements have been created with unformatted content followed by the child <span> element with style and content set by your code.

Figure 1.28. Running JavaScript in the console allows you to test your code. Here you’ve created a new array called smallerNumbers that consists of only three values, which you can then use as your data in a selection to update and create new elements.

Figure 1.29. Nested data represents parent/child relationships of objects, typically with each object having an array of child objects, and is represented in a number of forms, such as this dendrogram. Notice that each object can have only one parent.

Figure 1.30. Network data consists of objects and the connections between them. The objects are typically referred to as nodes or vertices, and the connections are referred to as edges or links. Networks are often represented using force-directed algorithms, such as the example here, that arrange the network in such a way as to pull connected nodes toward each other.

Figure 1.31. Geographic data stores the spatial geometry of objects, such as states. Each of the states in this image is represented as a separate feature with an array of values indicating its shape. Geographic data can also consist of points, such as for cities, or lines, such as for roads.

Figure 1.32. Using console.log(), you can test to see if an event is properly firing. Here you create a <div> and assign an onclick event handler using the .on() syntax. When you click that element and fire the event, the action is noted in the console.

Figure 1.33. The result of running listing 1.10 in the console is the creation of two circles, a line, and two text elements. The order in which these elements are drawn results in the first label covered by the circle drawn later.

Figure 1.34. Transition behavior when associated with a delay results in a pause before the application of the attribute or style.

Figure 1.35. Transition behavior when associated with position makes the shape graphically move to its new position over the course of the assigned duration. Because you used the same y position for both circles, the first circle moves down and the second circle moves up to the y position you set, which is between the two circles.

Chapter 2. Information visualization data flow

Figure 2.1. Examples from this chapter, including a diagram of how data-binding works (left) from section 2.3.3, a scatterplot with labels (center) from section 2.3, and the bar chart (right) we’ll build in section 2.2.

Figure 2.2. The data visualization process that we’ll explore in this chapter assumes we begin with a set of data and want to create (and update) an interactive or dynamic data visualization.

Figure 2.3. The first step in creating a data visualization is getting the data. You can do that by loading the file asynchronously using one of several D3 XHR functions, or you can import or include the data. If the data is fixed, then either way is suitable, but if you plan to replace your data source with a dynamic API call, then the XHR requests are the best approach.

Figure 2.4. After loading data, you need to make sure it’s formatted in such a way that it can be used to create graphics. This includes mapping the data to positions on the screen, colors that indicate quantity, or bins to nest the data visually.

Figure 2.5. Scales in D3 map one set of values (the domain) to another set of values (the range) in a relationship determined by the type of scale you create.

Figure 2.6. Scales can also be used to map numerical values to color bands, to make it easier to denote values using a color scale.

Figure 2.7. Quantile scales take a range of values and reassign them into a set of equally sized bins.

Figure 2.8. Objects nested into a new array are now child elements of a values array of newly created objects that have a key attribute set to the value used in the d3.nest.key function.

Figure 2.9. After formatting your data, you’ll need to measure it to ensure that the graphics you create are appropriately sized and positioned based on the parameters of the dataset. You’ll use d3.extent, d3.min, d3.mean, and d3.max all the time.

Figure 2.10. To create graphics in D3, you use selections that bind data to DOM elements.

Figure 2.11. When our selection binds the cities.csv data to our web page, it creates eight new divs, each of which is classed with "cities" and with content drawn from our data.

Figure 2.12. The default setting for any shape in SVG is black fill with no stroke, which makes it hard to tell when the shapes overlap each other.

Figure 2.13. By changing the opacity settings, you can see the overlapping rectangles.

Figure 2.14. SVG rectangles are drawn from top to bottom.

Figure 2.15. When we set the y position of the rectangle to the desired y position minus the height of the rectangle, the rectangle is drawn from bottom to top from that y position.

Figure 2.16. SVG shapes will continue to be drawn offscreen.

Figure 2.17. A bar chart drawn using a linear scale

Figure 2.18. The same bar chart from figure 2.17 drawn with a polylinear scale

Figure 2.19. A bar chart drawn with a linear scale where the maximum value in the domain is lower than the maximum value in the dataset

Figure 2.20. A bar chart drawn with values in the dataset greater than the maximum value of the domain of the scale, but with the clamp() function set to true

Figure 2.21. The cities.csv data drawn as a bar chart using the maximum value of the population attribute in the domain setting of the scale

Figure 2.22. By nesting data and counting the objects that are nested, we can create a bar chart out of hierarchical data.

Figure 2.23. Tweets are represented as circles sized by the total number of favorites and retweets and are placed on the canvas along the x-axis based on the time of the tweet and along the y-axis according to the same impact factor used to size the circles. Two tweets with the same impact factor that were made at nearly the same time are shown overlapping at the bottom left.

Figure 2.24. Selections where the number of DOM elements and number of values in an array don’t match will fire either an .enter() event or an .exit() event, depending on whether there are more or fewer data values than DOM elements, respectively. Update, in contrast, is not a function, and simply refers to when you update the data bound to the elements.

Figure 2.25. Each tweet is a <g> element with a circle and a label appended to it. The various tweets by Roy at 7 A.M. happen so close to each other that they’re difficult to label.

Figure 2.26. All elements corresponding to tweets that were not favorited and not retweeted were removed.

Chapter 3. Data-driven design and interaction

Figure 3.1. This chapter covers loading HTML from an external file and updating it (section 3.3.2), as well as loading external images for icons (section 3.3.1), animating transitions (section 3.2.2), and working with color (section 3.2.4).

Figure 3.2. Circles and labels created from a CSV representing 2014 World Cup statistics.

Figure 3.3. Buttons for each numerical attribute are appended to the controls div behind the viz div. When a button is clicked, the code runs buttonClick.

Figure 3.4. Our initial buttonClick function resizes the circles based on the numerical value of the associated attribute. The radius of each circle reflects the number of goals scored against each team, kept in the ga attribute of each datapoint.

Figure 3.5. The effect of our initial highlightRegion selects elements with the same region attribute and colors them orange, while coloring gray those that aren’t in the same region.

Figure 3.6. A screenshot of your data visualization in the middle of its initial drawing, showing the individual circles growing to an exaggerated size and then shrinking to their final size in the order in which they appear in the bound dataset.

Figure 3.7. The console results of inspecting a selected element, which show first the datapoint in the selection, then its position in the array, and then the SVG element itself.

Figure 3.8. The results of running the node function of a selection in the console, which is the DOM element itself—in this case, an SVG <circle> element.

Figure 3.9. The <text> element “Netherlands” is drawn at the same DOM level as the parent <g>, which, in this case, is behind the element to its right.

Figure 3.10. Re-appending the <g> element for Germany to the <svg> element moves it to the end of that DOM region and therefore it’s drawn above the other <g> elements.

Figure 3.11. Using the darker and brighter functions of a d3.rgb object in the highlighting function produces a darker version of the set color for teams from the same region and lighter colors for teams from different regions.

Figure 3.12. Color mixing between yellow and blue in the RGB (red-green-blue) scale results in muddy, grayish colors displayed for the values between yellow and blue.

Figure 3.13. Interpolation of yellow to blue based on hue, saturation, and lightness (HSL) results in a different set of intermediary colors from the same two starting values.

Figure 3.14. Interpolation of color based on hue, chroma, and luminosity (HCL) provides a different set of intermediary colors between yellow and blue.

Figure 3.15. Interpolation of color based on lightness and color-opponent space (known as LAB; L stands for lightness and A-B stands for the color-opponent space) provides yet another set of intermediary colors between yellow and blue.

Figure 3.16. Application of the schemeCategory10 to an ordinal scale in D3 assigns distinct colors to each class applied, in this case, the four regions in your dataset.

Figure 3.17. Utilizing the .unknown() method of an ordinal scale to serve back values for data that doesn’t have a corresponding entry in the scale’s domain

Figure 3.18. Automatic quantizing linked with the ColorBrewer 3-red scale produces distinct visual categories in the red family.

Figure 3.19. Our graphical representations of each team now include a small PNG national flag, downloaded from Wikipedia and loaded using an SVG <image> element.

Figure 3.20. The infobox is styled based on the defined style in CSS. It’s created by loading the HTML data from infobox.html and adding it to the content of a newly created div.

Figure 3.21. An icon for a soccer ball created by James Zamyslianskyj and available at http://thenounproject.com/term/football/1907/ from The Noun Project

Figure 3.22. An SVG loaded using d3.html() that was created in Inkscape. It consists not only of the graphical <path> elements that make up the SVG but also much data that’s often extraneous.

Figure 3.23. A hand-drawn soccer ball icon is loaded onto the <svg> canvas, along with the other SVG and HTML elements we created in our code.

Figure 3.24. Each <g> element has its own set of paths cloned as child nodes, resulting in soccer ball icons overlaid on each element.

Figure 3.25. Football icons with a fill and stroke set by D3

Figure 3.26. The paths now have the data from their parent element bound to them and respond accordingly when a discrete color scale based on region is applied.

Chapter 4. Chart components

Figure 4.1. The charts we’ll create in this chapter using D3 generators and components. From left to right: a line chart, a boxplot, and a streamgraph.

Figure 4.2. The three main types of functions found in D3 can be classified as generators, components, and layouts. You’ll see components and generators in this chapter and layouts in the next chapter.

Figure 4.3. Circle positions indicate the number of friends and the array position of each datapoint.

Figure 4.4. Any point closer to the bottom has more friends, and any point closer to the right has a higher salary. But that’s not clear at all without labels, which we’re going to make.

Figure 4.5. Elements of an axis created from d3.axis are 1 a <path.domain> with a size equal to the extent of the axis, 2 a <g.tick > that contains a <line> and 3 a <text> for each tick. Not shown, because it’s invisible, is the <g> element that’s called, and in which these elements are created. By default, a path like the domain is with black (this figure shows that fill area in purple), but the axis components have some default styles built in to prevent this. SVG line elements don’t have stroke by default, but the elements created by D3 axes also have default styles in place to make them visible.

Figure 4.6. Default styles for an axis display the ticks and don’t fill the domain area.

Figure 4.7. With CSS settings corresponding to the tick <line> elements, we can draw a rather attractive grid based on our two axes.

Figure 4.8. The DOM shows how tick <line> elements are appended along with a <text> element for the label to one of a set of <g.tick.major> elements corresponding to the number of ticks.

Figure 4.9. A box from a boxplot consists of five pieces of information encoded in a single shape: (1) the maximum value, (2) the high value of some distribution, such as the third quartile, (3) the median or mean value, (4) the corresponding low value of the distribution, such as the first quartile, and (5) the minimum value.

Figure 4.10. The median age of visitors (y-axis) by day of the week (x-axis) as represented by a scatterplot. It shows a slight dip in age on the second and third days.

Figure 4.11. The <rect> elements represent the scaled range of the first and third quartiles of visitor age. They’re placed on top of a gray <circle> in each <g> element, which is placed on the chart at the median age. The rectangles are drawn, as per SVG convention, from the <g> down and to the right.

Figure 4.12. The <rect> elements are now properly placed so that their top and bottom correspond with the visitor age between the first and third quartiles of visitors for each day. The circles are completely covered, except for the second rectangle where the first quartile value is the same as the median age, so we can see half the gray circle peeking out from underneath it.

Figure 4.13. How a boxplot can be drawn in D3. Pay particular attention to the relative positioning necessary to draw child elements of a <g>. The 0 positions for all elements are where the parent <g> has been placed, so that <line.max>, <rect.distribution>, and <line.range> all need to be drawn with an offset placing their top-left corner above this center, whereas <line.min> is drawn below the center and <line.median> has a 0 y-value, because our center is the median value.

Figure 4.14. Our final boxplot chart. Each day now shows not only the median age of visitors but also the range of visiting ages, allowing for a more extensive examination of the demographics of site visitorship.

Figure 4.15. A scatterplot showing the datapoints for 10 days of activity on Twitter, with the number of tweets in blue, the number of retweets in green, and the number of favorites in orange.

Figure 4.16. chartsline chartsdrawing many lines with multiple generatorsline chartsdrawing many lines with multiple generatorsThe line generator takes the entire dataset and draws a line where the x,y position of every point on the canvas is based on its accessor. In this case, each point on the line corresponds to the day, and tweets are scaled to fit the x and y scales we created to display the data on the canvas.

Figure 4.17. The dataset is first used to draw a set of circles, which creates the scatterplot from the beginning of this section. The dataset is then used three more times to draw each line.

Figure 4.18. Three common curve methods you’ll see in charts. Orange is a “basis” interpolation that provides an organic curve averaged by the points (and therefore rarely touching them); a blue “step” interpolation changes the position of the line at right angles; and a green “cardinal” interpolation provides a curve that touches each sample point.

Figure 4.19. Behold the glory of the streamgraph. Look on my works, ye mighty, and despair! (Figure by Pitch Interactive from Wired, November 1, 2010, www.wired.com/2010/11/ff_311_new_york/all/1.) (Wesley Grubbs/WIRED © Condé Nast)

Figure 4.20. Each movie column is drawn as a separate line. Notice how the “cardinal” interpolation creates a graphical artifact, where it seems several movies made negative money.

Figure 4.21. By using an area generator and defining the bottom of the area as the inverse of the top, we can mirror our lines to create an area chart. Here they’re drawn with semitransparent fills, so that we can see how they overlap.

Figure 4.22. Our stacked area code represents a movie by drawing an area, where the bottom of that area equals the total amount of money made by any movies drawn earlier for that day.

Figure 4.23. Our stacked chart with a legend telling the reader which color corresponds to which movie

Figure 4.24. A horizontal oriented colorLegend from d3-svg-legend rendered with custom settings for shapePadding, shapeWidth, and shapeHeight.

Chapter 5. Layouts

Figure 5.1. Multiple layouts are demonstrated in this chapter, including the circle pack (section 5.3), tree (section 5.4), stack (section 5.5), and Sankey (section 5.6.1), as well as tweening to properly animate shapes like the arcs in pie charts (section 5.2.3).

Figure 5.2. The histogram in its initial state before we change the measure from favorites to retweets by clicking on one of the bars

Figure 5.3. The processed data from d3.histogram returns an array where each array item also has an x0 and x1 field.

Figure 5.4. The histogram chart we’ve built will make an animated transition to display tweets binned by the number of retweets instead of the number of favorites.

Figure 5.5. Three violin plots based on the data produced by d3.histogram

Figure 5.6. The traditional pie chart (bottom right) represents proportion as an angled slice of a circle. With slight modification, it can be turned into a donut or ring chart (top) or an exploded pie chart (bottom left).

Figure 5.7. A pie layout applied to an array of [1,1,2] shows objects created with a start angle, end angle, and value attribute corresponding to the dataset, as well as the original data, which in this case is a number.

Figure 5.8. A pie chart showing three pie pieces that subdivide the circle between the values in the array [1,1,2]

Figure 5.9. A donut chart showing the number of tweets from our four users represented in the nestedTweets dataset.

Figure 5.10. The pie charts representing, on the left, the total number of favorites and, on the right, the total number of retweets

Figure 5.11. Snapshots of the transition of the pie chart representing the number of tweets to the number of favorites. This transition highlights the need to assign key values for data binding and to use tweens for some types of graphical transition, such as that used for arcs.

Figure 5.12. The streamgraph by Pitch Interactive used in a Wired piece describing the subject of calls to 311 (a city service for reporting problems) in New York (November 1, 2010; https://www.wired.com/2010/11/ff_311_new_york/all/1)

Figure 5.13. The stack layout default settings, when tied to an area generator, produce a stacked area chart like this one.

Figure 5.14. The streamgraph effect from a stack layout with basis interpolation for the areas and using the silhouette and inside-out settings for the stack layout. This is similar to our hand-built example from chapter 4 and shows the same graphical artifacts from the basis interpolation.

Figure 5.15. A stacked bar chart using the stack layout to determine the position of the rectangles that make up each day’s stacked bar

Figure 5.16. Google Analytics uses Sankey diagrams to chart event and user flow for website visitors.

Figure 5.17. A Sankey diagram where the number of visitors is represented in the color of the path. The flow between index and contact has an increased opacity as the result of a mouseover event.

Figure 5.18. A squid-like Sankey diagram

Figure 5.19. The Sankey layout algorithm attempts to optimize the positioning of nodes to reduce overlap. The chart reflects the position of nodes after (from left to right) 1 pass, 20 passes, 100 passes, and 200 passes.

Figure 5.20. A word or tag cloud uses the size of a word to indicate its importance or frequency in a text, creating a visual summary of text. These word clouds were created by the popular online word cloud generator Wordle (www.wordle.net).

Figure 5.21. A word cloud with words that are arranged horizontally

Figure 5.22. A word cloud using the same worddata.csv but with words slightly perturbed by randomizing the rotation property of each word.

Figure 5.23. This word cloud highlights keywords and places longer words horizontally and shorter words vertically.

Chapter 6. Hierarchical visualization

Figure 6.1. Some of the hierarchical diagrams we’ll look at in this chapter. The dendrogram (left), the icicle chart (middle), and a treemap (right) showing a radial projection that’s popular with hierarchical diagrams.

Figure 6.2. Typical A/B testing results in a tabular view, showing several metrics and the change versus the control cell. Positive changes are denoted with a plus symbol, and statistically significant changes are shown with green for a statistically signficant positive change and orange for a statistically significant negative change.

Figure 6.3. A circle pack of A/B testing results, showing the nested results by cell, metric, subscription level, and country. Green shows statistically significant wins, and orange shows statistically significant losses.

Figure 6.4. Hierarchical viz of A/B testing results ordered by cell, country, subscription, and then metric

Figure 6.5. Pack layouts are useful for representing nested data. They can be flattened (top) or they can visually represent hierarchy (bottom).

Figure 6.6. Each tweet is represented by a green circle nested inside an orange circle that represents the user who made the tweet. One of those green circles is exactly the same size as its parent orange circle, which we address below. The users are all nested inside a blue circle that represents our “root” node.

Figure 6.7. An example of a fixed margin based on hierarchical depth. We can create this by reducing the circle size of each node based on its computed depth value.

Figure 6.8. A circle-packing layout with the size of the leaf nodes set to the impact factor of those nodes

Figure 6.9. Tree layouts are another useful method for expressing hierarchical relationships and are often laid out vertically (top), horizontally (middle), or radially (bottom). (Examples from Mike Bostock at d3js.org.)

Figure 6.10. A dendrogram laid out vertically using data from tweets.json. The level 0 “root” node (which we created to contain the users) is in blue, the level 1 nodes (which represent users) are in orange, and the level 2 “leaf” nodes (which represent tweets) are in green.

Figure 6.11. A dendrogram with labels for each of the nodes

Figure 6.12. The same dendrogram as figure 6.11 but laid out horizontally

Figure 6.13. The same dendrogram laid out in a radial manner.

Figure 6.14. Example of using a dendrogram in a word tree by Jason Davies (www.jasondavies.com/wordtree/).

Figure 6.15. The same dataset rendered using d3.tree (left) and d3.cluster (right)

Figure 6.16. A partition layout of our data, showing tweets at the bottom in green, sized by “impact” with users in orange sized by the total impact of their tweets and the root node (in this case “All Tweets”) in blue.

Figure 6.17. Icicle charts look like melting icicles hanging from the gutter when you have hierarchical data of uneven depth, as you have in this example.

Figure 6.18. The most popular blocks on October 20, 2016, which include not one or two but four different sunburst diagrams

Figure 6.19. A sunburst version of our nested tweets.

Figure 6.20. An example of d3-flame-graph, which implements the flame graph first developed by Brandon Gregg. Note that the value of the children (in this case the higher bars) often adds up to less than the value of the parents.

Figure 6.21. A treemap without padding will only show the leaf nodes.

Figure 6.22. A treemap with the padding method set. Notice that padding determines the space between children and not siblings.

Figure 6.23. The “zoomed in” view of our treemap, showing only the leaf nodes in one of the intermediary node views. Note that the recalculated treemap has adjusted the padding because the orange node is now the root node.

Figure 6.24. A radial treemap accomplished by taking the drawing instructions from d3.treemap and using them to draw paths using d3.arc instead of svg:rect elements.

Chapter 7. Network visualization

Figure 7.1. Along with explaining the basics of network analysis (section 7.2.3), this chapter includes laying out networks using xy positioning (section 7.2.5), force-directed algorithms (section 7.2), adjacency matrices (section 7.1.2), and arc diagrams (section 7.1.3).

Figure 7.2. Some basic kinds of network connections (directed, reciprocated, and undirected) that show up in basic networks like simple directed and undirected networks

Figure 7.3. How edges are described graphically in an adjacency matrix. In this kind of diagram, the nodes are listed on the axes as columns, and a connection is indicated by a shaded cell where those columns intersect.

Figure 7.4. The array of connections we’re building. Notice that every possible connection is stored in the array. Only those connections that exist in our dataset have a weight value other than 0. Also note that our CSV import creates the weight value as a string.

Figure 7.5. A weighted, directed adjacency matrix where lighter orange indicates weaker connections and darker orange indicates stronger connections. The source is on the y-axis, and the target is on the x-axis. The matrix shows that Sarah, Nadieh, and Hajra didn’t give anyone feedback, whereas Kai gave Susie feedback, and Susie gave Kai feedback (what we call a reciprocated tie in network analysis).

Figure 7.6. Adjacency highlighting of the column and row of the grid square. In this instance, the mouse is over the Erik-to-Kai edge, and as a result highlights the Erik row and the Kai column. You can see that Erik gave feedback to three people, whereas Kai received feedback from four people.

Figure 7.7. The components of an arc diagram are circles for nodes and arcs for connections, with nodes laid out along a baseline and the location of the arc relative to that baseline indicative of the direction of the connection.

Figure 7.8. An arc diagram, with connections between nodes represented as arcs above and below the nodes. We can see the first (left) two nodes have no outgoing links, and the rightmost three nodes also have no outgoing links. The length of the arcs is meaningless and based on how we’ve laid the nodes out (nodes that are far away will have longer links), but the width of the arcs is based on the weight of the connection.

Figure 7.9. Mouseover behavior on edges (right), with the edge being moused over in orange, the source node in light green, and the target node in dark green. Mouseover behavior on nodes (left), with the node being moused over in orange and the connected edges in light orange.

Figure 7.10. The forces in a force-directed algorithm: attraction/repulsion, gravity, and link attraction. Other factors, such as hierarchical packing and community detection, can also be factored into force-directed algorithms, but the aforementioned features are the most common. Forces are approximated for larger networks to improve performance.

Figure 7.11. The results of a force simulation where the only force acting on the nodes is attraction

Figure 7.12. Our sample node data laid out with collision detection. This is one way to create a simple bubble chart.

Figure 7.13. A beeswarm plot created with our code (rotated to better fit on the page)

Figure 7.14. A force-directed layout based on our dataset and organized graphically using default settings in the force layout. Managers are in orange, employees green, and contractors purple.

Figure 7.15. Edges now display markers (arrowheads) indicating the direction of connection. Notice that all the arrowheads are the same size. You can control the color of the arrowheads by using CSS rules such as marker > path {fill: # 93C464;}.

Figure 7.16. The same network measured using degree centrality (top left), closeness centrality (top right), eigenvector centrality (bottom left), and betweenness centrality (bottom right). We’ll only see degree centrality, but you can explore the others with libraries like jsnetworkx.js. More-central nodes are larger and bright red, whereas less-central nodes are smaller and gray. Notice that although some nodes are central according to all measures, their relative centrality varies, as does the overall centrality of other nodes.

Figure 7.17. Sizing nodes by weight indicates the number of total connections for each node by setting the radius of the circle equal to the weight times 2.

Figure 7.18. By basing the strength of the attraction between nodes on the strength of the connections between nodes, you see a dramatic change in the structure of the network. The weaker connections between x and y allow that part of the network to drift away.

Figure 7.19. The two nodes representing managers have been dragged to the top corners, allowing the rest of the nodes to take their positions based on the forces of the simulation (being dragged toward the center along with being dragged toward the fixed nodes).

Figure 7.20. The network has been filtered to only show nodes that are not managers or contractors. This figure catches two processes in midstream, the transition of nodes from full to 0 opacity, and the removal of edges.

Figure 7.21. Network with a new edge added

Figure 7.22. Network with two new nodes added (Mike and Noah), both with links to Sam

Figure 7.23. When the network is represented as a scatterplot, the links increase the visual clutter. It provides a useful contrast to the force-directed layout, but can be hard to read on its own.

Chapter 8. Geospatial information visualization

Figure 8.1. Mapping with D3 takes many forms and offers many options, including topology operations like merging and finding neighbors (section 8.4), globes (section 8.3.1), spatial calculations (section 8.1.4), and data-driven maps (section 8.1) using novel projections (section 8.1.3).

Figure 8.2. A polygon drawn at the coordinates [–74.0479, 40.8820], [–73.9067, 40.8820], [–73.9067, 40.6829], and [–74.0479, 40.6829].

Figure 8.3. A map of the world using the default settings for D3’s Mercator projection. You can see most of the Western Hemisphere and some of Europe and Africa, but the rest of the world is rendered out of sight.

Figure 8.4. The Mercator-projected world from our data now fitting our SVG area. Notice the enormous distortion in size of regions near the poles, such as Greenland and Antarctica.

Figure 8.5. chartsgeospatial information visualizationprojections and areasgeospatial information visualization (mapping)projections and areasMollweideprojectionsareas andOur map with our eight world cities added to it. At this distance, you can’t tell how inaccurate these points are, but if you zoom in, you see that both of our Italian cities are in the Mediterranean.

Figure 8.6. Mercator (left) dramatically distorts the size of Antarctica so much that no other shape looks as large. In comparison, the Mollweide projection maintains the physical area of the countries and continents in your geodata, at the cost of distorting their shape and angle. Notice that geo.path.area measures the graphical area and not the physical area of the features.

Figure 8.7. Your interactivity provides a bounding box around each country and a red circle representing its graphical center. Here you see the bounding box and centroid of China. The D3 implementation of a centroid is weighted, so that it’s the center of most area, and not only the center of the bounding box.

Figure 8.8. Our map with a graticule (in light gray) and a graticule outline (the black border around the edge of the map)

Figure 8.9. Our map with zooming enabled. Panning occurs with the drag behavior and zooming with mousewheel and/or double-clicking. Notice that the bounding box and centroid functions still work because they’re based on our constantly updating projection.

Figure 8.10. Zoom buttons and the effect of clicking Zoom In twice. Because the zoom buttons modify the zoom behavior’s translate and scale, any mouse interaction afterward reflects the updated settings.

Figure 8.11. An orthographic projection makes our map look like a globe. Notice that even though the paths for countries are drawn over each other, they’re still drawn above the graticules. Also notice that although zooming in and out works, panning doesn’t spin the globe but instead moves it around the canvas. The coloration of our countries is once again based on the graphical size of the country.

Figure 8.12. A draggable globe that clips the cities based on whether they should be in view and recolors the countries based on their displayed size

Figure 8.13. Our globe with countries colored by their geographic area, rather than their graphical area

Figure 8.14. A satellite projection of data from the Middle East facing Europe

Figure 8.15. Arcs making up the counties of California and Nevada and neighboring states. You can see that the arcs are split whenever there is a possibility that they could be used in a different polygon. As a result, the 17 arcs making up the border of California and Nevada are used not only in the polygons making California and Nevada but also the polygons making their counties. Because the dataset knows the arcs are shared, it can easily derive neighbors.

Figure 8.16. TopoJSON data formatted using Topojson.feature(). The data is an array of objects, and it represents geometry as an array of coordinates like the features that come out of a GeoJSON file.

Figure 8.17. The results of merging based on the centroid of a feature. The feature in gray is a single merged feature made up of many separate polygons.

Figure 8.18. By adjusting the merge settings, we can create something like northern and eastern hemispheres as merged features. Notice that because this is based on a centroid, we can see at the bottom a piece of Eastern Russia as part of our merged feature, along with Antarctica.

Figure 8.19. Hover behavior displaying the neighbors of France using TopoJSON’s neighbor function. Because Guyana is an overseas department of France, France is considered to be neighbors with Brazil and Suriname. This is because France is represented as a multipolygon in the data, and any neighbors with any of its shapes are returned as neighbors.

Figure 8.20. An example of hexbinning by Mike Bostock showing the locations of Walmart stores in the United States (available at http://bl.ocks.org/mbostock/4330486).

Figure 8.21. An example of a Voronoi diagram used to split the United States into polygons based on the closest state capital (available at www.jasondavies.com/maps/voronoi/us-capitals/).

Chapter 9. Interactive applications with React and D3

Figure 9.1. Throughout this chapter, we’ll build toward this fully operational data dashboard, first creating the individual chart elements (section 9.1), then adding interactivity (section 9.2), and finally adding a brush to filter the data (section 9.3).

Figure 9.2. A sketch of a dashboard, showing a map, bar chart, and stacked area chart that display our data

Figure 9.3. The default page that create-react-app deploys with

Figure 9.4. Your first React + D3 app, with a simple bar chart rendered in your app

Figure 9.5. The basic map we saw in chapter 8 but now rendered via React and JSX with D3 providing the drawing instructions

Figure 9.6. Our rendered WorldMap component, with countries colored by launch day

Figure 9.7. A rudimentary dashboard with two views into the data. The bars are ordered by our fake “Launch Day,” and sometimes the randomized data shows interesting patterns like the dark green bar showing a higher total sales than countries that launched 10 days earlier.

Figure 9.8. A typical dashboard, with three views into the same dataset. In this case, it shows MatFlicks launch waves geographically while summing up sales in the bar chart and showing sales over time in the streamgraph.

Figure 9.9. The same dashboard on a large screen and a small screen

Figure 9.10. onMouseEnter propertyOur MatFlicks table mat rollout dashboard, now with a legend to show your users which countries are in which wave of launches

Figure 9.11. Canada as our cross-highlighting example. It was earlier in the alphabet and as a result millions of Canadians were enjoying high quality European mats long before citizens of the United States could.

Figure 9.12. Components of a brush

Figure 9.13. Here’s our brush, though without any function associated with the brush events, so it’s little more than a toy. A boring, boring toy.

Figure 9.14. The results of our brushed() function showing only wave 1 and 2, then wave 3, and finally wave 4 countries.

Figure 9.15. Our final dashboard, showing a statline at the top indicating the number of countries we’ve selected out of the total number of countries in the data as well as the average sales of the selected countries compared to the average sales overall. Here it’s resized to be smaller and because we don’t resize the map, we only see North America.

Chapter 10. Writing layouts and components

Figure 10.1. The results of our makeAGrid function that uses our new d3.gridLayout to arrange the data in a grid. In this case, our data consists of employees that are each represented as a green circle laid out on a grid and size by salary.

Figure 10.2. The grid layout has automatically adjusted to the size of our new dataset. Notice that our new elements are above the old elements, but our layout has changed in size from a 4 x 4 grid to a 5 x 5 grid, causing the old elements to move to their newly calculated position.

Figure 10.3. The grid layout run with a 400 x 400 size setting

Figure 10.4. The grid layout run in a 200 x 200 size (left) and a 400 x 200 size (center), and a 200 x 400 size (right)

Figure 10.5. The three states of the grid layout using rectangles for the grid cells

Figure 10.6. The countries of the world as a grid

Figure 10.7. Circles representing countries colored by area

Figure 10.8. The new legend component, when called by a <g> element placed below our grid, creates five red rectangles.

Figure 10.9. The updated legend component is automatically created, with a <rect> element for each band in the quantize scale that’s colored according to that band’s color.

Figure 10.10. The legendOver behavior highlights circles falling in a particular band and deemphasizes the circles not in that band by making them transparent.

Figure 10.11. Our legend with rudimentary labels

Figure 10.12. Our legend with title, unit labels, appropriate number formatting, and additional graphical elements to highlight the breakpoints

Chapter 11. Mixed mode rendering

Figure 11.1. This chapter focuses on optimization techniques such as using canvas drawing to render large datasets in tandem with SVG for the interactive elements. This is demonstrated with maps (section 11.1), networks (section 11.2), and traditional xy data (section 11.3), which uses the D3 quadtree function (section 11.3.2).

Figure 11.2. Violin plots drawn using canvas. You can see that they’re more pixelated.

Figure 11.3. Two zoomed-in shapes, one rendered in SVG (left) and one rendered with canvas (right)

Figure 11.4. Drawing random triangles on a map entirely with SVG

Figure 11.5. Zooming in on the sample geodata around East Asia and Oceania

Figure 11.6. Drawing our map with canvas produces higher performance, but slightly less crisp graphics. On the left, it may seem like the triangles are as smoothly rendered as the earlier SVG triangles, but if you zoom in as we’ve done on the right, you can start to see clearly the slightly pixelated canvas rendering.

Figure 11.7. Placing interactive SVG elements below a <canvas> element requires that you set its pointer-events style to none, even if it has a transparent background, in order to register click events on the <svg> element underneath it.

Figure 11.8. Background countries are drawn with canvas, while foreground triangles are drawn with SVG to use interactivity. SVG graphics are individual elements in the DOM and are therefore amenable to having click, mouseover, and other event listeners attached to them.

Figure 11.9. The same randomly generated triangles rendered in SVG while the map isn’t being zoomed or panned (left) and in canvas while the map is being zoomed or panned (right). Notice that only the SVG triangles have different fill values based on user interaction, because that isn’t factored into the canvas drawing code for the triangles on the right.

Figure 11.10. A network of D3 examples hosted on gist.github.com that connects different examples to each other by shared functions. Here you can see that the example “Bivariate Hexbin Map” by Mike Bostock (http://bl.ocks.org/mbostock/4330486) shares functions in common with three different examples: Metropolitan Unemployment, Marey’s Trains II, and GitHub Users Worldwide. The brush and axis components allow you to filter the network by the number of connections from one block to another.

Figure 11.11. A randomly generated network with 3,000 nodes and 1,000 edges

Figure 11.12. A large network drawn with SVG nodes and canvas links

Figure 11.13. 3,000 randomly placed points represented by orange SVG <circle> elements

Figure 11.14. Highlighting points in a selected region

Figure 11.15. A quadtree for points shown in red with quadrant regions stroked in black. Notice how clusters of points correspond to subdivision of regions of the quadtree. Every point falls in only one region, but each region is nested in several levels of parent regions.

Figure 11.16. Quadtree-optimized selection used with a dataset of 10,000 points

Figure 11.17. The test to see whether a quadtree node is outside a brush selection involves four tests to see if it’s above, left, right, or below the selection area. If it passes true for any of these tests, the quadtree will stop searching any child nodes.

Table of Contents for D3.js in Action, Second Edition: Data visualization with JavaScript

List of Figures

Table of Contents for
D3.js in Action, Second Edition: Data visualization with JavaScript