Chapter 10. Vision: Are You Looking at Me?

Figure 10.1
Now that we’ve discussed how to conduct contextual interviews and observe people as they’re interacting with a product or service, I want to think about how those interviews can provide important clues for each of the Six Minds.
I’d like to start by looking at this from a vision/attention perspective. In considering vision, we’re seeking to answer these questions:
In this chapter, we’ll discuss not only where customers look and what they expect to see when they look there, but also what this data suggests about what is visually salient to them. We’ll consider whether users are finding what they are hoping to, what their frame of reference is, and what their goals might be.
Where are their eyes: Eye-tracking can tell you some things, but not everything
When it comes to improving interfaces or services, we start with where participants are actually looking. If we’re talking about an interface, where are users looking on the screen? Or where are they looking within an app?
Eye-tracking devices and digital heat maps come in handy for this type of analysis, helping us see where our users are looking. This sort of analysis can help us adjust placement of our content on a page.

Figure 10.2: Moderating contextual interview
But you don’t always need eye-tracking if you use good old-fashioned observation methods like those we discussed in the previous chapter. When I’m conducting a contextual interview, I try to set myself up at 90 degrees to the participant (so that I’m a little bit behind them without creeping them out) for several reasons:
Speaking of where people’s eyes are, I’d like to show you a representation of what your eyes use to draw attention to the next location in space.

Figure 10.3: What your visual attention system sees from an image.
The image above shows two screens side-by-side from an electronics company — blurred out a bit, with the color toned down. This is the type of representation your visual system uses to determine where to look next.
In this image on the left there are four watches, with two buttons below each watch. Though you can tell these are buttons, it’s not clear from the visual features and this level of representation which is the “buy” button and which is the “save for later” button. The latter should appear as a secondary button, yet it currently draws an equal amount of attention as the “buy” button. That’s something we would work with a graphic designer to adjust.
Similarly on the right panel, the checkout screen, the site showed several buttons for things like commenting, checking on shipping status, and actually making the purchase. By graying out this picture, you can see how incredibly subtle these buttons were, and with that they have little variation between them. By blurring out images of your designs and toning down the color, you can get a good sense of what’s going to be successful in terms of your user finding things.
What lens must they be looking through to see that?
We’ve talked about the bottom-up drivers of attention, like visual features of a scene that are unique: unusual sizes, areas of higher visual contrast, distinct colors, large images, and other features that draw people’s attention. The second step of the visual analysis employs a top-down approach. Here, you should consider not only what users are seeing, but what they’re actually seeking, attending to, processing, and perceiving.
Case Study: Security Department
Challenge: Even though many of my examples are of digital interfaces, we as designers also need to be thinking about attention more broadly. In this case, I worked with a group of people with an enormous responsibility: monitoring security for a football stadium-sized organization (and/or an actual stadium).
Their attention was divided in so many ways. Here are all the systems and tools (along with their respective numerous alerts, bells, and beeping sounds) they monitored at any given time:
If you’re impressed that anyone could get work done in such a busy environment, you’re not alone; I was shocked (and a bit skeptical of whether all these noisy systems were helping or hurting their productivity). Here was an amazing challenge of divided attention, far more distracting than an open office layout (which many people find distracting).
Recommendation: With huge visual and auditory distractions in play, we had to distil the most important thing that they should be attending to at each moment. My team developed a system very similar to a scroll-based Facebook news feed, except with extreme filtering to ensure relevancy of the feed (no cat memes here!). Each potential concern (terror, fire, door jams, etc.) had its own chain of action items associated with it, and staff could filter each issue by location. The system also included a prominent list of top priorities – at that moment – to help tame the beastly amount of items competing for staff’s attention. It has one scroll and could be set to focus on a single topic or all topics, but only when the topics rose to a specific level of importance. As a result, staff knew where to look and what the (distinct) sound of an alert sounded like.
Quick, get a heat map … well …
Eye gaze heat maps can show us where our users’ eyes are looking on an interface. We can get a representation of the total time people are looking on the screen to be “hotter” in some locations than others.
Case Study: Website Hierarchy

Figure 10.5: Heatmaps
Challenge: In the case of this site (comcast.net, the precursor to Xfinity), consumers were overwhelmingly looking at one area in the upper left-hand corner, but not further down the page, nor the right-hand side of the page. We knew this both from eye-tracking and the fact that the partner links further down the page weren’t getting clicks (and were not happy about that). The problem was that the visual contrast. The upper left of the old page was visually much darker than the rest of the page and more interesting (videos, images), so much so that so that it was overwhelming people’s visual attention system.
Recommendation: We redesigned the page to make sure that the natural visual flow included not only the headlines, but the other information down below to help. We gave more visual prominence to the neglected sections of the page through balancing features like visual contrast, size of pictures, color, fonts, and white space. We were able to draw people down visually to engage “below the fold.” This made a huge difference in where people looked on the page, making end users, Comcast, and its paid advertising partners much happier.
The case study above shows you how helpful tools like eye tracking and heat maps can be. But I want to counter the misperception that these tools on their own are enough for you to make meaningful adjustments to your product. Similar to the survey results and usability testing that I mentioned in the last chapter, heat maps can only provide you with a lot of the what, but not the why behind a person’s vision and attention. The results from heat maps do not tell you what problem users are trying to solve.
To get at that, we need to …
Go with the flow.
We’re trying to satisfy the customers’ needs as they arise, and so we want to know at each stage in problem solving what our users are looking for, what they’re expecting to find, and what they’re hoping to get as a result. Then we can match the flow and with what they’re expecting to find at each stage of the process.
While observing someone interact with a site, I’ll often ask them questions like “What problem are you trying to solve?” and “What are you seeing right now?” This helps me see what’s most interesting to them, at this moment and understand their goals.
There are many unspoken strategies and expectations that users are employing, which is why we can only learn through observing users in their natural flow. These insights, in turn, help us with our visual design, layout, and information architecture (i.e., what are the steps, how should they be represented, where should they be in space, etc.).
Case Study: Auction Website
Challenge: Here’s an example of some of those unspoken expectations that we might observe during contextual interviews. In testing the target audience for a government auction site (GovAuction.gov), I heard the feedback “Why doesn’t this work like eBay?” Even though this site was even larger than eBay, our audience was much more used to eBay, and brought their experience and related expectations regarding how eBay worked to their interactions with this new interface.
Eye-tracking confirmed users’ expectations and confusion: they were staring at a blank space beneath an item’s picture and expecting a “bid” button to appear, since that’s where the “bid” button appears on eBay items. Even though the “bid” button was in fact present in another place, users didn’t see it because they expected it to be in the same location as the eBay “bid” button.
Recommendation: This was one case when I had to encourage my client not to “think different,” but rather admit that other systems like eBay have cemented users’ expectations about where things should be in space. We switched the placement of the button (and a few other aspects of the eBay site architecture) to match people’s expectations, immediately improving performance. This story also exemplifies the lenses I was talking about earlier. We knew where they were looking for this particular feature, and we knew they didn’t find it in that location. This wasn’t because of language or the visual design, but because of their experience with other similar sites and associated expectations.
Research Examples
I don’t know if you’ve had a chance to put my sticky note categorization method into practice yet, but I’d like to share some examples of the findings I’ve noted in the previous chapter from clients interacting with both a video-streaming website and an e-commerce website. These will give you a sense of what we’re looking for when subdividing data according to the Six Minds; in this case, focusing on vision and attention. Remember, there’s often overlap, but I’m most concerned with the biggest problem underlying each comment.

Figure 10.6

Figure 10.7

Figure 10.8

Figure 10.9

Figure 10.10
Warning against literalism #1: In reviewing your findings, you’re going to see a lot of comments about “seeing,” “finding,” “noticing,” etc. Such words might suggest vision, but beware of placing such findings in the “vision” category automatically! In reviewing each finding, ask yourself if it implies an expectation of how things should be (memory), how to navigate through space (wayfinding), how familiar the user is with the product (language), before automatically putting that observation in the vision category.

Figure 10.11

Figure 10.12

Figure 10.13
Once I’ve reviewed all of my customers’ feedback and distilled the major problem to address, I can provide this to the visual design team with quite specific input and recommendations for improvement.
Concrete recommendations: