Thanks to a retweet by Beth Fox, I came across Daniel Göransson‘s 2017 post, Alt-texts: The Ultimate Guide. It’s well worth reading in full, but I want to summarize why I think that.
Göransson is visually impaired, and his screen reader comes across lots of what he calls alt-text-fails, like image file names or photographer credits. As he says:
An alt-text is a description of an image that’s shown to people who for some reason can’t see the image…
Alt-texts are super important! So important that the Web Content Accessibility Guidelines (WCAG) have alt-texts as their very first guideline…
For a long time on my blogs, including this one, I treated alt-text as a way to show a kind of cleverness: many of the images here have alt-text that was intended as a pun or a side comment; it doesn’t describe the image at all. So it was no help to anyone.
Describing: it depends
Göransson points out what should be obvious: what you put in the alt-text description depends on context. Here’s an image from his post:
If the context was about a TV series, he says, the alt-text might be:
Star of the show, Adam Lee, looking strained outside in the rain.
In an article on photography, though:
Close up, grayscale photograph of man outside, face in focus, unfocused background.
The top items for me from Göransson:
Describe the image in context.
Keep it concise.
Don’t say it’s an image (as in, don’t put “image of” or similar in the alt-text).
End with a period (so the screen reader will pause for a moment).
Always include text labels with icons.
That last point is about alt-text, in a sense: he’s saying put text on screen next to an icon, and not in alt-text. Icons shouldn’t stand alone, as users who didn’t create the icons could tell you. (That’s a whole other story, as this link from his article illustrated.)
(based on my article for ATD’s Science of Learning blog; Part 1 of 2)
Trying to learn important information through multimedia can feel like driving through a strange city for a big job interview.
As the learning designer, you’re not the one heading to the interview, but you do select the car and choose the route. You not only give directions, but also mark the lanes, erect street signs, and string up the traffic signals. Whatever roads and vehicles the driver encounters, you put there. And if you make mistakes, the driver won’t arrive on time or do well in the interview.
With that happy thought, let’s discuss how to manage cognitive load. In other words, how can you reduce demands on working memory and maximize a learner’s chances of success?”
Research suggests—with a very loud “ahem!”—that multimedia razzle-dazzle can actually work against effective learning. Even background music can interfere with success, the way sound from the car radio makes it harder for you to navigate through a work zone. In “Nine Ways to Reduce Cognitive Load in Multimedia Learning,” Richard E. Mayer and Roxana Moreno explain what they mean by cognitive load and offer a three-part theory for how to make information meaningful.
Meaningful learning: how it happens, how it doesn’t
Multimedia learning, according to Mayer and Moreno, involves delivering information through words (printed or spoken) and images (drawings, photos, animations, videos). By “meaningful learning,” they mean you’re able to apply that information to a new situation.
How that happens, they say, is affected by three factors:
The dual-channelassumption says that we handle incoming information through two channels: one for words and one for images.
The active processingassumption says that we need to do significant mental work in order to learn. We decide what to pay more attention to. And, for things that make the cut, we go on to figure out what they mean and how they interact. This processing is how we create a mental construct for what we’re learning, and connect that construct to existing knowledge.
The limited capacity assumption says that we can only work with so much at a time in a cognitive channel. We can only handle so many words or so many images at a time.
Assuming you’ve done some active processing with those three points, you can already see the implications. Learning is challenging enough; the way we present information through words and images can help or hinder.
Mayer and Moreno also identify five ways overload can happen, and they present strategies to overload. I’ll discuss three here, and two more in the next post.
Overload in a single channel
Imagine that a section of your multimedia lesson has most of its information in a single channel–say, a large block of text. Let’s say it’s all necessary information. In fact, because it’s necessary, you decide to include a voiceover to reinforce the print message.
You’re asking the learner to read and listen at the same time. The two streams of verbal information — printed text and spoken words — compete for working-memory resources and can overwhelm the verbal channel.
Assuming all the information truly is relevant, Mayer and Moreno suggest off-loading some content: move some from verbal to visual. Use images to anchor key concepts, reduce the printed text, and let the audio channel carry the message. “Students understand a multimedia explanation better when the words are presented as narration rather than as on screen text,” write Mayer and Moreno.
Remember our interview candidate? She navigates traffic more smoothly with a GPS that combines spoken directions with a graphic map—far more so than if she had highly detailed, text-only directions.
What happened in the researchers’ experiments? One way to express the strength of an outcome is through “effect size.” Using one common measure, Cohen’s d, an effect size of 0.1 – 0.3 would be small, 0.3 – 0.5 would be moderate, and greater than 0.5 would be significant.* In six experiments involving offloading, Mayer and Moreno report an effect size of 1.17.
* In Cohen’s terminology, a small effect size is one in which there is a real effect — i.e., something is really happening in the world — but which you can only see through careful study. A ‘large’ effect size is an effect which is big enough, and/or consistent enough, that you may be able to see it ‘with the naked eye’.
What if both channels, verbal and visual, have too much essential information? No matter how much you need to cover or how elegant the presentation, too much is too much. When the learner can’t process everything, she can’t organize the input into a useful mental model, let alone integrate it with what she already knows.
Again, our driver trying to make the interview can’t easily cope simultaneously with a nagging GPS, unfamiliar street signs, shifting traffic, and a message board displaying cryptic data about a detour—even though it’s all important.
Mayer and Moreno offer two solutions. One is to segment content; break material into smaller pieces, and allow the learner to decide when to move on. An experiment broke a three-minute segment into 16 segments, linked by CONTINUE buttons. Compared with a control group, students who could choose when to continue, thus taking the time they wanted with the current segment, performed substantially better.
When segmenting won’t work, a second solution is to offer pre-training, which means providing some information ahead of time, such as the names or functions of major parts. In order to build a mental model of what you’re learning, you need a component model (how each major part works), and a causal model (how the parts affect each other). Pre-training gets you to the component model faster so it’s easier to construct your causal model.
Suppose our interview candidate has traveled to Washington, D.C. Before she gets her car, she might learn the different names for the most important freeway (I-495, I-95, the Beltway) and the meaning of “Inner Loop” and “Outer Loop.” That could help her negotiate the trip from Dulles airport to Bethesda.
Part 1 (here) deals with how we process information through two channels (one for words, one for images), and how overload can occur in one channel or in both.
Overload from extraneous information
(Spoiler alert: “Nice to know” doesn’t mean “good to include.”)
Mayer and Moreno point out that “interesting but extraneous material” takes up cognitive capacity. The learner has to pay some attention—for instance, it’s hard to not listen to background music. Effort goes into deciding whether anything deserves further attention. The more this happens, the less capacity remains for learning what actually does matter.
You probably can guess what the researchers recommend: weeding. Remove the extraneous. What’s the bare minimum that people need to know in order to accomplish the skill or apply the knowledge? Force everything else to justify its inclusion.
In an animated sales-call lesson, for example, I don’t need to see the customer driving in. I don’t need an animated phone, virtual pens, and virtual paper clips. I do need a customer statement to respond to. I need time to analyze it. I need clear examples of responses and how effective they are in a situation like the one I’m seeing.
To me, the weeding of nonessential material is the difference between the rich but irrelevant detail of a war story and the crisp relevance of a pertinent example. Our interview candidate probably doesn’t need to know that there’s a library two blocks before she gets to Midcounty Highway; she does need to know when she gets there, the two right lanes are right-turn-only.
Granted, sometimes you can’t edit details out. Suppose you’re explaining how to operate packaging machinery in a pharmaceutical plant. Your learner will confront lots of equipment and lots of steps, along with potentially overwhelming detail in the video close-ups.
When weeding is not an option, Mayer and Moreno recommend is signaling—providing cues to the learner about how to organize the material. So, the lesson might start by breaking packaging into four stages: product into plastic blisters, blisters into cardboard wallets, wallets into carton packs, cartons into cases. In subsequent lessons, arrows or similar highlighting emphasize key components of each stage.
Overload from poor presentation
Sometimes overload results from the confusing presentation of essential information. Imagine an animation in one part of a screen and related text in another. The learner has to shift focus between the two areas, as well as figure out which parts are related to which.
Mayer and Moreno recommend closer alignment of words and pictures. Placing text inside a graphic, rather than alongside as a caption, aligns the explanation more closely with the visual for what’s being explained.
In a related situation, information arrives as animation, onscreen text, and audio narration. The simultaneous presentation of text and narration, which the researchers call redundant presentation, requires the learner to work at reconciling the two verbal forms while also dealing with the visual form. It’s as if our interview candidate were watching an animation of the route to follow and reading directional text while the person next to her recited those directions.
Mayer and Moreno cite studies with a significant shift as a result of reducing redundancy, such as dropping onscreen text and using only narration. An interesting twist they add is that if there’s no animation, students learn better from concurrent narration and on-screen text than from narration alone. The interpretation is that the on-screen text by itself doesn’t overload the visual channel the way it would with the animation there as well.
Overload from “Hold that thought!”
The final type of cognitive overload involves both essential processing and “representational holding.” Mayer and Moreno explain that as having to retain information in working memory. For example, if you read about the thermoforming process for drug packaging, and then watch a video showing the process, you have to keep elements of that text in memory during the video, which reduces your ability to select, organize, and integrate.
One way to avoid this overload is to synchronize—interweave text or audio with the video. Words about the sealing step should arrive as the visual does; a description of the check-weigher should come while the learner sees that device in action.
Researchers cite robust evidence that “students understand a multimedia presentation better when animation and narration are presented simultaneously rather than successively.” Meanwhile, Mayer and Moreno point out that if the non-synched elements are brief—a few seconds of narration followed by a few seconds of animation—there’s less overload, mostly likely because the learner has less representational holding to do: fewer things to keep in mind from the verbal information.
But what if you can’t synchronize? Then, the recommendation is individualization, or ensuring that you have learners skilled at holding things in memory. If for your work you’re able to match “high-quality multimedia design with high-spatial learners,” you’re all set. Personally, I’m rarely able to manage that.
Final thoughts
I started by comparing the multimedia learner to someone who has to drive through a strange city to make an interview. Mayer and Moreno highlight ways that your design decisions can make that trip far more difficult than necessary. Pick up some learning principles and lessons from this research—and take off a little cognitive load.
What is it? A line-by-line analysis of the second verse of Jay-Z’s song “from the perspective of a criminal procedure professor. It’s intended as a resource for law students and teachers, and for anyone who’s interested in what pop culture gets right about criminal justice, and what it gets wrong.”
It’s a terrific example of a focused, detailed explanation by a technical expert who’s also a teacher. Mason uses the specifics of the song (and of Jay-Z’s experience related to it) to highlight principles, legal issues, and practical problems related to the fourth amendment.
Amendment IV
The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.
As Mason says, “When I teach the Fourth Amendment, I ask my students what the doctrines tell us about, on the one hand, how to catch bad guys and not risk suppression [of evidence obtained improperly], and on the other, how to avoid capture or at least beat the rap if not the ride.”
A few parts of the interview that stood out for me:
If a learner doesn’t enjoy the learning experience, even if it’s effective and/or efficient, they won’t do it. The same is true for teaching: that is it must also be effective, efficient, and enjoyable for the teacher because if a teacher doesn’t enjoy the teaching process, even if it’s effective and/or efficient, they won’t do it.
Kirschner is talking about formal education, though I think this absolutely applies in the world of organizational learning and development as well. I strongly believe in the value of learning by doing, and of using realistic, rich practice problems — but in my experience if an organization hasn’t done those things often, people can resist such approaches because they don’t “look like” good training, or because they seem unnecessarily difficult, or because the learner is eager to get to the point (as he sees it) and wants to be told what to do and when to do it.
…Sweller’s cognitive load theory suggests that you should not present the exact same information in two modalities – for example, reading directly from a slide… And yet, many researchers who should know better will still do this. The best way to translate research is in your own teaching – why did you study it if you’re not going to use it?
Kirschner’s presenting Sweller’s redundancy principle here as an example. I’d extend the target group from researchers to practitioners: learning professionals should look for ways to put into practice the theories they espouse — or at the least to ask themselves why they practice what they practice.
Many multimedia instructional presentations are still based on common sense rather than theory or extensive empirical research. Visual formats tend to be determined purely by aesthetic considerations while the use of sound and its interaction with vision seems not to be based on any discernible principles.
(Managing Split-attention and Redundancy in Multimedia Instruction — Kalyuga, Chandler, Sweller)
The interviewers asked Kirschner how to challenge misconceptions in education. On the one hand, he encourages those who train teachers to connect their research to something that teachers have experienced — in other words, to find a starting point based on where the teacher has been.
“Don’t ever say ‘because research shows X’ — this is a conversation killer.”
This is a marvelous point for a researcher to make, and one I need to put into practice more often. I’ve gradually learned not to argue with advocates of learning styles, in part they’re no more interested in freelance criticism of what they believe is effective than I typically am.
A sidebar in the post says that Kirschner describes himself as “an educational realist and grumpy old man.” That may be the case, but in the interview and in his writings, I note as well his search for evidence and his optimism that practitioners will adopt strategies and techniques based on that evidence–and will experience success when they do.