Usability Testing

The most direct way to find out how your product actually works for the people using it, by watching real participants attempt real tasks.

Overview

The premise is simple: watch someone try to use something, and pay attention to what happens. No matter how much thought your team has put into a design, real users will do things you didn't anticipate. They'll miss the button you thought was obvious, take a path through the product that makes complete sense to them and none to you, and get stuck exactly where you assumed no one would.

That gap between what the team imagined and what users actually do is where usability testing lives. It's not a focus group or a survey. It's not someone telling you what they think of the design. It's behavioral observation, and what people do is almost always more informative than what they say they would do.

You can run usability tests at nearly any stage of a project. A rough prototype reveals just as much, sometimes more, than a polished one. Finding a fundamental navigation problem in a clickable wireframe costs a few hours of iteration. Finding the same problem after launch costs far more.

The most common misconception is that you need a large sample to get useful results. Research by Jakob Nielsen and Rolf Molich has long held that five participants typically surface the majority of a product's usability issues. That number isn't gospel, but the principle holds: testing with a few well-chosen participants beats waiting to test with many.

When to Use It

Before development starts, when you want to pressure-test a prototype before anyone writes a line of code.
When a redesign is underway and you need to confirm the new direction actually solves the problems the old one had.
When analytics tell you something is wrong (high drop-off, repeated support tickets about the same feature) but not why.
When you're comparing two design directions and need behavioral evidence to settle it.
When stakeholders need convincing: observed failure is harder to argue with than a designer's recommendation.

Skip it when you don't yet have a clear research question. Usability testing without defined tasks is just a demo. Know what you want to learn before you recruit.

How It Works

A usability test is built around tasks: specific things you ask a participant to accomplish using the product. Tasks should reflect what real users actually do, not steps you invented to showcase features. "Find a pair of running shoes under $100 and add them to your cart" is a task. "Explore the product catalog" is not.

Sessions are typically moderated (a facilitator is present, guiding the session in real time) or unmoderated (participants work independently with their screens recorded). Moderated sessions let you probe for reasoning in the moment and handle ambiguity. Unmoderated sessions are faster to run and better for validating straightforward interactions across a larger group.

Participants are asked to think aloud as they work: narrating what they're doing, what they expect to happen, what's confusing them. This gives you access to their mental model, not just their clicks. The facilitator's job is to observe and prompt, never to help. The moment you explain or redirect, the test is over for that task. Failure to complete a task is data.

Tips

Pilot your script with a teammate before running real sessions. Scripts that feel clear in writing often have gaps once someone is actually doing the tasks.

Keep tasks to three or four per session. Past that, participants get fatigued and the quality of observation drops.

Separate what participants do from what they say. Behavior is primary. Post-task commentary is context. They don't always agree, and when they conflict, trust the behavior.

Recruit participants who reflect your actual users. Testing with colleagues or design-literate people will suppress the failures you most need to see.

Debrief your observation team immediately after each session while recall is fresh. Don't batch all your synthesis to the end of the study.

The Output

Usability testing produces behavioral evidence: task completion rates, observed friction points, direct quotes, and (if recorded) video clips that make failures concrete and hard to dismiss. This typically gets organized into a findings report with severity ratings so your team can triage what to address first.

The output feeds directly into design iteration. It pairs well with SUS scores or CSAT data, which add a quantitative layer on top of the qualitative behavior you've observed.

Related Methods

Heuristic Review: Comes before. Use a heuristic review to identify likely problem areas before recruiting participants, so you can build sharper, more targeted tasks.
Interviewing: Runs alongside. Brief post-task questions help you understand not just what happened but why.
SUS: Comes after. Add a System Usability Scale questionnaire at the end of each session to capture a usability score alongside your qualitative findings.
CSAT: Comes after. Useful for measuring whether design changes you made based on testing findings actually moved satisfaction.
Affinity Mapping: Comes after. When you have notes from multiple sessions, affinity mapping is the right tool for surfacing patterns across them.