NASA TLX

A multi-dimensional assessment tool that measures perceived workload across six scales, revealing whether a design actually reduces the cognitive burden of a task.

Overview

Usability testing tells you whether someone can complete a task. SUS tells you how they feel about the experience overall. Neither tells you how hard they had to work to get through it. That's where the NASA Task Load Index earns its place.

Developed by Sandra Hart at NASA's Ames Research Center in the 1980s, TLX was originally designed to measure pilot workload during flight simulations. It has since been validated across hundreds of domains, from surgical procedures to software interfaces to military operations. The instrument works because it doesn't treat workload as a single thing. It breaks the experience into six dimensions: mental demand, physical demand, temporal demand, performance, effort, and frustration. A task might score low on physical demand but high on mental demand and frustration, and that profile tells you something a single number never could.

What makes TLX particularly useful for product design is its sensitivity to iteration. Run it after each round of usability testing and you get a workload trajectory. You can see whether your design changes are actually reducing the cognitive cost of a task or just rearranging where the burden falls. A redesigned notification system might reduce frustration but inadvertently increase mental demand if the new information architecture requires more interpretation. TLX catches that trade-off.

When to Use It

After usability testing sessions where you want to measure not just success rates but how much effort success required.
When comparing two design approaches that both "work" but may impose different cognitive costs on users.
When you're iterating on a design across multiple rounds and need a quantitative signal that workload is actually decreasing.
When the task environment is high-stakes (military, medical, industrial) and cognitive overload has real consequences beyond user satisfaction.

Skip it when you have fewer than 8 to 10 participants per round, when the tasks being tested are too short or simple to generate meaningful workload variation, or when you're in early generative research where the goal is to explore problems rather than evaluate solutions.

How It Works

Immediately after a participant completes a task, hand them the TLX questionnaire. It presents six scales, each a 21-point range from "Very Low" to "Very High" (except Performance, which runs from "Perfect" to "Failure"). The six dimensions are:

Mental Demand: How much thinking, deciding, calculating, remembering, looking, and searching was required?
Physical Demand: How much physical activity was required? (Often low for software, but not always.)
Temporal Demand: How much time pressure did the participant feel?
Performance: How successful does the participant believe they were at accomplishing the task?
Effort: How hard did the participant have to work (mentally and physically) to achieve their level of performance?
Frustration: How insecure, discouraged, irritated, stressed, or annoyed did the participant feel?

Each scale produces a score from 0 to 100. You can report individual dimension scores (which is more diagnostic) or compute an overall workload score by averaging across all six dimensions.

The original TLX protocol includes a weighting step where participants do pairwise comparisons of the six dimensions to indicate which contributed most to their workload. Many practitioners skip this step (using what's called "Raw TLX") because research has shown it adds time without significantly changing the results. For design research purposes, Raw TLX is usually sufficient.

Tips

Administer TLX immediately after task completion, before any debrief or discussion. You want the felt experience, not a rationalized reconstruction of it.

Keep the scales consistent across rounds. If you change the task, the timing, or the context between measurements, you've compromised the comparison.

Report dimension scores individually, not just the overall average. The profile across dimensions is where the actionable insight lives. An overall score of 55 could mean moderate load across the board, or it could mean very low physical demand masking very high frustration. The profile tells you what to fix.

Pair TLX with behavioral observation. A participant might score low on frustration but you watched them hesitate for 30 seconds at a decision point. The combination of what they report and what you observe gives you the full picture.

When presenting TLX results to stakeholders, focus on the trajectory across rounds. A chart showing mental demand dropping from 72 to 38 over three iterations is more compelling than any single score.

The Output

A set of workload scores across six dimensions for each participant on each task. These scores are typically averaged across participants and compared across testing rounds to show whether design changes are reducing cognitive load.

TLX results feed directly into design iteration priorities. If frustration is dropping but mental demand is still high, you know the emotional experience is improving but the information architecture still needs work. That specificity is what makes TLX more actionable than a general satisfaction score.

Related Methods

Usability Testing: Comes before. TLX is administered at the end of a usability test task, making the two inseparable in practice.
SUS: Runs alongside. SUS captures overall perceived usability after all tasks are complete. TLX captures workload per task. Together they give you both the macro and micro view.
Interviewing: Comes after. Post-task interviews help explain why certain TLX dimensions scored high, adding qualitative context to the quantitative signal.
CSAT: Runs alongside. CSAT measures satisfaction broadly. TLX measures the cost of getting to that satisfaction. A high CSAT with a high TLX effort score means users like the outcome but the process is too hard.