Source: Articles on Smashing Magazine — For Web Designers And Developers | Read More
The Wizard of Oz method is a proven UX research tool that simulates real interactions to uncover authentic user behavior. Victor Yocco explores its fundamentals, advanced techniques, and critical considerations, including its relevance in the emerging field of agentic AI.
New technologies and innovative concepts frequently enter the product development lifecycle, promising to revolutionize user experiences. However, even the most ingenious ideas risk failure without a fundamental grasp of user interaction with these new experiences.
Consider the plight of the Nintendo Power Glove. Despite being a commercial success (selling over 1 million units), its release in late 1989 was followed by its discontinuation less than a full year later in 1990. The two games created solely for the Power Glove sold poorly, and there was little use for the Glove with Nintendo’s already popular traditional console games.
A large part of the failure was due to audience reaction once the product (which allegedly was developed in 8 weeks) was cumbersome and unintuitive. Users found syncing the glove to the moves in specific games to be extremely frustrating, as it required a process of coding the moves into the glove’s preset move buttons and then remembering which buttons would generate which move. With the more modern success of Nintendo’s WII and other movement-based controller consoles and games, we can see the Power Glove was a concept ahead of its time.
If Power Glove’s developers wanted to conduct effective research prior to building it out, they would have needed to look beyond traditional methods, such as surveys and interviews, to understand how a user might truly interact with the Glove. How could this have been done without a functional prototype and slowing down the overall development process?
Enter the Wizard of Oz method, a potent tool for bridging the chasm between abstract concepts and tangible user understanding, as one potential option. This technique simulates a fully functional system, yet a human operator (“the Wizard”) discreetly orchestrates the experience. This allows researchers to gather authentic user reactions and insights without the prerequisite of a fully built product.
The Wizard of Oz (WOZ) method is named in tribute to the similarly named book by Frank L. Baum. In the book, the Wizard is simply a man hidden behind a curtain, manipulating the reality of those who travel the land of Oz. Dorothy, the protagonist, exposes the Wizard for what he is, essentially an illusion or a con who is deceiving those who believe him to be omnipotent. Similarly, WOZ takes technologies that may or may not currently exist and emulates them in a way that should convince a research participant they are using an existing system or tool.
WOZ enables the exploration of user needs, validation of nascent concepts, and mitigation of development risks, particularly with complex or emerging technologies.
The product team in our above example might have used this method to have users simulate the actions of wearing the glove, programming moves into the glove, and playing games without needing a fully functional system. This could have uncovered the illogical situation of asking laypeople to code their hardware to be responsive to a game, show the frustration one encounters when needing to recode the device when changing out games, and also the cumbersome layout of the controls on the physical device (even if they’d used a cardboard glove with simulated controls drawn in crayon on the appropriate locations.
Jeff Kelley credits himself (PDF) with coining the term WOZ method in 1980 to describe the research method he employed in his dissertation. However, Paula Roe credits Don Norman and Allan Munro for using the method as early as 1973 to conduct testing on an airport automated travel assistant. Regardless of who originated the method, both parties agree that it gained prominence when IBM later used it to conduct studies on a speech-to-text tool known as The Listening Typewriter (see Image below).
In this article, I’ll cover the core principles of the WOZ method, explore advanced applications taken from practical experience, and demonstrate its unique value through real-world examples, including its application to the field of agentic AI. UX practitioners can use the WOZ method as another tool to unlock user insights and craft human-centered products and experiences.
The WOZ method operates on the premise that users believe they are interacting with an autonomous system while a human wizard manages the system’s responses behind the scenes. This individual, often positioned remotely (or off-screen), interprets user inputs and generates outputs that mimic the anticipated functionality of the experience.
A successful WOZ study involves several key roles:
Creating a convincing illusion is key to the success of a WOZ study. This necessitates careful planning of the research environment and the tasks users will undertake. Consider a study evaluating a new voice command system for smart home devices. The research setup might involve a physical mock-up of a smart speaker and predefined scenarios like “Play my favorite music” or “Dim the living room lights.” The wizard, listening remotely, would then trigger the appropriate responses (e.g., playing a song, verbally confirming the lights are dimmed).
Or perhaps it is a screen-based experience testing a new AI-powered chatbot. You have users entering commands into a text box, with another member of the product team providing responses simultaneously using a tool like Figma/Figjam, Miro, Mural, or other cloud-based software that allows multiple users to collaborate simultaneously (the author has no affiliation with any of the mentioned products).
Maintaining the illusion of a genuine system requires the following:
Transparency is crucial, even in a method that involves a degree of deception. Participants should always be debriefed after the session, with a clear explanation of the Wizard of Oz technique and the reasons for its use. Data privacy must be maintained as with any study, and participants should feel comfortable and respected throughout the process.
The WOZ method occupies a unique space within the UX research toolkit:
This method proves particularly valuable when exploring truly novel interactions or complex systems where building a fully functional prototype is premature or resource-intensive. It allows researchers to answer fundamental questions about user needs and expectations before committing significant development efforts.
Let’s move beyond the foundational aspects of the WOZ method and explore some more advanced techniques and critical considerations that can elevate its effectiveness.
It’s a fair question to ask whether WOZ is truly a time-saver compared to even cruder prototyping methods like paper prototypes or static digital mockups.
While paper prototypes are incredibly fast to create and test for basic flow and layout, they fundamentally lack dynamic responsiveness. Static mockups offer visual fidelity but cannot simulate complex interactions or personalized outputs.
The true time-saving advantage of the WOZ emerges when testing novel, complex, or AI-driven concepts. It allows researchers to evaluate genuine user interactions and mental models in a seemingly live environment, collecting rich behavioral data that simpler prototypes cannot. This fidelity in simulating a dynamic experience, even with a human behind the curtain, often reveals critical usability or conceptual flaws far earlier and more comprehensively than purely static representations, ultimately preventing costly reworks down the development pipeline.
While the core principle of the WOZ method is straightforward, its true power lies in nuanced application and thoughtful execution. Seasoned practitioners may leverage several advanced techniques to extract richer insights and address more complex research questions.
The WOZ method isn’t necessarily a one-off endeavor. Employing it in iterative cycles can yield significant benefits. Initial rounds might focus on broad concept validation and identifying fundamental user reactions. Subsequent iterations can then refine the simulated functionality based on previous findings.
For instance, after an initial study reveals user confusion with a particular interaction flow, the simulation can be adjusted, and a follow-up study can assess the impact of those changes. This iterative approach allows for a more agile and user-centered exploration of complex experiences.
Simulating complex systems can be difficult for one wizard. Breaking complex interactions into smaller, manageable steps is crucial. Consider researching a multi-step onboarding process for a new software application. Instead of one person trying to simulate the entire flow, different aspects could be handled sequentially or even by multiple team members coordinating their responses.
Clear communication protocols and well-defined responsibilities are essential in such scenarios to maintain a seamless user experience.
While qualitative observation is a cornerstone of the WOZ method, defining clear metrics can add a layer of rigor to the findings. These metrics should match research goals. For example, if the goal is to assess the intuitiveness of a new navigation pattern, you might track the number of times users express confusion or the time it takes them to complete specific tasks.
Combining these quantitative measures with qualitative insights provides a more comprehensive understanding of the user experience.
The WOZ method isn’t an island. Its effectiveness can be amplified by integrating it with other research techniques. Preceding a WOZ study with user interviews can help establish a deeper understanding of user needs and mental models, informing the design of the simulated experience. Following a WOZ study, surveys can gather broader quantitative feedback on the concepts explored. For example, after observing users interact with a simulated AI-powered scheduling tool, a survey could gauge their overall trust and perceived usefulness of such a system.
WOZ, as with all methods, has limitations. A few examples of scenarios where other methods would likely yield more reliable findings would be:
The wizard’s skill is critical to the method’s success. Training the individual(s) who will be simulating the system is essential. This training should cover:
All of this suggests the need for practice in advance of running the actual session. We shouldn’t forget to have a number of dry runs in which we ask our colleagues or those who are willing to assist to not only participate but also think about possible responses that could stump the wizard or throw things off if the user might provide them during a live session.
I suggest having a believable prepared error statement ready to go for when a user throws a curveball. A simple response from the wizard of “I’m sorry, I am unable to perform that task at this time” might be enough to move the session forward while also capturing a potentially unexpected situation your team can address in the final product design.
The debriefing session following the WOZ interaction is an additional opportunity to gather rich qualitative data. Beyond asking “What did you think?” effective debriefing involves sharing the purpose of the study and the fact that the experience was simulated.
Researchers should then conduct psychological probing to understand the reasons behind user behavior and reactions. Asking open-ended questions like “Why did you try that?” or “What were you expecting to happen when you clicked that button?” can reveal valuable insights into user mental models and expectations.
Exploring moments of confusion, frustration, or delight in detail can uncover key areas for design improvement. Think about the potential information the Power Gloves’ development team could have uncovered if they’d asked participants what the experience of programming the glove and trying to remember what they’d programmed into which set of keys had been.
The value of the WOZ method becomes apparent when examining its application in real-world research scenarios. Here is an in-depth review of one scenario and a quick summary of another study involving WOZ, where this technique proved invaluable in shaping user experiences.
A significant challenge in the realm of emerging technologies lies in user comprehension. This was particularly evident when our team began exploring the potential of Agentic AI for enterprise HR software.
Agentic AI refers to artificial intelligence systems that can autonomously pursue goals by making decisions, taking actions, and adapting to changing environments with minimal human intervention. Unlike generative AI that primarily responds to direct commands or generates content, Agentic AI is designed to understand user intent, independently plan and execute multi-step tasks, and learn from its interactions to improve performance over time. These systems often combine multiple AI models and can reason through complex problems. For designers, this signifies a shift towards creating experiences where AI acts more like a proactive collaborator or assistant, capable of anticipating needs and taking the initiative to help users achieve their objectives rather than solely relying on explicit user instructions for every step.
Preliminary research, including surveys and initial interviews, suggested that many HR professionals, while intrigued by the concept of AI assistance, struggled to grasp the potential functionality and practical implications of truly agentic systems — those capable of autonomous action and proactive decision-making. We saw they had no reference point for what agentic AI was, even after we attempted relevant analogies to current examples.
Building a fully functional agentic AI prototype at this exploratory stage was impractical. The underlying algorithms and integrations were complex and time-consuming to develop. Moreover, we risked building a solution based on potentially flawed assumptions about user needs and understanding. The WOZ method offered a solution.
We designed a scenario where HR employees interacted with what they believed was an intelligent AI assistant capable of autonomously handling certain tasks. The facilitator presented users with a web interface where they could request assistance with tasks like “draft a personalized onboarding plan for a new marketing hire” or “identify employees who might benefit from proactive well-being resources based on recent activity.”
Behind the scenes, a designer acted as the wizard. Based on the user’s request and the (simulated) available data, the designer would craft a response that mimicked the output of an agentic AI. For the onboarding plan, this involved assembling pre-written templates and personalizing them with details provided by the user. For the well-being resource identification, the wizard would select a plausible list of employees based on the general indicators discussed in the scenario.
Crucially, the facilitator encouraged users to interact naturally, asking follow-up questions and exploring the system’s perceived capabilities. For instance, a user might ask, “Can the system also schedule the initial team introductions?” The wizard, guided by pre-defined rules and the overall research goals, would respond accordingly, perhaps with a “Yes, I can automatically propose meeting times based on everyone’s calendars” (again, simulated).
As recommended, we debriefed participants following each session. We began with transparency, explaining the simulation and that we had another live human posting the responses to the queries based on what the participant was saying. Open-ended questions explored initial reactions and envisioned use. Task-specific probing, like “Why did you expect that?” revealed underlying assumptions. We specifically addressed trust and control (“How much trust…? What level of control…?”). To understand mental models, we asked how users thought the “AI” worked. We also solicited improvement suggestions (“What features…?”).
By focusing on the “why” behind user actions and expectations, these debriefings provided rich qualitative data that directly informed subsequent design decisions, particularly around transparency, human oversight, and prioritizing specific, high-value use cases. We also had a research participant who understood agentic AI and could provide additional insight based on that understanding.
This WOZ study yielded several crucial insights into user mental models of agentic AI in an HR context:
Based on these findings, we made several key design decisions:
In another project, we used the WOZ method to evaluate user interaction with a voice interface for controlling in-car functions. Our research question focused on the naturalness and efficiency of voice commands for tasks like adjusting climate control, navigating to points of interest, and managing media playback.
We set up a car cabin simulator with a microphone and speakers. The wizard, located in an adjacent room, listened to the user’s voice commands and triggered the corresponding actions (simulated through visual changes on a display and audio feedback). This allowed us to identify ambiguous commands, areas of user frustration with voice recognition (even though it was human-powered), and preferences for different phrasing and interaction styles before investing in complex speech recognition technology.
These examples illustrate the versatility and power of the method in addressing a wide range of UX research questions across diverse product types and technological complexities. By simulating functionality, we can gain invaluable insights into user behavior and expectations early in the design process, leading to more user-centered and ultimately more successful products.
The WOZ method, far from being a relic of simpler technological times, retains relevance as we navigate increasingly sophisticated and often opaque emerging technologies.
WOZ In The Age Of AI
Consider the burgeoning field of AI-powered experiences. Researching user interaction with generative AI, for instance, can be effectively done through WOZ. A wizard could curate and present AI-generated content (text, images, code) in response to user prompts, allowing researchers to assess user perceptions of quality, relevance, and trust without needing a fully trained and integrated AI model.
Similarly, for personalized recommendation systems, a human could simulate the recommendations based on a user’s stated preferences and observed behavior, gathering valuable feedback on the perceived accuracy and helpfulness of such suggestions before algorithmic development.
Even autonomous systems, seemingly the antithesis of human control, can benefit from WOZ studies. By simulating the autonomous behavior in specific scenarios, researchers can explore user comfort levels, identify needs for explainability, and understand how users might want to interact with or override such systems.
Virtual And Augmented Reality
Immersive environments like virtual and augmented reality present new frontiers for user experience research. WOZ can be particularly powerful here.
Imagine testing a novel gesture-based interaction in VR. A researcher tracking the user’s hand movements could trigger corresponding virtual events, allowing for rapid iteration on the intuitiveness and comfort of these interactions without the complexities of fully programmed VR controls. Similarly, in AR, a wizard could remotely trigger the appearance and behavior of virtual objects overlaid onto the real world, gathering user feedback on their placement, relevance, and integration with the physical environment.
The Human Factor Remains Central
Despite the rapid advancements in artificial intelligence and immersive technologies, the fundamental principles of human-centered design remain as relevant as ever. Technology should serve human needs and enhance human capabilities.
It allows us to inject the “human factor” into the design process of even the most advanced technologies. Doing this may help ensure these innovations are not only technically feasible but also truly usable, desirable, and beneficial.
The WOZ method stands as a powerful and versatile tool in the UX researcher’s toolkit. The WOZ method’s ability to bypass limitations of early-stage development and directly elicit user feedback on conceptual experiences offers invaluable advantages. We’ve explored its core mechanics and covered ways of maximizing its impact. We’ve also examined its practical application through real-world case studies, including its crucial role in understanding user interaction with nascent technologies like agentic AI.
The strategic implementation of the WOZ method provides a potent means of de-risking product development. By validating assumptions, uncovering unexpected user behaviors, and identifying potential usability challenges early on, teams can avoid costly rework and build products that truly resonate with their intended audience.
I encourage all UX practitioners, digital product managers, and those who collaborate with research teams to consider incorporating the WOZ method into their research toolkit. Experiment with its application in diverse scenarios, adapt its techniques to your specific needs and don’t be afraid to have fun with it. Scarecrow costume optional.