'Westworld: The Maze': An all-audio feat of conversational AI

By Published on .

An Alexa device in front of 360i's story-planning board.
An Alexa device in front of 360i's story-planning board. Credit: 360i

Working on a condensed, 15-week timeline, 360i and HBO created "Westworld: The Maze," an Amazon Alexa voice skill that lets players control the actions of a "host," an artificial humanoid in the show's Western theme park, on a quest for personhood. Here's how they built an all-audio feat of conversational AI.

Phase 1: Discovery (3 weeks)

In March, the team decided the Alexa project would be an interactive, all-audio game. They nailed down the creative concept, a series of challenges based on the pyramid of consciousness espoused by Anthony Hopkins' "Westworld" character, Robert Ford: memory, improvisation and self-interest.

Phase 2: Game design and writing (4 weeks)

"You have to start with the user experience: How does the game work? Where are all the places I can go? How do I win? How do I die?" says Andrew Hunter, creative director at 360i. There are more than 60 paths through the game and 32 ways to die, including getting shot, rebooted, eaten by cannibals or trampled in a stampede.

Players speak commands in response to prompts from characters in the story, who might offer to pour them a drink or ask where they would like to go. The writers had to account for anything a player might say, whether right, wrong or nonsensical.

Writers also liaised with Kilter Films, production company for the show. "Anything 'in-world' doesn't go anywhere unless it goes through them," says Tanner Stransky, director of digital content at HBO.

Jeffrey Wright as Bernard Lowe in an ad for 'Westworld: The Maze.'
Jeffrey Wright as Bernard Lowe in an ad for 'Westworld: The Maze.' Credit: 360i, HBO

Phase 3: Cast and record (2 weeks)

Voice actors for more than 30 roles needed to be cast. Jeffrey Wright reprised his role as Bernard Lowe, and if players beat the game, they can access a hidden scene with Angela Sarafyan, who plays Clementine Pennyfeather, and order special drinks.

From there, Phases 4, 5 and 6 happened simultaneously.

Phase 4: Sound design (4 weeks)

Hoofbeats, gunshots, clinking glasses: Sound is the only way players can experience the game. "This is where the fans call bullshit," says Menno Kluin, chief creative officer at 360i. "They know what the saloon sounds like"—so it had to sound authentic.

Phase 5: Authoring (5 weeks)

The script hit nearly 260 pages of dialogue, sound cues and conversation trees, all of which needed to be coded using the PullString platform, which is designed for voice applications. The team also added a few pop-culture Easter eggs. (Ask Rose the bartender about Serena Williams or RuPaul.)

Phase 6: QA (3-4 weeks)

Bugs were flagged and sent to development partner Xandra for correction. In PullString, "you can type your commands instead of having to speak them all the time," which sped up testing, says Layne Harris, head of innovation at 360i.

Amazon also began testing on its end while the skill was still in development. After the final version of a skill is submitted, it usually takes two to three weeks for Amazon to complete its testing and certify it.

Launch: June 20

Amazon doesn't report user data until about four weeks after launch, but "The Maze" has a 4.5-star rating. Player responses within the game can also be analyzed, potentially paving the way for improvements or additional options. "We can start collecting what people are asking for and add to it," Harris says. "It becomes smarter over time."

Most Popular