Articles

Copyframes: Just like wireframes, only faster, more precise, and less costly

February 2, 2025 By Doug Schumacher

It must happen on thousands of video conferences every hour around the world. A UX design project is in the late stages when the client begins to question the content. The way things are said, or even worse, the order of key page sections.

And every time that happens, the progress grinds to a halt. And then more things happen. The project leaders swoop in to avert crisis. Long meetings take place involving said project leaders, racking up hours, and costs. And those meetings are about making decisions that should have been locked down much earlier in the process.

Enter copyframes. Wireframes’ more literal cousin.

Instead of greek text, with copyframes, you’re working out first-drafts of the most critical navigation elements on the page: headlines, body copy, and CTAs.

If you’ve read the most excellent UX writing book ‘Writing is Designing’, the authors point to work done by Mig Reyes demonstrating how text is the navigation bedrock of a user experience. Take a typical web page and first, remove all the graphic elements. Then, all the text elements. Most interfaces remain usable with just text, while visual elements alone can range from slower (i.e., greater cognitiveload) to completely unusable. (Check out ‘Image 1’ for a visual that illustrates that point.)

Image 1

As a result, the traditional wireframing approach, while possibly delivering a smoother UI transition, often diverts into a bottleneck of content revisions. Tweaking visual late-stage layouts based on copy modifications eats up valuable time.

It’s about more than time and money

Strategy usually lives in text – briefs, user stories, etc. Copyframes better bridge this natural gap between strategy and execution. It’s a more direct translation of the strategy into interface.

And this clarity creates a feedback loop into the time and money issue. When clients comment on the actual words taking shape, the crucial feedback insights arrive earlier in the process.

So copy matters. Or at least, it should. Copyframes offer the fastest, most direct route to translating crucial strategic ideas into tangible page structures. Think of it like prioritizing the big rocks in that task management jar we all know and (sometimes) love. Copy is undoubtedly one of those big rocks in the UX process.

But what do copyframes look like?

After what I’ve described so far, it’s probably not a stretch of your imagination to conjure up an image of a copyframe. That said, I’ve used them at various levels of fidelity. (And using fidelity to describe a copyframe could be considered egregious.)

So regarding tools, it’s a lot like the line about the best camera being the one you have with you. Copyframes are about capturing the gist of something vs artful finesse (I admit to having done them in Powerpoint). It’s about whatever helps you most easily communicate the flow of messaging, from first view of the experience through to the final CTA.

Below is an example (Image 2) of copyframe examples; the first done in FigJam, the next two in Figma, one rough and one using a page template. (I used the Door Dash page again to illustrate what what different copyframes approaches might look like.)

Image 2

It wouldn’t be a blog post without mentioning AI

Beyond the present advantages of time, cost, and accuracy, copyframes offer a giant leap forward: copyframes are perfectly positioned to leverage the power of artificial intelligence.

The text-centric nature of copyframes means that today’s sophisticated Large Language Models (LLMs) can be reliably employed to generate compelling and on-brand copy with the right prompting and agent design.

Furthermore, AI agents can be used to create the initial creative brief and page content outlines, leveraging copyframes to further expedite the process. The link between copyframes and AI is a match made in digital heaven, promising more relevant, faster, efficient, and utlimately more impactful UX design workflows.

So, ditch the late-stage content chaos and embrace the clarity and efficiency of copyframes. Your team, your clients, and your bottom line will thank you. It’s time to put words first and build truly meaningful digital experiences, one well-designed sentence at a time.

Put another way: Use your words.

Launch: UXing. Content analysis of tweets with ‘#UX’

November 9, 2022 By Doug Schumacher

UXers, meet UXing.

UXing is an app that analyzes tweets with the hashtag ‘#ux’.

Now of course, you can’t have Twitter content analysis without a dashboard, so there’s one of those. A topline breakout of the key metrics behind the broad trends.

The data is charted in a Google Looker (nee Data) Studio report.

After the initial dashboard view, it breaks down the analysis to answer 3 primary questions.
1. What are the influncers tweeting about?
2. Who’s doing the tweeting?
3. When is it happening?

That’s a simplification, as there are quite a few charts and tables designed around each of those questions.

For example, a word cloud featuring the topics most frequently posted helps answer the question ‘What are the influencers tweeting about’.

Likewise, a table ranking the content authors by total retweets generated during the given time period takes the question of ‘Who’s doing the tweeting’ and adds a value metric to rank them by.

The question ‘When is it happening’ is interesting from multiple views, including the dayparts with the most active authors.

In future posts, I’ll share charts that I think add interesting POV to the industry conversation and discuss how they work to answer the 3 main questions.

Meanwhile, the report is live, so please check it out.
https://datastudio.google.com/s/q3EifGqrEvM

If you have questions, thoughts on the analysis, or ideas about what else could be gathered from the data, I’m curious to hear more.

Your Brand Has a Personality Whether You Like it or Not. Here’s How to Define It.

April 13, 2020 By Doug Schumacher

Note: This article originally appeared in Voicebot.ai on April 4, 2020

Pay attention, mortal meatbot, and you might learn something about voice personality in this article. Or, maybe a better opening line would be, Are you interested in learning how to use voice personality to increase the effectiveness of your voice apps?

Or maybe not.

The point is, there are different ways to deliver a message. The style you choose will likely impact your message’s effectiveness. And that’s the point of a voice personality methodology. To hopefully nail the optimal voice for your brand.

YOU CANNOT NOT COMMUNICATE

The line above is from influential graphic designer David Carson. It’s a simple concept. Whatever font, imagery, or layout you use to represent your brand, it will make a statement about the identity and values of the company behind the message. Those who think they can hug the middle of the road attempting to avoid standing out can come across as bland and generic. And of course, bland and generic can be ways to describe a personality. Just not terms most brands are likely to aspire to.

The spoken word adheres to the same communication principle. Clifford Nass, a former Stanford professor of Communications who did extensive research on the impact of various voice personality traits, said that within seconds of a human hearing a voice, we begin to assign all sorts of personality traits to it. Whatever voice it is. However nondescript we may think it is. To demonstrate this, below are 2 human voice-over reads for the same script. Even with the bland, generic script used in those reads, listen to the two voices and note how your mind conjures different personas for each.

AUDIO SFX: S1VO1

Credit: Patrick Lagreid at Voices.com (LINK: https://www.voices.com/actors/plagreid)

AUDIO SFX: S1VO2

Credit: Debbie Feldman at Voices.com (LINK: https://www.voices.com/actors/debbiefeldman)

Of course, like any self-respecting professor at a research university like Stanford, Nass didn’t stop after a couple of experiments and call it a day. Through extensive lab environment testing, he developed the Media Equation theory[1]. It introduced the idea that humans assign personalities to machines we interact with similar to the way we do for humans we interact with. That’s highly significant for anyone building any type of technology interface. To demonstrate that concept, listen to the audio clips below.

AUDIO SFX: S2VO1

Credit: Sarah Raines at Voices.com (LINK: https://www.voices.com/actors/Sarah_Raines615)

AUDIO SFX: S2VO2

Credit: Scott William at Voices.com (LINK: https://www.voices.com/actors/SWilliam)

AUDIO SFX: S2VO3

Credit: Amazon Polly Synthetic Voice Joanna

In accord with the Media Equation theory, the persona behind the 1st and 3rd clips should feel more alike than the 2nd one. I believe they do.

A MODEL FOR PERSONALITY TYPES

In his book “Wired for Speech”, professor Nass provides detailed accounts of 20 studies on how voice personality impacts the user’s perception and behavior. He also points out the usefulness of a personality model to help in selecting the right voice characteristics and presents the Wiggins voice personality model as an effective tool. Wally Brill, head of conversation design, advocacy and education at Google, has given several good presentations on using the Wiggins Personality Model, as seen below.

The Wiggins model is simple. There are four quadrants, with the x and y-axis presenting polar traits. The x-axis contrasts distance with friendliness, while the y-axis goes from submissive to dominant. While voices don’t convey all personality characteristics like if a person is intuitive or spontaneous, there are a number of voice traits commonly associated with personality types. Nass focuses on four of them. Speech rate, pitch, variance in frequency, and volume. The Wiggins chart below plots how those voice traits influence personality.

Nass even plots where several common character types reside on this chart. A supervillain like Darth Vader? His deep, slow, stolid voice provides a mix of dominance and unfriendliness. A classic bad guy voice. Superheroes also need to project dominance while also having a friendlier tone. So while they may have the deepness of a villain’s voice, they combine that with a wider frequency range and greater speed, adding friendliness to the mix.

More secondary characters will likely have less dominance, i.e. be more submissive, and will vary along the friendliness and unfriendliness axis based on their desired perception of being on the good or the bad side of the story. This model may provide a blunt view of voice personalities, but it’s a good starting point for discussions around what voice personality would most likely fit a brand.

THE RIGHT PERSONALITY FOR YOUR BRAND

A logical next question is, How do you know which voice personality is right for your company? The answer to that will vary significantly by company, but a good place to start looking is the company’s brand strategy document. The brand strategy document defines the brand’s identity. What the brand stands for. And it culminates in a brand promise: A statement that sums up the brand’s identity.

The brand promise takes into account three things in particular.

The target audience(s) for the brand’s products or services. Not only who they are, but what their needs are, and what they find most relevant about the brand.
The competitive landscape. How the company creates brand distinction from its competitors.
The brand’s capacity for authentically supporting the brand promise. In other words, do the brand’s values and capabilities realistically back up a given strategic position?

That brand promise and the backing research in those three areas should go a long way to outlining the type of voice that will likely resonate with your target audience. Here’s a brand positioning statement template from Hubspot that contains those elements.

For [your target market] who [target market need], [your brand name] provides [main benefit that differentiates your offering from competitors] because [reason why target market should believe your differentiation statement.]

And keep in mind that brand promises are aspirational. Something the brand feels they can be, or at least evolve into. If you don’t have an official brand strategy document, then any information on the above three areas of consideration should be helpful. The better you know your target audience, your competitors and your own brand, the better you’ll be able to determine where your ideal voice should be in terms of friendly vs unfriendly, and dominant vs submissive.

HOW YOU COMPARE WITH COMPETITORS

Making our brand personality distinct from competitors requires that we understand our competitor’s brand personalities. And indeed, while we probably won’t have their brand strategy documents, we can easily observe competitor’s communications. From advertising campaigns to logo designs to color palettes, the assets used in brand communications are a reflection of the values and positioning in their brand promise.

To illustrate this, let’s use an example of two brands from the financial industry: Vanguard and eTrade. When casually observed, financial industry brands can all sound the same. However, with some fairly quick observations of the brand’s communications, we see differing approaches to presenting themselves in the marketplace.

Let’s start with Vanguard. The company is an icon within the investing community. Indeed, it’s founder, the late John Bogle, was a respected industry giant known for his thriftiness and efficient trading system. Company founders often drive the personality of the brand. Think Ford. Ralph Lauren. Um, Trump. Accordingly, the Vanguard logo font is a conservative serif font. The vintage-looking ship illustration certainly isn’t trying to rock the boat, if you will. Even the color palette is simple and about as unexciting as red can be.

Now consider eTrade, in contrast. The brand font? Bold sans serif uppercase. The logo? Two colliding arrows. A strong sense of motion and energy. And the color palette. Purple and chartreuse. If you’ve ever seen eTrade TV commercials, you may know them for their sometimes brash and outrageous sense of humor. Vanguard TV commercials? You probably can’t think of any.

While two brands within an industry may pursue different target audiences and may stake out different personalities within their industry, each brand personality needs to be authentic to the company behind it. Considering our example brands, it would be hard to imagine Vanguard using a color palette like eTrade’s. Likewise, eTrade using anything like the Vanguard ship illustration for its icon would seem odd. So at least on the surface, the two company’s brand personalities feel appropriate. Going back to the Wiggins model, you can see how these two brands plot differently in terms of friendliness and dominance. Other brands can be added until you have a complete sense of where your brand stands within its competitive landscape.

WHAT TO EXPECT

Hopefully, this article has provided the reason, value and methodology for giving deep consideration to the type of voice, be it human or synthetic, that will represent your brand. Keep in mind, a process like this won’t define the entirety of your brand personality. But it gives you a structured view for comparing your brand to competitors and provides anchor points for discussion and further analysis.

Chainsaw Product Highlights Risks for Brands Rushing into Voice Industry

April 1, 2020 By Doug Schumacher

Note: This article first appeared in Voicebot.ai on April 1, 2020

A dubitable voice product that garnered early VC investment is being taken off the market. The investigative journalism arm of Voicbot.ai has taken a closer look at this remarkable story.

There were doubtful murmurs when the concept of a voice-enabled chainsaw was first revealed back in 2017. But sometimes, the rising tide of a fast-moving industry lifts all barriers to good judgment.

CHAT N’ CUT

The “Chat n’ Cut”, as it was named, had humble beginnings. According to the ‘origins’ story on the brand’s now defunct website, the two founders, both advertising copywriters, had apparently polished off a case of Lumberjack Stout beer after work one Friday night and started brainstorming voice product ideas. In their inebriated state, the Escheresque connection between chainsaws and the voice industry became a direct path to startup stardom.

Sources close to the founders suggest that upon waking the day after their brainstorming binge, no hangover of any magnitude could keep the two from pursuing their newfound dream. Perhaps most surprising, they secured $1 million in funding with little more than a conversation flow diagram. Wil Gamble, CEO of seed funding firm Gamble & Shock, explained their logic for investing in the company:

“So many legacy industries were digitizing. We figured if the taxi and office rental businesses could be appified, well, it seemed like the chainsaw industry was ripe for the picking. Add to that the rapidly emerging voice space and, well, what’s not to love?”

And, while hindsight is 20/20, it seems there were a number of questionable decisions made along the way. The initial target market was the lumber industry. Early field tests were conducted with a logging firm in Veneta, Oregon, and a problem quickly surfaced. While it was easy to turn the chainsaw on by voice command, after it was started, the chainsaw motor noise overwhelmed the mic, and no additional user commands could be picked up.

“It was just weird,” said a field crew boss at the firm who asked to remain anonymous. “Usually, when the ‘jacks are out at work, all you hear is the buzz of the saws and falling trees. But this time, all you could hear was the saw, and then a bellowing voice of one of the ‘jacks screaming “Hey Chat ‘n Cut”, trying to shout commands above the chainsaw noise.”

CHAIN OF FOOLS

Field reviews were understandably bad. But what might have seemed a clear and obvious alarm was misinterpreted by the brand’s market research consultancy. Instead of viewing the research as an indication the Chat ‘n Cut was not a professional grade product, they instead took it as an indicator the product should be repackaged and sold to the consumer market.

We got in touch with the former director of marketing for Chat ‘n Cut to get a sense of how that strategy played out. Apparently, things didn’t go well. Under a blanket of anonymity, she noted that while the OKRs (Objectives and Key Results) were initially focused on metrics like CPS (Cost Per Sale) and NPS (Net Promoter Score), the company soon transitioned to the CPMWM metric (Customers Per Thousand who were Wounded or Maimed).

Voicebot’s investigative reporting team was able to source a key document that indicates management should never have sent this product to the market. The sample dialog flow had plenty of branching leaving us to wonder if there was an actual “happy path” that Chat ‘n Cuts entry-level conversation designers could agree on. It was also heavily edited to add new error states associated with unintelligible speech.

The voice industry remains promising, and we present this story in complete objectivity. While the personal injury attorneys and media feast on the carrion of this derailed initiative, keep in mind the lessons learned from our ancestral early-stage technology ventures. Just because you can doesn’t mean you should.

Judging Voicebot.ai’s “Leaders In Voice” Report

August 19, 2019 By Doug Schumacher

If you’re in the voice space you’re almost certainly aware of Voicebot.ai and the industry-leading reporting they produce.

I was honored to participate as a judge in selecting 44 most influential people in voice. You can get the report here.

Me and my synthetic voice

July 17, 2019 By Doug Schumacher

This post is about the recent experience I had creating a synthetic version of my own voice.

I’ve been interested in synthetic speech since starting my Homie & Lexy podcast, where I generate the character voices in Polly, Amazon’s text-to-speech engine.

And the episode of VoiceMarketing titled “How listenable is synthetic speech” looked at how Polly performs for various types of content, including the Gettysburg Address, simple jokes, a newscaster script, and a Shakespearian sonnet.

Throughout it all, I’ve gotten quite familiar with SSML (Speech Synthesis Markup Language).

I’d had several conversations with Rupal Patel, CEO of VocaliD, a company that’s doing interesting work in the area of creating synthetic voices. (She was recently interviewed on a Voicebot.ai episode.)

So when Rupal asked if I’d like to go through the synthetic speech process and create a synthetic version of my voice, I was intrigued and jumped at the opportunity. (Full transparency, this article isn’t a paid endorsement for VocaliD. I just found the process interesting and the results impressive, and wanted to write about it.)

Here’s an outline of the process I went through for creating my synthetic voice.

Equipment

Two big points here.

Have a quiet recording environment
Use a good mic/headset

Both of those may seem obvious, but small problems can lead to big problems in the results. Even a little background noise can cause problems in compiling the speech, as the technology focuses on minute differences in the speech recordings, and that includes non-speech noise that slips in.

For microphones, I used a Sennheiser SC 60, which I use for my podcasts, and it worked well. VocaliD recommends the Turtle Beach Recon 50 gaming headset. Both are reasonably priced at about $40.

Recording

The recording process was fascinating. I read a total of 2,000 statements (they need about 90 minutes of clean audio recordings to work with). The statements ranged from a couple of sentences to short phrases with only a few words. This was all provided by VocaliD, and I recorded it online through their website interface. It took me somewhere around 3-4 hours over the course of a couple days to complete the recordings.

The phrases were an interesting mix of text. Mark Twain. Shakespeare. Famous novels. And even phrases from my own podcasts. More than once I had to do second takes because I’d laughed out loud at what I was reading.

VocaliD has an online recording studio with a simple user interface, or you can upload pre-recorded files along with corresponding transcripts.

Compositing

This is where the magic happens. And I’m not going to dive into a technical explanation of the machine learning advances and end-to-end neural synthesis stuff.

For technical details, I’ll refer you to the VocaliD website.

On a more pedestrian level, the clean audio samples are used to train an end-to-end synthesizer which learns to emulate the sound patterns and intonation of the speaker. And that voice training process goes on for between 24 and 48 hours. So a fair bit of processing.

Demonstration

Let’s run through a few demonstrations comparing my voice to my synthetic voice under different types of phrases. A question, a statement, projecting excitement, and using irony.

Question

Me:

Synthetic Me:

Statement

Me:

Synthetic Me:

Excitement

Me:

Synthetic Me:

Irony

Me:

Synthetic Me:

Results

When I first heard the synthetic version of myself, I was quite impressed. It’s a bit trippy hearing yourself saying something you’ve never actually said. Welcome to the future!

While synthetic voices still struggle with many types of content, I found the diction on VocaliD’s tool impressive for what you might call mid-range content. Texts that can be delivered in a fairly straight forward manner. Not crazy humor. Not high drama. But needed information.

What I think this level of synthesis does more than anything is extend the length of speech a synthetic voice can read for in that mid range of content.

In my episode “How listenable is synthetic speech“, the main factor in whether synthetic speech is usable was the type of content. But coming in shortly after that is the length. A short news or weather clip? That’s more doable. A 6 minute reading of a magazine article? Not so much.

Use Cases

What are some use cases for where we could apply synthetic speech?

My favorite, and this is actually what charted the course for VocaliD, is giving voice to people who are voice-disabled. Rupal makes a great point about how you wouldn’t attach just any prosthetic arm to a small child in need of one. It’s similar for someone who needs a synthetic voice. This creates the possibility for them to have a voice that’s distinct and tied to their personality.

Another example is for brand consistency.

Consider a company like Allstate Insurance, and their use of Dennis Haysbert for their commercials. By synthesizing his voice, they could apply those same brand qualities to their company’s interactive voice system. Any voice assistant apps they develop. And for some companies, that would even apply to physical products they develop.

For the money they’ve spent positioning their brand, defining the right voice to represent that brand, and the contract for Mr Haysbert’s talents the additional cost of synthesizing his voice is nominal, and extends the value of work they’ve already done.

Another interesting possibility is the creation of previously unattainable brand voices. VocaliD is not only able to synthesize one persons voice, but they can also combine multiple voices to create a brand voice that’s not only distinct, but perhaps has qualities never before captured in a single voice. A voice avatar, perhaps.

Sensitivities and Considerations

I’d be remiss if I didn’t mention that the idea of synthesizing someone’s voice introduces a lot of legal, not to mention ethical, considerations. Given I’m not an attorney, I’m not going to do a deep dive into this. But clearly the idea of creating synthetic speech around a person’s voice identity will bring up a lot of issues and discussions in the coming years. I do know the folks at VocaliD have already been thinking about this and are working on ways to watermark a talents voice, similar to technologies already used for videos and images.

As the world of voice technology, voice and audio devices, and voice-delivered content evolves over the next few years, the synthetic voice space is going to offer brands, content developers, and voice talents some unprecedented opportunities. My sense is these changes are going to happen swiftly, and navigating this part of the voice landscape is going to be key to maximizing the impact and value of audio content.

Voicebot development insights from the winners of the Actions on Google Developer Challenge

January 17, 2018 By Doug Schumacher

This winter it was as if CES was trying to prove to the holiday season that it could create the most buzz around voice assistants. So at least heading into 2018, it looks to be an exciting year for the #VoiceFirst movement.

And Google recently announced winners of the Actions on Google Developer Challenge, with awards and cash going to what they judged were the best voicebots on the Google Assistant platform in 2017. It’s about time for someone to launch an awards show called the Bottys.

At any rate, it’s worth looking into the winners and seeing what they’re doing. While we don’t have hard data to indicate if what they’re doing is working, at least they’re getting Google’s nod of approval.

So I’m going to analyze the first, second and third place winners to see what we can learn from them.

And without further ado, here they are.

100 Years Ago: An app that travels back in time 100 years and lets you listen to an interactive radio show, including breaking news and the latest hit songs circa 1917
Credit Card Helper: Helps users find the best credit cards quickly and avoid traps.
My Adventure Book: Storytelling game that lets users navigate their own adventure.

There are 4 areas we’ll assess to better understand how these apps work.

Invocation
User interface
Landing page content
Developer comments

Invocation

The invocation is what you say to Google to get to the app. It’s the name. Now, voice apps aren’t that easy to discover. It’s not like people surf Alexa or Home like they might the web. Currently, users take a linear path through the medium, and typically have a clear idea of what they want to do or find out, if not specifically where they want to go. Categorically, audio information is consumed more linearly than graphic information.

The invocation is no small part of an apps success. The easier to say and remember, the more likely people will be to actually use it.

Here are the invocations for the three winners.

100 Years Ago

“Talk to 100 Years Ago”

Credit Card Helper

“Ask credit card helper what the best credit card is”

“Ask credit card helper what the risks of credit cards are”

“Ask credit card helper what to do if my card is stolen”

“Ask credit card helper what type of credit card i should get”

“Talk to credit card helper”

My Adventure Book

“Talk to My Adventure Book”

Credit card helper is using a range of invocations to touch on different user interests. Consider the difference between “what the best credit card is” and “what to do if my card is stolen”. The first one could drive users directly into a new customer funnel, and the second helping an existing customer’s make account changes. Both great features to offer, addressing people at different stages of the customer experience path.

So developers and marketers will want to use terms that are easily remembered, easy to say, distinct from other voice apps, and offer directions to specific sections of the app, where possible.

User Interface

Once the user passes through to the app, it’s imperative to create a positive experience as quickly as possible. Like websites and mobile apps before them, voicebots will likely succeed or fail based on the first 30 seconds of the experience.

And here’s how each app greets you after a successful invocation. These screens were grabbed while interfacing with Google Assistant on the phone, as it displays the script the Voicebot speaks for easy reference here.

100 Years Ago

Credit Card Helper

My Adventure Book

You can see the relative simplicity of these apps. And of course that’s to be expected from first forays into new territory. However, even early web pages often provided link after link. This is an early indicator of what will likely be a challenge for the voice web for a while to come: The small amount of content that users can comfortably navigate and consume.

One UX issue I noticed is, when going back through an app repeatedly, it’s great when you can skip through large sections of the script that you’ve already been through. Especially at the intro. Credit Card Helper did this very well, and it removed a sense of tedium from the process. You just have to say “Hey Google, skip” and you’re on to the next section.

Here’s the main navigation structure for each app.

100 Years Ago

Hear the news
Listen to music
Speak to special guest

Credit Card Helper

Help finding a card
Browse by category
About our research

My Adventure Book

Single launching point with multiple paths to follow

It makes sense the simplicity of the navigation reflects the simplicity of the overall app experience, although Credit Card Helper has extensive content on each of the credit cards in its database.

A frequent discussion point in voicebot design is the optimal depth for a nav. I’ve heard more than one person say 3 is ideal. I think that might be where we are now, although I’m sure that will expand as people become more familiar with the technology.

Additionally, the AI technology behind the speech interface is also going to improve dramatically, increasing the accuracy and quality of experience.

App directory

Due to the invocation process for voicebots, one of the major marketing challenges is getting users to remember the exact invocation name. You might hear it on the radio, or see it in an ad, but when you go to visit the voicebot, via voice, you have to remember what’s likely to be a 3 word name. Not a simple task, especially when you consider marketers spend bazillions just to get consumers to remember their simple brand name.

To help with finding apps, Google has created an app directory where each voicebot has its own page. This is an opportunity to prep the visitor for their upcoming user experience.

Here are the Google Directory pages for each app.

100 Years Ago

Credit Card Helper

My Adventure Book

Firstly, you can see how the many invocation phrases on the Credit Card Helper stands out, offering the user additional ways to invoke and discover the app.

The description area offers brands a place to briefly (hopefully) summarize why users should visit the app. You can bet there will be a lot of discussions within marketing departments over what goes on this page. Suffice it to say that well written copy with a clear sense of the app’s mission is going to be critical.

I’d also guess that Google will expand on the media available for presentation on this page.

Developer Comments

Given the recency of this industry, there’s not much in the way of best practices or industry standards. Brands are going to need to use this first wave of apps for as much design and user experience information as they can get.

In addition to reviewing each app, I contacted the app creators to get additional insights about the app development process. So here are some of the highlights and challenges they reported.

100 Years Ago developer Jesse Vig used feedback from a previous app he developed to guide the direction for his winning voicebot. According to Jesse:

Prior to building 100 Years Ago, I built another action called Time Machine that reads headlines from the past and plays a brief time travel sound effect. Surprisingly, most of the positive feedback I got was about the sound effect. Based on this experience, I wanted to create a richer audio experience with 100 Years Ago.

Arun Rao, CEO of Starbutter, the company that developed Credit Card Helper, also has strategic advice for app developers:

On the design side, state your Action’s key objectives early and try to design a “Wow” experience around them. If you do too much or don’t have a clear objective, your Action won’t be interesting. The first part of “Wow” is to not do anything really dumb – which takes a lot of user testing to figure out.

I previously mentioned the challenges of designing information for a voice interface. Jesse Vig also addresses this, saying:

I was able to create a much richer experience when the action had access to a visual display, as in the case of Google Assistant on mobile devices. Reading is faster than listening, and obviously images cannot be conveyed through an audio interface.

Arun Rao has some good advice for approaching new apps:

For use cases, think about where voice or chat interactivity adds much more value than a current experience. Test this out with prototypes or videos before you build anything (BotSociety is a good prototyping tool to start with).

He also offers up a valuable technical recommendation for maximizing app performance:

On the technical side, go serverless and use Google Cloud Functions or Amazon Lambda. These are efficient and more scalable and error-proof for webhooks than having a real or virtual server.

Along a similar vein, Nick Laing, developer of My Adventure Book, suggested new developers use the existing tools.

My advice to any beginners is to start with DialogFlow, there is a lot you can do with that console alone. Once you get familiar with the platform you can write your own code to expand the functionality.

As the voice assistant industry enters it’s 4th year as a consumer gadget, there’s enormous potential on the horizon. These early examples are the tip of a large and growing iceberg. (And Earth needs more ice, right?) If one thing is certain, it’s that the devices, the apps, and the user base, are all going to evolve considerably over the next few years. It should be an exciting race.

Thanks to Arun Rao, Jesse Vig and Nick Laing, not only for their work but their generosity in providing guidance for other app developers.

I’ll be reviewing more voicebots in the coming weeks, so if you’d like to stay up on the latest creations, consider clicking the SUBSCRIBE VIA EMAIL button below.

Articles

Enter copyframes. Wireframes’ more literal cousin.

It’s about more than time and money

But what do copyframes look like?

It wouldn’t be a blog post without mentioning AI

YOU CANNOT NOT COMMUNICATE

AUDIO SFX: S1VO1

AUDIO SFX: S1VO2

AUDIO SFX: S2VO1

AUDIO SFX: S2VO2

AUDIO SFX: S2VO3

A MODEL FOR PERSONALITY TYPES

THE RIGHT PERSONALITY FOR YOUR BRAND

HOW YOU COMPARE WITH COMPETITORS

WHAT TO EXPECT

CHAT N’ CUT

CHAIN OF FOOLS

Equipment

Recording

Compositing

Demonstration

Question

Statement

Excitement

Irony

Results

Use Cases

Sensitivities and Considerations

Invocation

User Interface

App directory

Developer Comments

Footer