Google, for its part, is relying on manual targeting to place ads within that multimedia content, but it joins an ever-expanding marketplace of technology companies hoping to bring contextual targeting to audio and video content, using nifty techniques such as speech-to-text and image recognition to detect what's happening inside a piece of multimedia content. But whether such means of placing ads will be effective or whether they present a better targeting tactic than demographic or vertical targeting remains a big question mark.
Many of the companies in this space have used their technology for other applications and now see an opportunity to repurpose it for the advertising market. Digitalsmiths, for example, was using its audio and image recognition to index TV shows so networks and stations could locate clips for promos. Wizzard Media is a voice-recognition company that has worked with corporations such as IBM and AT&T but now hopes to apply its speech-to-text technology to contextual podcast advertising. The market also includes YuMe Networks, which discerns a video's content to sort it into vertical categories; ScanScout and Adap.tv, which hope to place contextual ads along the bottom of videos to complement pre- and post-roll advertising; and PodZinger, which uses audio-recognition technology to allow consumers to search for podcasts and advertisers place relevant ads within them.
"If pre-rolls and 15s and 30s will be representative of a vast amount of inventory and the industry expects TV dollars to shift, the more targeting the better," said Ian Schafer, CEO of Deep Focus, an independent digital agency.
Recently, executives at Digitalsmiths demonstrated how their technology could work in an episode of NBC's "The Office." (Digitalsmiths does not have a deal with NBC to use its technology for advertising.)
An ad for Deloitte loaded when the technology recognized the conversation in the clip revolved around accounting. A Men's Warehouse ad loaded after the technology visually identified suits and ties in the clip. The characters shared a short exchange about traveling -- and an ad for United Airlines popped up. They mentioned "J. Crew" and "watch," and ads for J. Crew and Fossil emerged. (The ads were also for demonstration only.)
But does a sitcom about a dysfunctional office, in which one episode's storyline revolved around a couple thousand dollars missing from the books, really suggest the audience might be interested in accounting services? Do more accountants watch "The Office" than, say, "Lost" or "Two and a Half Men"?
Even the executives behind the technology aren't exactly sure what's going to deliver the best results. They said they're continually running different tests on programs, looking to see what might deliver the best click-through rates.
The uptake of this technology also depends on many of the companies in the space striking deals with publishers, who will use and sell the technology, and ad-serving companies.
And despite the uncertainty over whether or not contextual targeting is the best way to sell ads in multimedia content, it's impossible to ignore how much money Google, Yahoo and independent contextual networks such as Quigo have made through the text version of such targeting.
Perhaps the best application for such technologies will be to help marketers sort through the millions of hours of consumer-created content on the web to determine what might be safe for brand advertising and what is too risky. Several, including YuMe and ScanScout, are already pitching business on that feature as well, using image recognition and teams of people in places such as India who manually sort through content.