Did you know that there is a substantial overlap among gamers and yoga aficionados online? That's just one of the unique tidbits Yuanyuan Pao and her data team have teased out from consumer data gathered by BlueKai, where she works as a software engineer for the recently-acquired digital ad firm's data science team.
Just 24 years old, Ms. Pao already has a solid resume under her belt. Following her physics undergrad education at MIT, where she worked as a video-game programmer, she focused on statistical signal processing and machine learning while obtaining an M.S. in Electrical Engineering at Stanford University.
While at Stanford, she worked for Bazaarvoice, a company flush with product-review data that branched into advertising around a year ago. There, Ms. Pao focused on natural language processing, or building systems to derive and comprehend meaning from the language of written product reviews.
Young women aren't exactly well-represented in the data science world, but Ms. Pao feels right at home at BlueKai's Cupertino, Calif., headquarters. "While there are more men than women in our line of work today, I haven't seen that as a challenge," she said. "I am surrounded by like-minded data geeks, which is all that matters." She looks forward to an increased interest in computer science in general attracting more women to the field.
Ms. Pao got the data bug while studying at MIT, and realized how much she appreciated the connections between patterns in data and the world around us. "Seeing how much real world application can come from formalizing our intuition…it was obvious that data is a fun place to be," she said.
Most days at BlueKai, acquired last month by Oracle for $400 million, Ms. Pao can be found watching a steady stream of data about consumer interactions occurring online. "It's literally on a terminal screen coming out of hadoop," she explained. "It's lines of logs that we have of people coming in on particular verticals."
The company builds and refines models to detect relations or differentiation among types of consumer interests. From there it develops audience segments and enhances reach by finding more people with similar sets of interests.
But there's more work to be done, said Ms. Po, noting that she and her team are devising ways to analyze interest in products among consumers as it increases and decreases. "One challenge I feel we face…is getting that real-time kind of intent data," she said.
Ad Age: What got you into this data stuff?
Ms. Pao: My passion for data started budding with my Probabilistic Systems Analysis class at MIT, where Markov models provided such an intuitive way of visualizing and understanding a system's behavior and patterns. I sought research positions that would give me an opportunity to work with data on a larger scale and to design and implement my own models, landing a research opportunity to work in computational genomics analyzing non-encoding regions to identify potential sequences that influence the expression of certain genes during the stages of a disease.
Though these large datasets looked like arbitrary sequences of letters to my human eye, seeing my model interpret that data and infer information for real-world application, such as understanding the maturation of a disease, made me truly appreciate the sheer power of data and data science. That, along with the challenge of conjuring up interesting solutions to the problems associated with large datasets, has been my main motivation and inspiration for getting into data.
Ad Age: Has your understanding of natural-language processing at Bazaarvoice helped your current work?
Ms. Pao: My experience at Bazaarvoice shaped my views on how consumers interact with their products and the large quantity of information that flows in through the b-to-b space. Whether working on natural-language processing at Bazaarvoice or identifying users who are more likely to be in-market for a car, the approaches for filtering out redundant information in the data or for extracting valuable information, such as a review's key topics or a person's interest level in a particular vertical, are related in some ways.
Ms. Pao: I believe that young and old cookies bring their unique value to marketers. Younger cookies can be more relevant for capturing consumer purchase intent. However, extracting signals and making connections between both older and younger cookies can provide extremely valuable insight into the path of awareness and consideration that eventually results into intent to purchase.