"I started doing a lot of algorithm design," in the late '90s, said Mr. Curcio, and "kind of kept in that line of work." Back then, the concept of a data scientist didn't exist, he said, "The world sort of caught up in the last few years," recognizing data science should be a facet of business decision-making.
The data sets available through digital and social media are "massively larger" than any other set of science data, suggested Mr. Curcio. "None of the problems we work with are academic because of that," he said. He's been with San Mateo-based Aggregate Knowledge since it was founded six years ago as an Amazon-esque recommendation engine.
Yet he hasn't pulled himself away from academia entirely, and still tries to work with academics who are interested in the online, offline and third-party information he sifts through every day at Aggregate Knowledge to determine ad attribution and ad effectiveness. They're "dying to get data" to publish papers, he said.
Mr. Curcio sometimes visits nearby Stanford University and works with economics academics studying ad effectiveness, for example.
Ad Age: What don't people get about big data?
Mr. Curcio: A common saying around Aggregate Knowledge (AK) is "big data doesn't need to be fat." Data sets are inherently large and one of the common approaches to data is trying to query the entire data set every time you need an answer, which takes a long amount of time to produce insights that are actionable. This approach does not help analysts solve problems in "human" time, and since our media-intelligence platform effectively supports hundreds of analysts, we wanted to come up with a solution that would make their lives a lot easier so they could very quickly and easily get to the nuggets of information needed to drive business results.
At AK, we realized quickly that the existing tools, like Hadoop and others, were only being used because they were available, not because they provided the best solution to the problem. So, we decided to develop our own data store, which we call the Summarizer. The proprietary technology of the Summarizer relies on real-time, stream-based probabilistic algorithms. It is actually very compact, lives in memory, and represents a huge volume of data. To this end, I think a common misconception is that big data has to be physically large, yet it's really your mindset around analytics and how to solve problems that needs to be huge.
Ad Age: You mention Hadoop, which enables querying of large data sets and comes up a lot in conversations about modern approaches to analyzing data. What are some of its drawbacks that inspired Aggregate Knowledge to build the Summarizer?
Mr. Curcio: While Hadoop is quite literally the backbone of big-data analytics, it has many drawbacks. It is not a great option for production systems as there are many parts to building and operating Hadoop clusters that are fragile at best. The major reason we developed our own technology was to help with what we call "human-time" analytics. When an analyst is working on a problem and has to wait more than 30 seconds for an answer from a database, there is a huge interruption in the problem-solving process. Most people don't have the stamina to sit and continue to think through a problem during this "analytics blackout." Surfing the web turns out to seem like a great idea while you wait! Minimizing these types of interruptions was a major goal for us at AK.
Ad Age: I guess the irony is so much of the digital data people like you are dealing with is created while we're distracted! Despite the advances by people on the science side in dealing with the huge amounts of data that marketers can learn from, what's the biggest problem with data-science people as they navigate the world of marketing?
Mr. Curcio: The biggest challenge for data-science people is turning our curiosity into something that is easily usable and understandable by marketers. Data scientists need to continually push themselves to think outside the box and think strategically around what the data informs vs. just the coolness of the data itself. We need to continue to think of data in the ad-tech space as media intelligence, the actionable insights that are being delivered, and the ability to provide knowledge -- which is very powerful. Data-science teams and marketing need to work together to turn a scientist's world of exact data, numbers, and algorithms into something compelling for the industry that creates value. It is a trend across every business to retool their operations to ensure data scientists and marketers leverage as much analytics and insights as possible.
Ad Age: What do you wish marketers would understand about what data scientists do?
Mr. Curcio: Marketers often look at data scientists as a source from which to pull information. But, in fact, the opposite is true. While we are extremely analytical and can often be seen as "geeks," we do have some creativity. Our team talks a lot about how we are in the age of "Math Men meet Mad Men." What do I mean by that? Well, data scientists are very good at thinking through a problem and coming up with multiple ways to solve it. Marketers should see data-science groups as partners to help them outwardly communicate the value of data collected, put it in context, and drive differentiation. In this way, data scientists can push interesting metrics to marketers, enabling them with better media spend outcomes. At the end of the day, it will be the collaboration between marketers and data scientists who come up with meaningful insights that can drive business results…and that is really cool!
Ad Age: What's the coolest or strangest type of data set you've ever worked with and why?
Mr. Curcio: It is strange to say, but ad-tech data is great. There are not many industries where you get the quantity and velocity of data that you get in ad tech. Couple that with the fact that there are a ton of really cool algorithms in the field (auction theory, graph theory, and the like) and you start to see why so many of the best and brightest gravitate to work in this space.
Another interesting point is that academia is now looking to us (schools near us in California such as Stanford and others) because we have access to huge volumes of data that they do not normally see or have access to in the academic world.
Since AK is a media-intelligence company, I am working on new ways to inform marketers about classic sales theory. For instance, how should marketers think of the sales funnel in digital advertising so they can create more effective messages to reach more people and drive more sales? This, of course, touches on the problem of valuation. What is an ad worth? How do you attribute value to ads that users have seen on their journey to purchase (multi-touch attribution)? Proper valuation is important not only for marketers but also for real-time bidders, keyword optimization, pacing, and so on. It is no secret that the existing methods (last-touch attribution, for example) are a joke, but to really think about this problem and approach it in a truly academic fashion is a blast.
Ad Age: Speaking of education, what fields of study and professional backgrounds do you think help develop the best data people?
Mr. Curcio: This is a hot topic among data-science managers. I look for people that have backgrounds in physics, statistics or math. The computer-science component of the data-science toolkit is not that hard to pick up for most people. However, the problem-solving skills beaten into kids in fields such as physics are very hard to learn. I look mostly for people who are just curious about the world and have a passion to find out how things work and want to solve big problems.
Ad Age: What's a trend on the horizon you expect to be thinking more about?
Mr. Curcio: Aside from computing overall value of online advertising, there is also a clear push to think of a company's entire advertising budget at once and how it can be spent more efficiently. Answering questions around 1) how offline advertising plays with online, 2) how attribution and ad effectiveness should be measured, and 3) how a Super Bowl ad should be valued in the context of a Yahoo page takeover are really starting to heat up. We also work with academics in this capacity to help us get to unbiased solutions, which I believe will increase in the near future. Most companies are good at dealing with big data but not all have the horsepower (or neutrality in terms of media and data) to deal with the math and models the way academia does. And academics are lacking access to real data. Like I said before, Math Men meet Mad Men -- it's a match made in heaven!