When 30-year-old Kris Kubicki and his buddies got caught hacking into his high school school's locker system, little did the 4-H Rocketry state champion know he'd one day start a company with its roots in such creative data analysis exploits.
Today, Mr. Kubicki's firm, Dynamite Data, crawls the web for data on product pricing, store inventory, and other data clients use to dynamically alter prices, inform search marketing efforts, and deal with shoppers who browse in-store but buy elsewhere online.
Despite losing a scholarship to Cornell University, Mr. Kubicki landed at University of Illinois, another "great school" where he got his bachelor's degree in computer science. Working as a tech blogger while he got his degree, Mr. Kubicki began publishing regular guides on prices for computer memory, new video cards, and other computer products with regularly fluctuating prices.
"This is essential information if you're going to buy, like, 10,000 of these things," he said. It wasn't just buyers who were interested though. Sellers took notice, and began demanding the information for competitive intelligence purposes. In 2007, he founded Dynamite Data, and spent the next year writing the first iteration of the software that operates its data gathering and analysis.
Rounding up clients for the retail pricing data service wasn't easy at first, said Mr. Kubicki, recalling a tanking economy in 2008 and potential clients whose businesses were bearing the brunt. "It was tough to hear a 26-year-old tell you, 'You're gonna need this.' There were a lot of growing pains."
The Chicago-area firm is still scrappy, with only around 20 people on staff, according to Mr. Kubicki, who noted area retailer Abt Electronics is a client. The company is reluctant to name other clients.
As chief architect, Mr. Kubicki handles a variety of tasks from coding to business development and project management. But there's nothing more satisfying than coming up with a solution to a complex data problem, he said. "I like to see if I can solve the problem with my own hands….That's the kind of thing that's deeply rewarding to me."
Ad Age: What do you wish marketers would understand about what data scientists do?
Mr. Kubicki: Almost 30 years ago an internet pioneer by the name of Stewart Brand put out an aphorism -- perhaps the most famous aphorism regarding data science: "Information wants to be free. Information also wants to be expensive. Information wants to be free because it has become so cheap to distribute, copy, and recombine -- too cheap to meter. It wants to be expensive because it can be immeasurably valuable to the recipient. That tension will not go away."
People like me have been studying the meaning of just that phrase for decades, but the basic gist is that there are two simultaneous forces operating in the universe of big data: Haystacks are getting bigger, but the needles keep increasing in value. The right information at the right time just changes your life.
Marketers today have more access to information today than first-world presidents had just 10 years ago. In the case of Dynamite Data, your average garage startup can afford access to every price point and consumer rating of every widget ever sold in every country over the last five years. Just ten years ago, that kind of information would cost industry analysts hundreds of millions of dollars just to get an estimate. Data scientists are constantly trying to make information more valuable by adding other data points and using the insights to inform more decisions.
Ad Age: What's the biggest problem with data science people as they navigate the world of marketing?
Mr. Kubicki: Data scientists need to fully understand how marketers are using their data to make decisions. With our own clients, we find that the better we understand how they are trying to use our data, the better we can tailor our data to meet their needs. For example, if a sales manager is calling on accounts that are out-of-stock, they can't be relying on stock status information from the day prior. Or if a product manager is making pricing decisions on seasonal products, they need insights at a local level, as the situation in Los Angeles is likely different than in New York. Marketers often don't know what data is available, so it is up to the data scientist to help bridge that gap.
Ad Age: Home Depot recently purchased pricing data firm Black Locus. Do you anticipate similar acquisitions happening in the pricing-data-services sector? For instance, do you think more large retailers will buy firms like yours rather than partnering with them, and what are the pros and cons of a data -pricing firm working for just one retail client?
Mr. Kubicki: Walmart made a similar investment in 2011 when they bought Kosmix, which is now the heart of Walmart Labs. Given how strategic pricing is for retailers, I wouldn't be surprised if these types of acquisitions continue.
For firms like ours, there are advantages under either model. As a smaller company, we can remain nimble and bring additional data sources and uses to our clients. It may also be easier to attract data scientists and engineers, who often like to work in a fast-paced, state-of-the-art environment. However, as part of a client organization, our data expertise could be extended across both public and proprietary data sets, which could help address a broader set of decisions.
Ad Age: What's the coolest or strangest type of data set you've ever worked with, and why?
Mr. Kubicki: As an undergrad, I had to work this very complex set of data for a huge multinational airline. The goal was to create a model that would optimize boarding and deplaning. If I could shave just 10 seconds off the average boarding time, it would save so many tens of millions of dollars a year for the company. I had vast amounts data to solve this problem: customer ages, purchase times of day, luggage weight, weather, etc. All that mattered is that I used all that data to get people on and off the plane faster.
I holed myself up in a lab for a semester and worked on this for months. I ended up building a model that looked at the weight of the luggage being checked at the gate, and assigned boarding priority based on the average checked-luggage weight as the customers got to the airport -- the thought being if you checked heavier luggage early you'd have less to stow. It worked amazing and I had millions of simulated scenarios to prove it worked. I saved a full 15 seconds off each boarding, conservatively.
Unfortunately for me, at the same time this was going on the TSA decided to ban liquids. Near the end of the semester I found out boarding times had decreased by almost two full minutes as a result. Travelers were just checking their bags with liquids rather than deal with the hassle, thus, cutting down on boarding times. Sometimes you eat the bear, and sometimes the bear eats you.
Ad Age: What trends in data analysis and services are intriguing you lately?
Mr. Kubicki: Personally, I find Facebook's Social Graph very interesting. A few years ago we'd call these mashups, where a developer would take a few different application layers from different services and marry them up into something where the sum is greater than the parts. Facebook is attempting to do this in a very people-oriented way, but we're also seeing this with [Google Places] and [Dynamite Data]. These are massive data sets that legions of developers can work on for years without exhausting their value -- and yet these data sets continue to grow exponentially.
Past that, another really great trend is the "internet of things." People used to talk about convergence as if technology was going to stop once we got everything integrated into our smartphone-tablet-thing. But instead it looks like our refrigerators are going to know how many ounces of orange juice you have left, who drank it and possibly if it should order you more. Just a couple years ago your average consumer would lump that scenario in with werewolves and vampires, but today I can buy it from LG. I absolutely cannot wait until someone asks us to start looking at that kind of data.
Ad Age: What educational fields of study and professional backgrounds help develop the best data people?
Mr. Kubicki: Great data scientists are cross-disciplined and creative. They'll know a bunch of different programming languages you've never heard of, and some you probably have. Generally they'll have a math or computer science skill set, but it's not necessarily their primary discipline. Look at Nate Silver, for example: His background is economics. His counterpart on the Obama campaign, Harper Reed, has a degree in philosophy.
For me, the "X factor," really, is that creativity. Dynamite Data has engineers who led previous lives in disciplines like high-frequency trading, mineral exploration and network advertising. When we go to work on a problem, we can literally come up with a dozen ways to solve it before getting set on just one optimal idea. That same group of people then can take the idea and not just build the prototype but also the final product in a relatively short amount of time.
Ad Age: What's a common misconception about big data you'd like to dispel?
Mr. Kubicki: The thing we see over and over again is that someone at some highly regarded journal will say something along the lines of: Big data is great, but there's too much information and you can't possibly understand it. This is unfortunate -- the industry, media and management all conspire to keep big data inaccessible to the people who can benefit from it the most, often midmanagement and project leaders. Our job is to get this data to as many people as possible so that they can make their own decisions without getting lost.
Another particular pitfall we see is when a CEO or a CIO says, "Big data, let's get us some of that" just as if it's dim sum at a restaurant. The key is to make sure you know what questions you are trying to answer and how the data will inform a course of action. Big Data is a great tool to answer the hardest of questions. Big data confirms or denies your hypothesis. Sometimes you have to really contemplate: Are you asking the right questions to get the most out of it?