Back in 2008, when I was 31 years young and a freshly minted Ph.D., I took my first job in digital media. I was hired as a consultant for "advanced analytics" on a new mobile ad server. So I dusted off my statistics books, cracked my knuckles and logged into the machines where six months of server files were housed. And that's when the torture began.
The data files were scattered across directories. The names were inconsistent. The formats varied wildly. Hours passed as I typed out Unix commands on the terminal, tediously stitching together billions of events into a time series of the last few months. As the graph rendered, an amazing trend unfolded: a sudden dip, then a 300% spike in ad traffic! Yet a few more keystrokes revealed the unglamorous reason: The platform had crashed, rebooted and began logging every event in triplicate.
So much for the "advanced analytics" I'd been hired to deliver. Though I'd uncovered a few troubling bugs in the new ad server, those insights weren't gleaned through complex mathematics. It was simple counts, averages and a plain chart that got me there.
Herein lies the dirty secret about most data scientists' work -- it's more data munging than deep learning. The best minds of my generation are deleting commas from log files, and that makes me sad. A Ph.D. is a terrible thing to waste.
So how did we end up here, hiring data scientists with promises of building predictive models, and instead conscripting them to doing arithmetic at scale? The core problems are twofold -- an underestimation by marketing firms of the challenges of data cleansing, and a lack of self-service reporting tools for business users.
Data scientists are enslaved by both. One agency CIO recently told me, "We spend more than 50% of our time acquiring and cleaning data." Unfortunately, much of the janitorial work falls to the whiz kids hired to do downstream analytics. Likewise, on the other side of the data continuum, data-science teams are stuck servicing a barrage of reporting requests from business users.
Luckily, there are several paths for firms to liberate data scientists and clear a way toward the higher-value insights they're paid to pursue. Firms should hire a class of brawny developers who are great partners for data scientists: data engineers. At my company, we've hired an entire department of them, dedicated to building the data plumbing and sanitation processes to extract, transform and load data between external and internal systems.
The primary qualification for a great data engineer isn't education but experience -- those who have learned the black arts of data manipulation out of necessity, whether as an investment analyst, sales engineer or laboratory post-doc. When choosing whom to hire, SQL savvy isn't enough. Look for candidates with Python, Perl, and R -- all great languages for data munging -- listed on their LinkedIn profiles.
Data engineers can help companies build the necessary infrastructure needed to run an efficient business. These individuals decide what to build in-house or when it's more effective to bring in outside technology, depending on the scale of data, level of customization, requirements for external sharing, and desire for an on-premise or cloud-hosted solution. Their efforts are best suited to build out the pipes for the data scientists to work within. This approach frees up data scientists and saves their highly specialized skills for higher-level projects.
Seven years ago, as I huddled over my laptop searching for a signal in the noise of digital-media data, there was only a trickle of interest in data science on Madison Avenue. That trickle is now a flood, and the rising expectations of what data science can achieve -- the end of the make-good, the undoing of Wanamaker's lament -- are far from being fulfilled.
The promised land of insights does exist, but data scientists -- if they are the chosen ones -- deserve to be unshackled from many of the tasks they're forced into doing today. With supporting investments in people and software, data scientists can lead their organizations to a prosperous future.