Learn to Think Like a Data Scientist

By Published on .

At this point, it's cliché to emphasize the potential benefits of leveraging data to innovate and advance your marketing efforts. Every marketer knows the value of a data-driven campaign. But being "data-driven" is still elusive. Sure, data is used here and there, but the breadth of available data is largely untapped.

Beyond transactional records and CRM databases, there is the incredible depth of interaction data that can be used to strengthen modern marketing and advertising efforts. There are contextualizing public data sources like census demographics, macroeconomic indicators, weather records, social media posts and responses, and so on. Last, but certainly not least, are datasets generated from the targeted use of platforms like SurveyMonkey.

How can marketers use data more effectively? For starters, they can learn to think more like data scientists.

Data scientists consistently deliver robust insights by working through a process that extends beyond the analysis phase. This process starts by building an understanding of how the data was produced: what was measured, and who or what did the measuring. Next comes data wrangling. There's a good chance your data won't be in the ideal format for the analysis you want to run. You might need to pivot, filter or blend it with other datasets. Finally, in the analysis phase, data scientists know it is crucial to explore multiple analyses by generating variations of the data, or how key aspects are measured, and then assess how these changes affect the results.

How can you bring data science processes to bear on your own marketing analyses? Here are three key questions to orient your efforts:

What Process(es) Produced the Data?

Most people aren't interested in the data itself, but in what the data represents. Like paintings, stories or maps, the fact that data is a representation means it will contain distortions or biases that could impact your analysis. Many of these distortions come from how the data was collected. Perhaps some of the measurements are missing, or perhaps some of the measurements are systematically off (too big, too small, etc.).

Consider your customer (existing and potential) data. Most likely, the details in that dataset were manually entered, either by a prospect in the form of an inbound lead, or by an employee in the form of a marketing- or partner-sourced lead. What's important here is the manual-entry aspect. Manually-entered data often contains duplicates (multiple entries for the same company or customer), typos and conflicting information (e.g., notes from different conversations that assign people to conflicting roles or imported reporting details like deal amount or deal dates that have conflicting entries). Knowing that you are working with manually-entered data, you can check for these kinds of errors and build up business logic to address them.

Digging a bit deeper, here's a short list of more specific questions to help you understand how your data was created, and what biases or distortions it might contain:

  • Were the measurements in the dataset taken simultaneously or over time?
  • Did the units of measurement change over time (e.g., from EUR to USD)?
  • Did the people or systems making the measurements change (e.g., swapped out, re-calibrated, upgraded, etc.)?
  • Are the same entities measured over time (e.g., the same person or household)? Or just similar ones?

Does the Scope of Data and Analysis Match?

Armed with an understanding of how your data was created, the next step is wrangling it into a suitable form for analysis. This could include restructuring the data by filtering down to a subset (e.g., to particular regions or customer segments), expanding encoded or abbreviated values, un-nesting values stored in a hierarchical structure, or removing duplicate records. You may also need to address invalid entries, like misspellings and inconsistent or missing values.

The primary goal in wrangling is to align the scope of the data with the scope of the analysis. Scope has two primary aspects. The first corresponds to the kind of records in the dataset. If you want to analyze customers, each record in the dataset should represent a customer. If you want to analyze customer interactions by month, you'll likely want to start with a dataset in which each interaction is a record, which you'll need to aggregate into monthly statistics. Also, the structure of the record—what fields are represented, and the syntax of the values in those fields—is a critical concern relative to the desired analysis. Do you have the information you need, in a format you can use?

The second aspect of scope is coverage. If you want to analyze customers, for example, are all the customers in your dataset, or have some been (un)intentionally left out? Similarly, if you are analyzing monthly aggregates of interaction events, are you missing months? Are you missing some interaction events inside of each month? Tracking the transformations applied to the data, from initial loading to the point of analysis, should reveal the majority of coverage issues.

Pushing a little further on scope, here's a short list of more specific questions to consider:

  • Do the records in your data correspond to the level of your analysis? If not, can you aggregate them up or extract out the appropriate information?
  • Do the records in your dataset contain the fields you need for your analysis?
  • Do the fields in your dataset need some cleaning (to remove invalid values), normalization (to align categories or resolve minor variations like misspellings or abbreviations), or conversion (e.g., to align currencies or time zones)?
  • Does your dataset contain all the records you expect it to without duplication?

What Externalities Could impact the Results?

When wrangling data in preparation for an analysis, it is critical to avoid introducing bias or distortion. For example, selectively dropping some records could skew the results significantly. Just as important as reviewing the wrangling process is thinking expansively about external factors that might skew the results. Some of these factors are known and can be accounted for, such as seasonal shifts in shopping behavior. Others are unknown, or at least, unknown at the time of your analysis. For example, suppose you ran a big holiday ad campaign and sales were higher than expected, leading you to believe that the campaign was a success. But suppose your closest competitor crashed and burned during the holiday. Your sales lift might be less the result of your campaign than the external effect of your competitor's performance.

When it comes to externalities, consider the following actions:

  • Talk with people who can inform you about relevant factors.
  • Explore adjacent analyses that can reveal the impact of externalities on your primary analysis. For example, benchmark performance changes against multiple competitors, not just your primary one. If your industry experienced a macro shift, all of your competitors should show a similar shift.
  • Augment the data in your analysis with additional data that includes measurements from external factors (like weather data or late-breaking-news data).
  • Across the analysis variations, document inconsistencies. In many cases, inconsistencies are the first indicator of problems with the processes that produced the data. Moreover, this documentation will help facilitate discussion around the best data-wrangling actions for operationalizing your marketing insights.

The Bottom Line

When faced with increasing demands to leverage data, many organizations post a want ad for a data scientist. The explosive growth of data science as a business function reflects this trend. A recent study based on LinkedIn data estimates that the number of data scientists has doubled in the last four years. Some of this growth is simply title inflation. It isn't all hype, though. If your marketing goals require exotic insights, or if you need to navigate a particularly difficult domain, you probably need a data scientist (or a team of them).

However, a plethora of insights can be found without a data scientist, if you take the time to think a little bit more like one yourself. The key is to pursue robust findings by exploring many possible analyses. Do this with a healthy dose of skepticism and you'll develop a consistent ability to separate the signal from the noise.

Most Popular
In this article: