For instance, when he was asked to help determine whether the
London 2012 Olympics would have an impact on domestic bookings
there, Mr. Melkote turned to data gathered by the company during
other international sporting events such as the Beijing 2008
Olympics or the 2010 FIFA World Cup in South Africa. He even uses
UFO sightings
data to evaluate statistical programming tools. "You need a mix
of technical skills and a bit of creativity to think outside the
box," said Mr. Melkote.
With Wyndham since 2005, Mr. Melkote has a bachelor's degree in
engineering and a master's degree in computational finance.
Adverstising Age: What's a common misconception
about big data you'd like to dispel?
Srinidhi Melkote: Unfortunately, the term "big
data" is not well defined. This leads to all kinds of
misconceptions. One common definition places emphasis on the
adjective "big," referring to the size of the data to be collected
and analyzed. This leads to a common misconception, which is that
one needs "big" data to get big value. This is not always the case.
The practice of statistics is all about deriving good insights for
decision making from relatively small samples of representative
data. There is a lot of value to be had from "small" data, provided
it's the right kind of data, the right questions are asked, and
people with the right skills are hired to interpret this data. This
does not mean one shouldn't collect more data; it's just that most
of the value, in the form of information and insights, comes from a
small portion of "big" data.
Ad Age: What do you wish marketers would
understand about what data scientists do?
Mr. Melkote: There is a general assumption
among marketers and business people, in general, that all they need
to do is provide data scientists access to their data and deep
insights will follow with minimal effort from their end. This is
rarely the case. Data scientists have the most success if they are
in constant touch with business units, ask relevant questions and
let answers to those questions guide their analyses. It's hard to
get relevant business context just by looking at data.
Ad Age: What's the biggest problem with
data-science people as they navigate the world of marketing?
Mr. Melkote: Marketing is a complex field and
has multiple short and long-term objectives with overlapping
effects and trade-offs. It can be hard to quantify these objectives
and define the right metrics to track. For instance, one common
objective of pricing and promotions is to set optimal product
prices or run promotions to increase overall revenue. It's fairly
straightforward to build an algorithm that analyzes historical data
to gauge impact of transactions to various price changes, and based
on that, set a price. But this does not even begin to capture the
complexity that is present in the real world. In the short term, it
may turn out that giving healthy product discounts is the best
approach but constantly running promotions may lead to customers
expecting deeper discounts every time resulting in revenue losses
in the longer term.
While this may seem obvious to a marketer or business person, it
is usually not apparent that such effects are at play by just
looking at the data in a cursory fashion. If a data scientist
doesn't know to explicitly model for these, the price
recommendations their algorithms provide will not be accurate. This
is just one example of the sort of business knowledge and intuition
that takes time to develop. Data scientists should expect to face a
significant learning curve to understand these nuances in order to
build such business beliefs into their models.
Ad Age: What's the coolest or strangest type of
data set you've ever worked with and why?
Mr. Melkote: While I get to work with a lot of
interesting data in my day job, my favorite examples of cool or
strange data come from publicly available information. One such is
a dataset of about 60,000 UFO sightings. It has data like the
location of the sighting, date of the sighting, the shape of the
UFO, along with some text describing their experience. I first came
across this data set when I was reading a programming book last
year and now use it from time to time when I am evaluating new
statistical programming tools. If I recall correctly, most people
who sight UFOs tend to describe their sightings as just a flash of
light; quite a few people claim to have discerned the shape to be a
circle or triangle. White is the most popular color for UFO lights
while red and green are not far behind.
Ad Age: What educational fields of study and
professional backgrounds help develop the best data people?
Mr. Melkote: The best data people tend to come
from fields which have a good tradition of combining theory with
applications, which are computationally intensive. Examples include
applied mathematics, operations research, statistics and computer
science. There is increasing interest in creating data science and
analytics as a separate professional discipline as evidenced by the
number of universities now offering master's degrees in these
areas. As for relevant professional backgrounds, working on data
analysis in industries with a good history of collecting and
analyzing data helps a lot. Examples of such industries include
financial services, retail, travel and e-commerce.
Ad Age: What types of data-related services do
you expect more of in the coming year?
Mr. Melkote: A lot of consumer facing services
are actually very data intensive but the key has always been to
make the user experience so seamless that people actually don't
realize these are data related at all (e.g. the Google search bar).
All of the complexity is on the back end and not exposed to end
users. As someone who's involved in designing and developing these
back end data driven processes, I feel that the process of
discovering insights from the data is still a fairly manual and
cumbersome process requiring specialized skills. If you view the
data scientist as a "user" of the data and analysis tools, the user
experience is not very pleasant today. A data scientist spends a
lot of time extracting data from different sources, "cleaning" the
data and formatting it into something usable. This is then followed
by visualizing the processed data, leaving less time to frame and
test hypotheses and build algorithms or models. One needs to have
expertise in a variety of technologies just to get the data in a
usable format before even the true value added work can be
done.
I would like to see some better tools which help ease this
burden and streamline the data scientist's workflow. There has been
progress in this area both in the open source world of R [a
statistics language], Python [a programming language], etc., and
also from established commercial vendors such as SAS. Also, several
startups such as BigML [machine learning] and Precog [analytics
platform] are addressing this problem from different angles. This
is not an easy problem and it's hard to make predictions as
technology is changing quite rapidly.