What the Heck Is Hadoop?

Each Week, DataWorks Defines a Data-Related Term Marketers Need to Know

By Published on .

Apache Hadoop is open-source software with its roots in Google's search-engine technology. It's often considered a key building block enabling the popularity of big-data analytics, in part because it parses data at high speeds, and allows analysts to crunch data that's stored across multiple servers. The platform lets data analyzers sift through huge sets of structured (think proprietary CRM data) and unstructured data (like posts mentioning brands in social media) to create clusters. For instance, the platform can be used to create audience segments or clusters for ad targeting. Companies including Facebook and Google have used Hadoop for years.

Matt Curcio, chief scientist at Aggregate Knowledge, on Hadoop:

Hadoop is simply a data-access pattern that was created due to the inadequacy of current approaches in solving the "big-data" problems of that time. In some ways Hadoop is similar to SQL by providing an approach to querying data, but on a big-data scale. There are many technologies that sit on top of it such as Hive, Pig, etc., which make querying easier. Arguably, Hadoop and Amazon's cloud services spawned the current data-science revolution we are experiencing.

Most Popular