While the industry is closer than ever to accepting a measurement standard, debate remains about discrepancies between Internet audience measurement figures and server log analysis.
During the past six months, the buzz has been about "reconciliation of the numbers."
In our opinion, talk of reconciliation is wrong. Reconciliation implies that one set of numbers is incorrect and needs to be reconciled to the correct set.
We've found a way to compare these sets of data, something that to our knowledge has never been done before. While both systems are accepted as accurate, each captures Web usage from a different perspective, using different methodologies.
COUNTING AT A GRANULAR LEVEL
Server log measurement monitors usage at a very granular level and provides an anonymous measure of activity at a single Web site.
Audience measurement, on the other hand, tracks usage among a representative sample of users and provides a demographically rich account of Webwide behavior.
To compare the two sources, without adjusting for their systemic differences, would be to compare apples to oranges.
To help the industry understand the differences between audience measurement and server side measurement, Media Metrix and Internet Profiles Corp. analyzed both sets of usage data across more than 30 sites over three months. We learned that there are three fundamental differences between the two data sets:
Each method measures different universes: Server log analysis includes international Web traffic, while audience measurement currently measures Internet usage only among the U.S. population.
Audience measurement systems include pages cached (stored and served to multiple users) while server logs do not; differences arise because of a certain degree of error that is unavoidable in both systems.
nHowever, the most important finding in this work was that the path to reconciling these differences is a path to a clearer, more developed picture of Internet usage.
Both systems are valuable, but each plays a different role and is appropriate in different, non-overlapping decisionmaking situations.
As much as 40% of a typical Web site's traffic originates from outside the U.S. Today Media Metrix's audience measurement system monitors only U.S.-based usage. Simply controlling for this difference in the universe of what is measured eliminates a significant amount of the difference between server log data and audience measurement estimates.
Proxy level and browser caching hide activity from the server. Internet service providers typically store a page and serve it to multiple users, while browsers often store content and pull it from the cache multiple times.
Server log measurement could miss as much as 20% to 40% of a site's usage due to caching. We learned that within a site these estimates are relatively stable and predictable. But many factors affect this estimate.
For example, America Online's Internet traffic plays a large role in caching. Specifically, we found that sites with significant AOL traffic typically exhibited higher cache estimates than those with less traffic from AOL, since AOL caches routinely viewed content on its proxy servers as a way to improve performance across its entire network.
PERSONALIZATION CUTS CACHING
Conversely, sites with a high degree of personalization tend to have lower cache rates.
To the extent that these systems use either algorithms to estimate visits and page views (log files) or are based on samples and projections (audience measurement), there is a certain degree of error/variance around each estimate of activity. While systems are constantly improving, these types of errors and variance in estimates can not be completely eliminated.
Bottom line, these numbers don't match because they measure activity from different perspectives. However, when compared, the two systems can help enrich our understanding of Internet usage. For example, using server log data, it's possible to quantify international traffic in audience measurement terms..
Conversely, audience measurement estimates of cached data can help account for cached information now missing in log data. Comparing the two systems furthers understanding of the bias or errors that are systemic to each.
Mr. Ivins is senior VP at Web measurement company Media Metrix. Mr. Reed is director-marketing and business development at Web auditing company Internet Profiles Corp.
Copyright January 1999, Crain Communications Inc.