A question any practitioner of Internet-based analytics will be asked by many different stakeholders is “why don’t the numbers match?” Counts of the identically named metrics from ad servers don’t match the web analytics tool, which don’t match the for-pay third party audience measurement tools, which don’t match the free audience measurement tools, which never match any of the homegrown internal measurement tools. And none of them ever match each other.
So it’s a good question certainly valid to ask. The answers are even fairly easy to understand, but the root causes are often difficult to pinpoint and even harder, if possible at all, to remedy. The fact of the matter is that data discrepancies in analytics result for a multitude of reasons, such as:
On the audience measurement side, data is collected from self-selecting panels who install proprietary software (i.e. toolbars and so on) on their computers, perhaps at work or at their university, but most likely at home. Then, the collected data from different panels is rolled-up and combined, and the limited subset of the Internet population that chooses to be monitored, in exchange for some incentive, is inflated and projected to the entire Internet audience using proprietary statistical methods. We also have data collected from a limited set of geographically specific ISP’s. And regardless of whether we’re talking about audience measurement or web analytics, the different data collection methods often, but not always, involve cookies and all their inherent issues of cookie deletion.
- Unique data models. Ad servers aren’t focused on counting page views and the other dimension of web analytics (visits, time, and so on). Rather ad servers focus on serving and counting impressions served (and loads of related derivative calculations, like CTR, CPC, and view–thru). Metrics are based on an ad request and an ad code. Ads may or may not be targeted to a page, and instead to various constructs, like a “zone” or “keyword.” What that means is that the “page” dimension may not even exist in your ad server’s data model. In other words, you aren’t looking at impressions measured on a page, but rather at the number of impressions served in a different conceptual construct. That’s one of the reasons why people say metrics and ad-serving systems “don’t measure the same thing.”
- Cookie issues. When you’re counting based on cookies, third-party cookies get blocked (often by privacy software). Many ad servers and web analytics tools still serve third party cookies, and many corporations have not tricked out their DNS to accommodate this issue. And we all know how cookie deletion affects unique visitor counts, even if you use first-party cookies.
And of course, there’s always the nebulous issue around the complete lack of consensus-based, enforceable standards for online measurement. No industry organization can say what vendors or companies “must” do, only what they “should” do… And no industry body is going to get successful companies to change their secret sauce just because they said so…
So what’s a practitioner to do? Understand the potential sources of discrepancies. Work with your team (from IT to vendors) to prevent and minimize the root causes when possible. Educate your team when discrepancies are not remediable. Ensure you use the different sources of metrics judiciously in the context of your business goals. Finally, realize that none of the tools are more “correct” than any other. All of our analytics tools serve different, and sometimes overlapping, business purposes – from counting ads, to influencing media buying, to sizing audiences, to measuring business performance, and to optimizing the site.