The New York Times had a great piece yesterday by Louise Story, titled To Aim Ads, Web is Keeping a Closer Eye on You. The piece enumerates data collection points: opportunities for internet companies to collect data on users -- data that might enable better ad targeting. The piece comes out with some eye-popping numbers, such as: Yahoo collects data on each user 2520 times a month, on average!
Now, that's a very interesting statistic, and this piece makes a great contribution by putting for the first time a tangible measure on data collection. Let's take a closer look at the methodology, outlined in a post by Story on the NYTimes blog Bits. The methodology was jointly developed with ComScore. At the highest level, many web companies operate their own web properties in addition to an ad network that serves ads on third-party sites, so let's first look at just the owned-and-operated properties. Here's the salient bit:
The comScore study tallied five types of “data collection events” on the Internet for 15 large media companies. Four of these events are actions that occur on the sites the media companies run: Pages displayed, search queries entered, videos played, and advertising displayed. Each time one of those four things occurs, there is a conversation between the user’s computer and the server of the company that owns the site or serves the ad. The fifth area that comScore looked at was ads served on pages anywhere on the Web by advertising networks owned by the media companies.
By this metric, the number of data collection points per person per month at Yahoo is 699 (the difference between this number and the previous 2520 number is to Yahoo's off-site ad serving on its ad network). There are a few problems with this analysis:
- It assigns the same value to every data collection event. Search events reveal way more intent than content page impressions, so that's certainly not accurate. However, let's leave this issue aside for the moment, and assume we are just enumerating data collection events, not actually measuring their value. More on that in a later post.
- There is some doublecounting going on here. Advertising is displayed on pages, so counting both the page view and the ad impressions on it as unique data collection events is not accurate. In fact, the data collected based on the page view is often used to target the ad impression. There is sometimes additional information from ads, but only when ads are actually clicked -- but since clickthroughs on display ads are very low (in the 0.1% range), we can safely ignore that additional information, given the gross approximations we are making in this analysis.
- Finally, the same problem as in (2) applies to ads served by the ad networks on third-party sites, except it's probably even worse. Each page on average has 2-3 ad units. So, the ads-served might be overcounting actual page view events by a factor of 2-3.
If we recalculate the numbers after making the adjustments required by (2) and (3), it appears we will end up with numbers that are a factor of 2 to 3 (or even more) smaller than the comScore/NYTimes estimates. Also, the reduction will be larger for Yahoo and AOL/Ad.com than for Google, because Google serevs many fewer display ads and is less affected by adjustment (2) -- although adjustment (3) is still applicable since content ages often have more than 1 adsense unit.
I would love to hear more from comScore and Louise Story about their methodology and whether I missed something in my analysis here.