Datawocky

On Teasing Patterns from Data, with Applications to Search, Social Media, and Advertising

Google Chrome: A Masterstroke or a Blunder?

The internet world has been agog over Google's entry into the browser wars with Chrome. When we look back to this event  several years from now with the benefit of hindsight, we might see it either as a master stroke, or as Google's biggest strategic misstep.

The potential advantages to the internet community as a whole are considerable. The web has evolved beyond its roots as a collection of HTML documents and dumb frontends to database applications. We now expect everything from a web application that we do from a desktop application, and then some more: the added bonus of connectivity to vast computing resources in the cloud. In this context, browsers need to  evolve from HTML renderers to runtime containers, much as web servers evolved from simple servers of static files  and cgi scripts to modern application servers with an array of plugins that provide a variety of services. Chrome is the first browser to explicitly acknowledge this transition and make it the centerpiece of their efforts, and will force other browsers to follow suit. We will all benefit.

The potential advantages to Google also are considerable. If the stars and planets align, they can challenge Microsoft's dominance on the desktop by making the desktop irrelevant. Even otherwise, they can hope to use their dominance in search to promote Chrome, gaining significant browser marketshare and ensuring that Microsoft cannot challenge Google's search dominance by building features into Internet Explorer and Windows that integrate MSN's search and other services.

Therein, however, lies the first and perhaps the biggest risk to Google. Until now, Microsoft has been unable to really use IE and Windows to funnel traffic to MSN services and choke off Google. Given their antitrust woes, they have been treading carefully on this matter. Any overt attempt by them will evoke cries of foul from many market participants. Google has been in a great position to lead the outcry, because it has been purely a service accessible from the browser, without any toehold in browser market itself.

Chrome, however, eases some of the pressure on Microsoft. If Microsoft integrates MSN search or other services tightly into IE, it will be harder for Google to cry foul -- Microsoft could point to Chrome, and any steps taken by Google to integrate their services into Chrome, as counter-arguments. In addition, any outcry from Google can now be characterized as sour grapes from a loser -- Microsoft can say, we both have browsers out there, they have one too, ours is just better, and let consumers decide for themselves.

In some sense, regardless of the actual market penetration of Chrome, Google has lost the moral high ground in future arguments with Microsoft. I wonder whether Google might have achieved all their aims better not by releasing a Google-branded browser, but by working with Mozilla to improve Firefox from within.

Second, while Google has shown impressive technological wizardry in search and advertising, the desktop application game is very different from the internet service game. While users are very forgiving about beta tags that stay for years on services such as gmail, user expectations on matters such as compatibility and security bugs are very high for desktop applications. It remains to be seen whether Google has the culture to succeed in this game, going beyond providing whiz bang features that thrill developers -- such as a blazingly fast Javascript engine -- to deliver a mainstream browser that competes on stability, security, and features.

The third problem is one of data contagion. Google has the largest "database of intentions" in the world today: our search histories, which form the basis of Google's ad targeting. The thing that keeps me from freaking out that Google knows so much about me is that I access Google using a third-party browser. If Google has access to my desktop, and can tie my search history to that, the company can learn much about me that I keep isolated from my search behavior. The cornerstone of privacy on the web today is that we can use products from different companies to create isolation: desktop from Microsoft, browser from Mozilla, search from Google. These companies have no incentive to share information. This is one instance where information silos serve us well as consumers. Any kind of vertical integration has the potential to erode privacy.

I'm not suggesting that Google would do anything evil with this data, or indeed that the thought even crossed their minds; thus far Google has behaved with admirable restraint is their usage of the database of intentions, staying away for example from behavioral targeting. But we should all be cognizant of the fact that companies are in business purely to benefit their shareholders. At some point, someone at Google might realize that the contents of my desktop can be used to target advertising, and it might be prove tempting in a period of slow revenue growth under a different management team.

Two striking historical parallels come to mind, one a masterstroke and the other a blunder, in both cases setting into motion events that could not be undone. In 49 BC, Julius Caesar crossed the Rubicon with his army, triggering a civil war where he triumphed over the forces of Pompey and became the master of Rome. And in 1812, Napoleon Bonaparte had Europe at his feet when he made the fateful decision to invade Russia, greatly weakening his power and leading ultimately to his defeat at Waterloo. It will be interesting to see whether Chrome ends up being Google's Rubicon or its St. Petersburg. Alea iacta est.

September 07, 2008 in Advertising, Search | Permalink | Comments (18) | TrackBack (0)

Why Google Doesn't Provide Earnings Forecasts

Most public companies provide forecasts of revenue and earnings in the upcoming quarters. These forecasts (sometimes called "guidance") form the basis of the work most stock analysts do to make buy and sell recommendations. Much to the consternation of these analysts, Google is among the few companies that have refused to follow this practice. As a result, estimates of Google's revenue by analysts using publicly available data, like comScore numbers, have often been spectacularly wrong. Today's earnings call may be no different.

A Google executive once explained to me why Google doesn't provide forecasts. To understand it, you have think about the engineers at Google who work on optimizing AdWords. How do they know they're doing a good job? We know that Google is constantly bucket-testing tweaks to their AdWords algorithms. An ad optimization project is considered successful if it has one of two results:

  • Increase revenue per search (RPS), while not using additional ad real estate on the search results page (SERP).
  • Reduce the ad real estate on each SERP, while not reducing RPS.

The tricky cases are the ones that increase RPS, while also using more ad real estate. It then becomes a judgment call on whether they should be rolled out across the site. If Google were to make earnings forecasts, the thinking went, there would be huge temptation to roll out tweaks in the gray area to make the numbers. As the quarters roll by, the area of the page devotes to ads would keep steadily increasing, leading to longer term problems with customer retention.

Of course, this doesn't mean there is no earnings pressure. In reality, whether they issue guidance or not, Google's stock price does depend on whether they continue to deliver robust revenue and earnings growth. So implicitly, there is always pressure to beat the estimates. And for the first time, as Google's stock has taken a hammering in recent months, I've heard about hiring slowdowns at Google. So there is definitely pressure to cut costs as well. It will be interesting to observe the battle between idealism and expediency play itself out, with its progress reflected in the ad real estate on Google's search results. It's easy to be idealistic with the wind behind your back; the true test is whether you retain the idealism in the face of headwinds. Time will tell.

This brings us to today's earnings call. In my experience, the best predictor of Google earnings has been Efficient Frontier's excellent Search Engine Performance Report. EF is the largest ad agency for SEM advertisers, and manages the campaigns of several large advertisers on Google, Yahoo, and Microsoft. As I had noted earlier, in Q1 an estimate based on their report handily beat other forecasts, most of which use ComScore data. (Disclosure: My fund Cambrian Ventures is an investor in EF.)

EF's report for Q2, released this morning, indicates a strong quarter for Google. Google gained more than its fair share of advertising dollars in Q2 2008. For every new dollar spent on search advertising, $1.10 was spent on Google, at the expense of Yahoo and Microsoft. In addition, Google's average cost-per-click (CPC) increased by 13.8% in Q2 2008 versus Q2 2007, while click volume and CTR increased as well. And, there was strong growth overseas as well, which should help earnings given the weak dollar.

I don't have the time right now to do the math and figure out whether the robust performance was sufficient to beat the Street's estimates. You should read the report for yourself and make that call.

Update: Google's results, although robust, were below expectations. The biggest moment in the earnings call for me was this quote from Sergey (via Silicon Alley Insider):

Sergey said the company may have overdone its quality control efforts in the quarter (reducing the number of ads), and the reversal of this could provide a modest accelerator to Q3

Quality efforts "overdone"? Apparently those pressures are telling after all, and Google is going abandon their principles a wee bit to venture into the grey zone. Is is the start of a slippery slope?

July 17, 2008 in Advertising, Search | Permalink | Comments (1) | TrackBack (0)

Is Search Advertising a Giffen Good?

The Giffen good is a strange beast from economic theory.  For most goods, demand decreases as price increases. A Giffen good defies this normal market behavior -- the demand for it increases even as its price increases.

Giffen goods have a very interesting history. They were postulated originally by Alfred Marshall in his 1895 book The Principles of Economics. The classic example is staple foods such as rice, wheat, and potatoes. As their price goes up, poor people on a tight budget actually consume more of them, because they are forced to cut back on luxuries such as meat, but still need the same number of calories to survive. Until recently, Giffen goods remained a theoretical beast, with no real documented examples -- until 2007, when two Harvard economists demonstrated that rice and noodles behave as Giffen goods in certain poor parts of China.

Google's recent results raise the possibility that search advertising might be a Giffen good. Here's a simple model. Company X spends marketing dollars on two channels: search advertising and brand advertising (on the web or on TV and magazines). Search advertising drives customers directly to their site, resulting in immediate sales. Brand advertising drives organic traffic, albeit in a more unmeasurable way.

In an economic downturn,  companies get more cautious with their marketing budgets, moving more dollars into measurable and direct channels such as search advertising while cutting back on less-measurable brand advertising. Thus, there is more competition for the clicks, driving up the price (cost-per-click, or CPC) of search ads.

Company X, therefore, finds all their increased spend on search marketing actually drives the same or even fewer visitors to their site. At the same time, since they have cut back on brand advertising, organic traffic is decreasing. But wait -- we need to make this quarter's numbers! The easiest way to do that is cut back even more on brand advertising and channel even more dollars into search, which can drive immediate clicks towards the end of the quarter. Brand marketing's ROI is longer-term, while this quarter's revenue is a more pressing concern.

Witness the result: company X spends more on search marketing, driving more search ad clicks to its site, at a higher price point.  The definition of a Giffen good! Interestingly, unlike the rice-and-noodles example, the increased consumption directly leads to the increased price, because of the auction pricing model.

Google's recent results seem to confirm this hypothesis:  paid clicks increased by 20% from Q1 2007, while ad revenues increased by 40%, implying a CPC increase of 16%.  Of course, there's a limit to this phenomenon: companies cannot pay for more their ad clicks than their profit margins allow. Until that time, the sucking sound you hear is everyone's profit margins going into Google. We're going to see a lot of low-margin revenue increases at online retailers and other companies that rely on paid search.

April 30, 2008 in Advertising | Permalink | Comments (6) | TrackBack (0)

More data beats better algorithm at predicting Google earnings

Readers of this blog will be familiar with my belief that more data usually beats better algorithms. Here's another proof point.

Google announced earnings today, and it was a shocker -- for most of Wall Street, which was in a tizzy based on ComScore's report that paid clicks grew by a mere 1.8% year-over-year. In the event, paid clicks grew by a healthy 20% from last year and revenue grew by 30%.

In comparison, SEM optimizer Efficient Frontier released their Search Performance Report on their blog a few hours ahead of Google's earnings call. EF manages the SEM campaigns of some of the largest direct marketers, handling more SEM spend than anyone in the world outside of the search engines themselves. Their huge volumes of data give them more insight into Google's marketplace than anyone outside of Google.

EF reported a 19.2% increase in paid clicks and 11.2% increase in CPCs at Google Y-O-Y. Do the math (1.192*1.112 = 1.325), that's a 32.5% Y-O-Y revenue increase. That's the closest anyone got to the real numbers!  And this quarter is not a flash in the pan: in January, EF reported a 29% Y-O-Y increase in SEM spend, with 97% of the increased spend going to Google: that is, about a 28% Y-O-Y revenue increase for Google. That compares very favorably with the actual reported increase of 30%.

As Paul Kedrosky points out, this is a huge indictment of ComScore's methodology (ComScore's shares are trading down 8% after-hours post the Google earnings call). ComScore sets a lot of store on their "panel-based" approach, which collects data from a panel of users, similar to Nielsen's method of collecting data on TV viewing using data from a few households that have their set-top boxes installed. ComScore has been in this business longer than anyone else, and has arguably the best methodology (i.e., algorithm) in town to analyze the data. They're just not looking at the right data, or enough of it.  Some simple math using the mountain of data from EF handily beats the analysis methodology developed over several years using data from a not-so-large panel.

To my mind, this also puts in doubt the validity of ComScore's traffic measurement numbers. For websites where I personally know the numbers (based on server logs), both Quantcast and Hitwise come far closer to reality than ComScore. The latter two don't rely as heavily on a small panel. ComScore's value today is largely driven by the fact that advertisers and ad agencies trust their numbers more than the upstarts. Advertiser inertia will carry them for a while; but a few more high-profile misses could change that quickly.

Disclosure: Cambrian Ventures is an investor in EF. However, I don't have access to any information beyond that published in their public report.

April 18, 2008 in Advertising, Data Mining, Search | Permalink | Comments (6) | TrackBack (0)

Affinity and Herding Determine the Effectiveness of Social Media Advertising

A recent piece in The Economist raises a provocative question: social networking sites such as Facebook, Targetability_affinity MySpace, and Bebo have grown tremendously in usage, but are they viable businesses? In other words, is it possible to monetize these services in an effective fashion? To answer this question, it helps to take a step back and look at the monetizability of social media as a whole.

Since most social media sites rely on advertising revenues, let us restrict ourselves to advertising as the monetization mechanism. Regardless of the model (CPM, CPC, CPA), advertisers value three key measures: reach, frequency, and targeting. Many social media sites certainly score high on reach and frequency, but how do they fare on targeting? Targeting is key, because it determines the CPM rates advertisers are willing to pay. And CPM rates vary very widely: from $16-20 for TripAdvisor to $0.10 for Facebook and MySpace. See, for example, this media plan.

What drives such a wide divergence in CPM rates among social media sites? Are the low rates at social networking sites a transient aberration, with higher rates around the corner as advertisers get more comfortable with the medium? And is there a simple model to predict the targetability of different forms of social media?

Remarkably, there appears to be a single factor that explains a great deal of the available data. Consider the difference between a Facebook profile and a TripAdvisor travel review. A typical pageview on the former is by someone known very well to the creator of the profile – a close friend or acquaintance. On the other hand, a TripAdvisor travel review is seen by people completely unrelated in any way to the person or persons who wrote the reviews on the page.

We quantify this distinction with a measure called affinity. The “affinity” of a social media service is the average closeness of relationship between a content creator and someone who views that content.  The affinity of Facebook is very high, while the affinity of TripAdvisor is very low.

Here’s the key observation: There is an inverse relationship between the affinity of a social media service and its targetability. Why is this true? The act of viewing a Facebook profile gives us very little information about the viewer, other than the fact that she is friends with the profile creator; when someone views a TripAdvisor travel review, she is definitely interested in traveling to that location.

I estimated the affinity of several forms of social media, and plotted affinity aginst CPM (which I used as a proxy for targetability). The resulting graph (click for a larger image) shows the landscape of Affinity versus Targetability for several forms of social media. Some of these data points are from published data and others are extrapolated. We can see that there is a strong inverse proportionality, with a couple of outliers. We’ll get to the outliers in a moment; for now, note that Social Networks and Photo Sharing sites are even higher affinity (and therefore lower targetability) than email. This is because we often email people we don’t know or know only in passing. Instant messaging has the very highest of affinities: my IM buddy list includes only my very closest friends, who I trust with the ability to interrupt me any time of the day.

What about the outliers? Video sharing sites, such as YouTube, have low affinity, because the majority of people see videos posted by people they don’t know. However, the targetability is lower than we would expect, because of a compensating factor: herding. Most people see videos featured on lists such as “Most Popular”, which reduces the targeting value of such videos. This is also true of social news sites such as Digg.

A couple of caveats:

  • This is a broad brush-stroke, and individual services might well differ from the overall category. For example, popular blogs have much lower affinity and therefore much higher CPMs than the typical blog.
  • Targetability is not the only factor determining CPM; there are others. For example, certain viewer intents are inherently more valuable than others.

But with these caveats, this simple model is highly instructive. We may conclude that, when all the dust settles, the CPM rates of instant messaging services will not exceed those of social networks, which will not exceed those of email. These are inherently low CPM businesses.

What can social media sites do to increase their CPMs? There appear to be two options:

  • Create sections of the network that are more topic-oriented, and less about individuals. For example, band pages and groups on MySpace, and Facebook groups.
  • Mine individuals’ profiles, or their off-site behaviors, to target them behaviorally rather than contextually. This approach carries with it dangers of privacy violations, as the Facebook Beacon fiasco demonstrates.

If social networks are to become a viable business, they need to pursue aggressively one or both of these approaches. Of course, it may be possible for some services to sidestep this question entirely and develop business models that don’t depend on advertising. We haven’t seen such a model emerge yet, but there is so much creativity and ferment in this space that it might just happen.

Update: I received some questions about the affinity versus targetability landscape. Here's a brief description of the methodology. I used published CPM numbers where they were available; e.g., Yahoo Mail ($3-4), TripAdvisor ($16), Facebook ($0.10-0.15). Note that published CPMs are generally to be taken with a pinch of salt, since they may apply only to small portions of the overall publisher inventory and not represent real market-clearing prices e.g., Google's stated goal of $20 CPM for YouTube -- only a very small number of YouTube videos show ads today. I've used Metacafe's $5 net CPM payout to video producers as a more reasonable benchmark -- this likely represents a gross CPM of $10 assuming a 50% rev share. For blogs, the numbers are all over the place: BlogAds ratecards for various blogs vary from $1-$4CPM, Valleywag reports $6.50-$9.75, and Federated Media has ratecards charging $7-$40. I took $10 to be a median for blogs with reasonably high traffic. Some of the other data points are based on guesses and informal conversations, since these sites typically don't publish their CPMs. Please email me if you have additional data on these; I will update the graph accordingly.

April 04, 2008 in Advertising, Social Media | Permalink | Comments (10) | TrackBack (1)

Enumerating User Data Collection Points

The New York Times had a great piece yesterday by Louise Story, titled To Aim Ads, Web is Keeping a Closer Eye on You. The piece enumerates data collection points: opportunities for internet companies to collect data on users -- data that might enable better ad targeting. The piece comes out with some eye-popping numbers, such as: Yahoo collects data on each user 2520 times a month, on average!

Now, that's a very interesting statistic, and this piece makes a great contribution by putting for the first time a tangible measure on data collection. Let's take a closer look at the methodology, outlined in a post by Story on the NYTimes blog Bits. The methodology was jointly developed with ComScore. At the highest level, many web companies operate their own web properties in addition to an ad network that serves ads on third-party sites, so let's first look at just the owned-and-operated properties. Here's the salient bit:

The comScore study tallied five types of “data collection events” on the Internet for 15 large media companies. Four of these events are actions that occur on the sites the media companies run: Pages displayed, search queries entered, videos played, and advertising displayed. Each time one of those four things occurs, there is a conversation between the user’s computer and the server of the company that owns the site or serves the ad. The fifth area that comScore looked at was ads served on pages anywhere on the Web by advertising networks owned by the media companies.

By this metric, the number of data collection points per person per month at Yahoo is 699 (the difference between this number and the previous 2520 number is to Yahoo's off-site ad serving on its ad network). There are a few problems with this analysis:

  1. It assigns the same value to every data collection event. Search events reveal way more intent than  content page impressions, so that's certainly not accurate. However, let's leave this issue aside for the moment, and assume we are just enumerating data collection events, not actually measuring their value. More on that in a later post.
  2. There is some doublecounting going on here. Advertising is displayed on pages, so counting both the page view and the ad impressions on it as unique data collection events is not accurate. In fact, the data collected based on the page view is often used to target the ad impression. There is sometimes additional information from ads, but only when ads are actually clicked -- but since clickthroughs on display ads are very low (in the 0.1% range), we can safely ignore that additional information, given the gross approximations we are making in this analysis.
  3. Finally, the same problem as in (2) applies to ads served by the ad networks on third-party sites, except it's probably even worse. Each page on average has 2-3 ad units. So, the ads-served might be overcounting actual page view events by a factor of 2-3.

If we recalculate the numbers after making the adjustments required by (2) and (3), it appears we will end up with numbers that are a factor of 2 to 3 (or even more) smaller than the comScore/NYTimes estimates. Also, the reduction will be larger for Yahoo and AOL/Ad.com than for Google, because Google serevs many fewer display ads and is less affected by adjustment (2) -- although adjustment (3) is still applicable since content ages often have more than 1 adsense unit.

I would love to hear more from comScore and Louise Story about their methodology and whether I missed something in my analysis here.

March 11, 2008 in Advertising | Permalink | Comments (0) | TrackBack (0)

About

  • Anand Rajaraman
  • Datawocky

Recent Posts

  • Stanford Big Data Course Now Open to the World!
  • Goodbye, Kosmix. Hello, @WalmartLabs
  • Retail + Social + Mobile = @WalmartLabs
  • Creating a Culture of Innovation: Why 20% Time is not Enough
  • Reboot: How to Reinvent a Technology Startup
  • Oscar Halo: Academy Awards and the Matthew Effect
  • Kosmix Adds Rocketfuel to Power Voyage of Exploration
  • For Startups, Survival is not a Strategy
  • Google Chrome: A Masterstroke or a Blunder?
  • Bridging the Gap between Relational Databases and MapReduce: Three New Approaches

Recent Comments

  • mona on Stanford Big Data Course Now Open to the World!
  • Voyager on Stanford Big Data Course Now Open to the World!
  • Gautam Bajekal on Stanford Big Data Course Now Open to the World!
  • online jobs on Not all powerlaws have long tails
  • rc helicopter on Not all powerlaws have long tails
  • tory burch outlet on Goodbye, Kosmix. Hello, @WalmartLabs
  • SHARETIPSINFO on Goodbye, Kosmix. Hello, @WalmartLabs
  • Almeda Alair on Goodbye, Kosmix. Hello, @WalmartLabs
  • discount mbt on Retail + Social + Mobile = @WalmartLabs
  • custom logo design on Retail + Social + Mobile = @WalmartLabs

Archives

  • September 2014
  • May 2011
  • April 2011
  • April 2009
  • February 2009
  • December 2008
  • November 2008
  • September 2008
  • July 2008
  • June 2008

More...

Blogroll

  • The Numbers Guy
  • Paul Kedrosky's Infectious Greed
  • Life in the Bit Bubble
  • Kosmix Blog
  • John Battelle's Searchblog
  • GigaOM
  • Geeking with Greg
  • Efficient Frontier Insights
  • Data Mining Research
  • Constructive Pessimist, Cynical Optimist

 Subscribe in a reader

Subscribe to Datawocky by Email

Popular Posts

  • Are Machine-Learned Models Prone to Catastrophic Errors?
  • Why the World Needs a New Database System
  • Why Yahoo Glue is a Bigger Deal than You Think
  • The story behind Google's crawler upgrade
  • Affinity and Herding Determine the Effectiveness of Social Media Advertising
  • More data usually beats better algorithms, Part 2
  • More data usually beats better algorithms
  • How Google Measures Search Quality
  • Angel, VC, or Bootstrap?
  • India's SMS GupShup Has 3x The Usage Of Twitter And No Downtime

Categories

  • Advertising (6)
  • Data Mining (11)
  • Entrepreneurship: views from the trenches (2)
  • India (5)
  • Internet Infrastructure (3)
  • kosmix (2)
  • Lewis Carroll (1)
  • Mobile (6)
  • Search (10)
  • Social Media (2)
  • Venture Capital (4)
See More

Twitter Updates

    follow me on Twitter