Datawocky

On Teasing Patterns from Data, with Applications to Search, Social Media, and Advertising

The Real Long Tail: Why both Chris Anderson and Anita Elberse are Wrong

A new study by Anita Elberse, published in the Harvard Business Review, raises questions about the validity of Chris Anderson's Long Tail theory. If you're related to Rip Van Winkle, the Long Tail theory suggests that the dramatically lower distribution costs for media (such as music and movies) enabled by the internet has the potential to reshape the demand curve for media. Traditionally, these businesses have been hits-driven, with the majority of revenue and profits being attributable to a small number of items (the hits). Anderson argues that the internet's ability to serve niches cost-effectively increases the demand for items further down the "tail" of the demand curve, making the aggregate demand for the tail comparable to that for the head.

Anderson's insight resonated instantly with the digerati. It is said that Helen of Troy's face launched a thousand ships; the Long Tail theory certainly launched more than a thousand startups, all with an obligatory Long Tail slide in their investor pitches. Recently, however, there has been a creeping suspicion that the data don't support the theory; the backlash has been spearheaded, among others, by Lee Gomes of the Wall Street Journal. In her piece, Anita Elberse does a deep dive into the data and concludes that the Long Tail theory is flawed.

Anderson has posted a rebuttal on his blog, pointing out a problem with Elberse's analysis: defining the head and tail in percentage terms. There is some truth to Anderson's rebuttal. But the heart of Elberse's criticism lies not in the definition of the head and the tail. It's in using McPhee's theory of exposure to conclude that positive feedback effects reinforce the popularity of hits, while dooming items in the tail to perpetual obscurity. She presents data from Quickflix, an Australian movie rentals service showing that movies in the tail are rated on average lower than movies in the head. Thus, movies in the tail are destined to remain in the tail. Elberse exhorts media executives to concentrate their resources on backing a small set of potential blockbusters, rather than fritter it away on niches.

The big problem with this argument is that it conflates cause and effect. Before the internet, distribution was expensive, and there was no way for consumers to provide instant feedback on products. Consumers then got little choice in the matter of what items were readily available and what items were hard to find. Thus, the hits were picked by a few studio executives, publishers, or record producers who "greenlighted" projects they thought had hit potential. But when distribution is cheap, and consumer feedback loops are in place, the items that a lot of consumers like become popular and move into the head. It's not that items in the tail are inherently rated lower; items are in the tail precisely because they are rated lower.

It's as if we're comparing two systems of government, a hereditary aristocracy and a democracy, by comparing the sizes of the ruling elite in the two cases. That misses the point entirely. What matters is not the size of the ruling elite, it's how they got there. So, the big change wrought by the internet is not so much to change the shape of the demand curve for media products, as Anderson claims; nor has there been no change whatsoever, as Elberse posits. The big change is not in what fraction of the demand is in the head, it's in how the items that are in the head got there in the first place. Any change in the shape of the curve itself is incidental.

There's another market where we are seeing this phenomenon play out: the market for Facebook (and MySpace) apps. In earlier years, it took a lot of capital to get a company off the ground. The companies that got funded were the ones with good business plans who could convince VCs to take the plunge based on the people, the plan, and potentially some intellectual property. But it doesn't take much capital to write a Facebook app, leading to a proliferation of them. This paves the way for the expected inversion. Facebook users don't use the apps that VCs fund. Instead, Facebook users decide which apps they like, and VCs fund the ones, such as Slide and RockYou, that gain popularity.

It is instructive to look at the Facebook Facebook app trends study published by Roger Margoulas and Ben Lorica at O'Reilly Research. The study shows that at last count, there were close to 30,000 facebook apps. Usage, however, is highly concentrated among the top few apps, a classic example of a hits-driven industry (see graph) -- no long tail. However, these hits have been produced by the collective action of millions of Facebook users, rather than by a small set of savvy media executives. And there's a lot of churn: new applications join the winners and old winners die and are buried in the tail.

The real Long Tail created by the internet is not the long tail of consumption, but the long tail of influence. Earlier, the ability to influence the decisions on who the winners and losers were rested with a few media executives. Now every social network user has some potential influence, however small, on the result. The long tail of influence, combined with instant feedback loops, leads to a short tail of consumption. The Facebook app market is a leading indicator of the path the entire media industry will take in years to come.

Update: Chris Anderson has posted a rebuttal in the Comments. Thanks Chris! Please do read his comment and my response. Chris points out that Facebook apps still follow a power law distribution. It doesn't matter how long the tail is, what matters is how heavy it is. The area under the long tail is a function of both length and depth, and depends crucially on the power law exponent. For the mathematically minded, the details are here.

July 09, 2008 in Social Media, Venture Capital | Permalink | Comments (8) | TrackBack (0)

Affinity and Herding Determine the Effectiveness of Social Media Advertising

A recent piece in The Economist raises a provocative question: social networking sites such as Facebook, Targetability_affinity MySpace, and Bebo have grown tremendously in usage, but are they viable businesses? In other words, is it possible to monetize these services in an effective fashion? To answer this question, it helps to take a step back and look at the monetizability of social media as a whole.

Since most social media sites rely on advertising revenues, let us restrict ourselves to advertising as the monetization mechanism. Regardless of the model (CPM, CPC, CPA), advertisers value three key measures: reach, frequency, and targeting. Many social media sites certainly score high on reach and frequency, but how do they fare on targeting? Targeting is key, because it determines the CPM rates advertisers are willing to pay. And CPM rates vary very widely: from $16-20 for TripAdvisor to $0.10 for Facebook and MySpace. See, for example, this media plan.

What drives such a wide divergence in CPM rates among social media sites? Are the low rates at social networking sites a transient aberration, with higher rates around the corner as advertisers get more comfortable with the medium? And is there a simple model to predict the targetability of different forms of social media?

Remarkably, there appears to be a single factor that explains a great deal of the available data. Consider the difference between a Facebook profile and a TripAdvisor travel review. A typical pageview on the former is by someone known very well to the creator of the profile – a close friend or acquaintance. On the other hand, a TripAdvisor travel review is seen by people completely unrelated in any way to the person or persons who wrote the reviews on the page.

We quantify this distinction with a measure called affinity. The “affinity” of a social media service is the average closeness of relationship between a content creator and someone who views that content.  The affinity of Facebook is very high, while the affinity of TripAdvisor is very low.

Here’s the key observation: There is an inverse relationship between the affinity of a social media service and its targetability. Why is this true? The act of viewing a Facebook profile gives us very little information about the viewer, other than the fact that she is friends with the profile creator; when someone views a TripAdvisor travel review, she is definitely interested in traveling to that location.

I estimated the affinity of several forms of social media, and plotted affinity aginst CPM (which I used as a proxy for targetability). The resulting graph (click for a larger image) shows the landscape of Affinity versus Targetability for several forms of social media. Some of these data points are from published data and others are extrapolated. We can see that there is a strong inverse proportionality, with a couple of outliers. We’ll get to the outliers in a moment; for now, note that Social Networks and Photo Sharing sites are even higher affinity (and therefore lower targetability) than email. This is because we often email people we don’t know or know only in passing. Instant messaging has the very highest of affinities: my IM buddy list includes only my very closest friends, who I trust with the ability to interrupt me any time of the day.

What about the outliers? Video sharing sites, such as YouTube, have low affinity, because the majority of people see videos posted by people they don’t know. However, the targetability is lower than we would expect, because of a compensating factor: herding. Most people see videos featured on lists such as “Most Popular”, which reduces the targeting value of such videos. This is also true of social news sites such as Digg.

A couple of caveats:

  • This is a broad brush-stroke, and individual services might well differ from the overall category. For example, popular blogs have much lower affinity and therefore much higher CPMs than the typical blog.
  • Targetability is not the only factor determining CPM; there are others. For example, certain viewer intents are inherently more valuable than others.

But with these caveats, this simple model is highly instructive. We may conclude that, when all the dust settles, the CPM rates of instant messaging services will not exceed those of social networks, which will not exceed those of email. These are inherently low CPM businesses.

What can social media sites do to increase their CPMs? There appear to be two options:

  • Create sections of the network that are more topic-oriented, and less about individuals. For example, band pages and groups on MySpace, and Facebook groups.
  • Mine individuals’ profiles, or their off-site behaviors, to target them behaviorally rather than contextually. This approach carries with it dangers of privacy violations, as the Facebook Beacon fiasco demonstrates.

If social networks are to become a viable business, they need to pursue aggressively one or both of these approaches. Of course, it may be possible for some services to sidestep this question entirely and develop business models that don’t depend on advertising. We haven’t seen such a model emerge yet, but there is so much creativity and ferment in this space that it might just happen.

Update: I received some questions about the affinity versus targetability landscape. Here's a brief description of the methodology. I used published CPM numbers where they were available; e.g., Yahoo Mail ($3-4), TripAdvisor ($16), Facebook ($0.10-0.15). Note that published CPMs are generally to be taken with a pinch of salt, since they may apply only to small portions of the overall publisher inventory and not represent real market-clearing prices e.g., Google's stated goal of $20 CPM for YouTube -- only a very small number of YouTube videos show ads today. I've used Metacafe's $5 net CPM payout to video producers as a more reasonable benchmark -- this likely represents a gross CPM of $10 assuming a 50% rev share. For blogs, the numbers are all over the place: BlogAds ratecards for various blogs vary from $1-$4CPM, Valleywag reports $6.50-$9.75, and Federated Media has ratecards charging $7-$40. I took $10 to be a median for blogs with reasonably high traffic. Some of the other data points are based on guesses and informal conversations, since these sites typically don't publish their CPMs. Please email me if you have additional data on these; I will update the graph accordingly.

April 04, 2008 in Advertising, Social Media | Permalink | Comments (10) | TrackBack (1)

About

  • Anand Rajaraman
  • Datawocky

Recent Posts

  • Stanford Big Data Course Now Open to the World!
  • Goodbye, Kosmix. Hello, @WalmartLabs
  • Retail + Social + Mobile = @WalmartLabs
  • Creating a Culture of Innovation: Why 20% Time is not Enough
  • Reboot: How to Reinvent a Technology Startup
  • Oscar Halo: Academy Awards and the Matthew Effect
  • Kosmix Adds Rocketfuel to Power Voyage of Exploration
  • For Startups, Survival is not a Strategy
  • Google Chrome: A Masterstroke or a Blunder?
  • Bridging the Gap between Relational Databases and MapReduce: Three New Approaches

Recent Comments

  • mona on Stanford Big Data Course Now Open to the World!
  • Voyager on Stanford Big Data Course Now Open to the World!
  • Gautam Bajekal on Stanford Big Data Course Now Open to the World!
  • online jobs on Not all powerlaws have long tails
  • rc helicopter on Not all powerlaws have long tails
  • tory burch outlet on Goodbye, Kosmix. Hello, @WalmartLabs
  • SHARETIPSINFO on Goodbye, Kosmix. Hello, @WalmartLabs
  • Almeda Alair on Goodbye, Kosmix. Hello, @WalmartLabs
  • discount mbt on Retail + Social + Mobile = @WalmartLabs
  • custom logo design on Retail + Social + Mobile = @WalmartLabs

Archives

  • September 2014
  • May 2011
  • April 2011
  • April 2009
  • February 2009
  • December 2008
  • November 2008
  • September 2008
  • July 2008
  • June 2008

More...

Blogroll

  • The Numbers Guy
  • Paul Kedrosky's Infectious Greed
  • Life in the Bit Bubble
  • Kosmix Blog
  • John Battelle's Searchblog
  • GigaOM
  • Geeking with Greg
  • Efficient Frontier Insights
  • Data Mining Research
  • Constructive Pessimist, Cynical Optimist

 Subscribe in a reader

Subscribe to Datawocky by Email

Popular Posts

  • Are Machine-Learned Models Prone to Catastrophic Errors?
  • Why the World Needs a New Database System
  • Why Yahoo Glue is a Bigger Deal than You Think
  • The story behind Google's crawler upgrade
  • Affinity and Herding Determine the Effectiveness of Social Media Advertising
  • More data usually beats better algorithms, Part 2
  • More data usually beats better algorithms
  • How Google Measures Search Quality
  • Angel, VC, or Bootstrap?
  • India's SMS GupShup Has 3x The Usage Of Twitter And No Downtime

Categories

  • Advertising (6)
  • Data Mining (11)
  • Entrepreneurship: views from the trenches (2)
  • India (5)
  • Internet Infrastructure (3)
  • kosmix (2)
  • Lewis Carroll (1)
  • Mobile (6)
  • Search (10)
  • Social Media (2)
  • Venture Capital (4)
See More

Twitter Updates

    follow me on Twitter