Slumdog Millionaire is one my favorite movies of all time. And I have followed the career of A.R. Rahman, who composed the movie's music, for several years ever since his debut in 1992. So I was quite thrilled when Slumdog was nominated for 10 academy awards -- and Rahman in two categories, Original Score and Original Song. Thrilled, and a little surprised: while I like Rahman's work in Slumdog, I don't think it's his best work. There is of course nothing wrong with that, as long as Rahman's work is better than that of his competitors this year.
But it got me to thinking: if Rahman had composed the same music for an obscure film this year, rather than for Slumdog Millionaire, would he have been nominated? And even if he had been nominated, what are his chances of winning? In other words, is there a Matthew Effect in Oscar nominations -- to them that have, more shall be given? And, once nominated, is there a halo surrounding movies with many nominations that improves the odds of winning across many award categories? I thought it might be fun to run the numbers based on past years' nominees and winners to see if I could find answers to these questions; it turned out to be somewhat instructive as well, since it required an extension of the standard Market Basket analysis from the world of data mining.
To get the data, I went straight to the source: the official Academy Awards database , which lists all the nominations and winners for the past 80 years. Unfortunately there is not a single page that lists all this information, but it was fairly straightforward to write python scripts that queried the website a few times and collated the data in tabular form. The result: a table that lists every nomination and winner in every category beteen 1927 and 2007. There were 8616 nominations in the period, representing 4215 distinct movies; so each movie was nominated on average for 2 award categories.
Let's start first with the nominations, to see if there is any evidence of the Matthew Effect. Let's say N(k) is the number of movies with exactly k nominations. The table below shows k and N(k) for k between 1 and 10. If we ignore two outliers (k=1 and k=7), it appears that N(k+1)/N(k) is close to 0.6 for k between 2 and 10; the decay is certainly much slower than exponential. This indicates that the number of nominations roughly follows a power-law; and a power-law is the classic embodiment of of the Matthew Effect, arising in contexts such as income and wealth distribution. The table below summarizes the data.
The next step is to enquire whether there are Oscar categories for which the effect is much stronger than for other categories. To study this, we divide the nominated movies into two groups: movies with 4 or fewer nominations (the "poor" group) and movies with 5 or more nominations (the "rich" group). Overall, 5382 nominations, or 62.5%, went to movies in the poor group and 3234 nominations, or 37.5%, went to movies in the rich group. Now, let's look at the major Oscar categories. The major outliers are Best Picture and Best Director -- both nominations went overwhelmingly to movies in the rich category (70% and 73%, respectively, compared to the average of 37.5%). This is not surprising, because the best picture is typically one that is strong in many disciplines. There is some bias in the acting categories as well, but the big surprise is Film Editing: 68% of the nominations in this category are "rich" movies. At other extreme are Music and Special Effects: approximately 70% of the nominated movies are in the "poor" category. So it appears that in these categories at least, talent gets its due without help from Matthew.
Moving from nominations to actual winners, the obvious question is: does being nominated in many categories boost the chances of winning in a disproportionate manner? To study this, I used the Market Baskets approach from Data Mining. In a classic Market Baskets scenario, we ask which items are often purchased together: such as milk and eggs. In this case, we model each movie as a basket: the contents of a movie's basket are its nominations and wins. Do movies with many nominations in their baskets have a disproportionate number of wins?
We must first deal with a technicality. In a normal
market basket scenario, the contents of each basket are independent of
every other basket, but in this case there are dependencies. Consider
the set of market baskets of the movies that have all been nominated in
a single award category in a particular year; clearly, one of these has
to be the winner in that category, and so the basket of that movie will
also contain a win in that category.
It's easy to extend the Market Baskets model to capture this idea. I'll call the new model Constrained Market Baskets. Consider a subset S of market baskets; say, the set of market baskets corresponding to the "rich" movies with 5 or more nominations. Suppose movie M is in this set, and has been nominated in award category C. If there are (say) a total of 5 nominees in this category, then the prior probability of movie M's basket containing a win is 1/5 or 0.2. We can repeat this for all the categories M is nominated in, and add up the priors; this gives the "prior expected value" of the number of wins in M's basket. We add up the expected wins for all the movies in set S to get the total number of wins we expect the set S of movies to have; call this EW. Now, if OW is the actual number of "Observed Wins" across the movies in set S, we want to see if there is a discrepancy between EW and OW. In particular, we define the "win boost" of set S to be OW/EW. If the win boost is higher than 1, then the set S of market baskets has a disproportionate number of wins, and if it's much less than 1, then it has fewer wins than expected.
When we do the analysis, the set of "poor" movies, with 4 or fewer nominations, had a total of 5382 nominations, with 1143 "expected wins" but only 840 "observed wins"; a win boost of 0.73. The "rich" movies, by contrast, with 3234 nominations, were expected to win 657 Oscars but actually won 958, a win boost of 1.46. In other words: the rich movies, which represent only 37.5% of all nominations, actually won more than half of all the actual Oscar awards! Matthew!
Once again, we can break up the results by category, and look at the win boosts for specific categories of awards. For most major award categories, the win boosts for the rich and poor categories are in line with the overall average boosts. As in the case of nominations, the effect is very significant in the best picture and best director categories: in these categories, the "poor" movies have a win boost of just 0.30! We noted that the Music category seemed resilient to Matthew in the case of nominations; but in the case of wins, this category has a win boost of 1.7 for the rich movies, in line with the overall average. The surprising and significant outlier in this case is the Best Supporting Actor category, with win boosts very close to 1.0 for both the rich and the poor movies. It appears that the Best Supporting Actor award shows no evidence of Matthew; the other acting categories, however, are in line with the overall averages.
I don't have a deep enough understanding of the movie industry and the Academy Awards process to speculate on the reasons for these effects. Perhaps great talent attracts other great talent, and the Awards reflect that reality. And perhaps the difference between the behavior of wins and of nominations has to do with the fact that the former uses simple plurality voting while the latter uses a preferential voting scheme. In any case, I'm happy on two counts. The statistics on the Music category say that the Matthew effect likely did not help Mr Rahman in securing his nominations; but now that he has been nominated, his chances of winning are greatly boosted because he is associated with Slumdog's 10 nominations. Jai Ho!
Update: A big night for Slumdog, winning 8 awards, including both the music and song awards for A. R. Rahman. While 8 awards is not the best Oscar performance ever, it is the most number of awards won by a movie with 10 nominations (the ones that won more awards had more nominations). Matthew must be pleased.