Erik Schonfeld over at TechCrunch has a very interesting post revealing that big enterprises are adopting Amazon's EC2/S3 services at a far faster pace than previously imagined:
A high-ranking Amazon executive told me there are 60,000 different customers across the various Amazon Web Services, and most of them are not the startups that are normally associated with on-demand computing. Rather the biggest customers in both number and amount of computing resources consumed are divisions of banks, pharmaceuticals companies and other large corporations who try AWS once for a temporary project, and then get hooked.
This is epochal stuff -- banks and pharma are notorious late adopters of early-stage technology, so to see them in the vanguard of cloud computing (or perhaps I should say utility computing, but everyone says cloud) is astonishing. But it illustrates a very important detail that's been overlooked: that there are significant network effects to the cloud computing business.
There are two basic underlying forces behind the network effects:
- Code that works with large amounts of data needs to be close to the data (in the network topology sense).
- Any processing that consumes data generates data.
So, once one enterprising group within a company decides to place some data in S3 and the code to process it on EC2, it becomes a whole lot easier for someone else within the company who needs to run some other code on the data, to move the processing to EC2. And since all this processing generates even more data, we have a virtuous cycle building up. The stable state is for all of a company's data processing tasks to move into the same utility computing cloud, to take advantage of the co-location and minimize data transfer latency and costs.
The network effect extends across companies as well. Often data created by company A is consumed by company B. When this "data interface" is voluminous, it makes economic sense for company B to move into the same utility cloud as company A. There are some ecosystems where utility computing players are already exploiting this trend; for example, AppNexus is creating a utility cloud optimized for the use of ad networks and their associated ecosystem: analytics for publishers and advertisers. There is so much data being shared here (on ad campaigns and their performance) that there is significant advantage to being in the same cloud.
The network effects argument leads to the interesting possibility that cloud computing becomes a winner-take-all game, like auctions; we might end up with one winner (maybe Amazon?) A more likely outcome is, we might end up with a couple of big general-purpose clouds (Amazon and Google, perhaps?) and a few niche clouds optimized for different ecosystems (such as ad networks and social networks).
You guys have written some great article around data - are there any books you can recommend on the subject?
Thanks,
Jeff
Posted by: Jeff | April 23, 2008 at 09:53 AM
@Jeff:
Would love to help you, but "data" is a very high-level term. Could you be more specific: databases, data mining, analytics, machine learning, etc? Or did you mean to suggest that I periodically write posts reviewing books on data-analysis related topics?
Posted by: Anand Rajaraman | April 23, 2008 at 01:37 PM
Yes, there are network effects, but I'm not sure they are so strong as in auctions or social networks for this to be a winner takes all game.
(1) Companies frequently need to move data between different applications and end up doing this through a whole bunch of ways (ETL tools for example). I think in many cases, the network bandwidth is not the bottleneck - usually the data transformations and load process itself tend to be a bottleneck. So, it maybe feasible for two different apps to be in two different data centers and consume each others data.
(2) There are a large number of different data centers/hosting services out there today. If the network effects were so strong, I'd have expected there to be far fewer of those.
Posted by: Srinivas | April 25, 2008 at 06:49 PM
Sorry about the lack of specifics Anand - that was like me asking you to write about the ocean - maybe a better way to ask is how does the class you teach unfold, or what subjects do you cover in your class? If there a logical flow from databases to data mining, and so on?
Thanks,
Jeff
Posted by: Jeff | April 28, 2008 at 09:31 AM