I recently started using Twitter and have become a big fan of the service. I've been appalled by the downtime the service has endured, but sympathetic because I assumed the growth in usage is so fast that much might be excused. Then I read this TechCrunch post on the Twitter usage numbers and sympathy turned to bafflement - because I'm intimately familiar with SMS Gupshup, a startup in India that boasts usage numbers much, much higher than Twitter's, but has scaled without a glitch.
I'll let the numbers speak for themselves:
- Users: Twitter (1+ million), SMS GupShup (7 million)
- Messages per day: Twitter (3 million); SMS GupShup (10+ million)
Actually, these numbers don't even tell the whole story. India is a land of few PCs and many mobile phones. Thus, almost all GupShup messages are posted via mobile phones using SMS. And almost every GupShup message is posted simultaneously to the website and to the mobile phones of followers via SMS. That's why they have the SMS in the name of the service. Contrast with Twitter, where the majority of the posting and reading is done through the web. Twitter has said in the past that sending messages via the SMS gateway is one of their most expensive operations, so the fact that only a small fraction of their users use the SMS option makes their task a lot easier than GupShup's.
So I sat down with Beerud Sheth, co-founder of Webaroo, the company behind GupShup (the other founder Rakesh Mathur is my co-founder from a prior company, Junglee). I wanted to understand why GupShup scaled without a hitch while Twitter is having fits. Beerud tells me that GupShup runs on commodity Linux hardware and uses MySQL, the same as Twitter. But the big difference is in the architecture: right from day 1, they started with a three-tier architecture, with JBoss app servers sitting between the webservers and the database.
GupShup also uses an object architecture (called the "objectpool") which allows each task to be componentized and run separately - this helps immensely with reliability (can automatically handle machine failure) and scalability (can scale dynamically to handle increased load). The objectpool model allows each module to be run as multiple parallel instances - each of them doing a part of the work. They can be run on different machines, can be started/stopped independently, without affecting each other. So the "receiver", the "sender", and the "ad server" all run as multiple instances. As traffic scales, they can just add more hardware -- no re-architecting. If one machine fails, the instance is restarted on a different machine.
In read/write applications, the database is often the bottleneck. To avoid this problem, the GupShup database is sharded. So, the tables are broken into parts. For e.g., users A-F in one instance, G-K in another etc. The shards are periodically rebalanced as the database grows. The JBoss middle-tier contains the logic that hides this detail from the webserver tier.
I'm not familiar with the details of Twitter's architecture, beyond knowing they use Ruby on Rails with MySQL. It appears that the biggest difference between Twitter and GupShup is 3-tier versus 2-tier. RoR is fantastic for turning out applications quickly, but the way Rails works, the out-of-the-box approach leads to a two-tier architecture (webserver talking directly to database). We all learned back in the 90's that this is an unscalable model, yet it is the model for most Rails applications. No amount of caching can help a 2-tier read/write application scale. The middle-tier enables the database to be sharded, and that's what gets you the scalability. I believe Twitter has recently started using message queues as a middle-tier to accomplish the same thing, but they haven't partitioned the database yet -- which is the key step here.
I don't intend this as a knock on RoR, rather on the way it is used by default. At my company Kosmix we use an RoR frontend for a website that serves millions of page views every day; we use a 3-tier model where the bulk of the application logic resides in a middle-tier coded in C++. Three-tier is the way to go to build scalable web applications, regardless of the programming language(s) you use.
Update: VentureBeat has a follow-up guest post by me, with some more details on SMS GupShup. Also my theory on why SMS GupShup is growing faster than Twitter: Microblogging is a nice-to-have in places with high PC penetration, like the US, but a must-have in places with very low PC penetration, like India.
Disclosure: My fund Cambrian Ventures is an investor in Webaroo, the company behind SMS GupShup. But these are my opinions as a database geek, not as an investor.