I recently started using Twitter and have become a big fan of the
service. I've been appalled by the downtime the service has endured,
but sympathetic because I assumed the growth in usage is so fast that
much might be excused. Then I read this TechCrunch post on the Twitter
usage numbers and sympathy turned to bafflement - because I'm
intimately familiar with SMS Gupshup, a startup in India that boasts
usage numbers much, much higher than Twitter's, but has
scaled without a glitch.
I'll let the numbers speak for themselves:
-
Users: Twitter (1+ million), SMS GupShup (7 million)
- Messages per day: Twitter (3 million); SMS GupShup (10+ million)
Actually, these numbers don't even tell the whole story. India is a
land of few PCs and many mobile phones. Thus, almost all GupShup
messages are posted via mobile phones using SMS. And almost every
GupShup message is posted simultaneously to the website and to the
mobile phones of followers via SMS. That's why they have the SMS in the
name of the service. Contrast with Twitter, where the majority of the
posting and reading is done through the web. Twitter has said in the
past that sending messages via the SMS gateway is one of their most
expensive operations, so the fact that only a small fraction of their
users use the SMS option makes their task a lot easier than GupShup's.
So I sat down with Beerud Sheth, co-founder of Webaroo, the company
behind GupShup (the other founder Rakesh Mathur is my co-founder from a
prior company, Junglee). I wanted to understand why GupShup scaled
without a hitch while Twitter is having fits. Beerud tells me that
GupShup runs on commodity Linux hardware and uses MySQL, the same as
Twitter. But the big difference is in the architecture: right from day
1, they started with a three-tier architecture, with JBoss app servers
sitting between the webservers and the database.
GupShup also uses an object architecture (called the "objectpool")
which allows each task to be componentized and run separately - this
helps immensely with reliability (can automatically handle machine
failure) and scalability (can scale dynamically to handle increased
load). The objectpool model allows each module to be run as multiple
parallel instances - each of them doing a part of the work. They can be
run on different machines, can be started/stopped independently,
without affecting each other. So the "receiver", the "sender", and the
"ad server" all run as multiple instances. As traffic scales, they can
just add more hardware -- no re-architecting. If one machine fails, the
instance is restarted on a different machine.
In read/write applications, the database is often the bottleneck. To
avoid this problem, the GupShup database is sharded.
So, the tables are broken into parts. For e.g., users A-F in one
instance, G-K in another etc. The shards are periodically rebalanced as
the database grows. The JBoss middle-tier contains the logic that hides
this detail from the webserver tier.
I'm not familiar with the details of Twitter's architecture, beyond
knowing they use Ruby on Rails with MySQL. It appears that the biggest
difference between Twitter and GupShup is 3-tier versus 2-tier. RoR is
fantastic for turning out applications quickly, but the way Rails
works, the out-of-the-box approach leads to a two-tier architecture
(webserver talking directly to database). We all learned back in the
90's that this is an unscalable model, yet it is the model for most
Rails applications. No amount of caching can help a 2-tier read/write
application scale. The middle-tier enables the database to be sharded,
and that's what gets you the scalability. I believe Twitter has
recently started using message queues as a middle-tier to accomplish
the same thing, but they haven't partitioned the database yet -- which
is the key step here.
I don't intend this as a knock on RoR, rather on the way it is used by
default. At my company Kosmix we use an RoR frontend for a website that
serves millions of page views every day; we use a 3-tier model where
the bulk of the application logic resides in a middle-tier coded in
C++. Three-tier is the way to go to build scalable web applications,
regardless of the programming language(s) you use.
Update: VentureBeat has a follow-up guest post by me, with some more details on SMS GupShup. Also my theory on why SMS GupShup is growing faster than Twitter: Microblogging is a nice-to-have in places with high PC penetration, like the US, but a must-have in places with very low PC penetration, like India.
Disclosure: My fund Cambrian Ventures is an investor in Webaroo, the
company behind SMS GupShup. But these are my opinions as a database
geek, not as an investor.
Recent Comments