I recently started using Twitter and have become a big fan of the service. I've been appalled by the downtime the service has endured, but sympathetic because I assumed the growth in usage is so fast that much might be excused. Then I read this TechCrunch post on the Twitter usage numbers and sympathy turned to bafflement - because I'm intimately familiar with SMS Gupshup, a startup in India that boasts usage numbers much, much higher than Twitter's, but has scaled without a glitch.
I'll let the numbers speak for themselves:
- Users: Twitter (1+ million), SMS GupShup (7 million)
- Messages per day: Twitter (3 million); SMS GupShup (10+ million)
Actually, these numbers don't even tell the whole story. India is a land of few PCs and many mobile phones. Thus, almost all GupShup messages are posted via mobile phones using SMS. And almost every GupShup message is posted simultaneously to the website and to the mobile phones of followers via SMS. That's why they have the SMS in the name of the service. Contrast with Twitter, where the majority of the posting and reading is done through the web. Twitter has said in the past that sending messages via the SMS gateway is one of their most expensive operations, so the fact that only a small fraction of their users use the SMS option makes their task a lot easier than GupShup's.
So I sat down with Beerud Sheth, co-founder of Webaroo, the company behind GupShup (the other founder Rakesh Mathur is my co-founder from a prior company, Junglee). I wanted to understand why GupShup scaled without a hitch while Twitter is having fits. Beerud tells me that GupShup runs on commodity Linux hardware and uses MySQL, the same as Twitter. But the big difference is in the architecture: right from day 1, they started with a three-tier architecture, with JBoss app servers sitting between the webservers and the database.
GupShup also uses an object architecture (called the "objectpool") which allows each task to be componentized and run separately - this helps immensely with reliability (can automatically handle machine failure) and scalability (can scale dynamically to handle increased load). The objectpool model allows each module to be run as multiple parallel instances - each of them doing a part of the work. They can be run on different machines, can be started/stopped independently, without affecting each other. So the "receiver", the "sender", and the "ad server" all run as multiple instances. As traffic scales, they can just add more hardware -- no re-architecting. If one machine fails, the instance is restarted on a different machine.
In read/write applications, the database is often the bottleneck. To avoid this problem, the GupShup database is sharded. So, the tables are broken into parts. For e.g., users A-F in one instance, G-K in another etc. The shards are periodically rebalanced as the database grows. The JBoss middle-tier contains the logic that hides this detail from the webserver tier.
I'm not familiar with the details of Twitter's architecture, beyond knowing they use Ruby on Rails with MySQL. It appears that the biggest difference between Twitter and GupShup is 3-tier versus 2-tier. RoR is fantastic for turning out applications quickly, but the way Rails works, the out-of-the-box approach leads to a two-tier architecture (webserver talking directly to database). We all learned back in the 90's that this is an unscalable model, yet it is the model for most Rails applications. No amount of caching can help a 2-tier read/write application scale. The middle-tier enables the database to be sharded, and that's what gets you the scalability. I believe Twitter has recently started using message queues as a middle-tier to accomplish the same thing, but they haven't partitioned the database yet -- which is the key step here.
I don't intend this as a knock on RoR, rather on the way it is used by default. At my company Kosmix we use an RoR frontend for a website that serves millions of page views every day; we use a 3-tier model where the bulk of the application logic resides in a middle-tier coded in C++. Three-tier is the way to go to build scalable web applications, regardless of the programming language(s) you use.
Update: VentureBeat has a follow-up guest post by me, with some more details on SMS GupShup. Also my theory on why SMS GupShup is growing faster than Twitter: Microblogging is a nice-to-have in places with high PC penetration, like the US, but a must-have in places with very low PC penetration, like India.
Disclosure: My fund Cambrian Ventures is an investor in Webaroo, the company behind SMS GupShup. But these are my opinions as a database geek, not as an investor.
My firefox session-saver got me back at your post. Good to see the number of comments. It seems that Twitter bashing is a really hot topic these days. And to each, his own two words of wisdom!!
I still dont agree with your title though. :-)
Posted by: AI | June 16, 2008 at 09:51 PM
Sam: good point! Until unless we read through the comments we dont get the feeling that RoR does support 3-Tier. I feel Anand has mis-leaded the readers
Posted by: Prasanth | June 17, 2008 at 03:10 AM
Prasanth: did you just not read the part of the post where I say clearly that RoR can be used in a 3-tier fashion, and in fact this is how we use it at Kosmix? As I've said many times, this is not a knock on RoR. It's a knock on 2-tier architectures.
Posted by: Anand Rajaraman | June 17, 2008 at 07:55 AM
Impressed by their achievement.
Posted by: T-Enterprise | June 17, 2008 at 10:08 AM
How about you talk about and invest in something innovative versus competing indian services that are copies.
Posted by: Sam | June 17, 2008 at 10:57 PM
Sam: When I invested 3 years ago in Webaroo, the company behind SMS GupShup, Twitter was not even around. These services have developed in parallel. You should also look at the other companies I'm involved with, in the US and in India, and judge for yourself.
Posted by: Anand Rajaraman | June 18, 2008 at 07:39 AM
VentureBeat has a follow-up guest post by me, with some more details on SMS GupShup. Also my theory on why SMS GupShup is growing faster than Twitter: Microblogging is a nice-to-have in places with high PC penetration, like the US, but a must-have in places with very low PC penetration, like India:
http://venturebeat.com/2008/06/19/think-twitters-the-biggest-microblogging-service-take-a-look-at-sms-gupshup/
Posted by: Anand Rajaraman | June 19, 2008 at 03:45 PM
Anand,
Great Post .....
Should models like twitter and gupshup adopt a CEP CCL (Continuous Query on Event Stream) system in order to sniff-act-create the next event broadcast?
I met with Mark from Coral8 and was quite impressed in terms of their offering in this space.
Same goes with StreamBase.
This will dissolve the "insert-into-database-and-then-act" paradigm.
Nati Shalom muses on twitter scalabilty:
http://natishalom.typepad.com/nati_shaloms_blog/2008/05/twitter-as-an-e.html
I would be interested to understand your thoughts on this.....
Posted by: Chetan Conikee | June 19, 2008 at 10:56 PM
as AI said i think you are comparing apples to oranges.
ok i will propose an architecture for sms gupshup tell me where i am wrong.
functionality - There are many sms groups. people can post to a group and all the other people who subscribed to that group wil receive the sms.
i will use a mysql database and php and nothing else.
in mysql i have 2 tables. 1 for maintaining the phone nos subscribed to a particular group wit fields id,sms group id, phone no. the 2nd table has the fields id,message,sms group to be sent to.
i write 2 programs in php and use a message queue to pass messages asynchronously between the two programmes.
1 program has following classes -
class number :-
-- add a no :- subscribes a no to an sms group which means 1 database insert into 1st table.
-- delete a no - unsubscribes a no from an sms group which means 1 delete from table one.
class messages :-
-- post message - someone posts a new message to an sms group involves 1 mysql read for checking if that person is authorised to post and 1 write to make an entry into the 2nd table. also 1 more write to the message queue to tell a new message is posted to that particular sms group
thats it for the 1st program.
the other program is a daemon continuously checking the message queue for any messages.
class sendmessages:
--- send sms - needs 1 read from the message queue to get the message and the group id and then 1 more read from the database to get all the numbers subscribed to that group. once it has the message to be sent and the the phone no to send the msg to all it has to do is loop for the sms gateway api call.
thats it you have sms gupshup. with such few database calls i dont think you need more than one machine for database.
waiting to here from you regarding where i went wrong.
ohh in case of twitter. it is more complex because it is a pull mechanism not push. so when a user logs in, twitter should pull messages from all the users. if you want me to explain in more detail i am glad to do so.
Cheers,
Deepu.
Posted by: Deepu | June 23, 2008 at 11:52 AM
oh also,
ignored the timestamp column in all the tables as it has become pretty mandatory in everything.
so add one more program which runs as a cron job once a week which back's up all the old messages and deletes them - 2 database calls.
so your database won't grow past a certain size.
Posted by: Deepu | June 23, 2008 at 12:30 PM
Hi Anand,
Interesting post, but as far as I understand it does not give the right picture. The two products are entirely different. Twitter allows two way communication (in various ways web/IM/phone and the infinite tools), whereas SMSGupShup is only one way communication (that too via only few ways web/phone).
In SMSGupShup, one can cache the subscribers of the top/regular posters and get a very good performance beacause of the "monologue". Whereas for twitter, there is always a "dialogue" going on, which results in lot of processing both off and on database.
Your other points make sense, but comparing the two services is not doing justice to Twitter.
The number of users might be more for SMSGupShup, but I doubt if there are more groups than twitter users. (as no. of groups = no. of persons actually posting content)
-regards,
Raja
Posted by: Raja Agrawal | June 23, 2008 at 03:18 PM
OMG, please stop this penis contest. My framework is always better than yours. Just admit it!
Posted by: jbossonrails | July 10, 2008 at 10:31 PM
GupShup's starters probably knew they'd get hammered hard and fast from the start. Therefore they brought up a service they knew would cope with such a load.
It's possible that Twitter's growth was a little more unexpected and that scaling their Rails application is more complicated than that used by GupShup.
Posted by: David Lloyd | July 13, 2008 at 11:38 PM
tell your group member.about my grp for join this grp type JOIN Shaaad sn 567678
>hindi sayari
>tricks
>JOKES
>one line
>dont miss
Posted by: bilal | July 31, 2008 at 01:17 AM
The Indian IT sector has grown to such an extent, that almost anything is possible to do from anywhere.I love ROR and I use to run my website.
Posted by: magnum | September 26, 2008 at 07:28 AM