Datawocky

On Teasing Patterns from Data, with Applications to Search, Social Media, and Advertising

Datawocky

My favorite poem of all time is Lewis Carroll's "Jabberwocky." Here's how it starts:

`Twas brillig, and the slithy toves
  Did gyre and gimble in the wabe:
All mimsy were the borogoves,  
    And the mome raths outgrabe.

Here's the whole poem.

Jabberwocky is a unique poem because it uses many nonsense words, such as "brillig", "slithy" and so on, that you'll find in no English dictionary. Yet despite this, if you read the poem and let yourself sink into it, you can actually get a pretty good understanding of its meaning -- sometimes you can even figure out what the nonsense word actually means!

Likewise, it's sometimes difficulty to analyze data one piece at a time. But if you have a large amount of data, there are often surprising and useful patterns you can unearth. And the more data you start with, the more likely you'll find something useful.

Thanks to the internet, the amount of data that is available to analyze is increasing at a rapid pace. Terabytes of web server and activity logs accumulate every day at the more popular websites. Out of these have emerged some interesting patterns, such as collaborative filtering (which Amazon.com uses to recommend products based on comparing your activity with those of other users). The web itself is a big source of data, giving rise to patterns such as PageRank to determine popularity. And social media, including social networks and collaborative content efforts, such as Wikipedia, provide more grist for the data mill.

There's a whole specialty of Computer Science devoted to analyzing large amounts of data: it's called Data Mining. I co-teach a class on Data Mining at the Computer Science Department at Stanford. Data Mining is closely related to and borrows ideas from Statistics, but what sets the two apart is the emphasis on large data -- terabytes of it.

In this blog, I'll discuss data mining, but with special emphasis on the areas that I find most fascinating: search, social media, and advertising. I'll periodically talk about data mining techniques, but more often about their consequences.

March 10, 2008 in Lewis Carroll | Permalink | Comments (1) | TrackBack (0)

About

  • Anand Rajaraman
  • Datawocky

Recent Posts

  • Goodbye, Kosmix. Hello, @WalmartLabs
  • Retail + Social + Mobile = @WalmartLabs
  • Creating a Culture of Innovation: Why 20% Time is not Enough
  • Reboot: How to Reinvent a Technology Startup
  • Oscar Halo: Academy Awards and the Matthew Effect
  • Kosmix Adds Rocketfuel to Power Voyage of Exploration
  • For Startups, Survival is not a Strategy
  • Google Chrome: A Masterstroke or a Blunder?
  • Bridging the Gap between Relational Databases and MapReduce: Three New Approaches
  • Stop Email Overload and Break Silos Using Wikis, Blogs, and IM

Recent Comments

  • online jobs on Not all powerlaws have long tails
  • rc helicopter on Not all powerlaws have long tails
  • tory burch outlet on Goodbye, Kosmix. Hello, @WalmartLabs
  • SHARETIPSINFO on Goodbye, Kosmix. Hello, @WalmartLabs
  • Almeda Alair on Goodbye, Kosmix. Hello, @WalmartLabs
  • discount mbt on Retail + Social + Mobile = @WalmartLabs
  • custom logo design on Retail + Social + Mobile = @WalmartLabs
  • Delhi Traffic on Goodbye, Kosmix. Hello, @WalmartLabs
  • Christian Louboutin Shoes on Goodbye, Kosmix. Hello, @WalmartLabs
  • Josh Cole on Goodbye, Kosmix. Hello, @WalmartLabs

Archives

  • May 2011
  • April 2011
  • April 2009
  • February 2009
  • December 2008
  • November 2008
  • September 2008
  • July 2008
  • June 2008
  • May 2008

More...

Blogroll

  • The Numbers Guy
  • Paul Kedrosky's Infectious Greed
  • Life in the Bit Bubble
  • Kosmix Blog
  • John Battelle's Searchblog
  • GigaOM
  • Geeking with Greg
  • Efficient Frontier Insights
  • Data Mining Research
  • Constructive Pessimist, Cynical Optimist

 Subscribe in a reader

Subscribe to Datawocky by Email

Popular Posts

  • Are Machine-Learned Models Prone to Catastrophic Errors?
  • Why the World Needs a New Database System
  • Why Yahoo Glue is a Bigger Deal than You Think
  • The story behind Google's crawler upgrade
  • Affinity and Herding Determine the Effectiveness of Social Media Advertising
  • More data usually beats better algorithms, Part 2
  • More data usually beats better algorithms
  • How Google Measures Search Quality
  • Angel, VC, or Bootstrap?
  • India's SMS GupShup Has 3x The Usage Of Twitter And No Downtime

Categories

  • Advertising
  • Data Mining
  • Entrepreneurship: views from the trenches
  • India
  • Internet Infrastructure
  • kosmix
  • Lewis Carroll
  • Mobile
  • Search
  • Social Media
  • Venture Capital

Twitter Updates

    follow me on Twitter