Datawocky

On Teasing Patterns from Data, with Applications to Search, Social Media, and Advertising

Datawocky

My favorite poem of all time is Lewis Carroll's "Jabberwocky." Here's how it starts:

`Twas brillig, and the slithy toves
  Did gyre and gimble in the wabe:
All mimsy were the borogoves,  
    And the mome raths outgrabe.

Here's the whole poem.

Jabberwocky is a unique poem because it uses many nonsense words, such as "brillig", "slithy" and so on, that you'll find in no English dictionary. Yet despite this, if you read the poem and let yourself sink into it, you can actually get a pretty good understanding of its meaning -- sometimes you can even figure out what the nonsense word actually means!

Likewise, it's sometimes difficulty to analyze data one piece at a time. But if you have a large amount of data, there are often surprising and useful patterns you can unearth. And the more data you start with, the more likely you'll find something useful.

Thanks to the internet, the amount of data that is available to analyze is increasing at a rapid pace. Terabytes of web server and activity logs accumulate every day at the more popular websites. Out of these have emerged some interesting patterns, such as collaborative filtering (which Amazon.com uses to recommend products based on comparing your activity with those of other users). The web itself is a big source of data, giving rise to patterns such as PageRank to determine popularity. And social media, including social networks and collaborative content efforts, such as Wikipedia, provide more grist for the data mill.

There's a whole specialty of Computer Science devoted to analyzing large amounts of data: it's called Data Mining. I co-teach a class on Data Mining at the Computer Science Department at Stanford. Data Mining is closely related to and borrows ideas from Statistics, but what sets the two apart is the emphasis on large data -- terabytes of it.

In this blog, I'll discuss data mining, but with special emphasis on the areas that I find most fascinating: search, social media, and advertising. I'll periodically talk about data mining techniques, but more often about their consequences.

March 10, 2008 in Lewis Carroll | Permalink | Comments (1) | TrackBack (0)

About

  • Anand Rajaraman
  • Datawocky

Recent Posts

  • Creating a Culture of Innovation: Why 20% Time is not Enough
  • Reboot: How to Reinvent a Technology Startup
  • Oscar Halo: Academy Awards and the Matthew Effect
  • Kosmix Adds Rocketfuel to Power Voyage of Exploration
  • For Startups, Survival is not a Strategy
  • Google Chrome: A Masterstroke or a Blunder?
  • Bridging the Gap between Relational Databases and MapReduce: Three New Approaches
  • Stop Email Overload and Break Silos Using Wikis, Blogs, and IM
  • Why Google Doesn't Provide Earnings Forecasts
  • The Real Long Tail: Why both Chris Anderson and Anita Elberse are Wrong

Recent Comments

  • margaret on Creating a Culture of Innovation: Why 20% Time is not Enough
  • nataly on Creating a Culture of Innovation: Why 20% Time is not Enough
  • Datawocky on Creating a Culture of Innovation: Why 20% Time is not Enough
  • Jo Jordan on Creating a Culture of Innovation: Why 20% Time is not Enough
  • 88 India on Creating a Culture of Innovation: Why 20% Time is not Enough
  • Harry Fuecks on Creating a Culture of Innovation: Why 20% Time is not Enough
  • Dunston rocks on Creating a Culture of Innovation: Why 20% Time is not Enough
  • Michael Christensen on Creating a Culture of Innovation: Why 20% Time is not Enough
  • Harold Jarche on Creating a Culture of Innovation: Why 20% Time is not Enough
  • O on Creating a Culture of Innovation: Why 20% Time is not Enough

Archives

  • April 2009
  • February 2009
  • December 2008
  • November 2008
  • September 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008

Blogroll

  • The Numbers Guy
  • Paul Kedrosky's Infectious Greed
  • Life in the Bit Bubble
  • Kosmix Blog
  • John Battelle's Searchblog
  • GigaOM
  • Geeking with Greg
  • Efficient Frontier Insights
  • Data Mining Research
  • Constructive Pessimist, Cynical Optimist

 Subscribe in a reader

Subscribe to Datawocky by Email

Popular Posts

  • Are Machine-Learned Models Prone to Catastrophic Errors?
  • Why the World Needs a New Database System
  • Why Yahoo Glue is a Bigger Deal than You Think
  • The story behind Google's crawler upgrade
  • Affinity and Herding Determine the Effectiveness of Social Media Advertising
  • More data usually beats better algorithms, Part 2
  • More data usually beats better algorithms
  • How Google Measures Search Quality
  • Angel, VC, or Bootstrap?
  • India's SMS GupShup Has 3x The Usage Of Twitter And No Downtime

Categories

  • Advertising
  • Data Mining
  • Entrepreneurship: views from the trenches
  • India
  • Internet Infrastructure
  • kosmix
  • Lewis Carroll
  • Mobile
  • Search
  • Social Media
  • Venture Capital

Twitter Updates

    follow me on Twitter