My favorite poem of all time is Lewis Carroll's "Jabberwocky." Here's how it starts:
`Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.
Here's the whole poem.
Jabberwocky is a unique poem because it uses many nonsense words, such as "brillig", "slithy" and so on, that you'll find in no English dictionary. Yet despite this, if you read the poem and let yourself sink into it, you can actually get a pretty good understanding of its meaning -- sometimes you can even figure out what the nonsense word actually means!
Likewise, it's sometimes difficulty to analyze data one piece at a time. But if you have a large amount of data, there are often surprising and useful patterns you can unearth. And the more data you start with, the more likely you'll find something useful.
Thanks to the internet, the amount of data that is available to analyze is increasing at a rapid pace. Terabytes of web server and activity logs accumulate every day at the more popular websites. Out of these have emerged some interesting patterns, such as collaborative filtering (which Amazon.com uses to recommend products based on comparing your activity with those of other users). The web itself is a big source of data, giving rise to patterns such as PageRank to determine popularity. And social media, including social networks and collaborative content efforts, such as Wikipedia, provide more grist for the data mill.
There's a whole specialty of Computer Science devoted to analyzing large amounts of data: it's called Data Mining. I co-teach a class on Data Mining at the Computer Science Department at Stanford. Data Mining is closely related to and borrows ideas from Statistics, but what sets the two apart is the emphasis on large data -- terabytes of it.
In this blog, I'll discuss data mining, but with special emphasis on the areas that I find most fascinating: search, social media, and advertising. I'll periodically talk about data mining techniques, but more often about their consequences.
Recent Comments