« Why the World Needs a New Database System | Main | Twittering live from All Things D Conference »

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Anand Rajaraman

I've addressed some of the questions here in a followup post:

http://anand.typepad.com/datawocky/2008/06/how-google-measures-search-quality.html

Yaniv Bar-Lev

Excellent post!
It occurs to me that most of the data generated on the internet that I've had to analyze and predict (such as user value and behavior) fall under the Extremistan group. Simply because the web and the website change so rapidly and extremely.

Ted Dunning


In this context there are a few special cases that the commentors (and original author) are overlooking:

1) this is not a normal machine learning problem. It is machine learning in an adversarial environment. The other classic example of this is fraud detection. In both cases the adversary (spammers or fraudsters) are always changing methods according to what works for them (aka fails for the white hats). It takes special care and lots of human oversight and QA to avoid getting nailed by the bastards.

2) the "human" version of the algorithm is, no doubt, being crafted based on extensive examination of the output of machine learned models. In most large-data situations, human experts working on their own are pretty much guaranteed to not do as well on average as reasonably well built machine models. The augmented human working with machine models for reference, on the other hand, can be a fearsomely effective creature.

The comments to this entry are closed.