« Why the World Needs a New Database System | Main | Twittering live from All Things D Conference »

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83471bc3153ef00e5527c00a38833

Listed below are links to weblogs that reference Are Machine-Learned Models Prone to Catastrophic Errors?:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Chandra

Is it the case that human-crafted formulae
are better at handling unseen searches? Is
this supported by current data?
If so, it is natural to wonder
what "algorithm" humans are using.

abhishek

Once again a great post! I wonder if a hybrid approach can be used. When input data is very different from the training data then the model could switch to the manually crafted one. During which all such data continues to either improve or create machine learned models. As the confidence in the machine learned model improves the switch can be reverted. One take away for me is that models should come in pairs, a machine learned one along with a human crafted one so that if one starts to fail for unexpected data the other can take over.

anshul

What about neural networks? Aren't they supposed to work as well in Extremistan as in Mediocristan when trained right?

bcarpent1228

I agree with the comment:
""Taleb makes a convincing case that most real-world phenomena we care about actually inhabit Extremistan rather than Mediocristan.""

There is also the reverse "Hiding in plain sight" problem that causes humans to miss unusual events.

to anshul: I read neural network feedback problems create the same chaos phenomena as fractal or cellular models do - maybe someone knows more about this.

Good post - look forward to the sequels

Daniel Tunkelang

I liked Taleb's books (though I thought Fooled by Randomness was better written than The Black Swan), but I think there are two separate issues here.

The first is the extent to which we can extrapolate from the past (i.e., the training data) to the future. The second is whether the variable of interest looks more or less like a Gaussian.

Both algorithms and humans are susceptible to modeling failures on both accounts. Indeed, an algorithm is nothing more than a codification of a human formulation of the problem, put on automatic. At least the machine is not itself subject to cognitive biases. But conversely our algorithms aren't good at expanding their world views. Here is where I see the most value in having human overseers around to help stave off the "black swans" of catastrophic failure.

robocop

The human mind is computable, so is any ranking algorithm invented by puny humans... One day we will take over ha ha ha

robocop

The human mind is computable, so is any ranking algorithm invented by puny humans... One day we will take over ha ha ha

Kyle

I thought the mortgage crisis was the side effect of computer models using only a coupld decades (at most) worth of market data; a classic case of machine model breaking down when the market behaved unexpectedly (intrest rate resets).

The continued use of these models, even when they broke down in early 2007 was simple fraud-for-profit on part of the humans.

Robert 'Groby' Blum

This raises the interesting question if there can be *any* machine learning working well in extremistan. As far as I can tell from my limited knowledge, decision-making under unforseen circumstances is not exactly well-understood.

Oh, and the mortgage crisis was not exactly "unforseen" except by those who profited from it. I've read articles as far back as at least 2002 warning that the current lending/leveraging scheme was unmaintainable.

DrChaos

"Machine-learned" models are also hand-crafted, just crafted on a different higher level.

Machine learning always includes "regularization" and cross validation and other technologies to reduce harm on out-of-sample error.

It is true that ML, done poorly, might result in weirder search results for some "out of sample" searches.

But then, continuously retrained ML can also respond to subtle drifts, not generally apparent to humans, that hand crafted algorithms might not.

Humans craft the ML representations such that the regularization procedures automatically bring you closer to a good "default" for out of sample examples.

ML can also try to self-validate---"Am I close to the sample space which I am trained on?", and if not, go to some alternative method.

And in practice, people can and should employ "champion/challenger" strategies to test, in real time out of sample data, various methods.

I am very surprised---I had assumed Google would be using such ML methods for a very long time now.

Dan Lewis

abhishek said: "One take away for me is that models should come in pairs, a machine learned one along with a human crafted one so that if one starts to fail for unexpected data the other can take over."

When do you define failure? If it's after the engine has returned bad results for a search query, it's already too late.

If there is a trend of systematic failure, how do you detect it? Maybe if a larger proportion of users stop clicking on the search results... but you'd have to leave the algorithm in place for a while to see if it's a real trend or just a coincidence.

In other words, there's no way to swap algorithms on the fly because you suspect the results might be bad. At best, you can study the data when the human model and the computer model are widely divergent. But again, that's offline, not online.

Jitendra

Great post...I think the most important thing here is to be open to the potential that machine models cane be wrong...Just see the ratings Fiasco from moody with sub-prime mortgage based securities. If one is open to the possibility that machine models can be wrong then the corrective actions can be taken before catastrophic errors.

Thanks, Jitendra

bcarpent1228

Both the post and comments are fantastic.

(My understanding of) Chaotic results occur from the computational limits of computers - it should be fairly easy for a human (who is doing the search) to detect chaotic results. Basically - if i determine the search results are in error then i will modify my search parameters.

My concern in any search is not the results found but the results not found. I also get confused when formulating a complicated search - assuming ""if i ask the right question i will get the right answers""

Possibly a greater effort on search parameters and methods (opensearch.org) would assist the server algorithms.

Tobu

Most neural networks, such as perceptrons, assume a linear relation between their input and output, so they work in Mediocristan.

I don't know if other types of networks work different. Boltzmannn machines have a “factorial model” (whatever that exactly means), so their model is not linear, and also generally impossible to learn efficiently, but of course any model is an hypothesis on the data.

abhik

Are the black swans completely and truly novel?

Even if we haven't encountered a black swan before, we know that a swan is an animal and that animals come in various shades, including black. So the posterior probability for encountering a black swan should be low but not zero.. Maybe the solution is to use methods that learn model with multiple levels of generality. Of course, that requires some ontology for dealing with the training/testing samples..

leotor

With reference to the last commenter, even if we have learned a model at different levels of generality, it is quite hard to figure out which level to apply. I think its inherently hard for an automated system to know when it is wrong, perhaps the best it can do is to know when it is not confident of its answer.

Taking a more data intensive approach as outlined in previous posts, I wonder if the injection of random outliers may allow the ML algorithms to better handle or atleast know to be not confident when faced with extremistan.

With respect to Google, I wonder how often these hand-crafted algorithms have been tunes and what mechanisms those researchers use to do the tuning. Perhaps we can have a program replicate that process. Hence the ML algorithm will have a tutor/examiner who will routinely subject it to testing and validation.

Mark

So what would catastrophic failure look like in a top-ten search results list?

Spam?

Results that do not contain my keyword?

Results in different languages?

404 not found? 500 error?

Google's hard disk gets reinitialized?

Just what are we talking about here?

Jitendra

Great post...I think the most important thing here is to be open to the potential that machine models cane be wrong...Just see the ratings Fiasco from moody with sub-prime mortgage based securities. If one is open to the possibility that machine models can be wrong then the corrective actions can be taken before catastrophic errors.

Thanks, Jitendra

Paras Chopra

Well, I believe Human Crafted models are as susceptible to catastrophic errors as machine learned models are. Fundamentally, it should not make much difference *who* made the model. After all, a model is a model, irrespective of how and who created it.

In fact, I would argue that machine learning is better than human crafted because the former method ensure removal of biases and lack of knowledge which is inevitable for human crafted models. Moreover, machine learning models are trainable by definition. So, on some inputs even if it doesn't perform as well as it should, it can always retrain the model. Can we do this with hand crafted models?

By the way, on a different note, on many searches I am observing Google is throwing up more irrelevant results than it did in the past. Is it human crafted models at play?

Jordi

Dan Lewis said: "When do you define failure? ..."

It is often possible to derive a level of confidence in the results of a ML algorithm. Suppose you have a neural network doing a digit recognition task. If the '5' node is getting full activation, you can be fairly confident that the digit was a 5. Whereas if it would not have gotten as much activation, or the '6' node has only slightly less activation, you can't be as sure and you might want to fall back on something else (i.e. a hand-tuned formula).

Also, I think it is possible to detect confidence in other ways (possibly even by teaching the algorithm to generate an extra output for the confidence level).

Tobu said: "Most neural networks, such as perceptrons, assume a linear relation between their input and output"

I'm not really sure, but I think this is only the case if you don't use a hidden layer in your networks. I hate to quote Wikipedia, but here goes:
"[N]eural networks are non-linear statistical data modeling tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data."

Bharatheesh Jaysimha

As the saying goes 'All models (read machine learned/manually developed with domain expertise) are wrong. But some are useful'. On the utility of the models in different scenarios my way is to question them as often and as thoroughly as possible and correct them if necessary. This brings to front lot of questions about when to rebuild the model etc., which should be answered based on the needs and the objectives rather than philosophical issues. Nice post.

Gordon Rios

I don't think Humans make better decisions around these kinds of rare or unforeseen events. We're just more forgiving when those decisions turn out to be wrong. The decision problems where machine learning methods can fail 'catastrophically' are the same ones where people do: decisions on complex systems with incomplete information or intractable objectives.

In the case of Google's ranking formula, hand tuning a linear model in 200 features can be superior to machine learning since good objectives can be hard to formulate. For example, they surely want to minimize the percentage of abandoned searches -- e.g. p(abandon) -- but that involves knowing p(click) for each result given the *other* results shown. Of course, those conditional parameters are not known and even if it they were that optimization would still be intractable.

Instead, as described, they have used a 'generate and test' approach: let smart people come up with model features and hand tune parameters for seven or eight years. The result is a broadly successful model unlikely to be beaten by machine learning approaches hampered by both insufficient information and reduced target objectives (such as per result click through probability or editorial grade predictions, etc.)

steveballmer

You people are going down!

Anand Rajaraman

Thank you to everyone who posted insightful comments. I'm on vacation until mid-week, when I'll write a follow-up post that addresses some of the points raised in the comments.

Shiraz Kanga

I think that there is NO difference. At the scale we are talking about machines do exactly what humans do (since humans are the ones telling them what to do). They only do it faster. The human is judging the result based upon some criteria. The same criteria can to supplied to the machine. If the criteria is "learn from experience" then again the machine learns faster since it has more "experience" (i.e. past behavior)

The comments to this entry are closed.