## Thinking about thinking

This site concerns *ways of thinking* about some particularly important things: purpose, self, ethics, authority, and meaning, for instance. My aim is to point out common mistakes in thinking about those things, and how to do better.

I enjoy thinking about thinking. That’s one reason I spent a dozen years in artificial intelligence research. To make a computer think, you’d need to understand how *you* think. So AI research is a way of thinking about thinking that forces you to be specific. It calls your bluff if you think you understand thinking, but don’t.

I thought a lot about *how* to do AI. ^{1} In 1988, I put together “How to do research at the MIT AI Lab,” a guide for graduate students. Although I edited it, it was a collaboration of many people. There are now many similar guides, some of them better, but this was the first. Most of its advice was not specific to AI or MIT, and for many years after I got emails with thanks from researchers in all sorts of different fields.

Soon after, I realized that AI was a dead end, and left the field. Although my work in AI was influential, it seemed worthless in retrospect. I had a personal crisis: what should I do instead? The feedback on “How to do research” suggested that my thoughts about how to think would be useful more widely. And, I had worked in various fields besides AI, which had their own ways of thinking. My perspective was uncommonly broad.

Maybe the most useful thing I could do would be to write a book about how to think? I began. My jokey placeholder title was “How To Think Real Good.”^{2} I had a lot of ideas and some sketchy notes, but wound up abandoning the project.

### LessWrong

This post was prompted by discussions about Bayesianism and the LessWrong rationalist community. “How To Do AI,” like LW, was a broad collaboration. “How To Think Real Good” would probably also have become a community effort. All three projects were about how to think, with an emphasis on technical methods.

My fascination and frustration with LW comes from my long-standing interest in the same general project, plus liking much of what LW does, plus disliking some of it, plus the sense that LW simply overlooks *most* of what goes into effective, accurate thinking. LW suggests (sometimes, not always) that Bayesian probability is the main tool for effective, accurate thinking. I think it is only a small part of what you need.

I’ve been making myself obnoxious by griping about this, without explaining most of what my beef is. Several clarifying dialogues with LW community members have resulted. One question has come up repeatedly:

If not Bayesianism, then what?

The implicit assumption is that the problem Bayesianism solves is *most of rationality*, and if I’m unimpressed with Bayesianism, I must advocate some other solution to that problem. I do have technical doubts about Bayesianism, but that’s not my point. Rather, I think that the problem Bayesianism addresses is a small and easy one.

- Bayesianism is a theory of
*probability*. - Probability is only a small part of
*epistemology*. - Probability is only a small part of
*rationality*. - Probability is a solved problem. It’s easy. The remaining controversies in the field are arcane and rarely have any practical consequence.

My answer to “If not Bayesianism, then what?” is: *all of human intellectual effort*. Figuring out how things work, what’s true or false, what’s effective or useless, is “human complete.” In other words, it’s unboundedly difficult, and every human intellectual faculty must be brought to bear.^{3} We could call the study of that enterprise “epistemology”; and “rationality” is a collection of methods for it.^{4}

Mostly, we have no idea how people figure things out. The answer is certainly not going to be some simple bit of math like Bayes’ Rule. We’re not going to get a complete story any time soon. What we *can* do—what I was hoping to do with “How to think real good”—is find heuristics; rules of thumb that often work in particular sorts of situations.

### Like what?

In response to which, some LessWrong contributors rightly replied:

Like

what? Be specific!

A proper answer would be a book (which would require suitable collaborators, and many more years of thinking about thinking with them).

What follows below is, I’m afraid, an off-the-cuff brain dump. I haven’t thought about “How to think real good” in 20 years, and have forgotten whatever I’d worked out then. [*Update, years later*: However, I have returned to the topic repeatedly—for instance in posts on meta-rationality.]

To be specific, I’ll tell some anecdotes about thinking. These concentrate on the application of formal methods of thought, mostly because that’s LW’s orientation. This is probably a wrong emphasis; most insights result from informal reasoning and observation, not technical rationality.

Understanding informal reasoning is probably more important than understanding technical methods.

That’s an anvilicious moral—an unsubtle take-away message. It’s rude to point these out so **boldly**, but I thought it might be useful to create a set of topics that a broader discussion of effective thinking could expand on. The list is totally unsystematic and certainly not exhaustive. Mostly I’ll provide no evidence or even explanation for these morals. And, they are probably annoyingly non-specific. Each one could expand into a book.

The anecdotes concern academic research, because that’s what “How to think real good” was going to be about. Nowadays, I’m more interested in the everyday understanding of non-academics. That’s the subject of Meaningness, and largely of LW too.

The anecdotes also concern research projects that I took part in, *not* because those are particularly good examples, but because they come easily to mind. We could do a better job by studying diverse examples of technical progress, but I don’t have time for that now.

Before the anecdotes, I’ll talk in general about problem formulation, because that’s an aspect of epistemology and rationality that I find particularly important, and that the Bayesian framework *entirely ignores*.

It happens that I’m not especially good at solving problems (at least, not as compared with other MIT PhDs). I’m unusually good at selecting and formulating them. So, I’m biased.

## Problem formulation

Many of the heuristics I collected for “How to think real good” were about how to take an unstructured, vague problem domain and get it to the point where formal methods become applicable.

Formal methods all require a formal specification of the problem. For example, before you can apply Bayesian methods, you have to specify what all the hypotheses are, what sorts of events constitute “evidence,” how you can recognize one of those events, and (in a decision theoretic framework) what the possible actions are. Bayesianism takes these *as given*, and has nothing to say about how you choose them. Once you *have* chosen them, applying the Bayesian framework is trivial. (It’s just arithmetic, for godssakes!)

Finding a good formulation for a problem is often most of the work of solving it.

A bewildered Bayesian might respond:

You should consider

allhypotheses and types of evidence! Omitting some means you might get the wrong answer!

Unfortunately, there are too many. Suppose you want to understand the cause of manic depression. For every grain of sand in the universe, there is the hypothesis that this particular grain of sand is the sole cause of manic depression. Finding evidence to rule out each one individually is impractical.

But, obviously, all grains of sand are equivalent as far as manic depression is concerned! And anyway, sand

obviouslyhas nothing to do with manic depression.

Yes; but this is not *logically* necessary. It’s something we can reasonably suppose. But how do we do that? It requires intelligent background understanding.

This is something we have to do *without explicit thought*. We could consider and reject sand as a possible cause, but there is an infinite list of other *logically possible* causes. (Variations in the density of the letter “t” in Austrian government documents; chemical reactions that occur only above 873 kelvin; creatures that, at a distance, resemble flies, or are drawn with a very fine camel hair brush.) We can’t even imagine them all, much less evaluate the evidence for them.

So:

Before applying any technical method, you have to *already* have a pretty good idea of what the form of the answer will be.

Part of a “pretty good idea” is a vocabulary for describing relevant factors. Any situation can be described in infinitely many ways. For example, my thinking right now could be described as an elementary particle configuration, as molecules in motion, as neurons firing, as sentences, as part of a conversation, as primate signaling behavior, as a point in world intellectual history, and so on.

Choosing a good vocabulary, at the right level of description, is usually key to understanding.

A good vocabulary has to do two things. Let’s make them anvils:

1. A successful problem formulation has to make the distinctions that are used in the problem solution.

So it mustn’t categorize together things that are relevantly different. Trying to find an explanation of manic depression stated only in terms of emotions is unlikely to work, because emotions, though relevant, are “too big” as categories. “Sadness” is probably a complex phenomenon with many different aspects that get collapsed together in that word.

2. A successful problem formulation has to make the problem small enough that it’s easy to solve.

Trying to find an explanation of manic depression in terms of brain state vectors in which each element is the membrane potential of an individual neuron probably won’t work. That description is much too complicated. It makes billions of distinctions that are almost certainly irrelevant. It doesn’t collapse the state space *enough*; the categories are too small and therefore too numerous.

It’s important to understand that problem formulations are never right or wrong.

Truth does not apply to problem formulations; what matters is usefulness.

In fact,

*All* problem formulations are “false,” because they abstract away details of reality.

Any vocabulary pretends that the world is made of objectively separable “objects” (molecules, neurons, emotions, brains, conversations), with well-defined properties. But there *are no objects* in the real world.^{5}

This is going to be a major point in Meaningness; I’ve just begun to discuss it here. Since I haven’t had time yet to explain, let me quote Richard Feynman instead:

Consider an object… What is an object? Philosophers are always saying, “Well, just take a chair for example.” The moment they say that, you know that they do not know what they are talking about. Atoms are evaporating from it from time to time; dirt falls on it and gets dissolved in the paint; so to define a chair precisely, to say exactly which atoms are chair, and which atoms are air, or which atoms are dirt, or which atoms are paint is impossible…

There are not any single, left-alone objects in the world—every object is a mixture of a lot of things, so we can deal with it only as a series of approximations and idealizations.

The trick is the idealizations. One may prefer a mathematical definition; but those can never work in the real world. A mathematical definition will be good for mathematics, in which all the logic can be followed out completely, but the physical world is [too] complex. When we try to isolate pieces of it, to talk about one mass, the wine and the glass, how can we know which is which, when one dissolves in the other?

A system of discourse about the real world must involve approximations of some kind. This is quite unlike the case of mathematics, in which everything can be defined.

—The Feynman Lectures on Physics, Vol. 1: p. 12-2; some phrases omitted for concision.

Actually, I should probably just shut up and quote Feynman! His books are full of insights into thinking, and how formal methods work in practice.^{6}

Anyway, *the trick is the idealizations*—the ways you simplify and abstract away from reality to create a conceptual framework within which you can work on the problem. There’s no such thing as a *correct* idealization; what you need is one that’s good for a particular job.

There’s an obvious difficulty here: if you don’t know the solution to a problem, how do you know whether your vocabulary makes the distinctions it needs? The answer is: you can’t be sure; but there are many heuristics that make finding a good formulation more likely. Here are two very general ones:

Work through several specific examples before trying to solve the general case. Looking at specific real-world details often gives an intuitive sense for what the relevant distinctions are.

Problem formulation and problem solution are mutually-recursive processes.

You need to go back and forth between trying to formulate the problem and trying to solve it. A “waterfall” approach, in which you take the formulation as set in stone and just try to solve it, is rarely effective.

The difficulty then is that you have to recognize incremental progress in both the formulation and the solution. It’s rare that you can use formal methods to evaluate that progress. So a planned major topic in “How to think real good” was informal, or intuitive, ways to evaluate progress.

Heuristics for evaluating progress are critical not only during problem solving, but also during problem formulation.

A highly general one:

Solve a simplified version of the problem first. If you can’t do even that, you’re in trouble.

A medium-specificity heuristic, applicable mainly in computer science:

If you are having a hard time, make sure you aren’t trying to solve an NP-complete problem. If you are, go back and look for additional sources of constraint in the real-world domain.

## Rationality without probability

When I say “Bayesian methods are a tiny fraction of rationality,” somehow people don’t get it. So let’s look at an example, taken from my Master’s thesis.

I was interested in “classical planning,” a technical problem in robotics research. Let’s say you have a robot that can only do one thing at a time, and you want to get it to make several things true at once. The classic example is: suppose there are three children’s blocks sitting on a table: red, green, and blue. The robot can pick up one block at a time and put it on another block. You want a stack with the red block on the green block, and the green block on the blue block. That’s two things (red on green, green on blue) you want to be true simultaneously. The robot could put the red block on the green block, accomplishing the first condition, but then it would be stuck, because the green block has to go on the blue block, and it can only move one block.

Apparently, the robot has to plan ahead. It needs to figure out that it has to move the green block first. More generally, classical planning means finding an ordered sequence of actions that accomplish several goals at once. Once you’ve got that, you can execute the plan mindlessly, like running a program.

Before my Master’s work, dozens of researchers had tackled the problem, and built complex heuristic planning systems that no one understood well, and that didn’t always work. I produced a simple planning algorithm that I proved always worked (and so definitively solved the problem). This involved a year of agony and false starts and half-right attempts. It might be interesting to go back through my lab notebook of the time to analyze how I eventually succeeded.

However, *part* of the process is reflected in my solution, and I intend to draw some anvilicious morals from it. (This analysis is quite technical. You can skip ahead to the morals if you like.)

The algorithm constructs a plan incrementally as a partial order on actions. When it discovers a constraint on what has to happen before what, it adds an arc to the time-order digraph.

The key insight is a modal extension of temporal logic to partial time orders. The “necessary” operator corresponds to a proposition holding in *all totalizations* of the partial order; “possibly” corresponds to a proposition holding in *some* totalization. The algorithm depends on a model theory that makes it possible to compute possible and necessary truth in polynomial time.

Given this logic, proving that the planner is complete (it can always find a plan if there is one) and correct (its claimed plans always work) corresponds closely to demonstrating the completeness and soundness of a proof theory.

Morals?

You can never know enough mathematics.

I wasn’t smarter than the other people who worked on this problem. (Gerry Sussman’s PhD thesis was one of the major previous works.) I happened to have taken several advanced courses in mathematical logic (due to my interest in rationality), and it happened to be the case that the classical planning problem was easy once it was recast in logical terms. Probably none of the previous researchers in the field happened to have that background.

Put another way,

An education in math is a better preparation for a career in intellectual field X than an education in X.

I thought Paul Graham said that, but I can’t find it on his web site. The closest I can find is:

Suppose you’re a college freshman deciding whether to major in math or economics. Well, math will give you more options: you can go into almost any field from math. If you major in math it will be easy to get into grad school in economics, but if you major in economics it will be hard to get into grad school in math.

[*Update, three years later:* I’ve found it! It was Gian-Carlo Rota: “When an undergraduate asks me whether he or she should major in mathematics rather than in another field that I will simply call X, my answer is the following: ‘If you major in mathematics, you can switch to X anytime you want to, but not the other way around’.”]

It was mostly dumb luck that modal logic and model theory turned out to be relevant to classical planning. If other people had realized they were relevant, they could have solved the problem years earlier. So:

You should learn as many different kinds of math as possible. It’s difficult to predict what sort will be relevant to a problem.

There *are* heuristics for guessing what formal methods will be relevant, though. I’ll mention some later.

### Look, Ma, no Bayes!

Before moving on: observations about Bayesianism and rationality, at two levels.

First, the classical planning problem is definitely a problem of rationality. Putting the red block on the green block first is irrational; putting the green block on the blue block first is rational. This is a problem Bayes won’t help with *at all*.

My solution was also surely an example of formal rationality; mathematical logic is the *standard* for that. But it involves no probability theory of any sort.

At the meta level: the year of hard thinking I did to solve the classical planning problem involved huge uncertainties. Was a general solution even possible? What sort of approach would work? Was I on the right track, as I pursued various alternatives? But none of these uncertainties could usefully be modeled with probabilities, I think. The issues were way too *amorphous* for that.

At any rate, I certainly wasn’t aware of using probabilistic reasoning. It’s possible that I used it unconsciously.

I find it problematic, though, when Bayesians posit unconscious probabilistic reasoning as an explanation for rationality in cases where there is no evidence. This is dangerously close to “the God of the gaps”:

You have no other explanation for the Big Bang (consciousness, ethics, whatever),

therefore God did it.

Likewise:

You don’t know quite how you solved that problem,

therefore you used Bayes.

## Reformulating rational action

My next example comes from work with Phil Agre, which led to both our PhD theses. Phil had an extraordinary series of insights into how effective action is possible (with some contributions from me).

In my Master’s thesis, I had proven that there can be no *efficient* solution to the classical planning problem. (Formally, it’s NP-complete.) Since people obviously do act rationally, this seemed a paradox.

One of Agre’s insights was that the problem formulation was wrong. That is, the classical planning problem is dissimilar to most actual situations in which people act rationally.

If a problem seems too hard, the formulation is probably wrong. Drop your formal problem statement, go back to reality, and *observe* what is going on.

Phil and I spent a couple years in careful observation, recording, and analysis of people actually doing things. From that, we developed an entirely different way of thinking about action—both what the problem is, and how to address it.

We applied as many different intellectual tools as we could find. In the end, ethnomethodology, an anthropological approach to describing action, was the single most useful. We also drew on (among others) Gibson’s perceptual psychology and Heidegger’s phenomenology of tool use. Each of these fields is highly “technical” in the sense of having elaborate, non-obvious methods, but none is “formal” in a mathematical sense.

Learn from fields very different from your own. They each have ways of thinking that can be useful at surprising times. Just learning to think like an anthropologist, a psychologist, and a philosopher will beneficially stretch your mind.

One key idea came from a cookbook. Fear of Cooking emphasizes “the IRIFOY principle”: *it’s right in front of you*. You know what scrambled eggs are supposed to be like; you can see what is happening in the pan; so you know what you need to do next. You don’t need to make a detailed plan ahead of time.

IRIFOY doesn’t always work; sometimes you paint yourself in a corner if you don’t think ahead. But mostly that doesn’t happen; and Phil developed a deep theory of why it doesn’t. One aspect is: we can’t solve NP-complete problems, so we organize our lives (and our physical environments) so we don’t have to.

### Dealing effectively with uncertainty without using probability

The classical formulation was unrealistically hard in some ways, but also artificially easy. It did not allow for any sort of uncertainty, for instance. We implemented a series of AI programs that were effective in complex, uncertain domains, where the planning approach failed. These domains involved both inherently random events and limited sensory access to relevant factors.

Our programs dealt competently with uncertainty despite *not representing it at all*. A Bayesian approach would have been overwhelmed by computational complexity; and belief probabilities wouldn’t have contributed to effective action anyway. This was the IRIFOY principle again: when our programs needed to make decisions, they could *actively investigate* to see what they needed to know. Most of the facts about their worlds were unknowable, but they could find out enough of what mattered, and ignored the rest.

It’s possible to attribute unconscious Bayesian reasoning to me, but definitely not to our programs. Anyone could look at the code and verify a total absence of probabilities.

If all you have is a hammer, everything looks like an anvil. If you only know one formal method of reasoning, you’ll try to apply it in places it doesn’t work.

Probability theory is *sometimes* an excellent way of dealing with uncertainty, but it’s not the only way, and sometimes it’s a terrible way. One reason is that it collapses together many different sources of uncertainty. For example:

- inherent effective randomness, due to dynamical chaos
- physical inaccessibility of relevant events
- time-varying causes (so samples are drawn from different distributions)
- sensing/measurement error/noise
- model/abstraction approximations (as Feynman explained)
- one’s own cognitive/computational limitations

Each of these can be complex, and often they need to be dealt with in quite different ways. Summing them up in one number is unhelpful.

### How far will that go?

The work Phil and I did was highly influential for a while, and we could have turned that into tenured professorships at top universities. But we both walked away instead. We recognized that our approach could generate five or so years of further work, but would then fizzle out.

Evaluate the prospects for your field frequently. Be prepared to switch if it looks like it is approaching its inherent end-point.

One of Feynman’s books has a memorable ranty letter to his wife, written from a gravity conference, in which he complains that the field is dying, and he’s bored stiff, but somehow the oblivious gravity theorists are still taking it seriously:

I am not getting anything out of the meeting. I am learning nothing. Because there are no experiments this field is not an active one, so few of the best men are doing work in it. The result is that there are hosts of dopes here and it is not good for my blood pressure… There is great deal of “activity in the field” these days, but this “activity” is mainly in showing that the previous “activity” of somebody else resulted in an error or in nothing useful or in nothing promising.

I had his advice in mind when I left AI.

## An AI model of problem formulation

Leslie Kaelbling, working with Stan Rosenschein, independently developed a theory of action similar to Agre’s and mine; and then independently recognized the same limitations we did. Around 1990, she and I hoped these limitations could be overcome using machine learning techniques, and we did many experiments on that, independently and in collaboration.

“Machine learning” is basically a collection of statistical techniques. As with other formal methods, they can work well when a problem is framed in terms that expose relevant features. They don’t work if your formalization of the problem is not good enough. That is fine if you view them as tools a scientist can use to help understand a problem; but our interest was in *making minds*, autonomous creatures that could figure out how to act effectively by themselves.

We considered a reinforcement learning problem. A creature is thrown into a complicated world, and at times given a reward (cookies, or maybe utilons). Initially, it has no idea what conditions cause it to be rewarded, and no idea how to act to bring about those conditions. Through trial and error, can it learn to act effectively in order to maximize its utility?

The relevant framework was temporal difference methods. Those worked well if the experimenter abstracted the world into a handful of input values whose statistical relationship with reward was fairly obvious.

But what we wanted was for the creature to figure out the abstraction itself. We didn’t want to have to formulate the problem; we wanted our program to find its own formulation.

Most sensory information is irrelevant to a task, and should be ignored. (It’s noise, relative to action and reinforcement.) But which are the relevant inputs? Without knowing that, the then-best available method would be instantly overwhelmed by the combinatorics of a realistically broad flow of sense data.

Our idea was that the creature could incrementally construct a formulation of the problem it faced by recognizing inputs that behaved statistically differently relative to action and reinforcement. Only those were relevant, and should be taken into consideration in figuring out an action policy.

With various refinements, this worked on problems that previous methods couldn’t handle.

### A little math goes a long way

When we did this research, neither of us knew much about statistics. In particular, we’d never heard of Student’s *t*-test, a basic statistical tool.

However, we did know enough about what statistics is *about*, and its vocabulary, that we could formulate one of our sub-problems statistically:

Given two sets of samples drawn from distributions D

_{1}and D_{2}, do we have enough data to know whether the two distributions are actually the same or different?

This was basically the test for whether a sensory input was relevant to action. And, having described it that way, it took half an hour of flipping through Leslie’s stats text together to find out that Student’s *t* was the tool for the job.

It’s more important to know what a branch of math is *about* than to know the details. You can look those up, if you realize that you need them.

Combined with the earlier moral that it’s good to know many kinds of math, this suggests:

Get a superficial understanding of as many kinds of math as possible. That can be enough that you will recognize when one applies, even if you don’t know how to use it.

Quite possibly the *t*-test was actually the “wrong” tool for the job. Someone who actually knows statistics might say “Oh, no! You should use Teacher’s *u*-test, because blah blah.” And they’d be “right”; that might work better, or be more “correct.” But the *t*-test solved the problem for us: the program worked.

Math only has to be “correct” enough to get the job done.

One reason for this is that there are often other, larger sources of error than mathematical details. Approximations are fine in engineering, and even in physics (as Feynman pointed out above). Mathematics *never* perfectly describes the real world.^{7} Quoting “How to do research”:

You should be able to prove theorems and you should harbor doubts about whether theorems prove anything.

Of course, it’s often good to go on to figure out the “right” answer; it might be important for other, related jobs. Or it might just be interesting for its own sake.

## Surface thinking

After I decided that “strong” AI research (making minds) was going nowhere, and after the “what should I do with my life!?!” existential crisis, I figured I’d apply what I knew to something actually useful. Pharmaceutical drug discovery (finding new medicines) seemed the best bet.

Drugs work by fitting into slots in proteins. This is called the “lock and key” model: a particular protein slot has a very specific shape, and how well a molecule works depends on how nearly it fills the hole.^{8} If you know the shape of the slot, you can design molecules to fit. But often you don’t know. Instead, you have a collection of molecules that don’t fit very well, and some that don’t fit at all, and you want to find ones that fit better.

Actually making and testing new molecules is expensive—and the number of possible molecules is infinite. What you’d like is a statistical method that would take as input a set of molecules with known degrees of fit, and could predict how well a hypothetical new molecule would fit.

I worked on this problem in a team in the early ‘90s. Many of our conceptual advances were due to Ajay Jain, who is perhaps the best problem solver I’ve collaborated with. I learned a lot from him.

I’ve found that pretty smart people are all smart in pretty much the same way, but *extremely* smart people have unique cognitive styles, which are their special “edge.”

Try to figure out how people smarter than you think.

Figure out what your own cognitive style is. Embrace and develop it as your secret weapon; but try to learn and appreciate other styles as well.

What I observed about Ajay is that he always went for the simplest, most obvious, least interesting approach, and made it work. That is not my style at all; I’m addicted to “interesting” approaches. Those usually wind up as baroque failures. Maybe I’m less prone to that after watching Ajay cut through complexity.

There’s a quote I’d like to include here that goes something like this:

Every supposed genius has a bag of tricks—a list of obscure technical methods that hardly anyone knows about, that they have mastered. Every time they hear about a problem, they go through the list mentally, to see if one of the tricks might work. They hardly ever do, but once every year or two, you get a match, and then you look brilliant, like you’ve had some staggering insight. But actually all you did was notice that percolation theory is applicable, or something.

(I thought Feynman said this, or maybe Gian-Carlo Rota, but I can’t find it.) Percolation theory was actually one of Danny Hillis’ tricks. I never saw him use it, but we used to compare our lists, and that one came up a couple of times. It stuck in my mind, and I’ve been hoping to find an application ever since.

Rota’s best trick was a method for solving a class of elliptic integrals that no one else could crack. These happened to come up a lot in hydrogen bomb design, so once every few months he’d fly to Los Alamos on a military jet and be locked in a room with some top-secret equations. He wasn’t allowed to take them away, of course, but he also refused to explain his method. He’d solve them entirely in his head and just write down the answers. He was paid well for this… I think I may have wandered off-topic.

Collect your bag of tricks.

Rota was the only professor I had who would actually explain how math works and how to do it. For some reason, mathematicians find that extremely embarrassing, like talking about their bowel movements or something, and they absolutely refuse to discuss it.

…Wait a minute! I’ve just found Rota’s “Ten lessons I wish I had been taught,” which includes the “bag of tricks” idea. It’s very funny, and has some good advice. (And it *was* Feynman, by the way! Except Feynman did it the other way around: keep a list of unsolved problems, and check them against any new technique you learn about.)

Find a teacher who is willing to go meta and explain how a field works, instead of lecturing you on its subject matter.^{9}

So anyway, back to drugs. Medicinal chemists think about a molecule in terms of its connectivity graph: its atoms and covalent bonds. That is *entirely irrelevant* to whether or not it fits into a hole. So, naturally, medicinal chemists are bad at predicting whether a molecule will work, and that is one of many reasons that pharmaceutical research is unbelievably inefficient.

Computational chemists had developed predictive models that also depended on the connectivity graph, and naturally didn’t work either. This despite the fact that everyone knew that what actually matters is the 3D shape.

This is an example of problem formulation failure. Thinking about molecular fit in terms of connectivity was doomed from the outset, because that vocabulary does not capture the relevant distinctions (shapes), and makes a lot of irrelevant distinctions (graph topologies).

Part of the difficulty was that no one had a good idea about how to represent shape. One academic group *had* developed a prediction method based on shape, but it worked only barely better than the connectivity-based methods. It used a Cartesian occupancy grid to represent shape. In other words, it had a large number of voxels, checked each to see whether it was inside or outside the molecule, and used that as the input to the statistical method. This didn’t work well. If the grid was fine enough to discriminate shape accurately enough, the number of voxels was so large that it would cause statistical overfitting.

Ajay invented a much better shape representation, blindingly obvious in retrospect. (This was an instance of his trying the simplest thing first, and finding it worked.) It simply consisted of the distances from each of a set of fixed reference points to the nearest point on molecule’s surface.

One reason this worked (dramatically well, we showed) was that *every* measurement was directly relevant to what matters: the shape of the surface. In the voxel grid representation, nearly every measurement either tells you “this voxel is not part of the molecule” (in which case you don’t care) or “this voxel is somewhere inside the molecule” (but probably not on the surface, so again it doesn’t matter).

So this is another instance of the principle that a good problem formulation is one that exposes the information relevant to the solution, and eliminates information that is irrelevant and results in meaningless complexity.

## Conclusions

Violating my main advice, this rambling brain dump included lots of irrelevant details (like how drugs work), and also failed to expose most of the key information you’d want (like interestingly specific heuristics for figuring stuff out).

In an attempt to salvage *some* value, let me try and make some of the main points again, concisely:

- Figuring stuff out is
*way*hard. - There is no general method.
- Selecting and formulating problems is as important as solving them; these each require different cognitive skills.
- Problem formulation (vocabulary selection) requires careful, non-formal observation of the real world.
- A good problem formulation includes the relevant distinctions, and abstracts away irrelevant ones. This makes problem solution easy.
- Little formal tricks (like Bayesian statistics) may be useful, but any one of them is only a tiny part of what you need.
- Progress usually requires applying several methods. Learn as many different ones as possible.
- Meta-level knowledge of how a field works—which methods to apply to which sorts of problems, and how and why—is critical (and harder to get).

If I had more time, I could do better. But, figuring out how to figure stuff out is *even way harder*. This is where the LessWrong internet collaborative approach shines brilliantly. It really needs to be a community effort.

Maybe we could start in the comment stream for this page?

How do you think about thinking? What heuristics have you found useful?

- 1.That was thinking about thinking about thinking. “Anything you can do, I can do meta,” AI folks often say. But I can do it meta meta!
- 2.My dissertation advisor wrote a book that got translated into Russian, and then translated back into English with the title “How To Hack Lisp Real Good”. He thought that was very funny and posted it on his office door.
- 3.The analogy is with NP-completeness.
- 4.Neither term is well-defined. The Stanford Encyclopedia of Philosophy gives two definitions for epistemology. The narrow definition is the study of “justified true belief”—an impoverished and unworkable framework. The wide definition is “issues in the creation and dissemination of knowledge in particular areas of inquiry.” “Rationality” is even less well defined, but often involves the use of formal, mathematical tools. This post is mostly about that.
- 5.Not above the level of elementary particles, anyway.
- 6.And he was the most important physicist of the mid-20th century, which makes him harder to argue with than me!
- 7.A unified field theory would, but only at a level that is useless for nearly all practical purposes.
- 8.And on charge distribution, and other factors; I’m simplifying this story because otherwise it will take forever, and you don’t care.
- 9.This is why I am a student of Ngak’chang Rinpoche, who is the only Buddhist teacher I’ve met who can do that.