# Comments on “How To Think Real Good”

Comments are for the page: How To Think Real Good

### Stuff

Yeah, I feel the same as Kaj. The example I was going to use was mathematics as a whole: “People say math is important. And one example given of math being important is that you can use it in baking, for example adding half a pound of sugar to another half pound of sugar. But this is only a tiny part of the problem. First you have to obtain the sugar, which may require growing sugarcane or locating a supermarket. Then you have to measure it. And sugar isn’t even an ontologically fundamental concept - defining ‘sugar’ is prior to this whole enterprise! So overall I think math is only a tiny part of baking, which means mathematicians are silly to ascribe so much importance to it, which means anyone who tries to teach math is part of a cult.”

As for your own epistemology, it reminds me a lot of virtue ethics, so much so that I’m tempted to call it “virtue epistemology”.

My critique of virtue ethics (see the part in italics after the edit here) seems like it could fit here as well (oh, man, I actually used that same grammar metaphor - apparently I am nothing if not predictable). Yes, the natural human epistemological faculty is extremely complicated and very good at what it does and any mathematization will be leaving a lot out. On the other hand, I think it is good to try to abstract out certain parts and fit them to mathematical formalism, both for the reasons I describe here and because a very inadequate partial understanding of some things is the first step towards a barely adequate understanding of more things.

Since I’ve already stretched that poor grammar metaphor to its limit I’ll continue Kaj’s discussion of physics. Imagine if someone had reminded Archimedes that human mental simulation of physics is actually really really good, and that you could eyeball where a projectile would fall much more quickly (and accurately!) than Archimedes could calculate it. Therefore, instead of trying to formalize physics, we should create a “virtue physics” where we try to train people’s minds to better use their natural physics simulating abilities.

But in fact there are useful roles both for virtue physics and mathematical physics. As mathematical physics advances, it can gradually take on more of the domains filled by virtue physics (the piloting of airplanes seems like one area where this might have actually happened, in a sense, and medicine is in the middle of the process now).

So I totally support the existence of virtue epistemology but think that figuring out how to gradually replace it with something more mathematical (without going overboard and claiming we’ve already completely learned how to do that) is a potentially useful enterprise.

### Confusion about differences

I would like to start by pointing out that it’s really hard to understand claims that someone else makes about epistemology. We like to understand what someone else says through our own lense of how the world works. People don’t change their own epistemology fundamentally after reading a single blog post that illustrates a weakness in their way of viewing the world.

Most people take months for a process like that instead of a few hours.

In my perspective there seems to be a clear disagreement.

Elizeer says that Bayesianism is necessary while Chapman says it isn’t.

Chapman seems to argue that it’s often useful to reject the use of Bayes formula to reduce the complexity of a model of a problem.

According to Chapman that’s even true for some AI problems that are completely formalized in math.

Chapman described that he sometimes uses antropology. If I looks through LessWrong relationship to antropology I find post titles like “Anthropologists and “science”: dark side epistemology?” by Anna Salamon.

There are problems that are more likely to be solved by learning to be a better anthropologist than by learning to be a better bayesian.

On another note the concern that it problematic to boil down all concern about uncertainity to a single number is also absent from less wrong.

Nassim Taleb writes a lot about how people get irrational by trying to treat all uncertainity as a matter of probability.

Scott writes in his latest post “An Xist [Bayenian] says things like “Given my current level of knowledge, I think it’s 60% likely that God doesn’t exist.” If they encounter evidence for or against the existence of God, they might change that number to 50% or 70%. instead of ” “You can never really be an atheist, because you can’t prove there’s no God. If you were really honest you’d call yourself an agnostic.”“.

In a recent facebook post Taleb wrote: “Atheists are just modern versions of religious fundamentalists: they both take religion too literally.” By which he means that atheist make an error because they think that religion is really about God.

If you start by thinking in probability of whether God exists you still think it’s about God and block yourself from framing the issue differently and perhaps getting a different understanding of it.

That also has interesting implications for whether LessWrongism/Bayesianism is a religion ;)

### One important distinction

I generally agree with what Kaj Sotala says. Bayes isn’t the whole picture, there’s obviously lots of other important parts to thinking, particularly things like forming the correct categories, asking useful questions, and deciding what to do with your answers. Obviously individual cases of thinking, in particular solving toy problems, may have the solutions to some or all of these steps handed to you on a platter, so only some subset are required.

I mentally picture thinking as a serial process steps like noticing the presence of an interesting problem at the start, steps like formulating a good vocabulary further in, updating on evidence still further in, and stuff like deciding what to actually do near the end. Trying to use Bayes theorem for any step other than updating_on_evidence makes about as much sense modifying your car to use its engine as a wheel, but concluding that Bayes is therefore bad makes about as much sense as throwing out the engine because it isn’t circular enough.

Where I think we diverge, if we diverge, is that Bayesian thinking (preferably applying the theorem, otherwise using it as an approximation or a constraint) is the right way to do an important part of the thinking process. If you do something else instead, then you will get worse results.

There is, however, an important distinction to make between Bayes, and the content of this post. Bayes is mathematised. It is both precise and provably optimal for its task. What you give here are heuristics, they are probably better than what people do naturally, but not precise enough for the question “are they optimal” to even be meaningful. This is not a criticism, you never claimed they were anything else, but it is a distinction.

I do not claim to have a mathematically precise and optimal method most stages of the thinking process, but where we might diverge is that I (and I think most LessWrongers) would like to have one and think there is reasonable hope of getting one, whereas many people (including you?) either don’t think such a technique can exist, or wouldn’t want it to.

This is what makes Bayes (the theorem combined with the idea of beliefs as probabilities) so important. It serves both as an example of what we want and as evidence that this might be a sane thing to look for, if we did it once we’re more likely to be able to do it again (look! Bayes! … sorry).

In addition, I suspect that because the updating_on_uncertain_evidence part of the problem is solved it looks easy, at least to smart people like you. The finding_a_good_way_of_talking_about_the _problem part hasn’t been solved, so it looks hard. This makes Bayes look less important than it is, because it only solves the ‘easy’ bit of the problem. The mistake in this thinking should be obvious.

I also don’t think I’m taking an atypical view for LessWrong anywhere here, since I’m pretty much just rephrasing some posts from the sequences (which sadly I can’t find). If you think that the above needs to be emphasised more, then I fully agree, and will add it to my long list of ‘things that aren’t very good about the actual day-to-day practice on the LessWrong message board’.

### The Best Introduction

First off, I want to say thanks for posting this as is, rather than spending more time editing it. I don’t think your worries about not spending enough time writing it were founded; I think it was clear enough to get your point across and I enjoyed the flow.

I agree with Kaj that I think what you’ve written so far would fit well into the LW consensus. I’ll also add on the specific encouragement to post it on LW. I came to LW, was excited, but then looked around and asked myself “where’s the decision analysis?” It’s an existing academic and professional field devoted to correctly making decisions under uncertainty, which seems like it should be a core component of LW. I was familiar with it from my academic work, and realized that the easiest way to get it onto LW was to put it there myself. So I wrote a sequence on the mechanics of decision analysis, and reviews of important books related to DA, and they’ve all been positively received. I’m also exposed to much more in the way of potential collaborators and other interesting ideas from being on LW regularly.

In particular, I want to comment on:

Understanding informal reasoning is probably more important than understanding technical methods.

I’d modify this slightly; the self-help version is **improving your informal reasoning is probably more important than improving your technical methods**. In particular, the relevant LW post is The 5-Second Level, which argues that in order to actually change rationality in the field, you need perceptual triggers and procedures, rather than abstract understanding that is not obviously useful and thus never gets used. This quote in particular might interest you:

The point of5-second-level analysisis that to teach theprocedural habit, you don't go into the evolutionary psychology of politics or the game theory of punishing non-punishers (by which the indignant demand that others agree with their indignation), which is unfortunately how I tended to write back when I was writing the original Less Wrong sequences. Rather you try to come up with exercises which, if people go through them, causes them to experience the 5-second events - to feel the temptation to indignation, and to make the choice otherwise, and to associate alternative procedural patterns such as pausing, reflecting, and asking "What is the evidence?" or "What are the consequences?"

And now I’ll finally get around to the subject of this comment: I agree with you that Bayes is probably not the best introduction to LW-style rationality, because it’s only part, and not necessarily the core part. I’m not sure what is the best introduction, though. A common one seems to be heuristics and biases- but that’s not that great either, because the goal is not “find out how you’re dumb” but “find out how to be as smart as possible.” Finding errors to fix is important but insufficient.

What I think is best currently is just introducing the idea of ameliorative psychology (I was introduced to the term by Epistemology and the Psychology of Human Judgment, which I had read before finding LW). Basically, the premise seems to be “thinking real good is a skill that can be studied, researched, and developed.” If you could tell people in 15 minutes the fundamental secret that would make them awesome, there would be many more awesome people running around- but I think that in 15 minutes you can tell people “hey, decision-making is potentially your most important skill, and you might want to actively develop it.”

### @David

@David

I was tempted to just write “what everyone else said”, but instead of joining the agreement circlejerk, I’ma instead complain that even though I get what you mean, and I like the references, I still want moar.

It’s a flaw of the existent sequences that they already assume more-or-less complete formulations of what the question is, and only care much about picking the right choice once you already have some idea what is sensible and relevant to the question. (The only real counter-example I can think of off the top of my head is Harry in HPMOR’s chapter 16 enumerating as many ways as possible to use the objects in a room for combat purposes.) I don’t think most people on LW would substantially disagree with any of these points, or that this flaw exists, but still, so far no one really complained much about it or attempted to fix it.

So I’d still like to see worked examples of problem formulation, if only because I’m a selfish jerk who wants to get as much practice out of you as possible. Thus, I’m totally unconvinced and disagree with everything, you’ll have to convince me more, with concrete examples and more anecdotes, and last time I counted the number of your long-term projects is still single-digit, so why aren’t you writing more books already?

(This is of course the same tactic as publicly declaring that “X SUCKS AND X-ISTS ARE STUPID, X DOESN’T EVEN Y” when you’re too lazy to figure out how to do Y yourself, ‘cause it will trigger many an X-ist to write such excruciatingly detailed explanations for you.)

### "Catherine G. Evans"

It’s Cath*a*rine, peasant.

### Virtue Epistemology Is A Thing

You, the generic reader, can read more about standard virtue epistemology in encyclopedias :-)

As a sometime student in this area who is aware that her idiosyncratic fandom puts her in the minority, I think it might be helpful to explain why I bother with it. In a nutshell, I think human beings can be understood by assuming they use a lot of reputational reasoning, and I think reputational reasoning is understudied relative to its importance, possibly because the area is both controversial and large, and hence hard to make progress in.

The more totalizing meta-ethical approaches both seem to help structure existing human debate practices that are relatively formal, like “planning stuff out as a group in order to achieve the goals of the group” (consequentialism) and “writing and interpreting laws for settling disputes” (deontology), and then for purposes of philosophy they turn the idealism up to 11 by “aiming for utopia” and trying to figure out “God’s laws”. Then the idealized reasoning can be backported into more pragmatic areas and used extensively in the day to day application of power.

My interest in virtue ethics stems from the observation that a huge amount of how actual people seem to actually behave in non-governmental contexts (eg as individuals and in small groups) is significantly accounted for by highly social reasoning (at least partially implemented by instincts) that builds off of an expectation that teams win against individuals and that the fundamental attribution error is a shortcut that many of the agents in the system will use and expect others to use. Theories of attribution, imputation, and reputation become relevant, and it seems that human beings do a lot of tacit reasoning about the extended consequences of these processes. From within discourse processes you can see evidence for this, as with exhortations to think in terms of goals rather than roles. Advice to think in terms of roles rather than goals is less common because people are often already often doing that.

Or another example: the old forms of rhetoric included not just logos (appeals to logic) and pathos (appeals to passion) but also ethos (appeals from the character of the person speaking)… however, children don’t have very much distinctive character to practice and older people are usually less receptive to teaching. If you put ethos in your text in modern times, even justifiably, (rather than leaving it safely in your subtext) you’ll often be accused of a “logical” fallacy (authority or ad hom are the usual ones) especially if your interlocutor is arguing in bad faith… which is of course a conclusion people leap to so often with such bad consequences that some communities have formal advice to do the opposite . The whole area is ripe for trolling because it is emotionally and intellectually complicated.

If there is an obviously singularly awesome way to explicitly think about reputation in general and epistemic reputation in particular, it has not been formalized yet, to my knowledge.

Thus, if you want to understand and engage with most actual people *according to an explicit and helpful theory* rather than by feel, descriptive empirical virtue ethics research is probably indicated. Scott thinks virtue ethics isn’t that useful but I think that’s because he wants his theory to tell him which policies to insightfully advocate more than he is trying to formally Solve Ethics before the singularity arrives, while (when I’m in a rare crazy heroine mood) the latter seems more urgently important to me than the former.

A barrier to research (and potential object of study in itself) in virtue ethics in general and epistemic virtue ethics in particular is that there are a lot of conflations and confusions that happen in the tacit reasoning people actually engage in and in various theoretical accounts of it. If someone had a good theory it would sort of be about who is the most authoritative (and thus who should be deferred to when a specific debate that falls within their jurisdiction?). It would also sort of about who is best at discovery (and thus who deserves research grants?). It would also sort of about pedagogic ideals (and thus curriculum design?). It would also sort of be about “fitness to rule”, which is an independent reason to predict that the subject will make people crazy :-(

For what it is worth, I thought David’s original essay was full of interesting thinking on the operationalized pedagogic side of the issue of “how to think good” that I’m grateful for having been collected together (I’ve bookmarked a few things for later study). And I am sympathetic to the difficulty of talking about the general subject area (which sort of inherently involves various status moves) in public forums where all kinds of signalling/marketing/status stuff that relates to real world resource allocation is happening simultaneously. It seems possible that even if everyone party to the discussion was a Bodhisattva and knew that the others were as well, there might still be somewhat complicated interactions because of predictable pragmatic consequences of not doing at least a bit of zigging and zagging. Consider how weird dharma combat looks.

Despite the various constraints people are working within, I also suspect the ongoing back and forth process (including the previous posts on various blogs) has been pedagogically educational for a lot of lurkers and is praiseworthy on that basis alone :-)

### Model comparison

You complain that probability theory “collapses together many different sources of uncertainty”. It doesn’t. People do that. Probability theory is perfectly capable of separating all of these, if programmed by a willing and well-educated user.

You say that you solved your brick-stacking problem using no probability theory, only logic, but have you noted that binary logic is a limiting case of probability theory? Further, have you noted that zero uncertainty on any issue (needed for Aristotelian logic to work) is typically only an approximate model for how well we really know the world?

You seem to characterize the typical Bayesian as somebody who believes that all inference must explicitly involve Bayes’ theorem, but I think you’ll struggle to actually find many people who hold that view. What I believe (along with many others) is that the success of any procedure of inference is derived from its capacity to mimic the outcomes of probability theory. Furthermore, we can often gain insight into the value of some heuristic procedure by comparing its structure to that of probability theory.

I applaud your efforts to develop efficient heuristics for problem solving. Problem formulation is certainly a fascinating and highly important topic, and simply being Bayesian doesn’t guarantee an effective toolkit for this. Your analysis of the relationship between probability theory and problem formulation is flawed, though.

You say:

“I’ll talk in general about problem formulation, because that’s an aspect of epistemology and rationality that I find particularly important, and that the Bayesian framework entirely ignores.”

In a comment on another thread you directed me here, saying: “Choosing a good hypothesis space is typically most of the work of using Bayes’ Theorem.” Certainly, without a suitable hypothesis space, we are stuck. To test whether a set of hypotheses is appropriate for solving some problem, we will (usually informally) look at the various P(D|H) - if they are all extremely low, then we should try again. But is there a way to formalize this procedure? Some methodology that our informal methods seek to approximate? By golly, I believe there is. It’s called Bayes’ theorem. See, for example, my articles on model comparison and model checking.

There is actually an active sub-discipline, within ‘Bayesian’ research, attempting to develop efficient techniques for hypothesis formulation. You should find out what is being done within a framework, before declaring what it ignores.

### Categorization is the true 'Beginning of Infinity'

Hey Tom, when you have a hammer, everything looks like a nail. I wish Baysesians good luck trying to use Bayes theorem to formalize a procedure for testing the hypotheses space (‘model comparison’ and ‘model checking’). They won’t succeed, because the process of problem formulation can never be reduced to probabilistic reasoning.

The main usage of probabilities to date has been prediction, but as David Deutsch convincingly argues in his two superb books ‘The Fabric of Reality’ and ‘The Beginning of Infinity’, science is not primarily about prediction, it is about *explanation*.

Yes, binary Aristotelian logic is indeed just a special case of Bayesian (probabilistic logic), but this sort of thing should actually be highly alarming to Bayesians, because it demonstrates that a framework that every one thought had ‘solved’ logic for centuries, was in fact, just a special case of something far more powerful, namely probabilistic (Bayesian) logic. What makes you think that Bayesianism won’t suffer the same fate?

What framework could be more general than probability theory you cry! I can suggest an answer: Categorization/Analogical Inference.

Picture a mind as a space, and ‘the laws of mind’ are analogous to the

principles of cognitive science.

Now in this ‘mind space’ picture the ‘mind objects’ - I suggest these

are logical predicates - symbolic representations of real objects. How

do these ‘mind objects’ interact? I suggest picturing ‘mind forces’

as analogous to the ‘strengths of relationships’ between the mind

objects (predicates or variables) so ‘mind forces’ are probability

distributions. But what about the background geometry of mind space?

I suggest picturing ‘curvatures’ in the geometry of mind space as

analogous to concepts (categories or analogies).

Then Symbolic logic is the laws governing the mind objects (rules for

manipulating predicates). Bayes (Probability Theory) is the laws

govering the mind forces (rules about probability distributions), and

Analogical inference (categorizaton) is the laws governing the

geometry of mind space itself (concept learning and manipulation).

Bayes is but a stepping stone; Categorization is the real ‘Beginning Of Infinity’.

### Single-number probabilities, etc

David, thanks for this post. The “brain dump” worked out well, and I think the comments show that thoughts were nicely provoked.

Christian notes “the concern that it problematic to boil down all concern about uncertainty to a single number is also absent from less wrong.” I want to add a few more reasons to limit the application of Bayesian calculations. In the terms of Scott’s blog-post, I advocate keeping one foot in Aristotelian certainty, and one foot in “Wilsonian” relativism, while keeping a solid footing in probabilistic beliefs. (I’m a quadruped, or maybe a millipede.)

On the Aristotelian side, there’s plenty of stuff we just flat-out believe, and that’s as it should be. There’s no need to put a number on it, nor does that mean that the belief isn’t revisable. You just revise it when the need arises, without specifying in advance how much evidence would call for such revision.

On the relativistic side, sometimes just saying “I don’t know” is better than saying “50-50 chance” or any other number. This isn’t *just* to say that pulling numbers from one’s nether regions is too much bother with not enough payoff. After all, Bayesianism and (my real target) various utility theories are supposed to be normative, not prescriptive. So it’s no mark against them to admit that putting numbers on beliefs and crunching them in the Formula is seldom necessary. Rather, the point is that even if you had all the time in the world, or just a burning epistemological curiosity, sometimes there is still no point in putting odds on a proposition. We don’t always have to bet. We don’t always have a uniquely appropriate reference class with known statistics. There just isn’t any sufficiently general reason to generate probabilities, that I am aware of. And I’m willing to bet :) that the reason for the unawareness is the nonexistence of that sufficiently general reason.

### Physics

Marc,

when the methods of science work, it is because they successfully mimic (and sometimes even directly employ) probability theory. For anybody to say that “the main usage of probabilities to date has been prediction” is grossly over simplistic. See my posts Total Bayesianism and Natural Selection By Proxy as simple explanatory examples.

There may be an effective theory of inference that is more general than probability theory, but from your description, it won’t be categorization inference. You claim this theory as some sort of framework modeling mind spaces and mind objects. That’s fine, if it really works, but what I’m concerned with is much more general than that - real spaces, and real objects, of which mind spaces and mind objects are quite a small sub-population. Since probability theory handles this wider scope quite well, I dare say whatever your theory of minds can get right, probability theory can replicate. These are physical entities we are talking about, if you want to infer their mechanics, observations and probability theory will do just fine.

### real-world v's abstract reasoning

David,

Thanks for taking the time to reply in detail. You ask if I think that differential calculus is part of probability theory: No, it’s not. In fact differential and integral calculus are indispensable for deriving much of probability theory.

Yes, we can do some sort of inference with differential equations, but that inference is strictly limited to an abstract domain – the entities we learn about do not actually exist. When I say that probability is the ultimate model for solving problems, I really mean problems of inference (*including* problems of decision – inferring, ‘what is the correct thing to do here’) about the real world. You can’t solve such problems without using something that at least approximates probability theory.

You promise a counter example involving airoplane design, but you fail to deliver. No, I’m not saying that fluid mechanics and what-not are the wrong tools to employ, but these are models of reality. Inferring appropriate models and assigning values to the coefficients in such models are issues of judgment under uncertainty. Furthermore, the expected cost of getting the model or its parameters wrong is not known precisely. If we are strictly honest with ourselves, we will acknowledge the very small but non-zero probability associated with the divine-intervention theory of flight, and we will recognize that our knowledge of the physical parameters of any suitable model is best described by a probability distribution. We don’t need to explicitly use Bayes’ theorem in this case, but that is only because the probability distributions are sufficiently narrow (actually, would be sufficiently narrow, if we could be bothered with the analysis – probability distributions do not exist on their own). Thus, BT is not the ‘best tool for the job’ in this case.

Thus, when we learn something about the real world using pure mathematics, it is probabilistic learning – e.g. if I build an airoplane using this procedure, it’ll probably fly. We need probability, or some workable approximation, to bridge the gap between abstract reasoning and understanding of the real world.

Incidentally, if you wonder how we could ever have inferred the principles of abstract reasoning (needed to derive probability theory), I would suggest that our ability to effectively do so comes from using procedures that can be modeled as approximately probabilistic. Probability theory works because it works. Human brains work because they approximate probability theory. Abstract reasoning works because it is derived by human brains, figuring things out under uncertain empirical input.

You say that predicate calculus contains probability theory. Granted. But if you know a way to actually learn something about the real world using elements of predicate calculus that are not probability theory in disguise, then please show it to us. There are some simple statements that seem to be demonstrable in this way, such as the cogito, but if you make use of logic, how do you know that your formalism will deliver the truth?

I’m afraid all the references I have at the moment on problem formulation relate to model comparison, which I see as a major component, though you evidently don’t. I remember reading something that tried (not massively successfully, as I vaguely recall) to go deeper, in terms or how to actually originate hypotheses efficiently, but I can’t find it now, sorry. But let’s consider a problem like inferring the probable cause of global warming. It is inefficient for us to include in our hypothesis space something like “global warming is caused by this grain of sand.” Why am I confident in saying that? Because nobody has ever observed a correlation between the identified sand grain and global warming, or even any especially analogous correlation. (I might be wrong to eliminate this hypothesis, but the expectation is that I am right – hence the efficiency of not including the idea.) How do we arrive at a principle like that? It is something overwhelmingly reinforced by our experiences from the moment we get pushed out of somebody’s uterus. What is the model for learning from experience? Probability theory.

One last thing I’d like to emphasize: when I say that model comparison is crucial, I don’t mean that we must laboriously go through Bayes’ theorem each time we want to make progress. Valuable heuristics exist, and probably even more potent heuristics await discovery, but again, their success is, or will be, derived from their ability to mimic probability theory. We can use this fact to guide our search for these heuristics, to prove their value, and to establish their limited domains of applicability.

### Newsflash: there are no gods!

David,

We certainly have much common ground. Hopefully my following long-windedness will not break your blog:

Your clever analogy gives me the perfect opportunity to explain more clearly what probability theory (PT) is. I don’t need to explain the privileged status of my god. PT is just another formal model, like all other mathematical tools. It is necessarily approximate (even at the quantum level), and like all other models it refers only to abstract entities. I alluded to this earlier, when I said that probability distributions do not exist on their own.

It’s what probability is a model of that makes it the king of theories: it is a model of what a rational agent is entitled to believe about the real world. This is why I say that it is a bridge: I could have the most sophisticated physical theory, with the most elegant mathematics ever devised, yet without any basis for attaching credence to it, declaring it to be the truth would be perverse.

The abstract entities that probability speaks of are rational agents. These idealized entities need to know certain things with absolute certainty. This doesn’t describe real objects. (I discuss this in Extreme values: P = 0 and P = 1, which verifies is your apparent feeling that some Bayesian’s neglect the theory ladeness of their art – I sympathize with your wish to correct technically competent people who make such rooky mistakes.)

In particular, the modelled rational agent must be absolutely confident (and rationally so) that some background information (a set of modeling assumptions) is true. Probability provides no means to guarantee this. This, you have correctly identified, but you seem to see it as proof that there must be another theory, waiting to be developed, capable of plugging this gap. What you are asking for is omniscience.

This kind of infallible knowledge is impossible – what would it mean? How would I know that my knowledge was infallible, and not just a mistaken perception of infallibility? The omniscient deity disappears up its own backside, in a puff of logical incoherence.

This uncertainty about the validity of the hypothesis space, etc. is handled quite gracefully by PT, though – if I’m worried that my search space is badly formulated, I just apply the theory recursively, starting from a higher level. (This shows that there is actually an alternative to infallible knowledge, but it’s also impossible: calculation on an infinitely deep hierarchy.)

You say that an ability to recognize useful analogies (correlations) is the hard part of learning, and I couldn’t agree more. But I see no credible reason to even suspect that this is not part of learning by experience. For humans that experience can be assumed to be a combination of personal experience and evolutionary history (hardware predisposed to learn efficiently).

You say that “probability theory by itself is definitely not a full theory of learning from experience.”

Let me be pedantically clear:

(1) I said that PT is the model for learning from experience. By this, I don’t mean that PT describes the microscopic mechanics of brains or digital circuits, or whatever other devices you have in mind. Much less a description of biological evolution by natural selection! When I say model, I really mean aspiration: what would a rational mind, exposed to the same information infer? PT prescribes reasoning that is in a limited way optimal (computational overheads are not taken into account), such that if the output of some real machine differs significantly from a probabilistic calculation, then we should expect that our machine has not got the most out of the data (subject to caveats about theory ladeness – but recall if we suspect that such a difference is accountable reasonably by invoking differences in effective model assumptions, then we can check that without needing further technology beyond probability).

(2) In order for PT to do any work, there needs to be some hardware – a substrate capable of implementing mathematical operations. Furthermore, in order to allow learning from experience, it is obviously necessary to have some mechanism for that hardware to receive input. PT, literally by itself is literally nothing. Apart from these obvious prerequisites, you give no evidence that probability “by itself is definitely not a full theory,” (i.e. an insufficient model for optimal inference) nor any reason to suspect this in the slightest.

Finally, you ask why I am so confident that PT successfully models rational inference. As described above, you cannot remove all uncertainty – probability is reasoning under uncertainty, so that’s a good start. My confidence (actually not 100% confidence, but I have found no credible reason for non-negligible doubt) in probability as the appropriate theory comes from the obvious undeniability of Jaynes’ desiderata, from which he derived the entirety of PT (obviously making use of pre-existing maths). These desiderata are:

- reliability of propositions to be encoded using real numbers
- qualitative correspondence with common sense
- all ways of answering the same question must provide the same result
- all relevant evidence is taken into account
- equivalent states of knowledge are represented by the same probability assignment

To deny any of these, we need to demand things that are obviously perverse, such as that our theory should provide different procedures that give different answers to the same question. These desiderata quite simply describe rationality, which is PT’s stated goal.

Nobody has given any evidence that PT is deficient in any avoidable way for inference under uncertainty. Nobody will ever demonstrate that uncertainty can be fully eliminated (or fully quantified). After a horribly long-winded explanation, these are the simple facts that account for my confidence in the appropriateness of PT.

Thanks for reading (if you got this far).

Cheers,

Tom

### Rapture of the nerds; Ainslie's cure for rationalism

I’m late to this party, but have to chime in because a puckish fate has landed me in the midst of the Silicon Valley LessWrongers, who congregate at the office of my new job. I have had a hard time holding my tongue at some of the nonsense. You are doing a good job of articulating the issues and problems and being reasonably polite, which I admire.

I don’t think the problem is Bayesianism so much as it is an extraordinarily nerdy picture of what life is all about. Bayesianism is a symptom of this more fundamental underlying problem. The nerd-rationalist views thinking as a matter of (a) figuring out true beliefs and (b) optimizing some utility function, which just is not how normal people spend their time.

Not that there is anything wrong with truth and utility and problem solving; they just don’t form a very complete picture; they don’t provide the universal acid of mind that the cultists would like. The overapplication of Bayesianism is just one manifestation of a more general tendency to glorify abstract and oversimplified thinking at the expense of the rest of life. Nerds tend to live in the their brain and treat their body as something alien. The nerds of my era (and I am one myself, pretty much) tended to ignore their body entirely; the new breed treats it as another engineering optimization problem, and applies various diet and exercise fads while waiting for the problem to be solved properly by developing the technology to upload their minds into gleaming robot shells.

I am being too negative. In fact I have a complex relationship with all this stuff, I am simultaneously attracted to and repulsed by the hypernerdishness of the cult of rationality. To rescue this comment from being pure ad hominem nastiness: you said somewhere I think that you didn’t believe there could be a single simple totalizing model of thought, which I agree with of course. The work of George Ainslie has some tantalizing ideas about why this might be the case; not that I buy it or understand it fully, but his theory is that minds are inherently structured so as to be unpredictable to others and to themselves, for solid practical reasons. But it’s real strength is that it is a theory of mind that is rooted in desire and action, not abstract model building. Anyway, I recommend it.

### This critique of bayesianism is spot on

I have to admit I’ve learned a lot by reading this few blogs and a bit of the book itself. After reading a good chunk of what chapman wrote, I had to admit that I had fallen for a cult.

Which is weird, because I said to myself I recognized the “cultish” parts of LW, and the importance of the language/model. But actually, I now see that I had attributed way too more “weight” to the powers of the bayes rule. It is really just a wrench.

### Not quite sure we are speaking the same language

Hi David

Again, my lack of brevity feels to be bordering on rude. My apologies.

You write:

My main point is that omniscience is impossible

yet a little later, you complain that,

Applying the theory recursively generates an epicyclic infinite regress, I think; it's unworkable.

It is only unworkable if you insist on trying to be omniscient - being absolutely certain that your model is correct. Why, in your attempted critique of probability theory (PT), do you keep drawing attention to the tentative nature of the modeling assumptions, if you accept that no other theory can eliminate this problem?

The part of this before the parenthesis, and the sentence in parentheses, seem to contradict each other.

Apologies for not being clearer. I meant that if we want to try to remove the weaknesses of PT, there are two avenues we might investigate, (1) omniscience and (2) some infinite computation. Both are impossible to achieve, therefore the weaknesses of PT just have to be lived with.

I'm skeptical that "analogies" is a useful way of thinking about learning.

All learning is a process of identifying useful analogies. Important examples include: analogies between an equation and a physical process, analogies between past behavior and present behavior, and analogies between different physical systems that appear to share important traits: this object looks like a human, therefore I expect it to behave like a human, maybe I should say “hello”. Learning is exactly the process that provides confidence that saying hello is a useful thing to do in the presence of such an object.

You don't seem to have made any argument that PT is a theory of learning. The burden of proof for that is on you; it's not my job to disprove it, unless you have some argument that it is one.

I think you are still talking about something different. PT is not a “theory of learning” (a description of how real machines acquire information), it is a description of optimal learning, assuming some (unavoidable) assumptions. My argument for this is (1) Jaynes’ desiderata, and (2) the glorious success of the resulting theory, e.g. everything that is good about science can be shown to be an approximation of PT (you’ll perhaps argue that science relies on imagination, but I’ll counter argue that imagination is useless without some means to compare its products to reality, and any efficient imaginative process is very likely derived from experience), and every time science goes wrong can be traced to an important violation of PT.

If PT were a model of learning, then the machine learning field would consist of a single well-defined problem

Wrong, PT is not an efficient way to actually solve all inference problems. Even using PT, a suitable set of assumptions is required to set up each problem – this requires trial and error, subject to the randomness of how nature decides to reveal herself. There is no way around this.

How, by the way, do you propose PT is a theory of "recognizing useful analogies", if you think that's the essence of learning?

Um, do you want me to start from the beginning again? Recognizing and implementing useful analogies are exactly the functions of PT.

The last part of your comment is about PT as a theory of inference, not as a theory of learning. Those two are not at all the same thing.

Getting weirder and weirder. May I recommend a dictionary? Am I missing something?

[PT] has nothing to say about logical quantification, which is where serious inference starts.

I’m not completely sure what you mean by logical quantification, or why you think PT doesn’t address it. I assume you are referring to statements like the canonical “all men are mortal, Socrates is a man….,” but this kind of reasoning is derived trivially, if I’m not mistaken, from Bayes’ theorem. Please give an example of any situation where learning takes place in a way that PT can’t encompass (honestly, I’m interested).

Maybe we are wrong, but at minimum you can't just assert it as an unquestionable axiom.

Actually, Jaynes was quite emphatic that it is not an axiom – there is no attempt to assert its truth, hence ‘desideratum.’ I’m somewhat open to the possibility of a useful extension to PT using complex representation (and I privately predicted that this would be the point at which you would attack my *argumentum ad desideratum*), but I have yet to see any strong evidence. PT provides expected frequencies, and I’d need very strong evidence to accept the necessity of representing frequencies as imaginary numbers. I confidently predict that any improvement would be merely a result of making the algebra more efficient. In any case, I further predict confidently that whatever works about PT now will not be significantly changed by such a move – wherever PT justifies making a strong connection between probabilities and frequencies, PT always provides the goods.

On the contrary, it is you making the strong assertions, declaring PT to be definitely unable to do the job it sets itself, which you haven’t supported at all.

### Utilitarianism

David,

In your reply to mtraven, you say

Utilitarianism is a more important target, because it's the rationalist-eternalist's account of ethics.

I think I understand you here, and if so, there are important ways that I agree, though I’m not sure that your characterization of utilitarianism is strictly correct. I’ll give you my version of non-eternalist non-nihilism (which I call utilitarianism), perhaps you agree with it:

I understand that eternalism here refers to a belief that value is set by some external mechanism, which of course is nonsense. Please note, though, that this is not a necessary (or even plausible) belief of rationalism. I consider myself a rationalist and a utilitarian, but I certainly don’t hold any ‘eternalist’ principle. The word ‘utilitarianism’ may have a narrower meaning in common usage than I am attributing to it, in which case, forgive me (I think that Bentham, for example, did hold some kind of eternalist axioms about how to infer value). In any case, I believe I’m using the only meaning that the word *should* have: the value of an action is determined by the value of its outcomes.

I accept that very many are confused about the issues here. There is frequently a conflation of the statements “value has no magical origin” and “value does not exist.” The resulting conflict between the obvious fact that value has no magical origin and the obvious existence of value is ‘resolved’ by various tortuous concatenations of logical fallacy (“I don’t want to be a nihilist!” they scream, without noticing the implications of those words). Many professing rationalists do incoherently hold an eternalistic notion of ethics, which you are right to criticize (very much so), but it isn’t a necessary feature of rationalism, in fact the opposite.

Value is evidently a property of minds (and thus evidently exists). Learning about value, therefore, entails inference upon empirical data concerning minds. Inferring what actions will most probably bring us the things we value is similarly an operation on empirical experience. Thus determining ethics is a rational undertaking: to desire something (to attribute value to it) is to desire a means to maximize one’s expectation to achieve it, which is rationality. (We have no omniscient access to the actual value of an action, so we must make do with its expectation.) There is no external or eternal principle here: my values are inside my mind, and when I die, my values disappear with me.

It’s a matter of considerable puzzlement and sadness to me that so many people have difficulty accepting this simple logic, even when it is laid out for them in detail. Far too many scientists, for example, are happy to sit back and say “well that’s the science, what you do with it is another matter, that’s politics,” as if humanity is somehow separate from the ‘normal’ constituents of reality.

### Utilitarianism

The general approach is: "you acknowledge that no existing brand of utilitarianism works, but you have religious faith that some version somehow must work; why is that?"

I’ve found that utilitarianism works for practical purposes - yes, there are all kinds of weird corner cases like “utilitarianism says we should re-wire everyone’s brains for maximum pleasure” or the various paradoxes that it produces in population ethics… but I don’t particularly care about those, because I’m not in a position where I could re-wire everyone’s brains, so the question of whether or not I should do so is irrelevant. In the kinds of ordinary situations that we tend to encounter in real life, utilitarianism works fine, and that’s enough in my book.

### Probabilistic language acquisition

It's conceivable that probabilistic/statistical methods could help guide that process. You could consider proposed rules "hypotheses" and score them probabilistically. I don't know the current state of research in this area; when I last checked 20 years ago, that wasn't the approach anyone took.

Probabilistic models of language processing and acquisition is one relevant overview, though it’s back from 2006.

### Probabilistic Language Acquisition

Suppose that instead of a corpus of known grammatical sentences, we have a corpus in which almost all of the sentences are grammatical, but mistakes are possible.

Then it seems obvious that we should call in probabilistic reasoning, and that the case where all of the sentences in the corpus are known to be grammatical is a special case where the probabilities are all 1s or 0s.

I do agree that the benefit from making the hypothesis generation system clever is larger than making the scoring system clever. I don’t have much trouble seeing this as an approximation of PT, though: you’re guessing where the probabilities will be high, and then looking there to check.

I do wonder if this is mostly a discussion over terminology, and if so, to what extent refining the terminology is useful. This is, I think, a reflection of my seeing “probability theory” as a large component of “reasoning under uncertainty” which is deeply connected to the rest of “reasoning under uncertainty”. The deep connections make it hard to draw crisp lines separating disciplines, and make me pessimistic about the benefits of attempting to draw crisp lines. The question of whether tractable approximations of PT fall under PT sounds to me like the question of whether Machine Learning is a branch of statistics and I have a hard time caring about both questions.

To use an example from decision analysis (DA), the scope of DA goes from a problem description in natural language to a recommended action, but this covers a wide range of obstacles and techniques.

The easiest part of DA to discuss is the math, specifically expected value estimates and VNM-utility, *because* it’s math and thus easily formalizable. A hard part of DA to formalize is how to make formal models of real-world situations, and to populate those models with probability estimates from real-world experts who are generally unfamiliar with probabilistic thinking. Another hard part of DA is determining what the objective function actually is (decision-making under *certainty* is often hard!). An ever harder part of DA to formalize is how to look at a situation and recognize alternatives which may be better choices than any of the known choices.

And so when someone reads about expected value estimates and sees a bunch of neat math which presumes neat inputs, they says “DA doesn’t solve my messy problem!”. Well, VNM doesn’t solve your messy problem- but it does point towards the *right* parts of that messy problem to solve (namely, hunting for a procedure that gets the necessary neat inputs from your messy problem). Instead of endlessly listing pros and cons, trying to compress different values onto the same scale makes decision-making much easier. DA as a whole *does* have messy parts, and so DA can solve messy problems.

I don’t think it makes sense to walk around saying that VNM-utility solves all decision-making problems. VNM-utility is a useful component of decision-making, and a good guide as to what techniques should and shouldn’t make it into your decision-making toolbox, but it’s not the entirety of DA. So if your point is “these particular formulas aren’t everything you need for reasoning under uncertainty,” then I think we’re in agreement, and I think the claim you’re objecting to is meant to be interpreted as “formal reasoning under uncertainty is the overarching framework which all thought should fit in (often as an approximation).”

Galileo comes to mind: “Measure what is measurable, and make measurable what is not so.” The growth in usefuless of DA as a field is that people have recognized the parts that are not formal yet, and worked on formalizing them. Elicitation of probabilities is in much better shape now than it was 20 years ago, and I expect that in another 20 years it will be even better. If hypothesis formulation is where probabilistic epistemology seems weakest today, well, roll up your sleeves and start making it *less wrong*.

### Probabilistic learning

So I looked a bit more for theories of probabilistic learning and found, among other things, a paper named How to Grow a Mind: Statistics, Structure, and Abstraction. I’d be curious about whether or not David feels that it contradicts his claim that probability theory contributes nothing to the model generation part.

In particular, the section from the subheading “The Origins of Abstract Knowledge” onwards argues that hierarchical Bayesian models can discover the best structural representations for organizing the data that they perceive:

Kemp and Tenenbaum (36, 47) showed how HBMs defined over graph- and grammar-based representations can discover the form of structure overning similarity in a domain. Structures of different forms — trees, clusters, spaces, rings, orders, and so on — can all be represented as graphs, whereas the abstract principles underlying each form are expressed as simple grammatical rules for growing graphs of that form. Embedded in a hierarchical Bayesian framework, this approach can discover the correct forms of structure (the grammars) for many real-world domains, along with the best structure (the graph) of the appropriate form (Fig. 2). [...]

Getting the big picture first — discovering that diseases cause symptoms before pinning down any specific disease-symptom links — and then using that framework to fill in the gaps of specific knowledge is a distinctively human mode of learning. It figures prominently in children’s development and scientific progress but has not previously fit into the landscape of rational or statistical learning models. Although this HBM imposes strong and valuable constraints on the hypothesis space of causal networks, it is also extremely flexible: It can discover framework theories defined by any number of variable classes and any pattern of pairwise regularities on how variables in these classes tend to be connected. Not even the number of variable classes (two for the disease-symptom theory) need be known in advance. This is enabled by another state-of-the-art Bayesian tool, known as “infinite” or nonparametric hierarchical modeling. These models posit an unbounded amount of structure, but only finitely many degrees of freedom are actively engaged for a given data set (49). An automatic Occam’s razor embodied in Bayesian inference trades off model complexity and fit to ensure that new structure (in this case, a new class of variables) is introduced only when the data truly require it. [...]

Across several case studies of learning abstract knowledge — discovering structural forms, causal framework theories, and other inductive constraints acquired through transfer learning — it has been found that abstractions in HBMs can be learned remarkably fast from relatively little data compared with what is needed for learning at lower levels. This is because each degree of freedom at a higher level of the HBM influences and pools evidence from many variables at levels below. We call this property of HBMs “the blessing of abstraction.” It offers a top-down route to the origins of knowledge that contrasts sharply with the two classic approaches: nativism (59, 60), in which abstract concepts are as- sumed to be present from birth, and empiricism or associationism (14), in which abstractions are constructed but only approximately, and only slowly in a bottom-up fashion, by layering many experiences on top of each other and filtering out their common elements. Only HBMs thus seem suited to explaining the two most striking features of abstract knowledge in humans: that it can be learned from experience, and that it can be engaged remarkably early in life, serving to constrain more specific learning tasks. HBMs may answer some questions about the origins of knowledge, but they still leave us wondering: How does it all start? Developmentalists have argued that not everything can be learned, that learning can only get off the ground with some innate stock of abstract concepts such as “agent,” “object,” and “cause” to provide the basic ontology for carving up experience (7,61). Surely some aspects of mental representation are innate, but without disputing this Bayesian modelers have recently argued that even the most abstract concepts may in principle be learned. For instance, an abstract concept of causality expressed as logical constraints on the structure of directed graphs can be learned from experience in a HBM that generalizes across the network structures of many specific causal systems (Fig. 3D). Following the “blessing of abstraction,” these constraints can be induced from only small samples of each network’s behavior and in turn enable more efficient causal learning for new systems (62).

### Floating obliviously

Hi David

I accept that inference includes matters of mathematics and logic, and that these are not contained in PT. Further, I’m happy to concede that PT is incapable of anything without these necessary components. I consider inference to be the drawing of conclusions by reasoning from assumed premises. Usually, when I think of inference, I naturally think of inference as being *about something*, which excludes pure mathematics and logic. If I’ve inadvertently claimed PT as a general theory of inference, this was mistaken, and not my intention. But I stand by the claim that to attempt to draw inferences about the real world is to attempt to at least replicate the function of PT. You might be the world’s top expert in differential equations, but without some capacity to make some at least approximate probability assignments, those equations will allow you to say nothing informative about the real world. Nothing, for example, that will enhance the effectiveness of your decision making. This is my central claim, and that (I presume) of the prototypical Bayesian that you criticize.

Probability theory extends only propositional inference to real values. It has nothing to say about other sorts of inference, such as instantiation.

You were right that I’m not well acquainted with formal logic, so thanks for the explanation of instantiation. Forgive my ongoing ignorance, but what you’ve described seems to be achievable with PT: given P(mortal | man) = 1, and P(man | socrates) = 1, it trivially follows that P(mortal | socrates) = 1. Anyway, I’m not sure that logic is really the topic of this debate.

The problem that you describe as emerging from the process of instantiation: logical inference becoming “infinitely difficult – uncomputable,” … I really don’t understand how you can cite that as evidence for the shortcomings of PT. You seem to link it to hypothesis generation. Ultimately, the problem of hypothesis generation must come down to trial and error. To get started, we just need to assume some form of symmetry in nature. This assumption is *a-priori* sound (actually this doesn’t matter), as without symmetry, there can be no physics. Going from naïve trial and error to some kind of guided trial and error is itself made possible by learning from experience.

At the risk of being rude, I think you are probably out of your depth here.

Please don’t be in the least concerned about being rude. We can proceed better by being clear.

I freely admit that I’m ignorant of most of cognitive science, which no doubt has many different classifications of analogies, and models of how they are generated and manipulated in minds, but, floating obliviously above a bottomless ocean, I’ll stubbornly stick to my guns that the symmetries that PT investigates are exactly analogies. Stanford Encyclopedia of Philosophy seems to understand what I mean.

I feel like I’m repeating myself a lot, but if we do this in enough ways, maybe we’ll come to understand each other.

We can't have a perfect theory of model formation, but we can say lots about it. What we can say will necessarily be non-formal.

Are you claiming there are things you can say about model formation that are examples of things that can be learned about the real world in a way that isn’t capturable by PT? Please expand!

"Everything that is good about science" is a very vague and broad category, and reducing it to a simple formalism does not seem feasible.

This was supposed to be an approximate statement, I’m not claiming a rigorous proof. I mean that we can look into many of the elements of scientific method and find this property of striving for the Bayesian ideal as common to all. Again, if you can offer a counter example, that would be great. This would be your strongest argument, but so far, it remains lacking.

I deny the first one, as do many other people.

Then it would be a service to me to explain why you think frequencies are best modeled as complex numbers.

### Probability Theory - 'The facts on the ground'

Here are ‘facts on the ground’ about the current state of probability theory:

-Despite a huge surge in popularity of Bayesian methods since the 90s, there have been no spectacular breakthroughs in artificial general intelligence. Despite tens of thousands of the worlds’ best and brightest researchers in cog-sci and stats deploying Bayesian methods consistency for more than 20 years, these methods have produced no spectacular breakthroughs in scientific progress; there is no sudden surge in Nobel prize winners deploying Bayesian methods, and there is no evidence that AGI is imminent.

-The main success of probabilistic methods to date has been prediction; in cognitive science the main success has been the Memory-Prediction framework as popularized by Jeff Hawkins (‘On Intelligence’, 2004). But as top physicist and science philosopher David Deutsch comprehensively explains in his poplar books (‘The Fabric Of Reality’, ‘The Beginning Of Infinity’), prediction is not the primary role of science (the central role of science is ‘explanation’ not ‘prediction’). The idea that all of scientific inference is an approximation to probability theory is an article of faith. As AI researcher Ben Goertzel argues, formal Bayesian methods have no broad track record.

-There is no doubt that *some* of what the brain does in inference could be described as approximating probability theory. But there is still much to be learned, the idea that *everything* the brain does is an approximation to probability theory is an article of faith.

-Probability theory can’t handle mathematical reasoning. On-going research at MIRI has not yet shown that it can be generalized enough to deal with mathematical reasoning. Claims that it can be done are articles of faith.

-Model formulation and knowledge representation is a key problem which probability theory has not yet got to grips with. Claims that ‘hierarchical Bayesian models’ and ‘model checking’ etc., will be able to solve these problems are articles of faith.

### Re: Tenenbaum

David, thanks for your comments on the Tenenbaum et al. paper. The way they described their results made it sound like their models would be a lot more general than what you say, but I guess that’s always the case with AI. And all of the most impressive results are always on toy problems that don’t scale. (SHRDLU, anyone?) :)

Actually, I should probably have dug into their references to check for the scale of the examples, myself, before bothering you with it. Sorry about that.

I just contacted you with the Approaching Aro form.

### I've forgotten what we're arguing about

We can use differential equations to make inferences about the real world. E.g. we can estimate temperatures using the heat equation.

Only by making use of probability theory, or some surrogate! Differential equations provide no flow of information about the real world - they only refer to abstract entities. Differential equations allow me to learn things about x’s and y’s, but those x’s and y’s don’t exist - not even in your head, they are at best only represented in your head.

you think probability theory necessarily captures that slop

As I have said, I don’t think this. for example, every hypothesis in our search space might be false - and in fact “probably” is (!). We could even be wildly wrong (e.g. a simulation hypothesis is true, but we never come to suspect this). My point, which is supported by highly compelling theoretical arguments, is that no other process, in principle, can handle that slop better than PT. In fact, the elegance with which PT (and its many approximations) allows us to escape complete epistemological crisis is a wonderful thing, entirely.

Now, I have sympathy with your objection that statements about the supremacy of PT sound like a quasi-religious faith, but lets be practical. I accept that there is a non-zero possibility that the Earth doesn’t go around the Sun (e.g. simulation hypothesis, or simply, dirty lying astronomers), but I don’t need a syllogism employing 100% known premises to say confidently in a loud clear voice: “the Earth goes around the Sun”. Nobody reasonable will accuse me of blind faith for that. Even a syllogism, by the way, won’t prove that I’m not insane - any demands for absolute proof are simply misguided. Give me a reason to doubt my position, though, and I’ll listen.

Re complex numbers: I thought you said that you deny the assumption that probabilities should be represented by real numbers. Since probabilities model frequencies (though not in the naive way thought by many frequentists), I was curious to know why you think frequencies might be better modeled as complex.

### PT and inference

David,

If PT is not a general theory of inference, then there must be types of inference that it doesn't include. Differential equations are an example. If differential equations allow you to make inferences about the real world, then using them in that way does not replicate the function of PT.

I have no idea of whether this is what Tom has in mind, but it occurs to me that there could exist at least one way of reconciling “differential equations allow you to make inferences about the real world” with “to attempt to draw inferences about the real world is to attempt to at least replicate the function of PT”.

It involves kind of treating differential equations as a black box. In essence, you ask, “I have this particular tool that I have previously used to make predictions about the world; how certain am I that this tool works for that task, either in general or in this particular situation?”. Then you look at the track record that differential equations have, and use PT to estimate the probability that you will get the correct result this time around.

One might argue that (in a non-conscious and informal way) human children learn to do something like this: they experiment with different ways of learning about the world (asking their parents, asking their older siblings, asking that uncle who likes to pull their leg, trying things out themselves, taking the stuff that’s said in comic as gospel, etc.). Some of those end up providing better results, some of them provide worse results. And (some approximation of) probability theory is ultimately used to figure out which of these methods have worked the best, and which should be used further.

Of course, in real life we also rely on all kinds of different logical arguments and causal theories instead of just looking at the track record of something… but one could argue that the reliability of different logical arguments and causal theories, too, is verified using probability theory.

Again, I don’t know whether this is anything like the thing that Tom has in mind, but it would seem like a somewhat plausible argument for subsuming all inference under PT…

### Another critique of bayesianism (one a little bit less "nice")

http://plover.net/~bonds/cultofbayes.html

This critique strikes in a different angle at bayesianism. It’s not very nice, but I think it’s spot-on too.

I have to admit… I’ve been addicted to lesswrong in the past. No matter what I think now about the actual power of bayes, probability theory, and what I’ve always thought about singularity and cryogenics (a delusion), I think the website has a lot of good ideas, many of them lacking sources, but good nonetheless. Those good ideas tend to add up confidence, and the halo effect applies in full force. I also have to admit it’s really easy to attach to bayes powers that the theorem does not actually have. For example, I have explained some deductions I’ve made with the bayes theorem, for example, figuring out the sexual orientation of people from small characteristics that people with certain orientations tend to have: It’s true that bayes explains why this isn’t a logical fallacy, but the real reason I figured those things out was entirely intuitive pattern matching of the sort everybody has.

There also many bad ideas. I think the worst is the negation of the importance of politics. I’d wager there is no such thing as an apolitical person, and none of the LW are actually apolitical, rather, they are in general conservative: The fact that people seem to think that the best thing we can do with “rationality” is “optimizing charities” is really telling.

### What are the x's and y's?

Kaj,

I have no idea of whether this is what Tom has in mind

Surprised this needs clarifying. I suppose its double evidence that I haven’t communicated effectively.

Confirming the formalism (or usefulness) of differential equations, as a field of mathematics, is not really what I had in mind, though you are perfectly right, it is part of it. (Arguably, this normally works the other way around: we choose the axioms that seem to fit our experience, and derive the formalism from them - in a way, by the time the maths is developed, much of the confirmation has already taken place.)

I meant more, though, that to say anything about the world using differential equations, I must know the form of the equations that effectively model the process I’m interested in, and (depending on the nature of the problem) I have to know a good candidate set of coefficients. No amount of mathematical brilliance will jump this gap, if there is no empirical data to work on. PT, and its surrogates, are the only tools translating empirical experience into information that can be used in this way.

PT bridges the gap from x’s and y’s to things in the real world. It’s not perfect, but it’s the best we can hope for.

Hope this is clearer.

### Nice quote

How is this different to what I’ve been saying all along? There is a hypothesis that I’m afraid I’m having to assign ever greater weight: that you have not stopped to honestly scrutinize either your position or the one you criticize.

### Best possible

PT provides the *best possible* connection to the real world. Jaynes doesn’t contradict this. You seem to be stuck in a false dichotomy. I never claimed that the connection would be free of uncertainty. In fact, I explicitly contradicted this position, several times. PT allows management of uncertainty.

### with all due respect tom...

with all due respect tom… you seem to be repeating that David just does not get it. but perhaps it is you that do not get it.

every cult or cult-like belief system has what I call “defenses against opposing beliefs”. for example “the devil confuses the heart of the people against god” is one. lesswrong defense is “im smarter, therefore im more likely to believe youre wrong rather than me”

### pt does not give you the best

pt does not give you the best possible connection to the real world. pt is unconnected to the real world, its up to you to make the connection between the real world and the mathematical model!

for example… I think pt would aid me if I said that I find unlikely that SI actually can reach singularity given that they have done nothing to actually advance AI in baby steps, constructing something marginally useful today. why? because I think that if people at SI actually did know their shit, then it would be probable they could and would have done it. the fact that they didnt is a good sign that they dont. but that’s my model, right?

### Polya

Have you read Polya’s “Mathematics and Plausible Reasoning”? It overlaps with these ideas quite a bit, and breaks things down to pretty specific things you can actually do. It’s a nice book.

### An idea I've been developing concerning informal reasoning

I could bore you with the details of how I came to think about this, but I thought I’d get right to the point about how I feel humans reason things out.

Brains can, by ‘design’, only hold a finite number of ‘things’ at any given time. The way we typically get around that limitation is by grouping these ‘things’ together and then referring to the grouping as a ‘thing’, thereby allowing us to reduce the number of ‘things’ held in mind at any one time.

I call these things elements of ‘context’. The defining aspect of a context is that it fits neatly into our minds and can be worked with using various ‘tools’. I worked out that all of these ‘tools’ can be boiled down to a short sampling of ‘verses’. I have identified four primary verses. Inverse, converse, reverse, obverse. They kind of fit together like yin and yang. The inverse is the stuff ‘around’ the context, the reverse is all the stuff that’s not the context. The converse is the part of the reverse that relates somehow to the context, and the obverse is the rest of the reverse of the context.

All words we use invoke the verses of context in various ways. The verses are the basic building blocks of meaning, they are how we construct reality. You can determine how smart someone is by how fluidly and usefully they can model a problem in ‘contextual space’ and then apply forms of reasoning, which again boil down to complex applications of context verses, hereby called ‘logic’, to get somewhere nobody’s really thought of yet.

I’ve put a lot of thought into this over the last few years, and intend on eventually putting together a web application that’s similar to mind-mapping software, but organized around verses. One might choose to represent the “United States” in context logic. The first step would be to identify various elements of “United States of America”. So you might start with the states. Eventually you might realize that you need “The Star Spangled Banner” in there too. So you group the states together as elements of “States of the United States of America”. So “USA” has one element “States”, which itself, as a first class element of context in its own right, has fifty elements of its own. Elements are part of a context’s inverse.

It’s a contrived example, but it doesn’t have to be. Once you start exploring the converse and the obverse, things can get interesting quickly. What are things that are not the USA? Well, the rest of the countries, of course. And then the converse might be, “the relationship of the USA to other countries.” You can take each of these definitions, and start throwing things into the elements buckets. What are elements of the relationship between the United States and Mexico?

Eventually you might find that, subconsciously, you’re actually exploring a contextual ‘world’. Say, “international relations of North America”, and the entire graph you’ve constructed can be titled such, and now you have a complex element of context such that humans typically reason informally with. Then you can proceed with the context defining with that topic in mind.

Our usual way of communicating information is through a device we call a narrative. Blog posts, articles, encyclopedia entries, and the like. It’s all about reducing the contextual elements to that which is important, and then presenting them in comforting ‘story mode’. But there’s nothing really unimportant about the things we omit for the sake of a narrative. A logic graph offers, in my opinion, a tantalizing ‘explorative’ mode of interacting with knowledge, like browsing Wikipedia, but also allows you to display, graphically, the crucial aspect of how things relate to each other. Narrative formation can make it difficult to determine what’s really the most important aspect of the story. What the story-teller intends to communicate is often very different from what the person hearing the story takes away from it.

But a knowledge graph, interspersed with links to narrative-style content like Wikipedia Pages, allows someone to represent contexts as he or she sees them, as directly as possible.

### Jerry Weinberg

The stuff about problem formulation vs problem solution, the “anvils” and “collect your bag of tricks” reminds me of what Gerald Weinberg teaches in the experiential workshop *Problem Solving Leadership,* and his *Secrets of Consulting* books (particularly the second one, with the idea of creating a personal toolkit of problem-solving strategies, building on the work of Virginia Satir).

### Feynman quotes

I was wondering if the Feynman cryogenics quote actually came from his visit to a general relativity conference? I mean, I wouldn’t be surprised if Feynman ranted to his wife about more than one conference, but it fits your description quite well:

I am not getting anything out of the meeting. I am learning nothing. Because there are no experiments this field is not an active one, so few of the best men are doing work in it. The result is that there are hosts of dopes here (126) and it is not good for my blood pressure...

Interestingly the field had a bit of a renaissance not long after, despite the lack of experiments (sometimes known as the “golden age of general relativity”.) It’s still doing pretty well for itself… that quote was from GR3 in Warsaw, and at GR21 in a few weeks the big news will be the recent discovery of gravitational waves.

Oh and while I’m spouting Feynman quotes, the version I remember of the ‘bag of tricks’ thing was from *Surely You’re Joking…*, where he talks about his ‘box of tools’ for integrals:

The result was, when guys at MIT or Princeton had trouble doing a certain integral, it was because they couldn’t do it with the standard methods they had learned in school. If it was contour integration, they would have found it; if it was a simple seriesexpansion, they would have found it. Then I come along and try differentiating under the integral sign, and often it worked. So I got a great reputation for doing integrals, only because my box of tools was different from everybody else’s, and they had tried all their tools on it before giving the problem to me.

### Assuming the 100% rational...

I have a few questions for Tom…

Can Bayesian, or probability theory, etc… account for human *irrationality* , *confirmation bias* etc… in thinking?

Also, can these probabilistic ways of thinking account for the presence (and the sum) of knowledge gaps? (As in, not one single human possesses all-encompassing knowledge about everything)

Those variables seem to factor in unknowns, and how can you quantify an unknown to get an accurate probability, when thinking?

To me, these thinking tools are just that, **tools.** Each thinking tool has a purpose, but is limited to that (or those) purpose(s).

Just because you can use a hammer for more than one purpose doesn’t make it the best tool, or the right tool for the job. There isn’t a tool that does everything.

### I know almost nothing about

I know almost nothing about maths beyond high school level (I’m just here for the Buddhism) but… I think I know the Feynman quote re: genius tools.

Dan Dennet quotes it in ‘Intuition Pumps: And other thinking tools’.

I think Fenyman is the colleague and Von Neuman the genius. I like this because Fenyman is doing what your suggesting re learning other people’s styles and tricks.

“A colleague approached one day John Von Neumann with a puzzle that had two paths to a solution, a laborious, complicated calculation and an elegant, Aha!-type solution. This colleague had a theory: in such a case, mathematicians work out the laborious solution while the (lazier, but smarter) physicists pause and find the quick-and-easy solution. Which solution would von Neumann find? You know the sort of puzzle: Two trains, 100 miles apart, are approaching each other on the same track, one going 30 miles per hour, the other going 20 miles per hour. A bird flying 120 miles per hour starts at train A (when they are 100 miles apart), flies to train B, turns around and flies back to the approaching train A, and so forth, until the two trains collide. How far has the bird flown when the collision occurs? “Two hundred and forty miles,” von Neumann answered almost instantly. “Darn,” replied his colleague, “I predicted you’d do it the hard way, summing the infinite series.” “Ay!” von Neumann cried in embarrassment, smiting his forehead. “There’s an easy way!””

### Box of tricks again

You were right that Rota also had a ‘box of tricks’ type piece of advice.

I was trying to look up some other Rota quote that I half remembered and found this (spam filter didn’t like the link but should be easily googleable):

Every mathematician has only a few tricks.

A long time ago an older and well known number theorist made some disparaging remarks about Paul Erdos’ work. You admire contributions to mathematics as much as I do, and I felt annoyed when the older mathematician flatly and definitively stated that all of Erdos’ work could be reduced to a few tricks which Erdos repeatedly relied on in his proofs. What the number theorist did not realize is that other mathematicians, even the very best, also rely on a few tricks which they use over and over. Take Hilbert. The second volume of Hilbert’s collected papers contains Hilbert’s papers in invariant theory. I have made a point of reading some of these papers with care. It is sad to note that some of Hilbert’s beautiful results have been completely forgotten. But on reading the proofs of Hilbert’s striking and deep theorems in invariant theory, it was surprising to verify that Hilbert’s proofs relied on the same few tricks. Even Hilbert had only a few tricks!

Why is always either Rota or Feynman that says one of these things? I wish it was more common for people to talk about their work like this.

### Rugose

I learned a new word!

Interesting… I would never have considered this explanation, but then I’ve managed to stay amazingly ignorant of the foundations of mathematics (yes I know this is bad). I might read that Chaitin paper you linked on twitter though, thanks for the mention there!

I might tentatively favour a similar idea, though, which also has its tentacles reaching out of the early 20th century… Everyone knows now that mathematics should be Very Logical and Rigorous. Unfortunately the stuff that actually comes into your head while doing maths is not like that (for me at least…), so it’s easy to worry that people will pick it apart for not being rigorous enough.

Don’t know why this would stop all the physicists though, we don’t care quite so much.

I think the situation has got a bit better with more casual blogs where people try and explain how they’re actually thinking, rather that what they tidy up for their papers. John Baez’s *This Week’s Finds in Mathematical Physics* must be the best example, but Tumblr can be surprisingly good because it’s full of procrastinating grad students. (But also it’s Tumblr, so you have to pick through all the other rubbish.)

Looking forward to more on ethnomethodology too btw. Another new word for me, even though it’s obviously very relevant to what I’m interested in!

### Nothing happening

I find certain enjoyable hilarity in the conversation. Thank you!

Everyone knows now that mathematics should be Very Logical and Rigorous. Unfortunately the stuff that actually comes into your head while doing maths is not like that (for me at least…), so it’s easy to worry that people will pick it apart for not being rigorous enough.

Reading this, I realized that I do not often actually know, what my mind does, when I am working on theoretical physics stuff. Mostly I just stare out of the window, at the screen or at the whiteboard, and something comes up from somewhere which cannot be described (Dungeon Dimensions?). Surface thinking is often just meaningless fluff; busywork so that you can convince yourself that you are working. It does seem more unpredictable than rigorous.

However, after one delirious evening of science making, I wrote down this remark:

*The sensation of intellect is in communication with the environment/sense fields like any other sense experience. Similarly as a warrior chooses his next action based on the communication with the situation at hand; a scientist formulates new images, structures and perspectives in communication with the forms and movements of the universe.*

### Re: Nothing happening

Reading this, I realized that I do not often actually know, what my mind does, when I am working on theoretical physics stuff. Mostly I just stare out of the window, at the screen or at the whiteboard, and something comes up from somewhere which cannot be described (Dungeon Dimensions?). Surface thinking is often just meaningless fluff; busywork so that you can convince yourself that you are working. It does seem more unpredictable than rigorous.

Yes, this is very close to my experience! I was actually trying to write a post that was partly about this last weekend. Where by ‘post’ I currently mean ‘structureless braindump’ :(

I remember Hadamard’s *The Psychology of Invention In the Mathematical Field* being good on this. Might have to give it another look.

### Probably true isn't much better than absolutely true

I wrote about this one my blog. The thing is that probability tricks you into thinking that you can still change your mind if you find a better theory. But then your evaluation is also performed in terms of probability theory, and your theory of mind is also Bayesian, and now it’s turtles all the way down.

“This is how it goes: the Bayes theorem (and probability theory in general) might not be the one true model, but it probably is because I think so and my mind must be Bayesian so because I think that the Bayes theorem is probably the one true model, because…. Gödel what? As long as you think that the probability theory has the highest probability of all other models you are stuck. Your only chance to get out is the one that is irrational in this framework, probably when you stumble on something that just feels clearly wrong.

The annoying thing is not so much the unprovability, the annoying thing is that. It blinds you to everything that doesn’t conform to your theory. It is not a confirmation bias, it is a confirmation loop. Your sources are self-selecting for the ones using the Bayesian reasoning. Confirmation bias can be corrected for, this cannot.”

### Probably true isn't much better than absolutely true

I wrote about this one my blog. The thing is that probability tricks you into thinking that you can still change your mind if you find a better theory. But then your evaluation is also performed in terms of probability theory, and your theory of mind is also Bayesian, and now it’s turtles all the way down.

“This is how it goes: the Bayes theorem (and probability theory in general) might not be the one true model, but it probably is because I think so and my mind must be Bayesian so because I think that the Bayes theorem is probably the one true model, because…. Gödel what? As long as you think that the probability theory has the highest probability of all other models you are stuck. Your only chance to get out is the one that is irrational in this framework, probably when you stumble on something that just feels clearly wrong.

The annoying thing is not so much the unprovability, the annoying thing is that. It blinds you to everything that doesn’t conform to your theory. It is not a confirmation bias, it is a confirmation loop. Your sources are self-selecting for the ones using the Bayesian reasoning. Confirmation bias can be corrected for, this cannot.”

### Why is always either Rota or Feynman that says one of these things?

I believe Feynman also said something very much along these lines about his own tricks with Geometry in Surely You’re Joking, Mr. Feynman

### On "unique cognitive styles"

You write: “I’ve found that pretty smart people are all smart in pretty much the same way, but extremely smart people have unique cognitive styles, which are their special “edge.””

One possible hypothesis for *why* this is the case is that pretty smart people are essentially products of institutions. 20 people whose intellectual development is dominated by being at MIT will tend to end up rather similar, simply because they’ve been put through the same sausage grinder.

Someone sufficiently smart will find that relatively easy, and may have time to develop in other ways, very different from what is produced by the standard sausage grinder.

It is, of course, impolite to consider specific people in a context such as this. Still, it checks out with many specific examples. Though I wonder to what extent I am confusing correlation and causation here.

### What's in your bag?

Are you able to share your bag of tricks?

## Talking past one another

So my problem with the substantial advice on thinking that you give in this post is that… I don’t disagree with it. Nor do I really think that it contradicts anything that has been said on LW. In fact, if it was somewhat polished, cut into a set of smaller posts and posted on LW, I expect that it might get quite upvoted.

One thing that has started to increasingly bother me in this debate is that you seem to be saying essentially “yes, Bayesianism is nice, but there’s a lot more to rationality too”. And - like I hinted at in my earlier comments, but could have emphasized more - I’m not sure if you would find anyone on LW who would disagree! Eliezer spends a bunch of posts talking about how wonderful Bayesianism is, true… but while he certainly does make clear his position that Bayesianism is

necessary, I challenge you to find any where he would claim that it issufficient.And ultimately there isn’t

thatmuch content on LW that would talk about Bayesianism as such - I would in fact claim that there are more posts on LW that are focused on providing “heuristics for informal reasoning”, like in this post, than there are posts talking about Bayesianism.Given that, I find that this post is somewhat talking past the people who you are responding to. As I see it, the argument went something like this:

This seems to me similar to this (hypothetical) debate: