Comments on “Probability theory does not extend logic”
Adding new comments is disabled for now.
Comments are for the page: Probability theory does not extend logic
Ask the professor . . .
Thank you for this post. I have not had so much fun with math since I was a freshman, when I checked out the Raymond Smullyan logic puzzles from the college library. I would have loved to study logic and computer programming, but obviously not as much as I liked my fuzzy liberal arts curriculum.
I hope you will not send me to the corner, wearing a pointy hat, for asking “Logic 101” questions in the graduate forum. I spent a few hours reading “Probability theory does not extend logic,” and I would like to run through my layman’s conclusions based on your article. Then you can tell me how badly I’m astray.
First, I understand that your practical interest in logic is in designing AI computer systems. (I hope at least that assumption is correct.)
You start your post by referring to the five traditional bases for knowledge: Empiricism, rationalism, tradition, scripture, and intuition. In terms of artificial intelligence, “empiricism” would correspond to “observation” or “primary data input” (like through a camera). “Rationalism” would correspond to “logic” or “mathematical operations” (whether the system used is propositional calculus, predicate calculus, or some uber-system of FTL generalized rationalism that does not yet exit). “Tradition” would correspond to propositions that have been taught or placed into memory as programming instructions. I’m going to ignore “scripture,” because it’s a subset of “tradition.” I’m also going to ignore “intuition,” partly because the cognitive people seem to think that “intuition” is a shortcut we use for logic, based on pattern recognition and maybe statistics, because human beings don’t have infinite processing time to make truly logical decisions. Like that old joke (“we’re late because Dad took a shortcut,”) to me it doesn’t make sense to worry about intuition until we know where we are on the “logic” map.
When you compared probability notation to predicate calculus, you made the point that predicate calculus is more powerful because it allows logical quantification, instead of relying on implicit generalizations. This reminds me of some examples from the study of human language.
First example: In high-school English, your teacher may have made you diagram sentences, and jumped on you for using pronouns that don’t clearly point to the referent object. “Lindbergh was talking to Einstein, then he flew away.” You may infer that Lindbergh was the “he” who flew away, because Lindbergh was an aviator, but this is bad grammar because you haven’t logically quantified the “x.”) People manage to communicate all the time using bad grammar, but sometimes misunderstandings do arise; and when they do, we resort to higher-level logic or grammar to sort things out.
Another example from the study of human language: Baby talk is thought to be a natural or organic precedent for formal language; that is, human infants develop a simplified form of communication (“baby talk”) before they master formal grammar. Baby talk doesn’t have much logical quantification, but it is possible to understand what your infant is trying to communicate through “abuse of notation and intelligent application.” For example, the joke tee-shirt you can buy on the web: “Let’s eat Gramma . . . commas save lives.” Propositional calculus (even extended by probability theory) could be seen as a form of baby talk, which allows people to swap useful information before they understand predicate calculus. By the same token, maybe predicate calculus could be seen as the “baby talk” preliminary to a more highly developed system–“probabilistic” (or maybe “proba-ballistic”) logic–that would allow us to reason more precisely about the probabilities of probabilities. (To use your FTL analogy, we might not hit warp speed with an incomplete theory of rationality, but at least we would reach near-Earth orbit.)
(By the way, I appreciated your borrowing all the “snarks” and the “boojums” to bring this more down to my level.)
Okay, let’s go back to the tripod of “empiricism,” “rationality,” and “tradition.” Sticking with our metaphor of a human child learning language, what comes first is the physical plant, or the embryo. In a way, biology is designed by code just as surely as any computer brain, because everything is encoded in a DNA “language” of C-T-A-G, and the DNA blueprint is built up into a physical structure. The physical structure of the brain allows it to perform certain “logical” operations. In addition to a brain, the embryo also develops eyes, ears, and sensory nerve endings—these are its “inputs,” which allow it to receive new data. In terms of AI, it doesn’t really matter if you’re talking about synthetic biology, or something made of silicon—you still end up with a “brain” and “inputs,” however different these might look depending on the technology.
Now that we have a physical plant—brain to perform logical operations, and sensory inputs to receive information—we are able to add two more legs to our tripod–“empiricism” and “tradition.”
Use the variable “Pr” to refer to “programmer” or “parent” (not “P,” so as to avoid confusion with “Probability”). Use the variable “C” to refer to either “computer” or “child” (I hope “C” is not arbitrarily linked to something else that would throw a monkey wrench into my examples). Use the variable “E” to refer to “environment” or “entry”–in other words, “E” is shorthand for whatever empiric experience is thrown into the mix, in the particular situation. Maybe we should also have a variable “L,” which would refer to “language” or “logic” interchangeably.
The first inputs received by “C” are basic stimuli (light, dark, heat, or single-syllabic utterances). “C” comes equipped with logic potentialities, but no actual programming, and it takes a while for these stimuli to be encoded into logical propositions (“mom = food.”) Presumably, this encoding involves what you called “statistical inference,” which allows reasoning from specifics to generalities.
However, there is more to developing “L” than just making statistical inferences based on “E.” Specifically, “C” receives inputs from both “Pr” and “E,” and both types of input will affect the development of “L.” If you’re trying to find an exception, where “Pr” does not apply, the stereotype of the “wild man in the woods,” or the “boy raised by wolves,” might come close. In this scenario—where the child has no parents–“L” is developed only through inputs from “E,” since there is no “Pr” in the equation. (Okay, this example doesn’t really work. Wolves are good parents—they teach valuable hunting behaviors, and they even have some language ability, as expressed in the form of howls. I just brought it up to show the importance of “Pr” in real-life situations.)
In the development of intelligence, the function of “Pr” is to introduce traditions—or “logical propositions”–to “C.” In the case of a human child, these “traditions” are supposed to socialize the child and prevent it from getting hurt. (“If you hit your sister, you will go to bed without supper.” “If you wash the dishes, you may watch TV until bedtime.”) These “traditions” include both form and content. The “if, then” logic form is conveyed to “C” by example, while the content (value judgments, rules, and consequences) are passed on explicitly.
Very often, “Pr” wishes these “traditions” to be accepted as “scripture”–that is, as something which may not be questioned. However, “C” is often presented with situations in which “E” contradicts “Pr.” In assessing these situations, “C” has to learn to assign probabilities to the different statements which “Pr” might make. Take these statements as examples:
Proposition 1: “If you happened to be looking out the window, you would see Santa land on the roof with a bag of presents.”
Proposition 2: “If you touch the stove, you will get burned.”
Let’s take the case where “C” tests Proposition 1, by secretly staying up all night on Christmas Eve. Santa does not appear. The reliability of “Pr” is called into question. “C” does not have sufficient empiric data to assess the truth-value of Proposition 2. “C” also lacks an advanced understanding of probabilistic logic, and is still using some kind of “baby talk” (propositional calculus) where things either “are” or they “aren’t.” By incorrectly applying logic (false syllogism), “C” may conclude that the truth-value of Proposition 2 is “false”; or “C” may correctly infer that further empirical data is needed to make an estimate of probability that “Pr” is reliable in any given case.
The result? “C” touches the stove and gets burnt, which is an example of empirical learning, and also provides some statistical data for assessing the truth-value of propositions put forward by “Pr.”
(Okay, right: GIGO. Garbage in, garbage out. If “Pr” is statistically unreliable, then “C” will have trouble learning to apply logical propositions to empirical data in such a way as to yield positive outcomes. In other words, “C” will be poorly socialized. In other words: “Like father, like son,” or “The apple doesn’t fall far from the tree.” The HAL computer in “2001: A Space Odyssey” went crazy because its programmers taught it to lie.)
I know these ideas won’t help you to perfect a mathematical formulation of rationality. I hope I haven’t wasted too much of your time! But I did want to say, I found your ideas tremendously helpful in clarifying my own thoughts about language, logic, and artificial intelligence. Thanks again for sharing on the web, in bite-sized pieces that even a non-expert can find time to digest.
Who says all math can be expressed in predicate calculus!?
“All math can (in principle) be expressed in predicate calculus”
That seems like a crazily overconfident, eternalist statement to me.
What about theorems in intuitionistic logic, which lose shades of their meaning if you force them into a boolean world?
What about modal logics?
What about type theories? Are you so sure that no type theories exist that are worth calling “math”, but cannot be interpreted in a predicate logic?
What if someone does figure out a way to include probabilistic uncertainty natively into logic? This would be an achievement which is not the same thing as doing probability theory as a kind of measure theory under ZFC!
embedding, interpreting and expressing
Thanks for the response.
There’s a lot of really interesting territory contained in the question of what it means for one language to be embedded or expressed in another. For example, by the standard of embedding you gave ZFC is embeddable in PA, by writing a sentence φ of ZFC as “ZFC proves φ” in PA. But we don’t say PA can express everything ZFC can express, because if ZFC proves some statement about integers, PA can say that PA proved it, but not believe it’s true about the “real” integers.
So while it’s true that some modal or intuitionistic theories can be fully expressed inside first order theories like ZFC as statements about Kripke models, I don’t know if there’s any way of interpreting a more powerful intuitionistic theory like the Calculus of Constructions into ZFC without either treating it as a language game, or squishing it down into a boolean theory.
Here’s a really neat paper that shows how CoC + universes can be interpreted in ZFC + inaccessible cardinals and vice versa. It squishes intuitionistic propositions into True and False though.
Curtis Franks has a really interesting discussion of these sort of questions related to provability in weak theories of arithmetic in http://www.cambridge.org/us/academic/subjects/philosophy/philosophy-science/autonomy-mathematical-knowledge-hilberts-program-revisited. The rest of the book is great too.
So yea, “can be expressed*”, but the asterisk reads “actually what it means to express something from one language in another language is really subtle and interesting in its own right”.
What about fuzzy logic?
Isn’t Fuzzy Logic something similar to probability theory that can be used to reason about uncertainty?
Especially given the engineering question, to quote:
“Are there times when we should use one of the alternatives, instead of probability theory?”
In practice it is probably often easier to implement simple Fuzzy controllers rather than trying to model the involve uncertainties using probability theory.
Fuzzy Online Places
Like shoulder pads and the cone bra the fuzzy hype did not make it much past the 80s, but the field carried on regardless.
On the engineering side fuzzy control became just another tool to deploy, and it found application in a wide variety of use cases.
This paper presents a nice representative cross section: http://www.researchgate.net/profile/Hani_Hagras/publication/267097192_T2_Applications_2013/links/544545110cf2f14fb80ef88f.pdf
On the theory side the underlying mathematics has been made quite rigorous: http://www.mathfuzzlog.org/index.php/Handbook_of_Mathematical_Fuzzy_Logic
My Bayesian friends really don’t like Fuzzy logic. Arguing that Cox theorem means there’s no need for it, but to me this seems to be based on a profound misunderstanding.
Errors in scientific practice
I’m interested in your observation that misunderstanding the relationship between logic and probability leads to logical errors in scientific practice and to probabilistic methods being misapplied. It chimes in with some thoughts I had, for example about how the concept of a confidence interval is very commonly misunderstood and misinterpreted. Do you have any other examples of what you mean, or perhaps some references you can point me to? Much appreciated.
Some help to a non-STEMer
I tried to make it through this page, but a lack of a STEM education makes this difficult going. I’m hoping you can help tell me if I’m on the right track. My understanding is something like this:
Both logic and probability theory can be used to assign certainties to atomic claims (“that bucket is full of water”) and use those certainties to derive measures of certainty for compound claims (“that bucket is full of water and also red”). Probability theory can make finer distinctions in certainty than logic can.
However logic can also assign certainties to universal claims (“all buckets are red”) and existential ones (“some buckets are red”). Probability theory cannot do that in general. Either you’d have to treat universal statements as infinite conjunctions (and then when you multiply the probabilities you’d get 0?) or else it’d be unclear what to do with a universal statement with a probability other than 1 or 0 (if there’s a 50% chance that there exists a red bucket, what does that tell me about any particular bucket?).
One way I’m trying to move this toward things I understand better is thinking about other “systems”. For example, is the Marxist account of economics true? What about Friedman’s account? Both make universal claims about all economies (of a given sort); both have an internal logic and allow inferring some propositions from others. How do we describe our certainty in them? It seems ridiculous to say that either is “true” or “false”, but it also seems ridiculous to say that one is, say, “30% true”. A description of our certainty in one of these intellectual frameworks would have to describe where it applies, to what degree, what parts of the environment it ignores, what variations and scales and times it applies over.
Is this understanding and example accurate for what you were trying to say, David?
This post is stuck in my head
(This was going to be an email until I realised you don’t publish an email address, so it’s a little unfocussed. Which is probably exactly why you don’t publish an email address, but well, I’ve written it now.)
I just wanted to say that I’ve been reading your writing over your various sites a lot lately and getting a lot out of it. I originally got here via Slate Star Codex and the people of rationalist-adjacent Tumblr, which I’ve been hanging around the edge of. This post in particular has got severely stuck in my head, and I keep rereading it.
I have to admit that I’m not convinced on the ‘civilisational collapse’ framing. For example, I’d definitely like more context for your claim that ‘major institutions seem increasingly willing to abandon systemic logic: rationality, rule of law, and procedural justice’ - are there any concrete examples you’re thinking of here? But I find the broad outline of the stages and the paths between them really inspiring, and I’m really looking forward to seeing where you go with it.
I’m particularly fixated on that figure you drew with ‘past, current, and potential future ways beyond stage 3’… it makes a lot of sense to me. Just as a bit of context, here’s a description of my own paths through that diagram.
My parents studied languages at university in the sixties, back before pomo got its claws into the curriculum, and ended up with something very like your ‘Stage 4 via humanities education’. I never got much of an arts education at all outside of music, but what I do have is mostly from reading their books, and so I think I have some understanding of what this is. The bit I picked up was heavy on analytic philosophy and the New Critics - up-to-the-minute stuff like Bertrand Russell, A.J. Ayer, I. A. Richards, William Empson, T.S. Eliot, L. Susan Stebbing. And I got a lot out of it - there’s a lot that’s plain wrong (logical positivism! the objective correlative!), but they all wrote so clearly that at least you can tell where they’re wrong. I’m still in love with their writing style. And I don’t know, I’m really grateful to the New Critics for giving me some framework for enjoying literature, even if it’s a limited one and my tastes are still more shaped by it that they maybe should be.
So that’s my experience of the top line of your diagram. For the bottom line, I did get a really decent science education - my science and maths teachers were great at school, I did a maths degree and then a physics phd. I’ve definitely managed ‘Stage 4 via STEM education’. And then also as a student I read a shit-ton of pop science, Pinker and Dennett and Penrose so on, and drank in plenty of the New-Atheism-and-laughing-at-homeopathy atmosphere that the internet was filled with ten years ago.
All this is kind of a long-winded way of saying that I really went to town on Stage 4. And was insufferable about it to exactly the level you’d expect - pomo was obvious nonsense, Sokal had shown them all up as charlatans, religion was a pointless source of woo, all the usual. It’s probably good that I didn’t get a modern arts education because I’d just have been obnoxious and argued all the time.
Obviously by now I’d like to move on. I guess I have been for at least the last five years or so, but not in a very organised way. I haven’t managed any full-blown nihilist STEM depression (don’t really have the temperament for it) but I did have a good line in aimless confusion for a while. I probably do have just enough of a background for the ‘genuine pomo critique’, and actually I’ve been vaguely intrigued by postmodernism and earlier continental philosophy for a while, but I never really know where to make inroads - Foucault sounds interesting and some of the suggestions above are excellent. Whitehead sounds like a particularly good path for me, but one I’d never thought of myself. And a native STEM bridge beyond stage 4 would be wonderful - I’m definitely up for tagging along on that project!
Anyway thanks very much for writing it all!
Re: probability and logic
Here’s the formalization that you appear to be looking for:
Given a first-order language L and an underlying set X, fix a probability distribution mu over interpretations of the constants, functions, and relations of L in X. Now expand L to a language L’, a two-sorted language with sorts X and ℝ. L’ should consist of all the symbols of L (which apply to the sort X), the language of arithmetic (which apply to the sort ℝ), and an additional logical symbol P whose syntax is that if phi is a formula, then P(phi) is a term of sort ℝ. Then sentences of L’ can be assigned truth-values in the model (X, mu) in the obvious inductive manner.
As for whether “P(phi) = 0.4” is true if P(phi) is actually 0.400001, no, of course it’s false, because 0.4 and 0.400001 are different. This isn’t actually a problem because you can just use inequalities instead, like “0.39 < P(phi) < 0.41”.
That formalization appears to give you what you were asking for, but it is still no good, because you were asking for the wrong thing. You claimed that “P(boojum|snark) = 0.4” means the same thing as “∀x: P(boojum(x)|snark(x)) = 0.4”, but it does not. “P(boojum|snark) = 0.4” means that if you randomly select a snark, there is probability 0.4 that it is a boojum. This does not imply that P(boojum(Edward)|snark(Edward)) = 0.4. Maybe you have some prior reason to believe that Edward less likely than average to be a boojum, and P(boojum(Edward)|snark(Edward)) = 0.3 instead. Maybe you know for certain that Edward is not a snark, and then P(boojum(Edward)|snark(Edward)) is undefined.
Re: Not quite obvious
Toy example: Let the language is the language of arithmetic together with a constant symbol c and a unary relation symbol R, and let ℕ be the underlying set. Let the symbols from the language of arithmetic (0, 1, +, and ×) be interpreted in the standard way with probability 1. Let c be interpreted as the number n with probability 2^(-n-1). And let R be interpreted such that R(n) holds with probability 3^(-n), with R(n) and R(m) being independent for distinct n and m, and R(n) is independent of c. Then P(R(c)) = sum over natural numbers n of 2^(-n-1)3^(-n) = sum … of (1/2)6^(-n) = (1/2)(1/(1-(1/6))) = 3/5.
Another possible probability distribution over interpretations of c and R is that c is interpreted as n with probability 2^(-n-1) and R is interpreted as “is prime” with probability 1/2 and as “is a multiple of c” with probability 1/2. Then P(R(4)) = (1/2)(probability 4 is prime) + (1/2)(probability that c is 1, 2, or 4) = (1/2)(1/4 + 1/8 + 1/32).
significant open problem?
What precisely is it that you’re claiming is a significant open problem? Combining probability measures and first order structures has been been done in several ways. For instance, Kiesler measures, which are probability measures over the Boolean algebra of definable sets of a first-order structure, are commonly used in model theory (technically, a Kiesler measure is only required to be finitely additive, but it doesn’t make a difference in countably saturated structures, and there’s no reason you couldn’t consider countably additive Keisler measures anyway). I think descriptive set theorists sometimes talk about probability measures on isomorphism-classes of countable models of first-order theories. You haven’t convinced me that you’ve identified something important that’s missing from already known concepts that combine logic and probability.
The suggestion you gave for a direction in which to try to combine logic and probability is misguided (as I pointed out in my first comment), enough so that I wouldn’t believe that capable researchers thought it worthwhile to explore that direction. And the fact that I quickly came up with a formal framework (which certainly isn’t worthy of a PhD thesis, or even a publication) that follows your suggestions also indicates it is unlikely that people were trying to do that and failing.
To be clear, it wouldn’t surprise me if smart researchers have thought of something unsatisfactory about the already known ways to combine logic and probability, which they have failed to resolve. But if so, I don’t know what these problems are, and your post didn’t adequately describe them.
Well this is embarrassing; I
Well this is embarrassing; I just realized the formalism I suggested does not behave like what you were looking for after all. The sentences in the language I was calling L’ have probabilities rather than truth-values like I incorrectly said they did (not really a problem so far), but if phi is a sentence and r is a real number, then sentences of the form P(phi)=r always have probability either 1 or 0 in any model, whereas you suggested that you’d want it to be possible to express uncertainty about such claims.
http://intelligence.org/files/DefinabilityTruthDraft.pdf seems like the framework it uses is very similar to the problem you stated, but IIRC that paper involves various unsatisfying weirdnesses like relying heavily on nonstandard models. Perhaps you already knew about it; idk.
It seems I misinterpreted you as making stronger claims about what a unification of predicate logic and probability theory would look like than you actually were. I guess I shouldn’t have given you such a hard time over the difference between “P(boojum|snark)=0.4” and “∀x: P(boojum(x)|snark(x)) = 0.4” if you were simply illustrating one conceivable way that such a logic-probability unification could look, rather than claiming it would definitely look like that. You’re definitely right that probability theory does not itself extend predicate logic, though I think you overestimate how often people who say “probability extends logic” are confused about this rather than just using “logic” to refer to propositional logic.
Anyway, sorry if I was making an ass of myself.
Jaynes
“Jaynes is just saying “I don’t understand this, so it must all be nonsense.””
Which is itself a form of mental projection fallacy.
Proof of title?
Could you source the statement “probability theory is mathematically proven not to extend logic”? I’d like to see those mathematical proofs, or at least see abstracts about them, at the graduate math level or below.
It seems you have no issue with probability theory as a theory of belief, confidence, how they are attached to a claim -or a family of claims-, and how they interact with- and evolve with- new and prior information. Is that correct?
Also, I’m curious if you’ve finished reading ET Jaynes’ book.
Confusion
So..I can define a “Probablity Monad” in some strongly typed programing language…and then use the Curry-Howard “isomorphism” to translate those types back into first-order logic.
Which seems to solve the issue to me… what does this not handle?
You should just know, this
You should just know, this post sounds incredibly snobbish and dismissive of very respected and knowledgeable people about these subjects. As you have said, this is “undergraduate-level” math, I am an undergraduate (full disclosure!) but this post is dramatized and overwrought in a way that is an immediate red flag for any level of rigor or clarity. If the point you were making was really fundamental–and the tradeoffs, limitations, or unverified assumptions whose existence you are convinced of really were mathematically basic statements about the limits of probability–this wouldn’t have come out sounding like such a pompous screed. This post simultaneously exudes confidence, clarifies nothing, and bandies about understanding of the most basic mathematical concepts like its the ultimate proof of the author’s superiority. This is a manifesto devoid of results. If it does contain any, they are either too underdeveloped or faintly-sketched to register as anything at all. Please, you are making the lives of people who are actually seeking understanding harder with your arrogance. I’ve seen how you respond to other posts–I am not interested in your response to this. I have selected not to be notified of further comments on this forum. Best of luck with this whole massive, completely indecipherable enterprise this blog seems to represent
RE: Wrong Way Reduction
Propositional logic embeds trivially in Probability theory, and the extension to Predicates is orthogonal.
Ontology (in this context, figureing out what space of ideas is both usable, and contains the helpful ideas) is in neither Logic or Probability Theory.
Do you have an example of a logical statement that can’t be embedded in Probability Theory?
Ontology is logic with additional structure
In the computer science context, ontology is a model of selected or constructed entities that are useful for some given domain, and the logical relationships between them.
Any effective ontology has to use logic , but ontology adds the above elements. So it’s logic with additional structure that enables it to be applied to the real world.
The real challenge is how to combine deduction and induction
David,
It seems that we have deductive logic on the one hand (pure math such as 1st-order logic ) and inductive logic on the other (applied math such as for instance probability theory).
But something is missing from both....as David Deutsch emphasizes in his books (‘The Fabric Of Reality’ and ‘The Beginning of Infinity’), the art of generating good explanations remains a mystery.
Could it be that this missing art (which you call ‘meta-rationality’) is equivalent to abduction, inference to the best explanations?
And perhaps the way to obtain this missing art of abduction is to learn how to combine deduction and induction into an integrated system.
Putting it together. Combine Induction&Deduction for Abduction!
Deepmind just posted a link to a new paper illustrating exactly what I’ve been suggesting. Link here:
https://arxiv.org/abs/1711.04574
In the paper above, the authors start with classical logic as applied in logic programming, then they allow for many-valued (non-classical, non-monotonic logic). They then attempt to mix this with neural networks!
So they’re starting to attempt exactly what I suggested above.
Deductive methods (logic programming), mixed with Inductive methods (non-monotonic many-valued logic and machine learning), yields the beginning of an entirely kind of method....Abduction!
Correct David, Deductive logic is more general than inductive!
I understand everything so much better now!
Inductive reasoning (of which ‘probability theory’ is actually only a part) is just a ‘fragment’ of deductive logic. That is to say, every aspect of inductive logic can be embedded in deductive, but not vice-versa.
In fact, probability theory is not even the most general form of inductive reasoning! I now think that the crown goes to a marriage of ‘type theory’ (a form of logic programming) with ‘non-monotonic logic’ (many-valued logic).
That is to say, I think type-theory/many-valued logic is the most general form of inductive logic! (probability theory is only a ‘fragment’ of this).
In turn, type-theory/many-valued logic is only a ‘fragment’ of deductive logic! This follows from the Curry-Howard correspondence!
https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence
Note, every aspect of type-theory and many-valued logic can be embedded in category theory (pure deductive logic), but not vice-versa!
So inductive logic is a fragment (embedded in) deductive logic.
Think of cognition as the filtered light of ultimate reality!
OK, imagine that ‘reality’ is ultimately just the categories of category theory! That is to say, imagine that mathematics exists ‘out there’ and it’s all just ‘categories’! Think of the categories of category theory as the ultimate reality and imagine them to be ‘sunlight’.
But the blinding light of the sun is too ‘pure’ to be sighted directly. It can never be captured by ‘thought’ in it’s entirety. The metaphor I’m making here is that reality can never be understood in its entirely (it’s beyond any formal system ), so we need to ‘filter it’. This I think, is the key insight of ‘meta-rationality’.
OK, now imagine ‘Inductive Logic’ as a pair of sun-glasses that filters the pure light of ‘Deductive Logic’ (Mathematical Categories). The filter blocks out some of reality, in effect, ‘structuring it’.
I’m suggesting that this filtering of reality is equivalent to ‘cognition’ (meta-rationality or intelligence or whatever you want to call it), which is ultimately abduction, the art of generating good explanations.
In my equation relating the 3 types of logic then, the ‘x’ sign here refers to the filtering operation.
Deduction x Induction = Abduction (Cognition)
The pure light of ultimate reality (the ‘categories’ of deduction) is structured or filtered by Induction , and the result is Abduction, or Cognition.
“Gnaw and tug at the posts, and you will slowly loosen them up!”
If we take a Cartesian Closed Category, this is equivalent to Typed Lambda Calculus. So there’s our initial deductive structure.
Now we’re going to perform a filtering operation, by transforming the above structure using inductive structure.
To do this, deploy fuzzy logic! Instead of bivalent (two-value) truth-conditions, we use a continuum of truth conditions (infinite-valued logic).
The filtering will transform this into a model of a dynamical system by deploying Temporal Modal Logic. The result is an ontology - an abductive structure that is the first component of general intelligence!
Lets use the same trick with another type of deductive structure – a hyper-graph.
Now transform this using an inductive structure – in this case simply put a probability distribution on it to obtain random graphs.
The filtering will transform this into a model of dynamical systems by deploying stochastic models. The result is a network - an abductive structure that is the second component of general intelligence!
Finally, we’ll deploy the trick with a third type of deductive structure – a manifold.
Take the manifold and transform this using the last type of inductive structure – an information geometry.
The filtering will transform this into a model of dynamical systems by deploying data compression coding . The result is a search algorithm - an abductive structure that constitutes the third component of general intelligence!
…
“One day you’ll break the fence that held your forebears captive!”
Bayes is a domain of mathematics, not cognitive science
David,
Yes I think can now explain much more clearly what’s wrong with ‘Less Wrong’ style rationality. As you say, it’s just ludicrous to take Bayes theorem as some sort of ‘magic key’ that explains all rationality. But now I think I’ve pin-pointed exactly where the Bayesian cultists are going wrong, and I can explain it.
OK, imagine that you want to learn about psychology, and you go along to a lecture series run by a guy named ‘Sliezer Budkowsky’, who’s claiming to have the ‘magic key’ to all psychology. You sit through all the lectures and are amazed to hear that CHEMISTY is the key to all psychology!
Imagine that you’re interested in insights into personality traits such as ‘Open-ness to Experience’, ‘Introversion’, ‘Extroversion’ etc., you sit through the lecture series, but are astounded to hear that Budkowsky spends the entire lecture series on psychology discussing CHEMISTRY - chemical bonds, the periodic table, acids-bases etc., and then triumphantly concludes with ‘And that’s the key to all of psychology!’
In reality of course, you’ve actually learned nothing at all about psychology. The reason is that psychology operates on a different level of abstraction from chemistry, and for real explanations of psychology, you need concepts and models that are appropriate for the domain you want to learn about.
Similarly, if I want to learn about machine learning, unsurprisingly I need to study well…machine learning. Bayes theorem won’t help much.
If I’m specifically learning about statistics and probability theory (branches of applied mathematics), then Bayesian models are appropriate. But if I switch domains, Bayes theorem quickly loses it’s relevance.
For one thing, in pure mathematics, the models are usually only appropriate for very simple (idealized) situations. For example, in probability and statistics university courses, they’re usually dealing with situations where they’re only looking at one variable (univariate statistics), and pure math can help. But of course, in the real world, you have multiple interacting variables (multivariate statistics), and an entirely different set of concepts and models would need to be deployed for that more general situation. And indeed, the numerical methods deployed in machine learning to deal with multivariate situations bear little resemblance to the stuff you learn in probability+stats courses.
In a nutshell, probability and statistics is about…well probability and statistics ;) There’s a set of concepts appropriate for a given level of abstraction. But change the level of abstraction you’re looking at, and these concepts no longer apply.
If I want to crack intelligence, I need methods of knowledge representation that can generate new concepts and models across many levels of abstraction. If we imagine the space of all knowledge domains, then probability and statistics is only a very limited sub-set of this. So it can’t possibly be the key to general intelligence.
Intuition
I’d be curious to hear your thoughts on how to fit intuitive leaps into an empiricist rationalism, as it seems to me that intuitive leaps are how much progress is made in that framework. The data doesn’t speak for itself, the human mind connects it in fits of intuition.
Is there a sense that we subject intuitive leaps to rational techniques in order to filter out the correct leaps from the incorrect ones, and thus the intuitive part of it becomes irrelevant? That it might as well be randomly generated? That seems inadequate.
Interesting! Thanks for your
Interesting! Thanks for your thoughts.
A useful account of productive non-rational mental processes needs to be more specific about how and when and why they work.
Well sure, but then we’d understand cognition and possibly consciousness. No?
More broadly, my point is that the scientific worldview doesn’t encompass, understand, or even seem to be interested in all of the processes that allow that worldview to make scientific progress, but that doesn’t seem to be a big talking point except among a few religious thinkers.
Epistemology =/= Theory of Knowledge
Hello David.
Thanks for your Relevant Article :)
Just asking for a Semantic Differentiation:
Could You please define the word “Epistemology” for the “the Study of the Scientific Knowledge” (what You were talking about) and as if it was “the Theory Of Knowledge”? Which is a Larger Concept for the later ;) Differencing Helps to Better Understand how things Works, imho :p
Regards,
Didier
A Bridge Uncrossed
I will pick a bone here, and I appologize for thrashing a sleeping post. I think there is a gap in my understanding of philosophical (as opposed to mathematical) logic. My contention regards my interpretation of this statement:
“Non-Aristotelian logic turned out to be a dead end, and is a mostly-forgotten historical curiosity.”
When philosophers claim to “know” logic without reading modern logic, they appear to make very curious statements. For one, they seem to regard all of logic beyond Russell to be an extraneous unusable mishmash. But it is not!
Here are examples of non-Aristotelian logic in modern use. Many appear well after the 1930s. All receive major semantics and and proof theoretic overhauls after the 1930s, and all are extensively applied:
- Fuzzy logic (Łukasiewicz and Tarski, 1920; Zadeh 1965)
- Computability logic (Japaridze, 2003)
- Lattice value logics (Belnap, 1975; Xu, 2003)
- Intuitionist logic (Brouwer, 1923; Gentzen, 1934; Kleene, 1952; Heyting, 1964)
- Modal logic (Lewis, 1910; Barcan, 1946, Prior, 1957)
- Bayesian & Causal networks (Pearl, 1985, 2000; Neapolitan, 1989)
- Quantum logic (Birkhoff/von Neumann 1936; Pratt, 1992; Abramsky/Coecke 2004)
- Relevance logic, Linear Logic, Petri nets, Non-Monotonic Logic, Closed Monoidal diagrams, and more.
From where can we dismiss these as a mostly forgotten curiosity? I can show you hundreds of papers in each topic from the last decade. I can show you technical projects that succeeded because I had some or other of these logics to hand. So why does it seem that analytic philosophers hold modern logic in such ignorant contempt?
A Possible Explanation: A bridge uncrossed
Lets take the statement:
“When philosophers claim to “know” logic without reading modern logic, they appear to make very curious statements. “
And transform it into the statement:
“When mathematicians and computer scientists and formal logicians claim to “know” logic without reading modern philosophical logic, they appear to make very curious statements indeed!”
Perhaps - and I am learning this - what philosophers mean as logic is simply not what modern topics in logic have grown into. That means that to bridge the gap, the logician must study philosophical logic, while the philosopher must take some care to study modern nonstandard logics, deductive theories, and semantics.
Non-Aristotelian logic is dead?!
Same as previous commenter.
Intuitionistic logic seems to be all the rage with the more functional programming and type theory minded in my bubble. Hardly dismissible as uninteresting, or indeed useless. These are real engineers building real theorem proving solutions to real mathematical problems… well, at least as a fun hobby.
And they always (well, if logic becomes the topic) make a point of constructionism and how double negation can’t be reduced to identity. Intuitionist logic seems to be, in fact, the more useful one in theories of computation. I wouldn’t know, I’ve never been a fan of logic anyway.
Programming is an artificial context
sigs,
The utility of mathematical logic in programming is sort of an exception-that-proves-the-rule situation: programming is an artificial context that is systematically logical by design. This is especially true of functional programming: all of the core mathematical concepts of functional programming were worked out decades ago, but it’s taken a long time to find problem domains other than AI where it’s a good fit.
I personally find intuitionistic logic fascinating but likely a dead end. On the one hand, it seems like a good idea to de-emphasize proving that a solution must exist without finding one. On the other hand, sqrt(2)^sqrt(2) obviously either is either rational or it isn’t, and the fact that we don’t (and probably can’t) know which one doesn’t seem like it should change that.
I think computability is a more important property than constructibility anyway.
Coding Theory (Computational Complexity) is the true foundation of rationality!
I’m updating my thoughts a few years later. I thought long and hard and eventually I saw in principle how to unify logic with probability.
The ‘laws of thought’ (as a consequence of computer science) basically do come down to 3 main areas I think: probability theory, coding theory (complexity) and constructive logic.
In order to determine the nature of the relationship between them, I tried to trace them back to their roots in pure math, finding direct analogies with linear algebra (probabilities), analysis (complexity) and category theory (logic).
Eventually David, I came to the conclusion that logic and probability are on an equal footing, one doesn’t in any sense ‘extend’ the other. You simply have 2 different foundations. However I believe they can both be unified in computational complexity theory (or coding theory).
Ultimately I think both probabilities and truth values convert to complexity measures. Coding theory is about efficient encoding of knowledge (complexity), and ultimately I think this is what encapsulates both probability theory and logic.
The 3 'keys to thought' are Causality, Complexity & Compositionality
Ultimately, I realized that technical facts about this or that system of inference are not as important as trying to explore the fundamental explanatory principles behind those technical systems. As you’ve always said David, we must ‘go meta’ ;) What queries about thinking were these technical systems of inference trying to answer?
After thinking long and hard, I concluded that each of the 3 main areas of computer science I mentioned (probability theory, coding theory and constructive logic), encapsulate deeper principles, of sufficient generality that they may indeed in some sense ‘solve’ intelligence when they’re combined together. But the resulting system will never be fully formalizable. It will, I think always be open-ended and amenable to further revision.
Here then, are what I think the 3 keys to thought are:
CAUSALITY: Probability theory in it’s fullest sense is really about cause and effect and how to do prediction, retrodiction and imputation. We don’t just want to know about correlations between things, we want to know about causes and counterfactuals, which outcomes are possible, and how would those outcomes change if we intervene in some way.
COMPLEXITY: Coding theory in it’s fullest sense is about dealing with complexity. We want to compress our representations of the world, to find efficient encodings to deal with limited resources in terms of space and time and limited information. In the real word, we are confronted with complex adaptive systems, and these embody a mix of randomness and determinism that makes them complex. How do such systems achieve open-endedness, efficiently exploring and creating new possibilities ?
COMPOSITIONALITY: Constructive logic in it’s fullest sense is about compositionality: how are large systems built from smaller ones, and going in the other direction, how do we manage to split the world into smaller parts, objects and the relations between them? Mereology studies the relationship between the whole and it’s parts. We want to know how to engineer and combine ontologies based on the principle of compositionality.
So there you have it! The keys to all thought are
CAUSALITY, COMPLEXITY and COMPOSITIONALITY !
Extending Logic
This blog post is really quite excellent and I also enjoyed all of the comments. I just wanted to add something that I found in a book which may be of interest.
“Probability Logics”, Ognjanovic, Raskovic and Markovic, Springer 2016, ISBN: 978-3-319-47011-5
On page 49, there is some discussion of John Maynard Keynes, and then “Thus probability extends classical logic” (ostensibly according to Keynes).
So I thought that this was an interesting thing to find in a book. Please bear in mind that I personally am in agreement with your overall thesis here, but the reference to Keynes is interesting and I just wanted to share that. Of course, Keynes interpretation of probability is pretty bizarre, so it might come as no surprise that he would take that view.
The marriage of predicate
The marriage of predicate logic and probability is indeed an ongoing problem. Most current work in Artificial Intelligence focuses on quantifying over probabilities. Partially due to neglectedness, and partially because we suspect that quantifying over probabilities is not going to end up a good approach, MIRI’s workshops on logical uncertainty have tended to attack the problem from the opposite direction, assigning probabilities to logical statements, e.g:
But none of this is challenging the Kolmogorov axioms or denying Cox’s Theorem, because once you assign probabilities to logical formulas, those probabilities obey the standard probability axioms. It might be better to say that the theory of uncertain reasoning extends the theory of certain reasoning, if you object to the phraseology which says that probability extends logic.
Another obvious example of an element of rationality that’s not contained in the Kolmogorov axioms for probability is, of course, the prior. But since logic never addressed priors either, this doesn’t mean that the theory of uncertain reasoning fails to extend the theory of certain reasoning.
What you’re really talking about, I think, is a mixture of the problem of extending our priors to logical theories that might be true of the empirical world, and the problem of relating logical theories to the empirical world at all. If I knew that, given some logical set of beliefs, it was 60% likely for a coin to come up heads, and then I saw the coin come up tails, I would know what to do with that. The problem is all in going from the logical set of beliefs to the 60% probability of seeing the coin come up heads. But if you can’t follow that link, you don’t have a problem with “extending logic to probabilities” so much as you have a problem with “relating anything phrased in predicate calculus to an experimental theory that makes predictions”. Even if I could tell you the probability of the Goldbach Conjecture being a semantic tautology of second-order arithmetic after updating on observation of the first trillion examples being true, I might not have solved the problem of going from any set of axioms phrased in predicate calculus to an experimental prediction. You’re free to say of this that “probability can’t extend logic!” but I wouldn’t describe that as being the main issue.
An AI-grade solution to this problem, I suspect, will tackle that issue head-on and give a naturalistic account of physical words that obey logical axioms, assign priors to those different observers, and try to locate observers or agents inside those worlds. But this is not the same problem as assigning a probability to the Goldbach Conjecture.
Jaynes clearly didn’t get everything right. For example, Probability Theory: The Logic of Science discusses Jaynes’s disbelief in the Copenhagen interpretation (justified) but it’s clear that Jaynes has not understood the import of the very standard argument from Bell’s Theorem which rules out Jaynes’s suggested resolution. Even so, the part of the argument where probability extends logic and (the repaired forms of) Cox’s Theorem gives us reason to suspect we won’t find any other way to do it, seems quite clear and cogent to me; even more so when you consider that in decision theory we have no good alternatives to utility functions and utility functions want scalar quantitative weights on outcomes, so useful uncertainty has to be a scalar quantity. I think you would probably just be happier with the wording if we said “the theory of scalar uncertainty extends the theory of qualitative belief and disbelief, and there are no good alternatives to scalar probabilities when it comes to AI-grade theories of uncertainty”.