probability a numerical value that can attach to items of various kinds (e.g., propositions, events, and kinds of events) that is a measure of the degree to which they may or should be expected – or the degree to which they have ‘their own disposition,’ i.e., independently of our psychological expectations – to be true, to occur, or to be exemplified (depending on the kind of item the value attaches to). There are both multiple interpretations of probability and two main kinds of theories of probability: abstract formal calculi and interpretations of the calculi. An abstract formal calculus axiomatically characterizes formal properties of probability functions, where the arguments of the function are often thought of as sets, or as elements of a Boolean algebra. In application, the nature of the arguments of a probability function, as well as the meaning of probability, are given by interpretations of probability. The most famous axiomatization is Kolmogorov’s (Foundations of the Theory of Probability, 1933). The three axioms for probability functions Pr are: (1) Pr(X) M 0 for all X; (2) Pr(X) % 1 if X is necessary (e.g., a tautology if a proposition, a necessary event if an event, and a ‘universal set’ if a set); and (3) Pr(X 7 Y) % Pr(X) ! Pr(Y) (where ‘7’ can mean, e.g., logical disjunction, or set-theoretical union) if X and Y are mutually exclusive (X & Y is a contradiction if they are propositions, they can’t both happen if they are events, and their set-theoretical intersection is empty if they are sets). Axiom (3) is called finite additivity, which is sometimes generalized to countable additivity, involving infinite disjunctions of propositions, or infinite unions of sets. Conditional probability, Pr(X/Y) (the probability of X ‘given’ or ‘conditional on’ Y), is defined as the quotient Pr(X & Y)/Pr(Y). An item X is said to be positively or negatively statistically (or probabilistically) correlated with an item Y according to whether Pr(X/Y) is greater than or less than Pr(X/-Y) (where -Y is the negation of a proposition Y, or the non-occurrence of an event Y, or the set-theoretical complement of a set Y); in the case of equality, X is said to be statistically (or probabilistically) independent of Y. All three of these probabilistic relations are symmetric, and sometimes the term ‘probabilistic relevance’ is used instead of ‘correlation’. From the axioms, familiar theorems can be proved: e.g., (4) Pr(-X) % 1 – Pr(X); (5) Pr(X 7 Y) % Pr(X) ! Pr(Y) – Pr(X & Y) (for all X and Y); and (6) (a simple version of Bayes’s theorem) Pr(X/Y) % Pr(Y/X)Pr(X)/Pr(Y). Thus, an abstract formal calculus of probability allows for calculation of the probabilities of some items from the probabilities of others. The main interpretations of probability include the classical, relative frequency, propensity, logical, and subjective interpretations. According to the classical interpretation, the probability of an event, e.g. of heads on a coin toss, is equal to the ratio of the number of ‘equipossibilities’ (or equiprobable events) favorable to the event in question to the total number of relevant equipossibilities. On the relative frequency interpretation, developed by Venn (The Logic of Chance, 1866) and Reichenbach (The Theory of Probability, 1935), probability attaches to sets of events within a ‘reference class.’ Where W is the reference class, and n is the number of events in W, and m is the number of events in (or of kind) X, within W, then the probability of X, relative to W, is m/n. For various conceptual and technical reasons, this kind of ‘actual finite relative frequency’ interpretation has been refined into various infinite and hypothetical infinite relative frequency accounts, where probability is defined in terms of limits of series of relative frequencies in finite (nested) populations of increasing sizes, sometimes involving hypothetical infinite extensions of an actual population. The reasons for these developments involve, e.g.: the artificial restriction, for finite populations, of probabilities to values of the form i/n, where n is the size of the reference class; the possibility of ‘mere coincidence’ in the actual world, where these may not reflect the true physical dispositions involved in the relevant events; and the fact that probability is often thought to attach to possibilities involving single events, while probabilities on the relative frequency account attach to sets of events (this is the ‘problem of the single case,’ also called the ‘problem of the reference class’). These problems also have inspired ‘propensity’ accounts of probability, according to which probability is a more or less primitive idea that measures the physical propensity or disposition of a given kind of physical situation to yield an outcome of a given type, or to yield a ‘long-run’ relative frequency of an outcome of a given type.
A theorem of probability proved by Jacob Bernoulli (Ars Conjectandi, 1713) and sometimes called Bernoulli’s theorem or the weak law of large numbers, and also known as the first limit theorem, is important for appreciating the frequency interpretation. The theorem states, roughly, that in the long run, frequency settles down to probability. For example, suppose the probability of a certain coin’s landing heads on any given toss is 0.5, and let e be any number greater than 0. Then the theorem implies that as the number of tosses grows without bound, the probability approaches 1 that the frequency of heads will be within e of 0.5. More generally, let p be the probability of an outcome O on a trial of an experiment, and assume that this probability remains constant as the experiment is repeated. After n trials, there will be a frequency, fn, of trials yielding outcome O. The theorem says that for any numbers d and e greater than 0, there is an n such that the probability (P) that _p–fn_ ‹ e is within d of 1 (P ( 1–d). Bernoulli also showed how to calculate such n for given values of d, e, and p. It is important to notice that the theorem concerns probabilities, and not certainty, for a long-run frequency. Notice also the assumption that the probability p of O remains constant as the experiment is repeated, so that the outcomes on trials are probabilistically independent of earlier outcomes. The kinds of interpretations of probability just described are sometimes called ‘objective’ or ‘statistical’ or ’empirical’ since the value of a probability, on these accounts, depends on what actually happens, or on what actual given physical situations are disposed to produce – as opposed to depending only on logical relations between the relevant events (or propositions), or on what we should rationally expect to happen or what we should rationally believe. In contrast to these accounts, there are the ‘logical’ and the ‘subjective’ interpretations of probability. Carnap (‘The Two Concepts of Probability,’ Philosophy and Phenomenological Research, 1945) has marked this kind of distinction by calling the second concept probability1 and the first probability2. According to the logical interpretation, associated with Carnap (see also Logical Foundations of Probability, 1950; and Continuum of Inductive Methods, 1952), the probability of a proposition X given a proposition Y is the ‘degree to which Y logically entails X.’ Carnap developed an ingenious and elaborate set of systems of logical probability, including, e.g., separate systems depending on the degree to which one happens to be, logically and rationally, sensitive to new information in the reevaluation of probabilities. There is, of course, a connection between the ideas of logical probability, rationality, belief, and belief revision. It is natural to explicate the ‘logical-probabilistic’ idea of the probability of X given Y as the degree to which a rational person would believe X having come to learn Y (taking account of background knowledge). Here, the idea of belief suggests a subjective (sometimes called epistemic or partial belief or degree of belief) interpretation of probability; and the idea of probability revision suggests the concept of induction: both the logical and the subjective interpretations of probability have been called ‘inductive probability’ – a formal apparatus to characterize rational learning from experience. The subjective interpretation of probability, according to which the probability of a proposition is a measure of one’s degree of belief in it, was developed by, e.g., Ramsey (‘Truth and Probability,’ in his Foundations of Mathematics and Other Essays, 1926); Definetti (‘Foresight: Its Logical Laws, Its Subjective Sources,’ 1937, translated by H. Kyburg, Jr., in H. E. Smokler, Studies in Subjective Probability, 1964); and Savage (The Foundations of Statistics, 1954). Of course, subjective probability varies from person to person. Also, in order for this to be an interpretation of probability, so that the relevant axioms are satisfied, not all persons can count – only rational, or ‘coherent’ persons should count. Some theorists have drawn a connection between rationality and probabilistic degrees of belief in terms of dispositions to set coherent betting odds (those that do not allow a ‘Dutch book’ – an arrangement that forces the agent to lose come what may), while others have described the connection in more general decision-theoretic terms.
See also BAYES’s THEOREM, CARNAP, DUTCH BOOK , INDUCTION , PROPENSITY , REICHEN — BAC. E.Ee.