information-theoretic semantics See PHILOSOPHY. OF MIN. information theory, also called communication theory, a primarily mathematical theory of communication. Prime movers in its development include Claude Shannon, H. Nyquist, R. V. L. Hartley, Norbert Wiener, Boltzmann, and Szilard. Original interests in the theory were largely theoretical or applied to telegraphy and telephony, and early development clustered around engineering problems in such domains. Philosophers (Bar-Hillel, Dretske, and Sayre, among others) are mainly interested in information theory as a source for developing a semantic theory of information and meaning. The mathematical theory has been less concerned with the details of how a message acquires meaning and more concerned with what Shannon called the ‘fundamental problem of communication’ – reproducing at one point either exactly or approximately a message (that already has a meaning) selected at another point. Therefore, the two interests in information – the mathematical and the philosophical – have remained largely orthogonal. Information is an objective (mind-independent) entity. It can be generated or carried by messages (words, sentences) or other products of cognizers (interpreters). Indeed, communication theory focuses primarily on conditions involved in the generation and transmission of coded (linguistic) messages. However, almost any event can (and usually does) generate information capable of being encoded or transmitted. For example, Colleen’s acquiring red spots can contain information about Colleen’s having the measles and graying hair can carry information about her grandfather’s aging. This information can be encoded into messages about measles or aging (respectively) and transmitted, but the information would exist independently of its encoding or transmission. That is, this information would be generated (under the right conditions) by occurrence of the measles-induced spots and the age-induced graying themselves – regardless of anyone’s actually noticing. This objective feature of information explains its potential for epistemic and semantic development by philosophers and cognitive scientists. For example, in its epistemic dimension, a single (event, message, or Colleen’s spots) that contains (carries) the information that Colleen has the measles is something from which one (mom, doctor) can come to know that Colleen has the measles. Generally, an event (signal) that contains the information that p is something from which one can come to know that p is the case – provided that one’s knowledge is indeed based on the information that p. Since information is objective, it can generate what we want from knowledge – a fix on the way the world objectively is configured. In its semantic dimension, information can have intentionality or aboutness. What is happening at one place (thermometer reading rising in Colleen’s mouth) can carry information about what is happening at another place (Colleen’s body temperature rising). The fact that messages (or mental states, for that matter) can contain information about what is happening elsewhere, suggests an exciting prospect of tracing the meaning of a message (or of a thought) to its informational origins in the environment. To do this in detail is what a semantic theory of information is about.
The mathematical theory of information is purely concerned with information in its quantitative dimension. It deals with how to measure and transmit amounts of information and leaves to others the work of saying what (how) meaning or content comes to be associated with a signal or message. In regard to amounts of information, we need a way to measure how much information is generated by an event (or message) and how to represent that amount. Information theory provides the answer.
Since information is an objective entity, the amount of information associated with an event is related to the objective probability (likelihood) of the event. Events that are less likely to occur generate more information than those more likely to occur. Thus, to discover that the toss of a fair coin came up heads contains more information than to discover this about the toss of a coin biased (.8) toward heads. Or, to discover that a lie was knowingly broadcast by a censored, state-run radio station, contains less information than that a lie was knowingly broadcast by a non-censored, free radio station (say, the BBC). A (perhaps surprising) consequence of associating amounts of information with objective likelihoods of events is that some events generate no information at all. That is, that 55 % 3125 or that water freezes at 0oC. (on a specific occasion) generates no information at all – since these things cannot be otherwise (their probability of being otherwise is zero). Thus, their occurrence generates zero information. Shannon was seeking to measure the amount of information generated by a message and the amount transmitted by its reception (or about average amounts transmissible over a channel). Since his work, it has become standard to think of the measure of information in terms of reductions of uncertainty. Information is identified with the reduction of uncertainty or elimination of possibilities represented by the occurrence of an event or state of affairs. The amount of information is identified with how many possibilities are eliminated. Although other measures are possible, the most convenient and intuitive way that this quantity is standardly represented is as a logarithm (to the base 2) and measured in bits (short for how many binary digits) needed to represent binary decisions involved in the reduction or elimination of possibilities. If person A chooses a message to send to person B, from among 16 equally likely alternative messages (say, which number came up in a fair drawing from 16 numbers), the choice of one message would represent 4 bits of information (16 % 24 or log2 16 % 4). Thus, to calculate the amount of information generated by a selection from equally likely messages (signals, events), the amount of information I of the message s is calculated I(s) % logn. If there is a range of messages (s1 . . . sN) not all of which are equally likely (letting (p (si) % the probability of any si’s occurrence), the amount of information generated by the selection of any message si is calculated I(si) % log 1/p(si) % –log p(si) [log 1/x % –log x] While each of these formulas says how much information is generated by the selection of a specific message, communication theory is seldom primarily interested in these measures. Philosophers are interested, however. For if knowledge that p requires receiving the information that p occurred, and if p’s occurrence represents 4 bits of information, then S would know that p occurred only if S received information equal to (at least) 4 bits. This may not be sufficient for S to know p – for S must receive the right amount of information in a non-deviant causal way and S must be able to extract the content of the information – but this seems clearly necessary. Other measures of information of interest in communication theory include the average information, or entropy, of a source,
I(s) % 9p(si) $ I(si), a measure for noise (the amount of information that person B receives that was not sent by person A), and for equivocation (the amount of information A wanted or tried to send to B that B did not receive). These concepts from information theory and the formulas for measuring these quantities of information (and others) provide a rich source of tools for communication applications as well as philosophical applications. See also COMPUTER THEORY, EPISTEMOL – OGY, PERCEPTIO. F.A.