the specified terms. The later phase, impressed not so much by our problem-solving prowess as by how well we get along with ‘simple’ common sense, has shifted the emphasis from search and reasoning to knowledge. The motivation for this shift can be seen in the following two sentences: We gave the monkey the banana because it was ripe. We gave the monkey the banana because it was hungry.
The word ‘it’ is ambiguous, as the terminal adjectives make clear. Yet listeners effortlessly understand what is meant, to the point, usually, of not even noticing the ambiguity. The question is, how? Of course, it is ‘just common sense’ that monkeys don’t get ripe and bananas don’t get hungry, s. . . But three further observations show that this is not so much an answer as a restatement of the issue. First, sentences that rely on common sense to avoid misunderstanding are anything but rare: conversation is rife with them. Second, just about any odd fact that ‘everybody knows’ can be the bit of common sense that understanding the next sentence depends on; and the range of such knowledge is vast. Yet, third, dialogue proceeds in real time without a hitch, almost always. So the whole range of commonsense knowledge must be somehow at our mental fingertips all the time.
The underlying difficulty is not with speed or quantity alone, but with relevance. How does a system, given all that it knows about aardvarks, Alabama, and ax handles, ‘home in on’ the pertinent fact that bananas don’t get hungry, in the fraction of a second it can afford to spend on the pronoun ‘it’? The answer proposed is both simple and powerful: common sense is not just randomly stored information, but is instead highly organized by topics, with lots of indexes, cross-references, tables, hierarchies, and so on. The words in the sentence itself trigger the ‘articles’ on monkeys, bananas, hunger, and so on, and these quickly reveal that monkeys are mammals, hence animals, that bananas are fruit, hence from plants, that hunger is what animals feel when they need to eat – and that settles it. The amount of search and reasoning is minimal; the issue of relevance is solved instead by the antecedent structure in the stored knowledge itself. While this requires larger and more elaborate systems, the hope is that it will make them faster and more flexible.
The other main orientation toward artificial intelligence, the pattern-based approach – often called ‘connectionism’ or ‘parallel distributed processing’ – reemerged from the shadow of symbol processing only in the 1980s, and remains in many ways less developed. The basic inspiration comes not from language or any other psychological phenomenon (such as imagery or affect), but from the microstructure of the brain. The components of a connectionist system are relatively simple active nodes – lots of them – and relatively simple connections between those nodes – again, lots of them. One important type (and the easiest to visualize) has the nodes divided into layers, such that each node in layer A is connected to each node in layer B, each node in layer B is connected to each node in layer C, and so on. Each node has an activation level, which varies in response to the activations of other, connected nodes; and each connection has a weight, which determines how strongly (and in what direction) the activation of one node affects that of the other. The analogy with neurons and synapses, though imprecise, is intended. So imagine a layered network with finely tuned connection weights and random (or zero) activation levels. Now suppose the activations of all the nodes in layer A are set in some particular way – some pattern is imposed on the activation state of this layer. These activations will propagate out along all the connections from layer A to layer B, and activate some pattern there. The activation of each node in layer B is a function of the activations of all the nodes in layer A, and of the weights of all the connections to it from those nodes. But since each node in layer B has its own connections from the nodes in layer A, it will respond in its own unique way to this pattern of activations in layer A. Thus, the pattern that results in layer B is a joint function of the pattern that was imposed on layer A and of the pattern of connection weights between the two layers. And a similar story can be told about layer B’s influence on layer C, and so on, until some final pattern is induced in the last layer. What are these patterns? They might be any number of things; but two general possibilities can be distinguished. They might be tantamount to (or substrata beneath) representations of some familiar sort, such as sentencelike structures or images; or they might be a kind (or kinds) of representation previously unknown. Now, people certainly do sometimes think in sentences (and probably images); so, to the extent that networks are taken as complete brain models, the first alternative must be at least partly right. But, to that extent, the models are also more physiological than psychological: it is rather the implemented sentences or images that directly model the mind. Thus, it is the possibility of a new genus of representation – sometimes called distributed representation – that is particularly exciting. On this alternative, the patterns in the mind represent in some way other than by mimetic imagery or articulate description. How? An important feature of all network models is that there are two quite different categories of pattern. On the one hand, there are the relatively ephemeral patterns of activation in various groups of nodes; on the other, there are the relatively stable patterns of connection strength among the nodes. Since there are in general many more connections than nodes, the latter patterns are richer; and it is they that determine the capabilities of the network with regard to the former patterns. Many of the abilities most easily and ‘naturally’ realized in networks can be subsumed under the heading pattern completion: the connection weights are adjusted – perhaps via a training regime – such that the network will complete any of the activation patterns from a predetermined group. So, suppose some fraction (say half) of the nodes in the net are clamped to the values they would have for one of those patterns (say P) while the remainder are given random (or default) activations. Then the network, when run, will reset the latter activations to the values belonging to P – thus ‘completing’ it. If the unclamped activations are regarded as variations or deviations, pattern completion amounts to normalization, or grouping by similarity. If the initial or input nodes are always the same (as in layered networks), then we have pattern association (or transformation) from input to output. If the input pattern is a memory probe, pattern completion becomes access by content. If the output pattern is an identifier, then it is pattern recognition. And so on. Note that, although the operands are activation patterns, the ‘knowledge’ about them, the ability to complete them, is contained in the connection patterns; hence, that ability or know-how is what the network represents.
There is no obvious upper bound on the possible refinement or intricacy of these pattern groupings and associations. If the input patterns are sensory stimuli and the output patterns are motor control, then we have a potential model of coordinated and even skillful behavior. In a system also capable of language, a network model (or component) might account for verbal recognition and content association, and even such ‘nonliteral’ effects as trope and tone. Yet at least some sort of ‘symbol manipulation’ seems essential for language use, regardless of how networklike the implementation is. One current speculation is that it might suffice to approximate a battery of symbolic processes as a special subsystem within a cognitive system that fundamentally works on quite different principles.
The attraction of the pattern-based approach is, at this point, not so much actual achievement as it is promise – on two grounds. In the first place, the space of possible models, not only network topologies but also ways of construing the patterns, is vast. Those built and tested so far have been, for practical reasons, rather small; so it is possible to hope beyond their present limitations to systems of significantly greater capability. But second, and perhaps even more attractive, those directions in which patternbased systems show the most promise – skills, recognition, similarity, and the like – are among the areas of greatest frustration for languagebased AI. Hence it remains possible, for a while at least, to overlook the fact that, to date, no connectionist network can perform long division, let alone play chess or solve symbolic logic problems. See also COGNITIVE SCIENCE , COMPUTER THEORY , CONNECTIONISM , FORMAL LOGIC , GRAMMAR , PHILOSOPHY OF LANGUAGE , PHI — LOSOPHY OF MIN. J.Hau.