Semantic influences on phonetic identification and lexical decision.

Number 883
Year 1975
Drawer 16
Entry Date 11/19/1999
Authors Rubin, P. E.
Contact
Publication A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
url http://www.haskins.yale.edu/Reprints/HL0883.pdf
Abstract [Introduction] It is commonly known that the perception of the sounds of speech is significantly influenced by the contexts in which they occur. In part, this is owing to the nature of the speech code. Research conducted by the Haskins Laboratories group (cf. Liberman, 1970, Cooper, Shankweiler and Studdert-Kennedy, 1967) has revealed that the information needed to make decisions about selected portions of the speech signal (e.g. identifying the phoneme /b/ in /bag/) is often-times neither discrete nor localized to a specific portion of the signal. In the acoustic stream, speech information is structured in a highly encoded form due, in part, to the mechanical constrains of the production system. Recent evidence points directly to the influence that contextual or higher-order structure has on making decisions about lower-level aspects of the stimulus pattern. Pisoni and Tash (1974) report that same-difference reaction-time judgments to consonants and vowels in pairs of consonant-vowel utterances depend upon the information carried in the entire syllable. If in syllable pairs the portions of the signals not being compared were the seam (e.g. if vowels were being compared, then consonants were the same), latency for the same-different judgment was significantly less than when the non-target portions of the syllable were different. A further demonstration of the influence of syllabic contact on phonetic identification is that the intelligibility of vowels in a consonant environment is superior to vowels presented in isolation (Shankweiler, Strange and Verbrugge, in press). As we might suppose, contextual determinants of the perception of speech sounds extend beyond a simple consideration of syllabic structure. For example, Liberman (1963) and Pollack and Pickett (1963) presented to listeners words excised from sentences and found a subsequent decrement in their recognition. Furthermore, there is evidence that judgments about the rise and fall of final pitch contours in utterances are not so much dependent upon the actual physical description of contours as they are upon whether the total contour is perceived as a question or as a statement (Studdert-Kennedy and Hadding, 1973; Hadding-Koch and Studdert-Kennedy, 1964). In short, the contextual base provided by such factors as syntax, semantics, prosody, etc. plays and important part in the overall speech recognition process. For a more complete discussion of such contextual considerations see Darwin. Of course, the effect of higher-order structure on recognition of particulars of the information nested within that structure is not limited to problems in speech. Evidence from experiments in visual perception present an analogous situation. Biederman (1972) and his associates (e.g. Biederman, Glass and Stacy, 1973) have shown that the detection and/or identification of an object in a representation of a real world scene is negatively affected if the overall coherency of the context in which these lines are presented results in subsequent decrements in detestability. A further illustration is provided by Riecher (1968) and wheeler (197): in a forced-choice test of letter recognition, under conditions of visual masking, subjects recognize a letter more accurately in a word than in a nonword (see also Johnson and McClelland, 1974). As previously noted, contextual influences in speech recognition have been demonstrated in a variety of different paradigms. In general, however, the problem has not been addressed of how higher-order properties (i.e. lexical, semantic) can influence the processing of lower-level particulars (i.e. acoustic, phonetic, phonological) in word recognition. The present author and his associates (Rubin, Turvey and Van Gelder, 1975) looked at the effect of manipulating lexical membership on the detection of specified phonemes appearing in initial position in spoken consonant-vowel-consonant syllables. A brief description of these experiments and their theoretical interpretation is in order, for they provide the departure point for the present series of experiments. Rubin, et.al. (1975) employed a phoneme-monitoring task in two experiments. Participants were presented with spoken consonant-vowel-consonant syllables, both words and nonwords, with this distinction determined solely by a difference in final consonant (e.g. /bit/ versus /bip/). Their task was to press a key whenever a syllable began with a particular consonant. The first experiment used sequences of either words or all nonwords, in which one item in a sequence began with the target consonant. In order to avoid contextual effects due to the use of homogeneous sequences (all words or all nonwords), words and nonwords were randomly intermixed within blocks in the second experiment. In both experiments it was found that consonant targets that began words were detected faster than consonant targets that began nonwords. Most commonly, accounts of the speech recognition process (cf. Studdert-Kennedy, 1974) have been hierarchically-based, that is, the recognition process has been viewed as a mapping from less to more abstract linguistic levels, e.g. auditory, phonetic, phonological, lexical, syntactic and semantic. While a hierarchy of this sort can allow for conversation between these levels, (for example, higher-order factors may be used in correcting or determining information at the lower end), the basic order of the hierarchy is preserved. In this view, we can interpret the word advantage effect for phoneme detection by assuming that the various linguistic levels interrelate prior to phonetic identification. On this assumption we are not surpassed by lexical and other higher-order influences on phonetic identification. But if we continue to assume that the order of levels of representation in the hierarchy is maintained, and that phonetic detection occurs a a low level, then why should the registration of lower and higher level influences be necessary before a simple response, contingent upon phoneme detection, can be made? Let us put this question aside for a moment and consider one other set of relevant findings. It has been shown that, in a latency-of-detection task, two-syllable word targets can be detected faster than one-syllable word targets. These, in turn, can be detected faster than individual phonemes (Foss and Swinney, 1973). In addition, there is evidence that certain three-word sentences can be detected faster than single words (McNeill and Lindig, 1973). Observations of this kind have motivated legitimate reservations about the relevance of the detection task to the analysis of perceptual stages. Perhaps we should be dubious about the claim that the word advantage effect for phoneme detection actually reflects perceptual processes. To the contrary, we might wish to entertain the idea that this effect and those reported by Foss and Swinney, McNeill and Lindig and others, are more accurately interpreted as manifestations of processes subsequent to perception. This point of view is elegantly expressed by Foss and Swinney (1973). In accounting for their results (and the results of an experiment similar to that of McNeill and Lindig, above), they found an explanation in terms of perceptual processes to be implausible,: the search for the critical element of perception in speech, they argued, is reduced to absurdity when sentences and multi-syllabic words are shown to be more easily identified than phonemes. For an alternative interpretation, they sought to distinguish between perception and identification and looked at their results as addressing the question of how, subsequent to perception, some “identification units” become accessible to consciousness more readily than others. The different rate of entry into awareness, they argue, is determined by the size of the linguistic unit. Thus two-syllable words are detected more rapidly than single syllable words because two-syllable words, as larger units become available to awareness at shorter delays. If the notion of “larger units” is taken literally, however, it is difficult to see how such a conception can explain the faciliatatory effect of words over nonwords in the phoneme detection task. In this case the words and nonwords were both monosyllables; there was no “larger unit”. Perhaps differences in latencies of availability to consciousness are determined not by “size”, but by a metric of familiarity or meaning. The word advantage effect can be seen as reflecting processes subsequent to perception which extract salient information and bring it to consciousness. A theoretical framework for this point of view can be found in Morton’s (1969, 1970) Logogen Model which seeks to account for performance in a variety of word recognition tasks (e.g. the word frequency effect, the word apprehension effect) and in more complex language behavior. The starting point for the Logogen Model is the assumption that when a response becomes available, the same final unit has operated to produce that output regardless of the source of information that led to the response. The term “logogen” refers to the supposed origin of this response. Each logogen is best described in terms of its output, which can be represented as a collection of sets of attributes of the following kind: visual, acoustic, phonological and semantic. The information of relevance to a logogen is multi-modal: further, it is detected in a relatively direct fashion (cf. Morton and Broadbent, 1967). The detection of relevant attributes in the structured energy at the receptors increments a counter which has various thresholds or critical values. When critical values are reached, the logogen sends information to two locations: the Cognitive System and the Response Butter (see Figure 1). The former is responsible for semantic and syntactic analysis, and is the source of contextually influenced inputs to the logogen; the latter, the Response Buffer, is only means of entry into the response systems (Morton, 1970). We should remark that, for our purposes, the Response Buffer is analogous to the identification unit of Foss and Swinney (1973). The output from the logogen to the Cognitive System predates the output from the logogen to the Response Buffer. This follows from the assumption that the critical value for a logogen’s semantic output is lower than that for its phonological output. The semantic output goes to the Cognitive System for contextual (and other) elaboration, and the phonological output-- which occurs subsequent to a reciprocal interchange between the Cognitive System and the logogen-- goes to the Response Buffer. We may summarize these comments as follows: the logogens (e.g. it is argued that they may vary inversely with word frequency), moreover they exhibit short-term as well as long-term variation. The threshold of a logogen can be reduced temporarily by the facilitative effect of prior word presentation (Neisser, 1954) and more permanently by morphemic pretraining (Murrell and Morton, 1974). We can now return to the word advantage effect for phoneme detection. In the Logogen Model it is explained in terms of a differential availability to consciousness of words and nonwords. The lower threshold for outputs from word logogens as compared with nonwords logogens will result in differences in the latencies with which sets of attributes of these items (e.g. phonetic, semantic) arrive in the response buffer. Thus, decisions about these attributes can be made sooner for words than for nonwords. The explanatory framework provided by the Logogen Modal can be an extremely powerful device for viewing problems in word recognition and, similarly, problems in other related areas--for example, higher linguistic processes, memory, etc. It should be pointed out, however, that the language Morton uses to describe this model is metaphorical and that the system is not a rigid and final statement. One notes, for example, that it has undergone constant modification (cf. Morton and Broadbent, 1967; Morton, 1964, 1969, 1970). Several aspects of the model are open to a variety of interpretations while still leaving the basic structure and utility of the system unchanged. By way of example, Posner (1972) views logogens in a more general sense as conceptual units that can be activated by many inputs whose physical characteristics can vary. He says, “These concepts serve as a means of storing the general character of past experience.” Along similar lines, the present author feels that integral concepts pf the Logogen Theory-- the amodal nature of influences on the logogen, the contribution of the “Cognitive System”-- may be spoken of in another light. The logoen can be considered to be a device differentially sensitive to specifications of information provided by invariant aspects of stimulation (cf. Gibson, 1966). Moreover, logogens are not triggered solely by sensory input; they can be directly addressed by the Cognitive System (e.g. endogenous speech acts). It appears that the logogen may be profitably viewed as an instantiation of a “coalitional” system, one that abstracts particulars from information provided, in an interactive way, from a variety of sources. In terms of a Logogen Model, the present experiments considered both lexical decision and contextual influences on phonetic and lexical decision, specifically, influences of a semantic nature.
Notes

Search Publications