How do infants know where word boundaries are?

When you listen to a foreign language, it sounds like one continuous stream of speech. Why don't foreigners put breaks between words?

As it turns out, neither do speakers of your language, but you've had years to develop quite sophisticated strategies to recognise where the word boundaries fall.

So how do infants figure it out?

Lexical cues: is it a word?

The most important type of cue for adults is lexical: do I recognise that word? If not, do I recognise the words on either side? (Mattys et al. 2005)

Infants "as young as 6 months can use knowledge of familiar words to segment input speech" in a similar way to adults (Bortfeld 2005).
7.5 month old infants can detect previously familiarised words in continuous speech (Jusczyk and Aslin 1995), and by 8 months they can remember the words two weeks later, and so “are beginning to engage in long term storage of words” (Jusczyk and Hohne 1997).

How do infants know that these are words in the first place? They are exposed to, and later reproduce, a significant number of isolated words  (Brent and Siskind 2001). But despite what you may think, these words don't make up more than 10% of the input. Since infants don't only increase their vocabulary with isolated words, they must be relying on other cues to figure out which part of a stream of speech is a word.

The Possible Word Constraint

As well as recognising whole words, both adults and infants obey the "possible word constraint": if breaking up continuous speech leaves bits and pieces that you know can't be real words, then that's not the right way to break it up.

Your understanding of what can and can't be a word in your language gets better as you get older. But a single consonant on its own can't be a word in any language, and 12-month old infants can use this to spot word boundaries just like adults (Johnson et al. 2002).

Sublexical cues: does it match the sounds of the language?

There are several types of sublexical cues which both adults and infants use if they can't recognise a word. The most basic of these are prosodic cues.

Prosodic cues

Prosody is the rhythm and melody of speech. Infant “preferences develop between six and nine months for the prosodic characteristics of native language words” (Jusczyk 1999).

And at birth, infants whose native language is English - who have been exposed, through their mother's speech, and the people she talks to, to English - can tell the difference between English speakers and French speakers. English is a stress-timed language, so the rhythm of the language is governed by stressed syllables, whereas French is a syllable-timed language, and every syllable is pronounced with a constant rhythm.

The prosody of Polish is very like English, and English infants don't distinguish between English and Polish speech at birth.

English infants learn to segment speech by stress, showing preference for the prosodic structure of words by 9 months old (Jusczyk et al. 1993). In a syllable-based language such as French, infants learn at a similar age to segment speech by syllable. (Nazzi et al. 2006).


Acoustic cues

Infants develop sensitivity to acoustic cues slightly later than prosodic cues. Here are some examples of acoustic cues.

Syllable transitions

Some transitions from one syllable to another are more likely than others, for a given language. If I give you the sound [eɪʃ], 'aysh', then it is quite likely that the next sound will be [ən], as in, "-ation". If you hear [eɪʃən], it is unlikely to be one word ending in [eɪʃ], and another starting with [ən].

By listening to speech, and hearing which sounds frequently occur together, infants can learn what the probabilities are of one syllable transitioning to another. An unusual transition is more likely to signal a word boundary than a common one, because if you usually hear 2 syllables together, they are probably in the same word.

8-month-old infants are capable of segmenting speech based purely on the transitional probabilities of syllables (Saffran et al. 1996, 1998), although if prosodic information is also present in the input, they favour prosodic cues over statistical cues (Johnson & Jusczyk 2001).

Phonotactics

There are also rules governing which sounds occur together within a syllable. Unlike the transitions between syllables, which are simply probable, phonotactic constraints can be absolute. So English phonotactics forbids syllables containing "tl", like "tlan" or "batl", but you can have phrases like "I got land" or "I bat left-handed". 

At 9 months, infants "have discovered that certain sound sequences typically occur at word boundaries, while others are more likely to be found within words" (Mattys et al 1999).

Phonotactic cues are favoured over stress cues by adults when both are clearly present (Mattys et al. 2005).
But infants prefer to use prosodic cues rather than acoustic ones: Mattys et al. demonstrated that infants listened longer to syllable pairs with a typical (strong-weak) stress pattern and an atypical consonant cluster than the reverse. This resembles the adult reliance on prosodic cues when acoustic information is poor (Smith et al 1989).

Allophony

Allophony is the subtle differences in pronunciation depending on where a sound is in a syllable.

Many accents of English have 2 different allophones for /l/: [l] and [ɬ]. These are commonly referred to as "light l" and "dark l".
(Try saying "light" and "dull". Are those "l"s different in your accent? Try saying "dull", and leave your tongue where it is during the "l". Now, without moving your tongue for the l, say "light". In certain London dialects, the "dark l" is so dark that it basically sounds like "ooh" or "w": trying saying words like "dull" and "full" with this vowel on the end instead of an l - does this sound familiar?)

Most 2-syllable words in English are trochees, that is, strong-weak words. Iambs are the reverse: words like gui-TAR or gi-RAFFE. When they rely entirely on prosody, English infants can only detect trochees, because iambs break the prosodic pattern. At 9 months, English infants cannot even detect allophonic differences, but by 10.5 months, they can use a combination of acoustic cues - including allophones - to detect iambs in fluent speech (Jusczyk et al. 1999).

When it goes wrong, go back a step

Infants start out using prosodic cues - things like rhythm and melody, which are much more universal ideas, even if the exact implementation varies between languages - then gradually replace them with acoustic and then lexical cues.

Adults prefer to use lexical cues, followed by acoustic cues, followed by prosodic cues (Mattys et al. 2005). As speech gets more difficult to understand - e.g. in noisy environments - we fall back onto the more basic strategies we used as infants.


References 


Aslin, R. N. , J. R. Saffran and E. L. Newport, 1998. Computation of Conditional Probability Statistics by 8-Month-Old Infants. Psychological Science Vol. 9, No. 4, pages. 321-324.

Bortfeld, H.,, J. L. Morgan, R. M. Golinkoff and K. Rathbun, 2005. "Mommy" and Me: Familiar Names Help Launch Babies into Speech-Stream Segmentation Psychological Science , Vol. 16, No. 4, pages 298-304.

Brent, M.R., and J. M. Siskind 2001. The role of exposure to isolated words in early vocabulary development. Cognition Vol. 81, Issue 2, pages B33-B44.

Cutler, A., 1994. Segmentation problems, rhythmic solutions. The Acquisition of the Lexicon, pages 81-104.

Dilley, L.C., S. L. Mattys, L. Vinke, 2010. Potent prosody: Comparing the effects of distal prosody, proximal prosody, and semantic context on word segmentation. Journal of Memory and Language Vol. 63, Issue 3, pages 274-294.

Johnson, E. K., and P. W. Jusczyk, 2001. Word Segmentation by 8-Month-Olds: When Speech Cues Count More Than Statistics. Journal of Memory and Language, Vol. 44, Issue 4, pages 548-567

Johnson, E. K., P. W. Jusczyk, A. Cutler & D. Norris, 2003. Lexical viability constraints on speech segmentation by infants. Cognitive Psychology, Vol. 46, Issue 1, pages 65-97.

Jusczyk, P.W., A. Cutler, and N. J. Redanz, 1993. Infants' Preference for the Predominant Stress Patterns of English Words. Child Development Vol. 64, No. 3, pages 675-687.

Jusczyk, P.W., and R. N. Aslin, 1995. Infants′ Detection of the Sound Patterns of Words in Fluent Speech. Cognitive Psychology Vol. 29, Issue 1, pages 1-23.

Jusczyk, P. W., E. A. Hohne, 1997. Infants' Memory for Spoken Words. Science 26 Vol. 277 no. 5334 pages 1984-1986.

Jusczyk, P. W., 1999. How infants begin to extract words from speech. Trends in Cognitive Sciences Volume 3, Issue 9, pages 323-328

Jusczyk, P. W., E. A. Hohne, A. Bauman, 1999. Infants’ sensitivity to allophonic cues for word segmentation. Attention, Perception and Psychophysics, Vol. 61, Number 8, pages 1465-1476.

Mattys, S. L., P. W. Jusczyk, P. A. Luce, & J. L. Morgan (1999). Phonotactic and prosodic effects on word segmentation in infants. Cognitive Psychology, 38, pages 465–494.

Mattys, S., L. White, J. Melhorn, 2005. Integration of Multiple Speech Segmentation Cues: A Hierarchical Framework. Journal of Experimental Psychology: General Issue: Vol 134(4), pages 477–500

Mehler, J. and A Christophe, 1994. Language in the Infant's Mind. Philosophical Transactions: Biological Sciences , Vol. 346, No. 1315, The Acquisition and Dissolution of Language, pages 13-20.

Nazzi, T., G. Iakimova, J. Bertoncini, S. Frédonie, and C. Alcantara, 2006. Early segmentation of fluent speech by infants acquiring French: Emerging evidence for crosslinguistic differences. Journal of Memory and Language Vol. 54, Issue 3, pages 283-299.

Saffran, J. R., R. N. Aslin and E. L. Newport, 1996. Statistical Learning by 8-Month-Old Infants. Science New Series, Vol. 274, No. 5294, pages 1926-1928.

Smith, M.R., A. Cutler, S. Butterfield, I. Nimmo-Smith, 1989. The Perception of Rhythm and Word Boundaries in Noise-Masked Speech. Journal of Speech and Hearing Research Vol.32, pages 912-920.
Thiessen, E.D., and J. R. Saffran, 2004. Spectral tilt as a cue to word segmentation in infancy and adulthood. Attention, Perception and Psychophysics, Vol. 66, Number 5, pages 779-791.