Jeffrey Heinz

UCLA Linguistics Dept.
3125 Campbell Hall
Los Angeles CA 90095-1543
jheinz@humnet.ucla.edu


Home          Dissertation          Committee          Abstract          Stress Typology         

Dissertation

Inductive Learning of Phonotactic Patterns

Dissertation [pdf] (1.5MB) (for 2-sided printing better to use this pdf)

Edward P.Stabler and Kie Zuraw, advisors.

Top


Committee

Chairs: Edward P.Stabler, Kie Zuraw

Members: Bruce Hayes, Stott Parker, Colin Wilson

Top


Abstract

This dissertation demonstrates that significant classes of phonotactic patterns---patterns found over contiguous sounds, patterns found over non-contiguous segments (i.e. long distance agreement), and stress patterns---belong to small subsets of logically possible patterns whose defining properties naturally provide inductive principles learners can use to generalize correctly from limited experience.

This result is obtained by studying the hypothesis spaces different formulations of locality in phonology naturally define in the realm of regular languages, that is, those patterns describable with finite state machines. Locality expressed as contiguity (adjacency) restrictions provides the basis for n-gram-based patterns which describe phonotactic patterns over contiguous segments. Locality expressed as precedence---where distance between segments is not measured at all---defines a hypothesis space for long distance agreement patterns. Finally, both of these formulations of locality are shown to be subsumed by a more general formulation---that each relevant phonological environment is defined `locally' and is unique---which I call neighborhood-distinctness.

In addition to patterns over contiguous and non-contiguous segments, it is shown that all stress patterns described in recent comprehensive typologies are, for small neighborhoods, neighborhood-distinct. In fact, it is shown that 414 out of the 422 languages in the typologies have stress patterns which are neighborhood-distinct for even smaller neighborhoods called `1-1'. Furthermore, it is shown that significant classes of logically possible unattested patterns do not. Thus, 1-1 neighborhood-distinctness is hypothesized to be a universal property of phonotactic patterns, a hypothesis confirmed for all but a few stress patterns which merit further study.

It is shown that there are learners which provably learn these hypothesis spaces in the sense of Gold (1967) and which exemplify two general classes of learners : string extension and state merging. Thus the results obtained here provide techniques which allow other hypothesis spaces possibly relevant to phonology, or other cognitive domains, to be explored. Also, the hypothesis spaces and learning procedures developed here provide a basis which can be enriched with additional, substantive phonological structure. Finally, this basis is readily transferable into a variety of statistical learning procedures.

Top


Stress Typology

The online access to the database is almost ready. In the meantime, the database is available in MYSQL format. The file is really an sql file so you should change the extension after downloading it. If MYSQL is running on your system, create a database called stress and then simply type from the command line:
mysql stress < stress-typology-heinz-21-06-2007.sql

The file with the stress typology is here: [txt] (300kb)

The finite state acceptors in the database are designed to be used with the fsa program which is available here.

Until the online access is available, please contact me if you have any questions.


Last updated: June 21, 2007