Bruce P. Hayes

Department of Linguistics

UCLA

Short
version (23 pages, to appear in *Annual Review of Linguistics*)

Long
version (45 pages)

The wug-shaped curve depicts a frequency pattern widely found in
quantitative studies of variable phenomena in linguistics. Indeed, it is
so widespread that I believe its appearance may be meaningful from a
theoretical viewpoint. Visually, the wug-shaped curve takes the form of two or more identical sigmoid (logistic)
curves, spaced apart.

The wug-shaped curve is a natural consequence of probabilistic
versions of Harmonic grammar, such as MaxEnt.
Here is how the analysis is set up: we divide the constraint set
into two families, having different forms or teleologies: Baseline
constraints and Perturbers. We then plot the empirical data points,
in the form of probabilities (zero to one) on the vertical axis,
and Baseline probability on the horizontal axis. This is done
separately, in a different color, for the data series defined by
violations of the Perturber constraints. We also plot the sigmoid lines
themselves, which show the model fit -- ideally, the data points will
cling to their respective sigmoids.

Here is the underlying research agenda: along with some of my colleagues, I suspect that MaxEnt, or something like it, is correct for natural language, and that is why wug-shaped are ubiquitous in language data. You can judge for yourself by browsing through the images in this gallery, or by analyzing your own data in this way (see the last section for how).

Various people see various things in multiple sigmoids. It was Dustin
Bowers who suggested to me that they look like wugs. The wug was invented
and first drawn in 1958 by Jean
Berko Gleason, in one of the most famous
papers ever written in linguistics. In recent years, the wug has
been adopted by the field of linguistics as a sort of mascot. The real
wug is cuter than the mathematical one.

What is this web page
for?

I've written an article about wug-shaped curves in linguistics, which you
can download from the links at the top of this
page, in either a long or short version. Even the long version
doesn't have all of the cases I've compiled, and it seems that a web site
would be the best format to display them all together, perhaps adding more
in the future. Below, I've included all
the cases I have studied, including the ones where the data don't
look entirely pretty.

Browsing hints

For the scholarly reference sources behind all these curves, please follow the links or look at the bibliography section of my paper. In most cases, a link to a spreadsheet is included, which will tell you how I obtained the data, did the MaxEnt reanalysis, and plotted the curve. For a few cases, my spreadsheet is still messy and can't be shared yet, though you could ask.

Directory

Wug-shaped curves in phonology

Wug-shaped curves in phonetics

Wug-shaped curves in syntax

Wug-shaped curves in sociolinguistics

Wug-shaped curves in semantics/pragmatics

Wug-shaped curves in language change

Wug-shaped curves in sound symbolism

Some graphs used to diagnose theories

How I made the curves

Wug-shaped curves in phonology

Hungarian vowel harmony

Sources: Hayes
and Londe (2006), Hayes
et al. (2009), Zuraw
and Hayes (2017)

Y-axis: how often a stem will take back suffixes in a wug-test
experiment

Baseline constraints: based on stem vowels that influence harmony; B
is any back vowel, F any front rounded vowel.

Perturber constraints: stem-final consonant environments (e.g.,
after sibilants) that favor front harmony

Spreadsheet (forthcoming), plotting
script

French liaison

Source: Zuraw
and Hayes (2017)

Y-axis: likelihood of elision or liaison; for example, use of [l]
instead of [la] for the feminine definite article

Baseline constraints: lexical propensity of Word 2 to act as an
h-aspiré word

Perturber constraints: lexical propensity of Word 1 to appear in its
isolation form

Spreadsheet, plotting
script

Tagalog Nasal
Substitution

Sources: Zuraw (2000,
2010), Zuraw
and Hayes (2017)

Y-axis: how often a stem of a given type will undergo the process of
Nasal Substitution

Baseline constraints: related to place and manner of stem-initial
consonant

Perturber constraints: propensity of a particular prefix to trigger
the process

Spreadsheet, plotting
script

Inversion of Final
Devoicing in Dutch

Source: Ernestus
and Baayen (2003)

Y-axis: how often speakers guess that a stem-final obstruent (always
voiceless when word-final) will appear as voiced when suffixed

Baseline constraints: place and manner of stem final consonants,
preceding consonant if any

Perturber constraints: based on three degrees of vowel length in the
stem

Spreadsheet (forthcoming), plotting
script

Finnish genitive plurals

Sources: Anttila (1997), Boersma
and Hayes (2001), Goldwater
and Johnson (2003), Hayes
(in progress)

Y-axis: how often a stem will take the longer [-den] allomorph of
the genitive plural

Baseline constraint: whether allomorph choice will result in two
consecutive light syllables

Perturber constraints: based on vowel height and weight of stem
syllables. One perturber is inviolable (infinite weight) and
therefore produces a flat line, not a sigmoid.

Spreadsheet,
plotting script

Schwa/zero alternations
in French: Smith/Pater

Source: Smith
and Pater (2020)

Y-axis: how often zero shows up in French schwa/zero
alternations

Baseline constraints: whether schwa is inserted or deleted,
consonants in environment

Perturber constraint: whether deletion of a schwa creates clashing
(adjacent) stressed syllables

Spreadsheet, plotting
script

Schwa/zero alternations
in French: Storme

Source: Storme
(2021)

Y-axis: how often zero shows up in French schwa/zero
alternations

Baseline constraints: the markedness of the cluster into which schwa
is inserted (Hard, Medium, Easy), whether morphology is derivational or
inflectional.

Perturber constraint: a difference between Swiss and French native
speakers, treated by Storme as stricter Dep(schwa) in the Swiss dialect.
Thanks to Benjamin Storme for help with these data.

Stress placement in Hupa

Source: Ryan
(2019)

Y-axis: probability of initial stress rather than second syllable
stress

Baseline constraints: weight of initial syllable

Perturber constraints: weight of second syllable

Spreadsheet, plotting
script

Wug-shaped
curves in phonetics

Perception of voicing
based on closure duration and length of preceding vowel

Source: Kluender
et al. (1988)

Y-axis: likelihood an experimental participant will perceive a
voiced instead of a voiceless stop

Baseline constraint: gradient, based on closure duration

Perturber constraint: long vs. short preceding vowel

Spreadsheet, plotting
script

Perception
of liquids based on F3 and phonotactic constraints

Source: Massaro
and Cohen (1983)

Y-axis: likelihood that an experimental participant will perceive
[r] as opposed to [l]

Baseline constraints: F3 value of a synthesized liquid consonant

Perturber constraints: violation of various phonotactic constraints,
based on choice of preceding consonant (*[tl, *[sr, ?[vl, ??[vr)

Spreadsheet, plotting script

Wug-shaped
curves in syntax

Datives in English

Source: Szmrecsanyi
et al. (2017)

Y-axis: how often the meaning of the dative construction will be
expressed using NP NP rather
than NP to NP

Baseline constraints: governing various properties of the Recipient

Perturber constraints: status of the Theme

Spreadsheet (forthcoming), plotting
script

Genitives in English

Source: Szmrecsanyi
et al. (2017)

Y-axis: how often the meaning of the possessive will be
expressed using NP's NP rather
than NP of NP

Baseline constraints: an amalgam; consult the Szmerecsanyi et al.
paper

Perturber constraints: based on length of possessor in words

Spreadsheet (forthcoming), plotting
script

One can also plot the same data with length as the Perturber, like this:

Wug-shaped
curves in sociolinguistics

Contraction of the copula
in Black English

Sources: Labov
(1969), Cedergren
and Sankoff (1974)

Y-axis: how often the speaker uses a contracted (vowelless)
allomorph of the copula

Baseline constraints: left side environment, including pronominal
portmanteaux like *he's*

Perturber constraints: right side syntactic environment

This case is unusual in my experience in that the data are fitted solely
by the "tail" of the wug; cases further forward on the wug are empirically
missing.

Spreadsheet, plotting
script

Deletion of the copula in Black English

Sources: Labov
(1969), Cedergren
and Sankoff (1974)

Y-axis: how often the speaker uses a null allomorph of
the copula, assuming they have already chosen to contract.

Baseline constraints: left side environment, include pronominal
portmanteaux like he's

Perturber constraints: right side syntactic environment

Spreadsheet, plotting
script

This is perhaps the messiest case I have seen; perhaps the use of
conditional probability is the problem?

Deletion of [l] in Quebec
French

Source: G. Sankoff (1972), cited and discussed in Bailey
(1973)

Y-axis: deletion rate of [l]

Baseline constraints: varying propensity of various function words
to lose their [l]

Perturber constraints: sex and social class of speaker, taken as a
proxy for Max(l) varying by speaking style

Spreadsheet, plotting
script

This case stands out as problematic for Stochastic OT, critiqued in Zuraw and Hayes (2017) and my own paper. Here is a graph of a best-fit model of these data in Stochastic OT:

Omission of que
in Quebec French

Source: Cedergren
and Sankoff (1974)

Y-axis: retention rate for que

Baseline constraints: surrounding consonants

Perturber constraints: formality of style, as varied by type of
speaker

Spreadsheet, plotting
script

R-Spirantization in Panamanian Spanish

Source: Cedergren
and Sankoff (1974)

Y-axis: probability of realizing /r/ as a spirant

Baseline constraints: phrasal position, whether /r/ is part of the
infinitive ending, speaking style

Perturber constraints: following segment

This is a rather messy one, I admit, and in particular lacks extreme
values of probability.

Spreadsheet, plotting
script

R-Dropping in New York City English

Source: William Labov, via Cedergren
and Sankoff (1974)

Y-axis: probability of deleting /r/ in syllable codas

Baseline constraints: speaking context

Perturber constraints: designating different dialects spoken in the
same speech community.

Spreadsheet, plotting
script

Cluster Simplification in Detroit Black English

Source: Wolfram
(1969)

Y-axis: probability of deleting one of a pair of adjacent consonants

Baseline constraints: neighboring vowel/consonant, whether deleting
consonant is part of past tense suffix

Perturber constraints: social class, assumed to be a proxy for
speaking style

This curve has a puzzling too-close vertical grouping for the _ C/ -ed
case.

Spreadsheet,
plotting script

Wug-shaped curves in language change

Portuguese definite articles

Source: Kroch
(1989), ultimately from Oliveira y Silva (1982)

Y-axis: probability of use of a definite article when a NP also has an NP
possessor

Baseline constraints: rising constraint preferring this usage, over
centuries

No perturber

Periphrastic do in English

Source Kroch
(1989), ultimately from Ellegard (1953).

Y-axis: probability of employing the inserted aux do.

Baseline constraints: a preference constraint shifting over time

Perturber constraints: governing various syntactic contexts.

Spreadsheet, plotting
script

Evolution of have from Aux to main verb in English

Source Zimmermann
(2017)

Y-axis: probability of employing have syntactically
as a main verb rather than as an Aux

Baseline constraint: a Aux-preferring constraint shifting over time

Perturber constraints: governing various distinct uses of auxiliary
verbs

Right graph plots same thing in different coordinates (harmony
difference), showing identical slopes

Wug-shaped curves in semantics/pragmatics

Quantifier
scope

Source: AnderBois
et al. (2012)

Y-axis: probability subjects will
prefer narrow scope

Baseline constraint: whether the
target quantifier is in first or second position

Perturber constraint: whether the target quantifier is in subject or
object position

Spreadsheet,
plotting script

Wug-shaped curves in sound symbolism

Classification of
Pokemon character names

Source (in this case, provides full analysis and discussion from a
wug-shaped point of view): Kawahara
(2020)

Y-axis: probability subjects will rate a Pokemon name as appropriate for
an "unevolved," smaller Pokemon creature

Baseline constraints: length of name in moras

Perturber constraints: whether name includes an initial voiced obstruent
(such as [d])

Spreadsheet (forthcoming), plotting
script

Graphs used to diagnose theories

The MaxEnt sigmoid

This is discussed extensively in the main text of my paper and is plotted here to permit the comparisons that follow.

The asymmetrical sigmoid of classical Noisy Harmonic Grammar

In the classical version of Noisy Harmonic Grammar (Boersma and Pater 2016), the "noise" that makes the theory stochastic is added to the constraint weights, prior to Harmony computation. This ends up producing a sigmoid curve quite different from that of Maxent; it is asymmetrical, and the long tail can be shown to asymptote to a value above zero.

The symmetrical, oddly-similar sigmoid of late-noise Noisy Harmonic Grammar

If, in designing a Noisy Harmonic Grammar framework, you add the "noise" to the completed Harmony values of candidates, you get a sigmoid that is remarkably similar to the MaxEnt sigmoid (even though the math is completely different). Here are the MaxEnt and late-noise Noisy Harmonic Grammar sigmoids superposed, with constraint weights suitably scaled to make the resemblance clear.

Can a wug-shaped curve be fitted to any data?

Well, no; not under any sensible meaning of the word "fitted". For instance, I opened up the spreadsheet for the Massaro-Cohen experiment reported above and replaced the empirical values from their experiment with random values. What happens is that the best-fit MaxEnt weights came out very low, the scattergram of model fit emerged as cloud-shaped, and the fitted sigmoids look like this:

The slope of the sigmoid ever-so-vaguely fits an entirely random minor skewing of the data cloud, and the vertical arrangement of the sigmoids fits the randomly imposed differences in the average values for /r/ after different consonants.

So, no, good fit with wug-shaped curves doesn't come for free, it is a
contingent fact about the data. :=)

I'm sure there are better ways (for instance, R is probably good) but this was the method I arrived at on an ad hoc basis. You can see examples of how all this works if you will download the spreadsheets and plotting scripts for individual cases above.

1. ** Obtain data**. Some authors web-post their
data, other have the data printed in their article, and still others
give just a graph. Even with the latter, it is not hard to use Microsoft
Paint to get the values: look at the bottom of the screen for
vertical and horizontal coordinates of points in pixels. Hover over
the data points, and over the legend ticks, and put their values into a
spreadsheet. Then you can use arithmetic (or the handy Excel
FORECAST()
function) to convert pixels into real values.

2. If necessary, **reduce data from individual tokens to
types-plus-counts**. I do this by applying my little Typizer
program to the rows of a spreadsheet, read in plain text form, containing
just the constraint violations.

3. Do a **MaxEnt analysis** of the data, which is easily
done in spreadsheet form. The spreadsheets above show you do this;
it helps also to know the basics of MaxEnt; for which you can read my
paper. The key step requires you to deploy the Excel
Solver (which is free, but must be activated), in order to calculate
constraint weights. During this stage, you should calculate
Harmony in two columns, one for Baseline constraints and one for Perturber
constraints, then use their sum to give the overall Harmony from which
probabilities are calculated.

In doing the MaxEnt analysis, use this trick, assuming a particular input
has two candidates A and B: if Candidate B has one violation of
Constraint X, record the violation in the spreadsheet not as a 1 in the
Candidate B's row, but rather as a -1 in Candidate A's row. Then the B row
ends up blank, other than the crucial frequency value for B. The
math will come out the same, and it gives you the harmony values in
ways plottable as a single number, as described in the longer version of
my paper.

4. **Collate the data**, keeping only Candidate A for each
pair. I perform this collation with formulas in the space below the
main MaxEnt analysis. You must also collate the values for Observed
Frequency, Base Harmony, and Perturber Harmony. Optionally, you can
include data for Counts, if you'd like to plot as small the datapoints
that are not well-attested. It is also good to gather values for Predicted
Frequency; then you can make a scattergram with Observed against
Predicted, calculate correlation, and in general assess whether your
MaxEnt model is a good model.

5. Within the spreadsheet, **fill in the necessary fields to make a
plot**. These are shown in blue in the spreadsheets posted here,
and also can be seen in the downloadable plotting scripts.

6. Clip the blue material out of your spreadsheet and **save it as
a text file**, which is the plotting script.

7. Download my **PlotSigmoids.exe
**program (Windows only, sorry!), put it in a new folder of your
choice, click on it, drag a plotting script file onto the designated
blank area of the interface. It will make a bmp image and put it into the
"out" subfolder.

Questions: bhayes@humnet.ucla.edu.

Last updated July 2021