Quantitative patterns in constraint interaction: FRENCH

Introduction

This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages.

If you are viewing this in a web browser, then an .Rmd file has been “knit” into a web page that includes the results of running embedded chunks of R code.

If you are viewing this in RStudio, When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. First use the Session menu to set the working directory to Source file location.

This document accompanies one section of a manuscript by Anonymous.

In terms of reproducibility, this document falls short in many respects. A fair amount of work was done by hand or in GUIs, before we created the present script. We have done our best to document and explain this work here, but have not gone back and reimplemented everything in script form. We hope that the partial reproducibility here will be beneficial to readers wishing to test or modify our analysis.

Pipeline to the input file for this script

  1. Google n-grams files are downloaded from Google

  2. grep takes these to a set of smaller files grep -E '^CE |^Ce |^ce |^CET |^Cet |^cet ' googlebooks-fre-all-2gram-20120701-ce > ce_cet_2grams.txt
    grep -E '^BEAU |^Beau |^beau |^BEL |^Bel |^bel ' googlebooks-fre-all-2gram-20120701-be > beau_bel_2grams.txt
    grep -E '^NOUVEAU |^Nouveau |^nouveau |^NOUVEL |^Nouvel |^nouvel ' googlebooks-fre-all-2gram-20120701-no > nouveau_nouvel_2grams.txt
    grep -E '^VIEUX |^Vieux |^vieux |^VIEIL |^Vieil |^vieil ' googlebooks-fre-all-2gram-20120701-vi > vieux_vieil_2grams.txt
    grep -E '^FOU |^Fou |^fou |^FOL |^Fol |^fol ' googlebooks-fre-all-2gram-20120701-fo > fou_fol_2grams.txt
    grep -E '^MOU |^Mou |^mou |^MOL |^Mol |^mol ' googlebooks-fre-all-2gram-20120701-mo > mou_mol_2grams.txt
    grep -E '^LE |^Le |^le ' googlebooks-fre-all-2gram-20120701-le > le_2grams.txt
    grep "^[Ll]'" googlebooks-fre-all-1gram-20120701-l > l_1grams.txt
    grep "^[Ll]' " googlebooks-fre-all-2gram-20120701-l_ > l_2grams.txt
    grep -E '^LA |^La |^la ' googlebooks-fre-all-2gram-20120701-la > la_2grams.txt
    grep -E '^MA |^Ma |^ma ' googlebooks-fre-all-2gram-20120701-ma > ma_2grams.txt
    grep -E '^MON |^Mon |^mon ' googlebooks-fre-all-2gram-20120701-mo > mon_2grams.txt
    grep -E '^TA |^Ta |^ta ' googlebooks-fre-all-2gram-20120701-ta > ta_2grams.txt
    grep -E '^TON |^Ton |^ton ' googlebooks-fre-all-2gram-20120701-to > ton_2grams.txt
    grep -E '^SA |^Sa |^sa ' googlebooks-fre-all-2gram-20120701-sa > sa_2grams.txt
    grep -E '^SON |^Son |^son ' googlebooks-fre-all-2gram-20120701-so > son_2grams.txt
    grep -E '^DE |^De |^de ' googlebooks-fre-all-2gram-20120701-de > de_2grams.txt
    grep "^[Dd]'" googlebooks-fre-all-1gram-20120701-d > d_1grams.txt
    grep "^[Dd]' " googlebooks-fre-all-2gram-20120701-d_ > d_2grams.txt
    grep -E "^QUE |^Que |^que |^QU' |^Qu' |^qu' " googlebooks-fre-all-2gram-20120701-qu > que_qu_2grams.txt
    grep -E "^QUE |^Que |^que " googlebooks-fre-all-2gram-20120701-qu > que_2grams.txt
    grep -E "^QU'|^Qu'|^qu'" googlebooks-fre-all-1gram-20120701-q > qu_1grams.txt
    grep -E '^NE |^Ne |^ne ' googlebooks-fre-all-2gram-20120701-ne > ne_2grams.txt
    grep "^[Nn]'" googlebooks-fre-all-1gram-20120701-n > n_1grams.txt
    grep "^[Nn]' " googlebooks-fre-all-2gram-20120701-n_ > n_2grams.txt
    grep -E '^SE |^Se |^se ' googlebooks-fre-all-2gram-20120701-se > se_2grams.txt
    grep "^[Ss]'" googlebooks-fre-all-1gram-20120701-s > s_1grams.txt
    grep "^[Ss]' " googlebooks-fre-all-2gram-20120701-s_ > s_2grams.txt
    grep -E '^JE |^Je |^je ' googlebooks-fre-all-2gram-20120701-je > je_2grams.txt
    grep "^[Jj]'" googlebooks-fre-all-1gram-20120701-j > j_1grams.txt
    grep "^[Jj]' " googlebooks-fre-all-2gram-20120701-j_ > j_2grams.txt
    grep "^[Cc]'" googlebooks-fre-all-1gram-20120701-c > c_1grams.txt
    grep "^[Cc]' " googlebooks-fre-all-2gram-20120701-c_ > c_2grams.txt
    grep -E '^ME |^Me |^me ' googlebooks-fre-all-2gram-20120701-me > me_2grams.txt
    grep "^[Mm]'" googlebooks-fre-all-1gram-20120701-m > m_1grams.txt
    grep "^[Mm]' " googlebooks-fre-all-2gram-20120701-m_ > m_2grams.txt
    grep -E '^TE |^Te |^te ' googlebooks-fre-all-2gram-20120701-te > te_2grams.txt
    grep "^[Tt]'" googlebooks-fre-all-1gram-20120701-t > t_1grams.txt
    grep "^[Tt]' " googlebooks-fre-all-2gram-20120701-t_ > t_2grams.txt grep -E '^DU |^Du |^du ' googlebooks-fre-all-2gram-20120701-du > du_2grams.txt grep -E "^DE L' |^De l' |^de l' " googlebooks-fre-all-3gram-20120701-de > del_3grams.txt
    grep -E "^DE L'|^De l'|^de l'" googlebooks-fre-all-2gram-20120701-de > del_2grams.txt
    grep -E '^AU |^Au |^au ' googlebooks-fre-all-2gram-20120701-au > au_2grams.txt grep -E "^? L' |^A L' |^? l' |^A l' |^? l' " googlebooks-fre-all-3gram-20120701-a_ > al_3grams.txt
    grep -E "^? L'|^A L'|^? l'|^A l'|^? l' " googlebooks-fre-all-2gram-20120701-a_ > al_2grams.txt
    grep -E '^EN |^En |^en ' googlebooks-fre-all-2gram-20120701-en > en_2grams.txt

  3. compile_ngrams3.py takes all of these files, in directory NGramExtracts, to collated_ALL.txt

  4. classify_phonology3.py takes collated_ALL.txt to collated_ALL_phon_and_morph.txt

  5. Partly manually, items are extracted from collated_ALL iff they meet all 3 criteria:

  • part of speech (POS), gender, and pronunciation information were available through dictionary look-up

  • Word1 is suitable for Word2’s part of speech (POS) and gender. For example, “le_l” is not allowed with feminines. Results are put in collated_SELECTED.txt

  • Word2 is in the region of interest: it begins with letter ‘h’, or with a glide sound, or occurs in lists of other aspirated words (‘uhlan’, ‘ululement’, etc.)

  1. collate_years2.py takes collated_SELECTED.txt to table_1900_2010_Min_10_wordMin_20_SELECT3.txt. This means that results are used from them years 1900 to 2010 only, that the word1+word2 combination must have a frequency of at least 10 during that period, and word2, in all combinations, must have a frequency of at least 20 during that period.

  2. The following word2s were excluded because of too many false hits (poor OCR), or too many in wrong language, or different meanings:

  • horizontale
  • hure
  • hie
  • Hauteville
  • oie
  • hombre
  • hast
  • h`ere
  • horion
  • hidalgo
  • ion
  • Io
  • le/l’, la/l’ for most person names (pragmatically unlikely, and danger of being actually L. Lastname)

…and the remainder was converted into a version with one row per word1-word2 combination (instead of one row per word1, with separate column for each word1), with apostrophes removed: table_1900_2010_Min_10_wordMin_20_SELECT3_inRows_noApostrophe.csv.

  1. The following columns were added: initial vowel letter, sound, and coarsely-classified sound of word2, google n-grams token frequency for voculent and consulent versions of word1+word2 combination, frequencies looked up in Lexique (lemma film/book, form file/book), random intercept for Word2 that was obtained in SAS beta regression (see below). Result is the input file for this script, table_1900_2010_Min_10_wordMin_20_SELECT3_inRows_Oct_2013.csv.

Prepare for analysis and plotting

set.seed(1234) #just in case we do anything with random numbers
require(lme4) || install.packages("lme4") 
## Loading required package: lme4
## Loading required package: Matrix
## Loading required package: Rcpp
## [1] TRUE
require(lme4)
require(car) || install.packages("car") 
## Loading required package: car
## [1] TRUE
require(car)
require(multcomp) || install.packages("multcomp") 
## Loading required package: multcomp
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: splines
## Loading required package: TH.data
## [1] TRUE
require(multcomp)
require(stringr) || install.packages("stringr") 
## Loading required package: stringr
## [1] TRUE
require(stringr)
require(plotrix) || install.packages("plotrix") 
## Loading required package: plotrix
## Warning: package 'plotrix' was built under R version 3.1.3
## [1] TRUE
require(plotrix)

#set font family for plots
myFontFamily="serif" 
par(font=list(family=myFontFamily)) #will work for most plots

#how much to increase resolution
myResMultiplier <- 5 #default is 72 ppi; using this in every call to png() will make it 360

Read in data

Read in and inspect data, processed from Google NGrams:

french <- read.table("table_1900_2010_Min_10_wordMin_20_SELECT3_inRows_Oct_2013.csv", 
  header=TRUE, sep=",")
head(french)
##      word2 w2_begins_with w2_phon wd2_morph  word1 word1.different.format
## 1   habeas              V    h__V         M   le_l                   le/l
## 2   habeas              V    h__V         M du_del                 du/del
## 3   habeas              V    h__V         M  au_àl                  au/àl
## 4   habeas              V    h__V         M ce_cet                 ce/cet
## 5   habeas              V    h__V         M   de_d                   de/d
## 6 habileté              V    h__V         F   la_l                   la/l
##   concatenation voculence vowel_letter vowel_sound vowel_sound_coarse
## 1   le/l+habeas    0.9770            a           a                  a
## 2 du/del+habeas    0.9539            a           a                  a
## 3  au/àl+habeas    1.0000            a           a                  a
## 4 ce/cet+habeas    1.0000            a           a                  a
## 5   de/d+habeas    0.9807            a           a                  a
## 6 la/l+habileté    1.0000            a           a                  a
##   phrase_V_count phrase_C_count phrase_frequency lemma_freq_film
## 1           1741             41             1782              NA
## 2           1201             58             1259              NA
## 3            179              0              179              NA
## 4             42              0               42              NA
## 5           7630            150             7780              NA
## 6         223505              0           223505            2.05
##   lemma_freq_books form_freq_film form_freq_books random_intercept_SAS
## 1               NA             NA              NA               0.4348
## 2               NA             NA              NA               0.4348
## 3               NA             NA              NA               0.4348
## 4               NA             NA              NA               0.4348
## 5               NA             NA              NA               0.4348
## 6            10.88           2.03           10.54               0.3725
str(french)
## 'data.frame':    1741 obs. of  19 variables:
##  $ word2                 : Factor w/ 358 levels "habeas","habileté",..: 1 1 1 1 1 2 2 2 2 2 ...
##  $ w2_begins_with        : Factor w/ 3 levels "G","V","V~G": 2 2 2 2 2 2 2 2 2 2 ...
##  $ w2_phon               : Factor w/ 24 levels "h__aspj","h__aspV",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ wd2_morph             : Factor w/ 18 levels "F","M","M_Int",..: 2 2 2 2 2 1 1 1 1 1 ...
##  $ word1                 : Factor w/ 20 levels "au_àl","beau_bel",..: 9 5 1 3 4 8 10 15 16 18 ...
##  $ word1.different.format: Factor w/ 20 levels "au/àl","beau/bel",..: 9 5 1 3 4 8 10 15 16 18 ...
##  $ concatenation         : Factor w/ 1741 levels "au/àl+habeas",..: 973 661 1 197 331 866 1154 1338 1510 1627 ...
##  $ voculence             : num  0.977 0.954 1 1 0.981 ...
##  $ vowel_letter          : Factor w/ 7 levels "a","C","e","i",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ vowel_sound           : Factor w/ 16 levels "a","A","a_nas",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ vowel_sound_coarse    : Factor w/ 6 levels "a","e","i","o",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ phrase_V_count        : int  1741 1201 179 42 7630 223505 1312 363 64712 384 ...
##  $ phrase_C_count        : int  41 58 0 0 150 0 0 0 0 0 ...
##  $ phrase_frequency      : int  1782 1259 179 42 7780 223505 1312 363 64712 384 ...
##  $ lemma_freq_film       : num  NA NA NA NA NA 2.05 2.05 2.05 2.05 2.05 ...
##  $ lemma_freq_books      : num  NA NA NA NA NA ...
##  $ form_freq_film        : num  NA NA NA NA NA 2.03 2.03 2.03 2.03 2.03 ...
##  $ form_freq_books       : num  NA NA NA NA NA ...
##  $ random_intercept_SAS  : num  0.435 0.435 0.435 0.435 0.435 ...

Descriptive statistics

What is the range of Word2 overall (raw, unweighted) voculences

sort(tapply(french$voculence, french$word2, FUN=mean))
##        hachette         haddock           hadji         hallier 
##       0.000e+00       0.000e+00       0.000e+00       0.000e+00 
##          hamada          hammam           harle          harmel 
##       0.000e+00       0.000e+00       0.000e+00       0.000e+00 
##           harpe            hart          hennin        hennuyer 
##       0.000e+00       0.000e+00       0.000e+00       0.000e+00 
##       hérissant           hertz           hesse             hlm 
##       0.000e+00       0.000e+00       0.000e+00       0.000e+00 
##           hotte       hottentot          huchet           huron 
##       0.000e+00       0.000e+00       0.000e+00       0.000e+00 
##        huronien       hurricane            yang            yoga 
##       0.000e+00       0.000e+00       0.000e+00       0.000e+00 
##           haine           yacht         houille          héraut 
##       1.708e-05       4.624e-05       7.214e-05       1.526e-04 
##           hâter           haute           honte        honteuse 
##       2.859e-04       3.178e-04       3.373e-04       3.512e-04 
##        hongrois         hauteur           Hesse           hêtre 
##       5.704e-04       7.137e-04       7.417e-04       7.467e-04 
##         houblon           hache         hérisse            hile 
##       8.767e-04       1.061e-03       1.133e-03       1.284e-03 
##          heaume            hâte           halte         haubert 
##       1.291e-03       1.350e-03       1.434e-03       1.605e-03 
##           héron           havre          Yunnan           henné 
##       1.717e-03       1.837e-03       1.877e-03       1.925e-03 
##            huit            yard            haïr      halètement 
##       2.016e-03       2.399e-03       2.489e-03       2.967e-03 
##            yole         Hongrie     harcèlement        Yokohama 
##       2.986e-03       3.037e-03       3.105e-03       3.287e-03 
##            hall      hiérarchie         haoussa         hamster 
##       3.596e-03       3.601e-03       4.528e-03       4.669e-03 
##            hait     hollandaise    hiérarchiser         heurter 
##       4.802e-03       5.108e-03       5.222e-03       5.244e-03 
##           halle       hardiesse hiérarchisation            halo 
##       6.648e-03       7.218e-03       8.010e-03       8.036e-03 
##        harceler          hurler            hase         Hainaut 
##       8.175e-03       8.488e-03       8.651e-03       8.710e-03 
##         hasarde       hiérarque         honteux         haricot 
##       9.027e-03       9.396e-03       1.010e-02       1.015e-02 
##        hanneton            haro        hasarder          hameau 
##       1.121e-02       1.189e-02       1.292e-02       1.297e-02 
##          hideux            haut         harnois      hussitisme 
##       1.406e-02       1.425e-02       1.456e-02       1.530e-02 
##          hasard           héros         yatagan         hideuse 
##       1.715e-02       1.842e-02       1.851e-02       1.967e-02 
##        hérisson       hermandad      hallebarde          hernie 
##       2.026e-02       2.217e-02       2.285e-02       2.302e-02 
##       huguenote         hussard          hangar           hibou 
##       2.500e-02       2.514e-02       2.754e-02       2.805e-02 
##          hazard           heurt            yack            haie 
##       2.844e-02       2.860e-02       2.954e-02       2.956e-02 
##           haste        hérisser       hérissent            onze 
##       3.004e-02       3.094e-02       3.302e-02       3.511e-02 
##        Hambourg        hobereau          honnir       haranguer 
##       4.087e-02       4.229e-02       4.259e-02       4.323e-02 
##            hune          ouolof        Haguenau      hautboïste 
##       4.346e-02       4.365e-02       4.762e-02       4.803e-02 
##           humer      houspiller        Houssaye     harassement 
##       4.913e-02       5.313e-02       5.352e-02       5.392e-02 
##        harasser      hornblende        huguenot      hollandais 
##       5.534e-02       5.649e-02       5.968e-02       6.877e-02 
##          hausse           Hulot       hasardeux          hareng 
##       7.263e-02       7.560e-02       8.246e-02       8.435e-02 
##             yen        ouistiti            haïs    huguenotisme 
##       8.442e-02       1.024e-01       1.035e-01       1.048e-01 
##           Yémen       haridelle            Huon     hiérophanie 
##       1.056e-01       1.155e-01       1.196e-01       1.216e-01 
##          Huguet        hollande        handicap           hardi 
##       1.225e-01       1.229e-01       1.311e-01       1.429e-01 
##       harnacher      hasardeuse        Hokkaïdo           Hanoï 
##       1.527e-01       1.688e-01       1.735e-01       2.144e-01 
##    harnachement            huis      hiératisme           henry 
##       2.206e-01       2.652e-01       2.732e-01       2.800e-01 
##       handicape            home          Hugues     hyaloplasme 
##       2.829e-01       3.021e-01       3.139e-01       3.182e-01 
##          hiatus        Hautmont             ouï           Ouadi 
##       3.939e-01       4.012e-01       4.101e-01       4.231e-01 
##      hiérogamie           Henry          Hubert           hyène 
##       4.505e-01       4.587e-01       4.614e-01       4.756e-01 
##          huiler    hallstattien         Herbert           hindi 
##       4.970e-01       5.000e-01       5.113e-01       5.282e-01 
##           oille     hiéroglyphe            iota           Henri 
##       5.368e-01       5.759e-01       5.830e-01       6.012e-01 
##           Hervé         hyalite         huilage            oint 
##       6.128e-01       6.302e-01       6.388e-01       6.571e-01 
##            iule           ouadi          Harmel        Huguette 
##       6.629e-01       6.766e-01       7.094e-01       7.185e-01 
##             oye          habité             oil            hoir 
##       7.290e-01       7.679e-01       7.755e-01       7.789e-01 
##        hospodar          Hudson            oing         heureux 
##       7.806e-01       7.808e-01       7.815e-01       7.821e-01 
##        hacienda        hippisme   hiérogrammate          Hathor 
##       7.864e-01       7.872e-01       7.879e-01       7.898e-01 
##     hiérophante            Héra          humain     hauterivien 
##       7.979e-01       8.174e-01       8.207e-01       8.250e-01 
##         hickory            Iéna          Hécate           Hadès 
##       8.332e-01       8.429e-01       8.435e-01       8.534e-01 
##           iambe       Hauterive        hégélien         Hérault 
##       8.635e-01       8.686e-01       8.705e-01       8.778e-01 
##          hetman         honneur           huile       haïtienne 
##       8.840e-01       8.855e-01       8.856e-01       8.859e-01 
##           herbe    hégélianisme         huileux           habit 
##       8.888e-01       8.906e-01       8.913e-01       8.943e-01 
##          habita           homme      Hispaniola         holmium 
##       9.035e-01       9.036e-01       9.040e-01       9.071e-01 
##           heure       hydrogène          oindre         haïtien 
##       9.139e-01       9.195e-01       9.204e-01       9.224e-01 
##         Hermite       hyacinthe        humidité         hameçon 
##       9.292e-01       9.354e-01       9.372e-01       9.373e-01 
##        Iénisséi     Hauterivien           hyper       Henriette 
##       9.393e-01       9.394e-01       9.454e-01       9.458e-01 
##           hapax         hallali            heur           Horus 
##       9.464e-01       9.487e-01       9.517e-01       9.522e-01 
##        humiliât            Iowa        Himalaya      hinterland 
##       9.570e-01       9.571e-01       9.611e-01       9.627e-01 
##       honnêteté          Horace        habituel        hérédité 
##       9.642e-01       9.670e-01       9.687e-01       9.698e-01 
##         habitus           Utica           hôtel          oiseau 
##       9.707e-01       9.725e-01       9.725e-01       9.734e-01 
##          Hector         hermite         hôpital       habiliter 
##       9.743e-01       9.744e-01       9.752e-01       9.767e-01 
##       Héraclite           Hygie     homographie            ouïr 
##       9.783e-01       9.784e-01       9.788e-01       9.801e-01 
##         habiter         humaine     Herzégovine          habeas 
##       9.805e-01       9.806e-01       9.818e-01       9.823e-01 
##        humanité          Hélène           hiver         haltère 
##       9.825e-01       9.829e-01       9.835e-01       9.837e-01 
##           yeuse          Hérode         Hadrien      hitlérisme 
##       9.849e-01       9.851e-01       9.860e-01       9.861e-01 
##      habitation        Hypérion           Yonne        habitude 
##       9.870e-01       9.873e-01       9.876e-01       9.877e-01 
##         hosanna         hommage         hectare       humanisme 
##       9.878e-01       9.881e-01       9.891e-01       9.892e-01 
##        histoire        Hérodote      ionosphère      habituelle 
##       9.896e-01       9.900e-01       9.900e-01       9.901e-01 
##        habilité         humilia        Honorine         horreur 
##       9.901e-01       9.902e-01       9.903e-01       9.908e-01 
##          hoirie        Héraclès            oued        habitant 
##       9.910e-01       9.922e-01       9.924e-01       9.926e-01 
##          iodure      Hippocrate       harmonica        harmonie 
##       9.927e-01       9.930e-01       9.931e-01       9.941e-01 
##           Ionie      hectolitre         hérésie       harmattan 
##       9.943e-01       9.943e-01       9.944e-01       9.951e-01 
##        heureuse        héritage       hypothèse      ionisation 
##       9.953e-01       9.960e-01       9.962e-01       9.967e-01 
##           hydro         haleine      hypothèque         honorer 
##       9.969e-01       9.974e-01       9.974e-01       9.977e-01 
##            Oise          iodate            iode        héritier 
##       9.979e-01       9.979e-01       9.980e-01       9.984e-01 
##        huilerie         Héloïse       iodoforme       historien 
##       9.985e-01       9.985e-01       9.986e-01       9.987e-01 
##          habite           humus      hydrologie        hématite 
##       9.992e-01       9.992e-01       9.993e-01       9.993e-01 
##         humilie          herpès    hagiographie        humilier 
##       9.994e-01       9.994e-01       9.994e-01       9.996e-01 
##          oignon        habitent          hélice       hydrolyse 
##       9.996e-01       9.997e-01       9.997e-01       9.998e-01 
##         hésiter    hypertrophie       hégémonie     hémoglobine 
##       9.998e-01       9.998e-01       9.998e-01       9.998e-01 
##         hospice          hélium         horloge     habillement 
##       9.998e-01       9.998e-01       9.998e-01       9.999e-01 
##         hydrate         habitat        humilité     hospitalité 
##       9.999e-01       9.999e-01       9.999e-01       9.999e-01 
##        héroïsme          humeur     hébergement         horizon 
##       9.999e-01       9.999e-01       9.999e-01       9.999e-01 
##          humour       hostilité         héroïne        habileté 
##       9.999e-01       1.000e+00       1.000e+00       1.000e+00 
##         hygiène          hésite        harpagon           hebdo 
##       1.000e+00       1.000e+00       1.000e+00       1.000e+00 
##           hélio          hélion           hélix        herbette 
##       1.000e+00       1.000e+00       1.000e+00       1.000e+00 
##         hercule          hermès         hermine        hibiscus 
##       1.000e+00       1.000e+00       1.000e+00       1.000e+00 
##          hindou       hipparque           hippo      hirondelle 
##       1.000e+00       1.000e+00       1.000e+00       1.000e+00 
##      histologie      holocauste         homélie         honorée 
##       1.000e+00       1.000e+00       1.000e+00       1.000e+00 
##       hortensia    horticulture         huilier     humiliation 
##       1.000e+00       1.000e+00       1.000e+00       1.000e+00 
##    hydrographie           oison 
##       1.000e+00       1.000e+00
dim(tapply(french$voculence, french$word2, FUN=mean)) #how many distinct Word2s
## [1] 358

Histograms of voculence for word1+word2 combination, and for each word2 averaging over all word1s it occurs with:

hist(french$voculence, col="grey",xlab="rate of unasp. behavior", main="word1+word2 combinations", breaks=20)

plot of chunk histogramVoculences

hist(tapply(french$voculence, french$word2, FUN=mean), col="grey", xlab="rate of non-alignment", ylab="number of Word2s", main="Word2s, averaging over Word1s they occur with", breaks=20)

plot of chunk histogramVoculences

#Since this second one will appear in the paper, also do a PNG file:
png(file="histo_Word2_voculence.png",width=myResMultiplier*460,height=myResMultiplier*300, res=myResMultiplier*72, family=myFontFamily)
par(mar=c(4,4,3,1)+0.1)
hist(tapply(french$voculence, french$word2, FUN=mean), col="grey", xlab="rate of non-alignment", ylab="number of Word2s", main="", breaks=20)
dev.off()
## pdf 
##   2

Regression model

A regression, with “voculence” (rate of behaving as though Word2 is vowel-initial; or, as we call it in the paper, non-alignancy) as dependent variable. Uses binomial family, because voculence has a U-shaped distribution. This does produce a warning message about non-integer successes.

Some independent variables were eliminated because they made little contribution (vowel quality, V vs. glide). Result is that any idiosyncrasy of Word2 is taken care of in its random intercept value, not attributed to more general properties of Word2, such as beginning with a vowel vs. a glide.

my_formula <- "voculence ~ word1 + (1|word2) + log(phrase_frequency)"
my_data <- subset(french, french$word1 != "fou_fol" &
        french$word1 != "mou_mol" &
            french$word1 != "je_j" &
            french$word1 != "te_t" &
            french$word1 != "me_m" &
            french$word1 != "ne_n" &
            french$word1 != "se_s" &
            french$word1 != "ce_cet" &
            french$word1 != "que_qu" &
            french$word1 != "ta_ton" &
            french$word1 != "nouveau_nouvel" &
            french$word1 != "sa_son")

french1.glmer.type <- glmer(my_formula, data=my_data, family="binomial")
## Warning: non-integer #successes in a binomial glm!
## Warning: Model failed to converge with max|grad| = 0.34544 (tol = 0.001, component 3)
#try to improve convergence

french2.glmer.type <- update(french1.glmer.type,start=getME(french1.glmer.type,c("theta","fixef")))
## Warning: non-integer #successes in a binomial glm!
## Warning: Model failed to converge with max|grad| = 0.0207578 (tol = 0.001, component 8)
french3.glmer.type <- update(french2.glmer.type,start=getME(french2.glmer.type,c("theta","fixef")))
## Warning: non-integer #successes in a binomial glm!
## Warning: Model failed to converge with max|grad| = 0.0170359 (tol = 0.001, component 7)
french4.glmer.type <- update(french3.glmer.type,start=getME(french3.glmer.type,c("theta","fixef")))
## Warning: non-integer #successes in a binomial glm!
## Warning: Model failed to converge with max|grad| = 0.00695753 (tol = 0.001, component 2)
#that's about as good as it's going to get

summary(french4.glmer.type)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: voculence ~ word1 + (1 | word2) + log(phrase_frequency)
##    Data: my_data
## 
##      AIC      BIC   logLik deviance df.resid 
##    925.2    975.4   -452.6    905.2     1108 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -4.362 -0.223  0.059  0.172  1.320 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  word2  (Intercept) 17.6     4.2     
## Number of obs: 1118, groups:  word2, 345
## 
## Fixed effects:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -2.3202     0.7297   -3.18   0.0015 ** 
## word1beau_bel           2.4568     0.9309    2.64   0.0083 ** 
## word1de_d               0.3090     0.4898    0.63   0.5282    
## word1du_del            -0.4875     0.4640   -1.05   0.2934    
## word1la_l              -0.1634     0.7021   -0.23   0.8159    
## word1le_l              -0.3680     0.5064   -0.73   0.4674    
## word1ma_mon             2.3977     0.8118    2.95   0.0031 ** 
## word1vieux_vieil        1.7650     0.7627    2.31   0.0207 *  
## log(phrase_frequency)   0.4122     0.0925    4.46  8.3e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) wrd1b_ word1d_d wrd1d_dl word1la_l word1le_l wrd1m_
## word1bea_bl -0.422                                                    
## word1de_d   -0.127  0.140                                             
## word1du_del -0.172  0.182  0.592                                      
## word1la_l   -0.015  0.045  0.687    0.448                             
## word1le_l   -0.052  0.119  0.619    0.573    0.486                    
## word1ma_mon -0.454  0.237  0.408    0.260    0.397     0.226          
## word1vix_vl -0.437  0.333  0.214    0.252    0.102     0.187     0.268
## lg(phrs_fr) -0.776  0.338 -0.342   -0.223   -0.395    -0.348     0.230
##             wrd1v_
## word1bea_bl       
## word1de_d         
## word1du_del       
## word1la_l         
## word1le_l         
## word1ma_mon       
## word1vix_vl       
## lg(phrs_fr)  0.306

Compare to result of beta regression in SAS (why not in R? R’s betareg package doesn’t seem to allow random effects)

Effect Estimate Standard Error DF t Value Pr > abs(t)
Intercept 0.5337 0.03140 344 16.99 <.0001
word1 = au_àl -0.08175 0.01965 765 -4.16 <.0001
word1 = beau_bel 0.007171 0.02419 765 0.30 0.7670
word1 = de_d -0.06596 0.02220 765 -2.97 0.0031
word1 = du_del -0.1114 0.02106 765 -5.29 <.0001
word1 = la_l -0.08278 0.02794 765 -2.96 0.0031
word1 = le_l -0.1016 0.02280 765 -4.46 <.0001
word1 = ma_mon 0.000791 0.02585 765 0.03 0.9756
word1 = vieux_vieil 0 . . . . (reference level)
logphrase 0.01291 0.002958 765 4.36 <.0001

Check some model properties:

#Test normality of residuals
plot(density(resid(french4.glmer.type))) #A density plot of residuals--bimodal, not surprisingly

plot of chunk modelValidation

qqnorm(resid(french4.glmer.type)) # A quantile normal plot
qqline(resid(french4.glmer.type)) # Points fall pretty close to line--looks largely OK to me

plot of chunk modelValidation

#Test homogeneity
plot(french4.glmer.type) #ideally should be uniform distribution in both dimensions.

plot of chunk modelValidation

  #Except for two extreme outliers, looks...OK?

#Check for independence of residuals and factors
plot(french4.glmer.type@frame$word1,resid(french4.glmer.type)) #residuals should look similar for each Word1

plot of chunk modelValidation

plot(french4.glmer.type@frame[,4],resid(french4.glmer.type), xlab="log of phrase frequency") #residuals should look similar across frequencies

plot of chunk modelValidation

Extract random intercepts

For comparison with the random intercepts from the beta regression, add random intercepts from model above as a column of dataset.

for (i in 1:length(french$word2)) {
  french$random_intercept_R[i] <- ranef(french4.glmer.type)$word2[as.character(french$word2[i]),]
}

Compare random intercepts from the two models–although the relationship is not linear (to be expected), it is just about monotonic.

plot(french$random_intercept_R, french$random_intercept_SAS)

plot of chunk compareRandomInterceptsLogisticVsBeta

List all random intercepts from the logistic model:

ranef(french4.glmer.type)
## $word2
##                 (Intercept)
## habeas              2.47582
## habileté            1.05627
## habilité            1.59589
## habiliter           1.13797
## habillement         2.24807
## habit               1.73489
## habitant            2.07797
## habitat             1.88350
## habitation          0.90139
## habité              2.68956
## habiter             0.52921
## habitude            0.73261
## habituel            2.42502
## habituelle          1.64925
## habitus             2.20784
## hache              -4.86124
## hachette           -3.86189
## hacienda           -0.44610
## haddock            -2.72921
## Hadès               1.04652
## hadji              -3.05231
## Hadrien             0.63323
## hagiographie        1.47891
## Haguenau           -3.04429
## haie               -4.16121
## Hainaut            -4.38506
## haine              -5.54273
## haïr               -3.36261
## haïtien             1.88941
## haïtienne           1.46805
## haleine             1.27459
## halètement         -3.34442
## hall               -4.50154
## hallali             1.90177
## halle              -3.66867
## hallebarde         -3.57084
## hallier            -3.02356
## hallstattien        0.96437
## halo               -4.19705
## halte              -4.20817
## haltère             2.66244
## hamada             -2.94097
## Hambourg           -3.81114
## hameau             -4.71389
## hameçon             1.47962
## hammam             -3.66294
## hamster            -3.53569
## handicap           -4.08589
## hangar             -4.36808
## hanneton           -3.18932
## Hanoï              -2.35755
## haoussa            -3.05605
## hapax               2.10377
## haranguer          -2.19219
## harassement        -2.02283
## harasser           -1.39435
## harcèlement        -4.23142
## harceler           -3.09722
## hardi              -3.57921
## hardiesse          -4.76329
## hareng             -3.08212
## haricot            -3.83309
## haridelle          -1.84517
## harle              -1.95209
## harmattan           2.58251
## harmel             -2.47434
## Harmel              0.14189
## harmonica           2.58131
## harmonie            1.05649
## harnachement       -3.24035
## harnacher          -1.10939
## harnois            -3.19059
## haro               -2.91667
## harpagon            3.01157
## harpe              -4.59506
## hart               -2.88297
## hasard             -5.41993
## hasarder           -2.88028
## hasardeuse         -2.56223
## hasardeux          -2.73530
## hase               -3.56290
## haste              -2.55517
## hâte               -4.93658
## hâter              -3.78325
## Hathor              0.49741
## haubert            -3.30092
## hausse             -3.82606
## haut               -5.34782
## hautboïste         -1.76383
## haute              -5.83745
## Hauterive           0.17030
## hauterivien         2.13328
## Hauterivien         1.81309
## hauteur            -5.56961
## Hautmont           -0.73204
## havre              -4.39937
## hazard             -3.09751
## heaume             -3.34039
## hebdo               3.17355
## hébergement         2.00879
## Hécate              0.95723
## hectare             1.63661
## hectolitre          2.14654
## Hector              3.13568
## hégélianisme        0.53038
## hégélien            1.85854
## hégémonie           0.95709
## Hélène              0.62641
## hélice              1.39762
## hélio               2.48481
## hélion              3.05228
## hélium              2.02232
## hélix               3.14379
## Héloïse             1.38388
## hématite            1.43941
## hémoglobine         1.13211
## henné              -3.87635
## hennin             -3.03772
## hennuyer           -1.41976
## Henri              -1.12030
## Henriette           0.41294
## henry              -0.77530
## Henry              -1.16198
## Héra                0.12930
## Héraclès            2.84212
## Héraclite           2.88475
## Hérault             1.12950
## héraut             -3.80378
## herbe               1.10276
## Herbert            -1.74527
## herbette            2.62594
## hercule             3.27467
## hérédité            1.11775
## hérésie             1.28224
## hérissant          -1.33410
## hérisser           -1.85024
## hérisson           -3.29645
## héritage            1.63160
## héritier            2.12922
## hermandad          -2.35393
## hermès              3.38805
## hermine             1.79014
## hermite             2.89549
## Hermite             2.67165
## hernie             -4.30203
## Hérode              2.88032
## Hérodote            3.37763
## héroïne             1.16304
## héroïsme            2.12074
## héron              -3.84958
## héros              -5.31925
## herpès              2.50005
## hertz              -2.80848
## Hervé              -1.59697
## Herzégovine         1.20069
## hésiter             0.75554
## hesse              -1.70834
## Hesse              -4.14403
## hetman              1.09885
## hêtre              -4.80175
## heur                2.07112
## heure               0.65967
## heureuse            1.23628
## heureux             1.88588
## heurt              -3.41871
## heurter            -3.48448
## hiatus             -1.38486
## hibiscus            3.10429
## hibou              -3.74010
## hickory             1.53526
## hideuse            -2.94451
## hideux             -3.38643
## hiérarchie         -5.09399
## hiérarchisation    -3.95129
## hiérarchiser       -3.29760
## hiérarque          -2.75761
## hiératisme         -1.21999
## hiérogamie         -0.58118
## hiéroglyphe        -0.29311
## hiérogrammate       2.26696
## hiérophanie        -1.71916
## hiérophante         1.08109
## hile               -3.69882
## Himalaya            1.17470
## hindi              -0.39295
## hindou              3.06525
## hinterland          1.80240
## hipparque           3.26132
## hippisme            1.57394
## hippo               3.35943
## Hippocrate          3.29066
## hirondelle          1.68149
## Hispaniola          1.01269
## histoire            0.44521
## histologie          1.28126
## historien           1.70426
## hitlérisme          2.09908
## hiver               1.76130
## hlm                -1.99518
## hobereau           -3.19252
## hoir                1.40070
## hoirie              1.66914
## Hokkaïdo           -1.62560
## hollandais         -3.13114
## hollandaise        -2.69320
## hollande           -1.96172
## holmium             2.28608
## holocauste          2.32113
## home               -1.34371
## homélie             1.83412
## hommage             1.84298
## homme               0.81552
## homographie         1.71201
## Hongrie            -4.78689
## hongrois           -3.93742
## honnêteté           1.12021
## honneur             1.54025
## honnir             -1.90284
## honorée             3.04619
## honorer             0.57890
## Honorine            1.11301
## honte              -5.39774
## honteuse           -3.93796
## honteux            -3.56338
## hôpital             1.41554
## Horace              2.52649
## horizon             1.81161
## horloge             1.25485
## hornblende         -3.23706
## horreur             0.94948
## hortensia           3.20895
## horticulture        1.13220
## Horus               2.55210
## hosanna             3.45944
## hospice             2.07655
## hospitalité         1.18800
## hospodar            0.69430
## hostilité           0.98979
## hôtel               1.38358
## hotte              -4.23187
## hottentot          -2.32222
## houblon            -4.13273
## houille            -4.79077
## houspiller         -1.65355
## Houssaye           -2.67500
## Hubert             -1.60305
## huchet             -2.34764
## Hudson              0.93888
## huguenot           -2.81736
## huguenote          -2.19977
## huguenotisme       -0.71881
## Hugues             -1.90992
## Huguet             -1.72258
## Huguette           -0.12896
## huilage             0.82283
## huile               0.94620
## huiler              0.04690
## huilerie            1.50299
## huileux             2.21631
## huilier             3.44064
## huis               -1.68718
## huit               -4.72459
## Hulot              -2.05556
## humain             -0.66271
## humaine             1.13064
## humanisme           1.89102
## humanité            0.79115
## humer              -2.23513
## humeur              0.96512
## humidité            0.78035
## humiliation         1.18865
## humilier            0.83805
## humilité            1.12184
## humour              2.03857
## humus               2.17523
## hune               -3.30803
## Huon               -1.99026
## hurler             -3.20283
## huron              -2.65468
## huronien           -1.25394
## hurricane          -1.35586
## hussard            -3.85179
## hussitisme         -2.80258
## hyacinthe           1.32925
## hyalite             0.81747
## hyaloplasme        -0.97974
## hydrate             2.27583
## hydro               2.58822
## hydrogène           1.64377
## hydrographie        1.35932
## hydrologie          1.29932
## hydrolyse           1.18153
## hyène              -0.53442
## Hygie               1.29489
## hygiène             1.03058
## hyper               2.32579
## Hypérion            2.60070
## hypertrophie        1.18647
## hypothèque          1.35583
## hypothèse           0.81000
## iambe               1.83185
## Iéna                1.80603
## Iénisséi            2.34164
## iodate              2.95709
## iode                1.76400
## iodoforme           2.59348
## iodure              1.84038
## Ionie               1.51108
## ionisation          1.02853
## ionosphère          1.92435
## iota                0.10705
## Iowa                2.11501
## iule                1.17399
## oignon              2.42514
## oil                 2.45779
## oille               0.46094
## oindre              1.10373
## oing                1.78488
## oint               -0.08364
## Oise                1.06746
## oiseau              1.83992
## oison               3.35400
## onze               -3.81962
## ouadi               0.41779
## Ouadi              -0.90688
## oued                1.80133
## ouï                -0.48206
## ouïr                0.91302
## ouistiti           -1.45823
## ouolof             -1.94655
## oye                 1.98197
## Utica               2.09973
## yacht              -4.49306
## yack               -2.58996
## yang               -3.51015
## yard               -3.12323
## yatagan            -2.75690
## Yémen              -3.03316
## yen                -4.19899
## yeuse               1.92918
## yoga               -3.97894
## Yokohama           -3.39922
## yole               -3.53277
## Yonne               1.22757
## Yunnan             -4.06532

Histogram of random intercepts from logistic model:

hist(ranef(french4.glmer.type)$word2[,1], main="histogram of random intercepts", xlab="random intercept value",col="grey")

plot of chunk histogramRandomIntercepts

Which Word1s are significantly different?

Take a look at the results–which levels of Word1 are significantly different?

Anova(french4.glmer.type)
## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: voculence
##                       Chisq Df Pr(>Chisq)    
## word1                  21.1  7     0.0036 ** 
## log(phrase_frequency)  19.9  1    8.3e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
french4.glmer.type.glht <- glht(french4.glmer.type, linfct = mcp(word1 = "Tukey"))
summary(french4.glmer.type.glht)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: glmer(formula = voculence ~ word1 + (1 | word2) + log(phrase_frequency), 
##     data = my_data, family = "binomial", start = getME(french3.glmer.type, 
##         c("theta", "fixef")))
## 
## Linear Hypotheses:
##                             Estimate Std. Error z value Pr(>|z|)   
## beau_bel - au_àl == 0         2.4568     0.9309    2.64    0.123   
## de_d - au_àl == 0             0.3090     0.4898    0.63    0.998   
## du_del - au_àl == 0          -0.4875     0.4640   -1.05    0.959   
## la_l - au_àl == 0            -0.1634     0.7021   -0.23    1.000   
## le_l - au_àl == 0            -0.3680     0.5064   -0.73    0.995   
## ma_mon - au_àl == 0           2.3977     0.8118    2.95    0.054 . 
## vieux_vieil - au_àl == 0      1.7650     0.7627    2.31    0.255   
## de_d - beau_bel == 0         -2.1478     0.9895   -2.17    0.335   
## du_del - beau_bel == 0       -2.9443     0.9616   -3.06    0.039 * 
## la_l - beau_bel == 0         -2.6202     1.1404   -2.30    0.264   
## le_l - beau_bel == 0         -2.8248     1.0054   -2.81    0.080 . 
## ma_mon - beau_bel == 0       -0.0591     1.0808   -0.05    1.000   
## vieux_vieil - beau_bel == 0  -0.6918     0.9873   -0.70    0.996   
## du_del - de_d == 0           -0.7964     0.4316   -1.85    0.550   
## la_l - de_d == 0             -0.4724     0.5102   -0.93    0.980   
## le_l - de_d == 0             -0.6769     0.4350   -1.56    0.746   
## ma_mon - de_d == 0            2.0888     0.7580    2.76    0.092 . 
## vieux_vieil - de_d == 0       1.4560     0.8136    1.79    0.589   
## la_l - du_del == 0            0.3240     0.6454    0.50    1.000   
## le_l - du_del == 0            0.1195     0.4502    0.27    1.000   
## ma_mon - du_del == 0          2.8852     0.8236    3.50    <0.01 **
## vieux_vieil - du_del == 0     2.2524     0.7864    2.86    0.069 . 
## le_l - la_l == 0             -0.2045     0.6353   -0.32    1.000   
## ma_mon - la_l == 0            2.5612     0.8366    3.06    0.039 * 
## vieux_vieil - la_l == 0       1.9284     0.9826    1.96    0.469   
## ma_mon - le_l == 0            2.7657     0.8540    3.24    0.023 * 
## vieux_vieil - le_l == 0       2.1329     0.8328    2.56    0.149   
## vieux_vieil - ma_mon == 0    -0.6328     0.9535   -0.66    0.997   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
french4.glmer.type.glht.cld <- cld(french4.glmer.type.glht)
#plot(french4.glmer.type.glht)
#french.lmer.type.glht.confint <- confint(french.lmer.type.glht)

Summary, taking absolute z value of 2.5 as cutoff:

  • beau > au, du, le
  • ma > au, de, du, la, le
  • vieux > du, le

Or, using the 0.05 p-values that are now supplied (parentheses for 0.05 du, (le) * ma > (au), (de), du, la, le * (vieux > du)

Plot the significantly different levels of Word1–Word1s with the same letter label at the top are not significantly different from each other.

opar <- par(mai=c(1,1,2,1))
plot(french4.glmer.type.glht.cld)

plot of chunk plotWord1Levels

par(opar)
#sort(table(french$word1)) #just checking what all the levels are and how many items in each

Compare to SAS beta regression results:

  • {beau, ma, vieux} > {au, le, du}
  • ma > la
  • de > {du, le}

Basically same three strata: beau/ma/vieux, au/de/la, du/le

Make the wug

Classify Word1s. Note that this is hard-coding the differences found above:

french["word1_group"] <- NA

for(i in 1:nrow(french)) {
  if(french$word1[i] == "beau_bel" | french$word1[i] == "ma_mon" |
            french$word1[i] == "vieux_vieil") {
        french$word1_group[i] <- "A: beau/bel, ma/mon,\n     vieux/vieil"
    }
    if(french$word1[i] == "au_àl" | french$word1[i] == "de_d" |
            french$word1[i] == "la_l") {
        french$word1_group[i] <- "B: au/à l', de/d', la/l'"
    }
    if(french$word1[i] == "du_del" | french$word1[i] == "le_l") {
        french$word1_group[i] <- "C: du/de l', le/l'"
    }
}

Add in a column with random intercepts, from the non-beta regression model, for each word2, and a column for Word2’s aspiratedness rank:

french["aspire"] <- NA
counter <- 1
    for(j in rownames(ranef(french4.glmer.type)$word2)) { 
        #print(j)
        for(i in 1:nrow(french)) {
            if(french$word2[i] == j) {
                french$aspire[i] <- ranef(french4.glmer.type)$word2[,1][counter]
            }
        }
        counter <- counter + 1
}


#Aspire-ness rank of each word2 (listed as a property of each phrase), binomial model
#We need to go down to 5 slices to get a decent number of Word2s in each slice.

french["aspire_rank_word"] <- NA
counter <- 1
for(j in rownames(ranef(french4.glmer.type)$word2)) { 
        #print(j)
        for(i in 1:nrow(french)) {
            if(french$word2[i] == j) {
                french$aspire_rank_word[i] <- as.numeric(cut(ranef(french4.glmer.type)$word2[,1], breaks=5))[counter]
          }
      }
        counter <- counter + 1
}

#table to show how many in each group
table(french$word1_group, french$aspire_rank_word)
##                                         
##                                            1   2   3   4   5
##   A: beau/bel, ma/mon,\n     vieux/vieil  33  18   9  42  50
##   B: au/à l', de/d', la/l'                68 138  58 159 172
##   C: du/de l', le/l'                      36  98  46  40 151
#what is the average alignancy rate in each aspire-rank group?
sort(tapply(french$voculence, french$aspire_rank_word, FUN=mean))
##       1       2       3       4       5 
## 0.01295 0.03090 0.33012 0.92409 0.96851

Make an interaction plot–random intercepts from pseudo-logistic regression model:

par(mar=c(5,5,2,6))
with(french, {
  interaction.plot(x.factor=as.factor(french$aspire_rank_word), 
  trace.factor=factor(word1_group), 
  response=voculence,
    xlab="Word2 alignancy group",ylab="rate of non-alignment"
  , trace.label="Word1 alignancy group", fixed=TRUE
  ,xpd=TRUE)
})

plot of chunk interactionPlot

#Since this will appear in the paper, also do a PNG file:
png(file="Word2_Word1_interaction_plot.png",width=myResMultiplier*460,height=myResMultiplier*300, res=myResMultiplier*72, family=myFontFamily)
#par(mar=c(5,3,2,0)+0.1, mgp=c(2,1,0))
par(mar=c(5,4,2,0)+0.1)
with(french, {
  interaction.plot(x.factor=as.factor(french$aspire_rank_word), 
  trace.factor=factor(word1_group), 
  response=voculence,
  xlab="Word2 alignancy group",ylab="rate of non-alignment"
  , trace.label="Word1 alignancy group", fixed=TRUE)
})
dev.off()
## pdf 
##   2
table(french$aspire_rank_word, french$word1_group) #shows how many Word2s in each bin
##    
##     A: beau/bel, ma/mon,\n     vieux/vieil B: au/à l', de/d', la/l'
##   1                                     33                       68
##   2                                     18                      138
##   3                                      9                       58
##   4                                     42                      159
##   5                                     50                      172
##    
##     C: du/de l', le/l'
##   1                 36
##   2                 98
##   3                 46
##   4                 40
##   5                151

List of the ranks:

french[,c("word2","aspire_rank_word")]
##                word2 aspire_rank_word
## 1             habeas                5
## 2             habeas                5
## 3             habeas                5
## 4             habeas                5
## 5             habeas                5
## 6           habileté                4
## 7           habileté                4
## 8           habileté                4
## 9           habileté                4
## 10          habileté                4
## 11          habileté                4
## 12          habilité                4
## 13          habilité                4
## 14          habilité                4
## 15          habilité                4
## 16          habilité                4
## 17         habiliter                4
## 18         habiliter                4
## 19         habiliter                4
## 20         habiliter                4
## 21         habiliter                4
## 22       habillement                5
## 23       habillement                5
## 24       habillement                5
## 25       habillement                5
## 26       habillement                5
## 27       habillement                5
## 28       habillement                5
## 29       habillement                5
## 30       habillement                5
## 31             habit                5
## 32             habit                5
## 33             habit                5
## 34             habit                5
## 35             habit                5
## 36             habit                5
## 37             habit                5
## 38             habit                5
## 39             habit                5
## 40             habit                5
## 41            habita               NA
## 42            habita               NA
## 43            habita               NA
## 44          habitant                5
## 45          habitant                5
## 46          habitant                5
## 47          habitant                5
## 48          habitant                5
## 49          habitant                5
## 50          habitant                5
## 51          habitant                5
## 52          habitant                5
## 53           habitat                5
## 54           habitat                5
## 55           habitat                5
## 56           habitat                5
## 57           habitat                5
## 58           habitat                5
## 59           habitat                5
## 60           habitat                5
## 61           habitat                5
## 62        habitation                4
## 63        habitation                4
## 64        habitation                4
## 65        habitation                4
## 66        habitation                4
## 67        habitation                4
## 68            habite               NA
## 69            habite               NA
## 70            habite               NA
## 71            habite               NA
## 72            habite               NA
## 73            habite               NA
## 74            habité                5
## 75            habité                5
## 76            habité                5
## 77            habité                5
## 78            habité                5
## 79          habitent               NA
## 80          habitent               NA
## 81          habitent               NA
## 82          habitent               NA
## 83          habitent               NA
## 84           habiter                4
## 85           habiter                4
## 86           habiter                4
## 87           habiter                4
## 88           habiter                4
## 89           habiter                4
## 90          habitude                4
## 91          habitude                4
## 92          habitude                4
## 93          habitude                4
## 94          habitude                4
## 95          habitude                4
## 96          habituel                5
## 97          habituel                5
## 98          habituel                5
## 99          habituel                5
## 100         habituel                5
## 101         habituel                5
## 102       habituelle                5
## 103       habituelle                5
## 104       habituelle                5
## 105       habituelle                5
## 106       habituelle                5
## 107       habituelle                5
## 108          habitus                5
## 109          habitus                5
## 110          habitus                5
## 111          habitus                5
## 112          habitus                5
## 113          habitus                5
## 114          habitus                5
## 115            hache                1
## 116            hache                1
## 117            hache                1
## 118            hache                1
## 119            hache                1
## 120            hache                1
## 121         hachette                2
## 122         hachette                2
## 123         hachette                2
## 124         hachette                2
## 125         hacienda                3
## 126         hacienda                3
## 127         hacienda                3
## 128         hacienda                3
## 129          haddock                2
## 130          haddock                2
## 131          haddock                2
## 132            Hadès                4
## 133            Hadès                4
## 134            Hadès                4
## 135            Hadès                4
## 136            Hadès                4
## 137            hadji                2
## 138            hadji                2
## 139            hadji                2
## 140            hadji                2
## 141          Hadrien                4
## 142          Hadrien                4
## 143          Hadrien                4
## 144     hagiographie                4
## 145     hagiographie                4
## 146     hagiographie                4
## 147         Haguenau                2
## 148         Haguenau                2
## 149         Haguenau                2
## 150             haie                1
## 151             haie                1
## 152             haie                1
## 153             haie                1
## 154             haie                1
## 155             haie                1
## 156          Hainaut                1
## 157          Hainaut                1
## 158          Hainaut                1
## 159          Hainaut                1
## 160          Hainaut                1
## 161            haine                1
## 162            haine                1
## 163            haine                1
## 164            haine                1
## 165            haine                1
## 166            haine                1
## 167             haïr                2
## 168             haïr                2
## 169             haïr                2
## 170             haïr                2
## 171             haïr                2
## 172             haïs               NA
## 173             haïs               NA
## 174             haïs               NA
## 175             haïs               NA
## 176             hait               NA
## 177             hait               NA
## 178             hait               NA
## 179             hait               NA
## 180          haïtien                5
## 181          haïtien                5
## 182          haïtien                5
## 183          haïtien                5
## 184        haïtienne                4
## 185        haïtienne                4
## 186          haleine                4
## 187          haleine                4
## 188          haleine                4
## 189          haleine                4
## 190          haleine                4
## 191       halètement                2
## 192       halètement                2
## 193       halètement                2
## 194       halètement                2
## 195       halètement                2
## 196             hall                1
## 197             hall                1
## 198             hall                1
## 199             hall                1
## 200             hall                1
## 201             hall                1
## 202             hall                1
## 203             hall                1
## 204          hallali                5
## 205          hallali                5
## 206          hallali                5
## 207          hallali                5
## 208          hallali                5
## 209            halle                2
## 210            halle                2
## 211            halle                2
## 212            halle                2
## 213            halle                2
## 214       hallebarde                2
## 215       hallebarde                2
## 216       hallebarde                2
## 217       hallebarde                2
## 218          hallier                2
## 219          hallier                2
## 220          hallier                2
## 221          hallier                2
## 222          hallier                2
## 223     hallstattien                4
## 224     hallstattien                4
## 225             halo                1
## 226             halo                1
## 227             halo                1
## 228             halo                1
## 229             halo                1
## 230             halo                1
## 231            halte                1
## 232            halte                1
## 233            halte                1
## 234            halte                1
## 235            halte                1
## 236            halte                1
## 237          haltère                5
## 238          haltère                5
## 239          haltère                5
## 240           hamada                2
## 241           hamada                2
## 242         Hambourg                2
## 243         Hambourg                2
## 244         Hambourg                2
## 245         Hambourg                2
## 246         Hambourg                2
## 247           hameau                1
## 248           hameau                1
## 249           hameau                1
## 250           hameau                1
## 251           hameau                1
## 252           hameau                1
## 253           hameau                1
## 254           hameau                1
## 255           hameau                1
## 256          hameçon                4
## 257          hameçon                4
## 258          hameçon                4
## 259          hameçon                4
## 260          hameçon                4
## 261           hammam                2
## 262           hammam                2
## 263           hammam                2
## 264           hammam                2
## 265           hammam                2
## 266          hamster                2
## 267          hamster                2
## 268          hamster                2
## 269          hamster                2
## 270         handicap                1
## 271         handicap                1
## 272         handicap                1
## 273         handicap                1
## 274         handicap                1
## 275         handicap                1
## 276         handicap                1
## 277        handicape               NA
## 278        handicape               NA
## 279           hangar                1
## 280           hangar                1
## 281           hangar                1
## 282           hangar                1
## 283           hangar                1
## 284           hangar                1
## 285           hangar                1
## 286           hangar                1
## 287         hanneton                2
## 288         hanneton                2
## 289         hanneton                2
## 290         hanneton                2
## 291         hanneton                2
## 292            Hanoï                2
## 293            Hanoï                2
## 294            Hanoï                2
## 295            Hanoï                2
## 296          haoussa                2
## 297          haoussa                2
## 298          haoussa                2
## 299          haoussa                2
## 300            hapax                5
## 301            hapax                5
## 302            hapax                5
## 303            hapax                5
## 304        haranguer                2
## 305        haranguer                2
## 306        haranguer                2
## 307        haranguer                2
## 308      harassement                3
## 309      harassement                3
## 310      harassement                3
## 311      harassement                3
## 312      harassement                3
## 313         harasser                3
## 314         harasser                3
## 315         harasser                3
## 316      harcèlement                1
## 317      harcèlement                1
## 318      harcèlement                1
## 319      harcèlement                1
## 320      harcèlement                1
## 321         harceler                2
## 322         harceler                2
## 323         harceler                2
## 324         harceler                2
## 325            hardi                2
## 326            hardi                2
## 327            hardi                2
## 328            hardi                2
## 329            hardi                2
## 330            hardi                2
## 331            hardi                2
## 332        hardiesse                1
## 333        hardiesse                1
## 334        hardiesse                1
## 335        hardiesse                1
## 336        hardiesse                1
## 337        hardiesse                1
## 338           hareng                2
## 339           hareng                2
## 340           hareng                2
## 341           hareng                2
## 342           hareng                2
## 343           hareng                2
## 344           hareng                2
## 345          haricot                2
## 346          haricot                2
## 347          haricot                2
## 348          haricot                2
## 349          haricot                2
## 350          haricot                2
## 351        haridelle                3
## 352        haridelle                3
## 353        haridelle                3
## 354            harle                3
## 355            harle                3
## 356        harmattan                5
## 357        harmattan                5
## 358        harmattan                5
## 359        harmattan                5
## 360           Harmel                4
## 361           harmel                2
## 362           harmel                2
## 363           harmel                2
## 364           Harmel                4
## 365        harmonica                5
## 366        harmonica                5
## 367        harmonica                5
## 368        harmonica                5
## 369        harmonica                5
## 370        harmonica                5
## 371         harmonie                4
## 372         harmonie                4
## 373         harmonie                4
## 374         harmonie                4
## 375         harmonie                4
## 376         harmonie                4
## 377     harnachement                2
## 378     harnachement                2
## 379     harnachement                2
## 380     harnachement                2
## 381     harnachement                2
## 382     harnachement                2
## 383        harnacher                3
## 384        harnacher                3
## 385          harnois                2
## 386          harnois                2
## 387          harnois                2
## 388          harnois                2
## 389          harnois                2
## 390          harnois                2
## 391             haro                2
## 392             haro                2
## 393             haro                2
## 394             haro                2
## 395             haro                2
## 396         harpagon                5
## 397            harpe                1
## 398            harpe                1
## 399            harpe                1
## 400            harpe                1
## 401            harpe                1
## 402            harpe                1
## 403             hart                2
## 404             hart                2
## 405           hasard                1
## 406           hasard                1
## 407           hasard                1
## 408           hasard                1
## 409           hasard                1
## 410           hasard                1
## 411           hasard                1
## 412           hasard                1
## 413          hasarde               NA
## 414          hasarde               NA
## 415          hasarde               NA
## 416          hasarde               NA
## 417          hasarde               NA
## 418         hasarder                2
## 419         hasarder                2
## 420         hasarder                2
## 421         hasarder                2
## 422         hasarder                2
## 423       hasardeuse                2
## 424       hasardeuse                2
## 425       hasardeuse                2
## 426       hasardeuse                2
## 427       hasardeuse                2
## 428        hasardeux                2
## 429        hasardeux                2
## 430        hasardeux                2
## 431        hasardeux                2
## 432        hasardeux                2
## 433        hasardeux                2
## 434             hase                2
## 435             hase                2
## 436             hase                2
## 437            haste                2
## 438            haste                2
## 439            haste                2
## 440             hâte                1
## 441             hâte                1
## 442             hâte                1
## 443             hâte                1
## 444             hâte                1
## 445             hâte                1
## 446             hâte                1
## 447             hâte                1
## 448             hâte                1
## 449             hâte                1
## 450            hâter                2
## 451            hâter                2
## 452            hâter                2
## 453            hâter                2
## 454            hâter                2
## 455           Hathor                4
## 456           Hathor                4
## 457          haubert                2
## 458          haubert                2
## 459          haubert                2
## 460          haubert                2
## 461          haubert                2
## 462           hausse                2
## 463           hausse                2
## 464           hausse                2
## 465           hausse                2
## 466           hausse                2
## 467           hausse                2
## 468           hausse                2
## 469           hausse                2
## 470           hausse                2
## 471           hausse                2
## 472             haut                1
## 473             haut                1
## 474             haut                1
## 475             haut                1
## 476             haut                1
## 477             haut                1
## 478             haut                1
## 479             haut                1
## 480             haut                1
## 481       hautboïste                3
## 482       hautboïste                3
## 483            haute                1
## 484            haute                1
## 485            haute                1
## 486            haute                1
## 487            haute                1
## 488            haute                1
## 489        Hauterive                4
## 490        Hauterive                4
## 491      hauterivien                5
## 492      Hauterivien                5
## 493      Hauterivien                5
## 494      hauterivien                5
## 495      Hauterivien                5
## 496      Hauterivien                5
## 497          hauteur                1
## 498          hauteur                1
## 499          hauteur                1
## 500          hauteur                1
## 501          hauteur                1
## 502          hauteur                1
## 503         Hautmont                3
## 504         Hautmont                3
## 505            havre                1
## 506            havre                1
## 507            havre                1
## 508            havre                1
## 509            havre                1
## 510            havre                1
## 511            havre                1
## 512            havre                1
## 513            havre                1
## 514           hazard                2
## 515           hazard                2
## 516           hazard                2
## 517           hazard                2
## 518           hazard                2
## 519           heaume                2
## 520           heaume                2
## 521           heaume                2
## 522           heaume                2
## 523           heaume                2
## 524            hebdo                5
## 525            hebdo                5
## 526            hebdo                5
## 527            hebdo                5
## 528            hebdo                5
## 529            hebdo                5
## 530      hébergement                5
## 531      hébergement                5
## 532      hébergement                5
## 533      hébergement                5
## 534      hébergement                5
## 535      hébergement                5
## 536           Hécate                4
## 537           Hécate                4
## 538          hectare                5
## 539          hectare                5
## 540          hectare                5
## 541          hectare                5
## 542          hectare                5
## 543          hectare                5
## 544       hectolitre                5
## 545       hectolitre                5
## 546       hectolitre                5
## 547       hectolitre                5
## 548       hectolitre                5
## 549           Hector                5
## 550           Hector                5
## 551           Hector                5
## 552           Hector                5
## 553           Hector                5
## 554           Hector                5
## 555           Hector                5
## 556     hégélianisme                4
## 557     hégélianisme                4
## 558     hégélianisme                4
## 559     hégélianisme                4
## 560     hégélianisme                4
## 561     hégélianisme                4
## 562         hégélien                5
## 563         hégélien                5
## 564         hégélien                5
## 565         hégélien                5
## 566         hégélien                5
## 567         hégélien                5
## 568         hégélien                5
## 569        hégémonie                4
## 570        hégémonie                4
## 571        hégémonie                4
## 572        hégémonie                4
## 573           Hélène                4
## 574           Hélène                4
## 575           Hélène                4
## 576           Hélène                4
## 577           Hélène                4
## 578           hélice                4
## 579           hélice                4
## 580           hélice                4
## 581           hélice                4
## 582            hélio                5
## 583            hélio                5
## 584           hélion                5
## 585           hélion                5
## 586           hélium                5
## 587           hélium                5
## 588           hélium                5
## 589           hélium                5
## 590           hélium                5
## 591            hélix                5
## 592            hélix                5
## 593            hélix                5
## 594            hélix                5
## 595          Héloïse                4
## 596          Héloïse                4
## 597          Héloïse                4
## 598          Héloïse                4
## 599         hématite                4
## 600         hématite                4
## 601      hémoglobine                4
## 602      hémoglobine                4
## 603      hémoglobine                4
## 604            henné                2
## 605            henné                2
## 606            henné                2
## 607            henné                2
## 608            henné                2
## 609           hennin                2
## 610           hennin                2
## 611           hennin                2
## 612           hennin                2
## 613         hennuyer                3
## 614            Henri                3
## 615            Henri                3
## 616            Henri                3
## 617            Henri                3
## 618            Henri                3
## 619            Henri                3
## 620            Henri                3
## 621            Henri                3
## 622        Henriette                4
## 623        Henriette                4
## 624        Henriette                4
## 625        Henriette                4
## 626        Henriette                4
## 627            henry                3
## 628            Henry                3
## 629            Henry                3
## 630            Henry                3
## 631            Henry                3
## 632            henry                3
## 633            Henry                3
## 634            Henry                3
## 635            Henry                3
## 636            Henry                3
## 637             Héra                4
## 638             Héra                4
## 639         Héraclès                5
## 640         Héraclès                5
## 641         Héraclès                5
## 642         Héraclès                5
## 643         Héraclès                5
## 644         Héraclès                5
## 645        Héraclite                5
## 646        Héraclite                5
## 647        Héraclite                5
## 648        Héraclite                5
## 649        Héraclite                5
## 650          Hérault                4
## 651          Hérault                4
## 652          Hérault                4
## 653          Hérault                4
## 654          Hérault                4
## 655          Hérault                4
## 656           héraut                2
## 657           héraut                2
## 658           héraut                2
## 659           héraut                2
## 660           héraut                2
## 661           héraut                2
## 662           héraut                2
## 663            herbe                4
## 664            herbe                4
## 665            herbe                4
## 666            herbe                4
## 667            herbe                4
## 668            herbe                4
## 669            herbe                4
## 670            herbe                4
## 671            herbe                4
## 672          Herbert                3
## 673          Herbert                3
## 674          Herbert                3
## 675          Herbert                3
## 676         herbette                5
## 677         herbette                5
## 678          hercule                5
## 679          hercule                5
## 680          hercule                5
## 681          hercule                5
## 682          hercule                5
## 683         hérédité                4
## 684         hérédité                4
## 685         hérédité                4
## 686         hérédité                4
## 687         hérédité                4
## 688         hérédité                4
## 689          hérésie                4
## 690          hérésie                4
## 691          hérésie                4
## 692          hérésie                4
## 693          hérésie                4
## 694          hérésie                4
## 695        hérissant                3
## 696        hérissant                3
## 697          hérisse               NA
## 698          hérisse               NA
## 699          hérisse               NA
## 700          hérisse               NA
## 701        hérissent               NA
## 702        hérissent               NA
## 703        hérissent               NA
## 704        hérissent               NA
## 705         hérisser                3
## 706         hérisser                3
## 707         hérisser                3
## 708         hérisser                3
## 709         hérisson                2
## 710         hérisson                2
## 711         hérisson                2
## 712         hérisson                2
## 713         hérisson                2
## 714         hérisson                2
## 715         héritage                5
## 716         héritage                5
## 717         héritage                5
## 718         héritage                5
## 719         héritage                5
## 720         héritage                5
## 721         héritage                5
## 722         héritage                5
## 723         héritage                5
## 724         héritier                5
## 725         héritier                5
## 726         héritier                5
## 727         héritier                5
## 728         héritier                5
## 729         héritier                5
## 730         héritier                5
## 731         héritier                5
## 732         héritier                5
## 733        hermandad                2
## 734        hermandad                2
## 735           hermès                5
## 736           hermès                5
## 737           hermès                5
## 738           hermès                5
## 739           hermès                5
## 740          hermine                5
## 741          hermine                5
## 742          hermine                5
## 743          hermine                5
## 744          Hermite                5
## 745          Hermite                5
## 746          hermite                5
## 747          Hermite                5
## 748          hermite                5
## 749          hermite                5
## 750          hermite                5
## 751          Hermite                5
## 752          hermite                5
## 753          hermite                5
## 754           hernie                1
## 755           hernie                1
## 756           hernie                1
## 757           hernie                1
## 758           hernie                1
## 759           hernie                1
## 760           Hérode                5
## 761           Hérode                5
## 762           Hérode                5
## 763           Hérode                5
## 764           Hérode                5
## 765           Hérode                5
## 766         Hérodote                5
## 767         Hérodote                5
## 768         Hérodote                5
## 769         Hérodote                5
## 770         Hérodote                5
## 771         Hérodote                5
## 772         Hérodote                5
## 773          héroïne                4
## 774          héroïne                4
## 775          héroïne                4
## 776          héroïne                4
## 777          héroïne                4
## 778          héroïne                4
## 779         héroïsme                5
## 780         héroïsme                5
## 781         héroïsme                5
## 782         héroïsme                5
## 783         héroïsme                5
## 784         héroïsme                5
## 785         héroïsme                5
## 786         héroïsme                5
## 787         héroïsme                5
## 788         héroïsme                5
## 789            héron                2
## 790            héron                2
## 791            héron                2
## 792            héron                2
## 793            héron                2
## 794            héron                2
## 795            héros                1
## 796            héros                1
## 797            héros                1
## 798            héros                1
## 799            héros                1
## 800            héros                1
## 801            héros                1
## 802            héros                1
## 803            héros                1
## 804           herpès                5
## 805           herpès                5
## 806           herpès                5
## 807           herpès                5
## 808           herpès                5
## 809            hertz                2
## 810            hertz                2
## 811            hertz                2
## 812            Hervé                3
## 813            Hervé                3
## 814            Hervé                3
## 815            Hervé                3
## 816      Herzégovine                4
## 817      Herzégovine                4
## 818           hésite               NA
## 819           hésite               NA
## 820           hésite               NA
## 821          hésiter                4
## 822          hésiter                4
## 823          hésiter                4
## 824            hesse                3
## 825            Hesse                1
## 826            Hesse                1
## 827            Hesse                1
## 828           hetman                4
## 829           hetman                4
## 830           hetman                4
## 831           hetman                4
## 832           hetman                4
## 833           hetman                4
## 834           hetman                4
## 835            hêtre                1
## 836            hêtre                1
## 837            hêtre                1
## 838            hêtre                1
## 839            hêtre                1
## 840            hêtre                1
## 841            hêtre                1
## 842            hêtre                1
## 843             heur                5
## 844             heur                5
## 845             heur                5
## 846             heur                5
## 847             heur                5
## 848             heur                5
## 849            heure                4
## 850            heure                4
## 851            heure                4
## 852            heure                4
## 853            heure                4
## 854            heure                4
## 855         heureuse                4
## 856         heureuse                4
## 857         heureuse                4
## 858         heureuse                4
## 859         heureuse                4
## 860         heureuse                4
## 861          heureux                5
## 862          heureux                5
## 863          heureux                5
## 864          heureux                5
## 865          heureux                5
## 866          heureux                5
## 867          heureux                5
## 868          heureux                5
## 869          heureux                5
## 870            heurt                2
## 871            heurt                2
## 872            heurt                2
## 873            heurt                2
## 874            heurt                2
## 875            heurt                2
## 876          heurter                2
## 877          heurter                2
## 878          heurter                2
## 879          heurter                2
## 880          heurter                2
## 881          heurter                2
## 882           hiatus                3
## 883           hiatus                3
## 884           hiatus                3
## 885           hiatus                3
## 886           hiatus                3
## 887           hiatus                3
## 888         hibiscus                5
## 889         hibiscus                5
## 890         hibiscus                5
## 891         hibiscus                5
## 892         hibiscus                5
## 893            hibou                2
## 894            hibou                2
## 895            hibou                2
## 896            hibou                2
## 897            hibou                2
## 898            hibou                2
## 899          hickory                4
## 900          hickory                4
## 901          hideuse                2
## 902          hideuse                2
## 903          hideuse                2
## 904          hideuse                2
## 905          hideuse                2
## 906           hideux                2
## 907           hideux                2
## 908           hideux                2
## 909           hideux                2
## 910           hideux                2
## 911           hideux                2
## 912       hiérarchie                1
## 913       hiérarchie                1
## 914       hiérarchie                1
## 915       hiérarchie                1
## 916       hiérarchie                1
## 917       hiérarchie                1
## 918  hiérarchisation                2
## 919  hiérarchisation                2
## 920  hiérarchisation                2
## 921     hiérarchiser                2
## 922     hiérarchiser                2
## 923     hiérarchiser                2
## 924        hiérarque                2
## 925        hiérarque                2
## 926        hiérarque                2
## 927        hiérarque                2
## 928        hiérarque                2
## 929       hiératisme                3
## 930       hiératisme                3
## 931       hiératisme                3
## 932       hiératisme                3
## 933       hiératisme                3
## 934       hiérogamie                3
## 935       hiérogamie                3
## 936      hiéroglyphe                3
## 937      hiéroglyphe                3
## 938      hiéroglyphe                3
## 939      hiéroglyphe                3
## 940      hiéroglyphe                3
## 941    hiérogrammate                5
## 942    hiérogrammate                5
## 943      hiérophanie                3
## 944      hiérophanie                3
## 945      hiérophante                4
## 946      hiérophante                4
## 947      hiérophante                4
## 948      hiérophante                4
## 949      hiérophante                4
## 950             hile                2
## 951             hile                2
## 952             hile                2
## 953             hile                2
## 954             hile                2
## 955         Himalaya                4
## 956         Himalaya                4
## 957         Himalaya                4
## 958            hindi                3
## 959            hindi                3
## 960            hindi                3
## 961            hindi                3
## 962           hindou                5
## 963           hindou                5
## 964           hindou                5
## 965           hindou                5
## 966           hindou                5
## 967           hindou                5
## 968       hinterland                5
## 969       hinterland                5
## 970       hinterland                5
## 971       hinterland                5
## 972       hinterland                5
## 973        hipparque                5
## 974        hipparque                5
## 975         hippisme                4
## 976         hippisme                4
## 977         hippisme                4
## 978         hippisme                4
## 979            hippo                5
## 980            hippo                5
## 981            hippo                5
## 982            hippo                5
## 983       Hippocrate                5
## 984       Hippocrate                5
## 985       Hippocrate                5
## 986       Hippocrate                5
## 987       Hippocrate                5
## 988       Hippocrate                5
## 989       Hippocrate                5
## 990       hirondelle                5
## 991       hirondelle                5
## 992       hirondelle                5
## 993       hirondelle                5
## 994       Hispaniola                4
## 995       Hispaniola                4
## 996       Hispaniola                4
## 997         histoire                4
## 998         histoire                4
## 999         histoire                4
## 1000        histoire                4
## 1001        histoire                4
## 1002        histoire                4
## 1003      histologie                4
## 1004      histologie                4
## 1005      histologie                4
## 1006       historien                5
## 1007       historien                5
## 1008       historien                5
## 1009       historien                5
## 1010       historien                5
## 1011       historien                5
## 1012       historien                5
## 1013       historien                5
## 1014       historien                5
## 1015      hitlérisme                5
## 1016      hitlérisme                5
## 1017      hitlérisme                5
## 1018      hitlérisme                5
## 1019           hiver                5
## 1020           hiver                5
## 1021           hiver                5
## 1022           hiver                5
## 1023           hiver                5
## 1024           hiver                5
## 1025           hiver                5
## 1026           hiver                5
## 1027           hiver                5
## 1028             hlm                3
## 1029        hobereau                2
## 1030        hobereau                2
## 1031        hobereau                2
## 1032        hobereau                2
## 1033        hobereau                2
## 1034        hobereau                2
## 1035            hoir                4
## 1036            hoir                4
## 1037            hoir                4
## 1038            hoir                4
## 1039          hoirie                5
## 1040          hoirie                5
## 1041          hoirie                5
## 1042          hoirie                5
## 1043        Hokkaïdo                3
## 1044        Hokkaïdo                3
## 1045        Hokkaïdo                3
## 1046      hollandais                2
## 1047      hollandais                2
## 1048      hollandais                2
## 1049      hollandais                2
## 1050      hollandais                2
## 1051      hollandais                2
## 1052      hollandais                2
## 1053     hollandaise                2
## 1054     hollandaise                2
## 1055     hollandaise                2
## 1056        hollande                3
## 1057        hollande                3
## 1058         holmium                5
## 1059         holmium                5
## 1060         holmium                5
## 1061      holocauste                5
## 1062      holocauste                5
## 1063      holocauste                5
## 1064      holocauste                5
## 1065      holocauste                5
## 1066      holocauste                5
## 1067            home                3
## 1068            home                3
## 1069            home                3
## 1070            home                3
## 1071            home                3
## 1072            home                3
## 1073            home                3
## 1074            home                3
## 1075            home                3
## 1076         homélie                5
## 1077         homélie                5
## 1078         homélie                5
## 1079         homélie                5
## 1080         homélie                5
## 1081         hommage                5
## 1082         hommage                5
## 1083         hommage                5
## 1084         hommage                5
## 1085         hommage                5
## 1086         hommage                5
## 1087         hommage                5
## 1088         hommage                5
## 1089           homme                4
## 1090           homme                4
## 1091           homme                4
## 1092           homme                4
## 1093           homme                4
## 1094           homme                4
## 1095           homme                4
## 1096           homme                4
## 1097           homme                4
## 1098           homme                4
## 1099           homme                4
## 1100     homographie                5
## 1101     homographie                5
## 1102         Hongrie                1
## 1103         Hongrie                1
## 1104         Hongrie                1
## 1105         Hongrie                1
## 1106         Hongrie                1
## 1107        hongrois                2
## 1108        hongrois                2
## 1109        hongrois                2
## 1110        hongrois                2
## 1111        hongrois                2
## 1112        hongrois                2
## 1113       honnêteté                4
## 1114       honnêteté                4
## 1115       honnêteté                4
## 1116       honnêteté                4
## 1117       honnêteté                4
## 1118       honnêteté                4
## 1119         honneur                4
## 1120         honneur                4
## 1121         honneur                4
## 1122         honneur                4
## 1123         honneur                4
## 1124         honneur                4
## 1125         honneur                4
## 1126         honneur                4
## 1127         honneur                4
## 1128         honneur                4
## 1129         honneur                4
## 1130          honnir                3
## 1131          honnir                3
## 1132          honnir                3
## 1133         honorée                5
## 1134         honorée                5
## 1135         honorée                5
## 1136         honorée                5
## 1137         honorée                5
## 1138         honorer                4
## 1139         honorer                4
## 1140         honorer                4
## 1141         honorer                4
## 1142         honorer                4
## 1143         honorer                4
## 1144        Honorine                4
## 1145        Honorine                4
## 1146           honte                1
## 1147           honte                1
## 1148           honte                1
## 1149           honte                1
## 1150           honte                1
## 1151           honte                1
## 1152        honteuse                2
## 1153        honteuse                2
## 1154        honteuse                2
## 1155        honteuse                2
## 1156        honteuse                2
## 1157        honteuse                2
## 1158         honteux                2
## 1159         honteux                2
## 1160         honteux                2
## 1161         honteux                2
## 1162         honteux                2
## 1163         honteux                2
## 1164         hôpital                4
## 1165         hôpital                4
## 1166         hôpital                4
## 1167         hôpital                4
## 1168         hôpital                4
## 1169         hôpital                4
## 1170         hôpital                4
## 1171         hôpital                4
## 1172         hôpital                4
## 1173          Horace                5
## 1174          Horace                5
## 1175          Horace                5
## 1176          Horace                5
## 1177          Horace                5
## 1178          Horace                5
## 1179          Horace                5
## 1180          Horace                5
## 1181         horizon                5
## 1182         horizon                5
## 1183         horizon                5
## 1184         horizon                5
## 1185         horizon                5
## 1186         horizon                5
## 1187         horizon                5
## 1188         horizon                5
## 1189         horizon                5
## 1190         horloge                4
## 1191         horloge                4
## 1192         horloge                4
## 1193         horloge                4
## 1194         horloge                4
## 1195         horloge                4
## 1196      hornblende                2
## 1197      hornblende                2
## 1198         horreur                4
## 1199         horreur                4
## 1200         horreur                4
## 1201         horreur                4
## 1202         horreur                4
## 1203         horreur                4
## 1204       hortensia                5
## 1205       hortensia                5
## 1206       hortensia                5
## 1207       hortensia                5
## 1208    horticulture                4
## 1209    horticulture                4
## 1210    horticulture                4
## 1211           Horus                5
## 1212           Horus                5
## 1213           Horus                5
## 1214           Horus                5
## 1215           Horus                5
## 1216           Horus                5
## 1217         hosanna                5
## 1218         hosanna                5
## 1219         hosanna                5
## 1220         hosanna                5
## 1221         hosanna                5
## 1222         hospice                5
## 1223         hospice                5
## 1224         hospice                5
## 1225         hospice                5
## 1226         hospice                5
## 1227         hospice                5
## 1228         hospice                5
## 1229         hospice                5
## 1230         hospice                5
## 1231     hospitalité                4
## 1232     hospitalité                4
## 1233     hospitalité                4
## 1234     hospitalité                4
## 1235     hospitalité                4
## 1236     hospitalité                4
## 1237        hospodar                4
## 1238        hospodar                4
## 1239        hospodar                4
## 1240        hospodar                4
## 1241        hospodar                4
## 1242        hospodar                4
## 1243       hostilité                4
## 1244       hostilité                4
## 1245       hostilité                4
## 1246       hostilité                4
## 1247       hostilité                4
## 1248       hostilité                4
## 1249           hôtel                4
## 1250           hôtel                4
## 1251           hôtel                4
## 1252           hôtel                4
## 1253           hôtel                4
## 1254           hôtel                4
## 1255           hôtel                4
## 1256           hôtel                4
## 1257           hôtel                4
## 1258           hotte                1
## 1259           hotte                1
## 1260           hotte                1
## 1261           hotte                1
## 1262           hotte                1
## 1263       hottentot                2
## 1264       hottentot                2
## 1265       hottentot                2
## 1266         houblon                1
## 1267         houblon                1
## 1268         houblon                1
## 1269         houblon                1
## 1270         houblon                1
## 1271         houblon                1
## 1272         houille                1
## 1273         houille                1
## 1274         houille                1
## 1275         houille                1
## 1276         houille                1
## 1277         houille                1
## 1278      houspiller                3
## 1279      houspiller                3
## 1280      houspiller                3
## 1281        Houssaye                2
## 1282        Houssaye                2
## 1283        Houssaye                2
## 1284        Houssaye                2
## 1285          Hubert                3
## 1286          Hubert                3
## 1287          Hubert                3
## 1288          Hubert                3
## 1289          Hubert                3
## 1290          huchet                2
## 1291          huchet                2
## 1292          huchet                2
## 1293          huchet                2
## 1294          Hudson                4
## 1295          Hudson                4
## 1296          Hudson                4
## 1297          Hudson                4
## 1298          Hudson                4
## 1299        huguenot                2
## 1300        huguenot                2
## 1301        huguenot                2
## 1302        huguenot                2
## 1303        huguenot                2
## 1304        huguenot                2
## 1305        huguenot                2
## 1306       huguenote                2
## 1307       huguenote                2
## 1308    huguenotisme                3
## 1309    huguenotisme                3
## 1310          Hugues                3
## 1311          Hugues                3
## 1312          Hugues                3
## 1313          Hugues                3
## 1314          Hugues                3
## 1315          Hugues                3
## 1316          Huguet                3
## 1317          Huguet                3
## 1318        Huguette                4
## 1319        Huguette                4
## 1320         huilage                4
## 1321         huilage                4
## 1322         huilage                4
## 1323         huilage                4
## 1324           huile                4
## 1325           huile                4
## 1326           huile                4
## 1327           huile                4
## 1328           huile                4
## 1329           huile                4
## 1330          huiler                4
## 1331          huiler                4
## 1332        huilerie                4
## 1333        huilerie                4
## 1334        huilerie                4
## 1335         huileux                5
## 1336         huileux                5
## 1337         huilier                5
## 1338         huilier                5
## 1339         huilier                5
## 1340         huilier                5
## 1341            huis                3
## 1342            huis                3
## 1343            huis                3
## 1344            huis                3
## 1345            huis                3
## 1346            huit                1
## 1347            huit                1
## 1348            huit                1
## 1349            huit                1
## 1350            huit                1
## 1351            huit                1
## 1352            huit                1
## 1353           Hulot                3
## 1354           Hulot                3
## 1355          humain                3
## 1356          humain                3
## 1357          humain                3
## 1358          humain                3
## 1359          humain                3
## 1360          humain                3
## 1361          humain                3
## 1362          humain                3
## 1363         humaine                4
## 1364         humaine                4
## 1365         humaine                4
## 1366         humaine                4
## 1367         humaine                4
## 1368         humaine                4
## 1369       humanisme                5
## 1370       humanisme                5
## 1371       humanisme                5
## 1372       humanisme                5
## 1373       humanisme                5
## 1374       humanisme                5
## 1375       humanisme                5
## 1376       humanisme                5
## 1377       humanisme                5
## 1378        humanité                4
## 1379        humanité                4
## 1380        humanité                4
## 1381        humanité                4
## 1382        humanité                4
## 1383        humanité                4
## 1384           humer                2
## 1385           humer                2
## 1386           humer                2
## 1387          humeur                4
## 1388          humeur                4
## 1389          humeur                4
## 1390          humeur                4
## 1391          humeur                4
## 1392          humeur                4
## 1393        humidité                4
## 1394        humidité                4
## 1395        humidité                4
## 1396        humidité                4
## 1397        humidité                4
## 1398         humilia               NA
## 1399         humilia               NA
## 1400         humilia               NA
## 1401        humiliât               NA
## 1402        humiliât               NA
## 1403     humiliation                4
## 1404     humiliation                4
## 1405     humiliation                4
## 1406     humiliation                4
## 1407     humiliation                4
## 1408     humiliation                4
## 1409         humilie               NA
## 1410         humilie               NA
## 1411         humilie               NA
## 1412         humilie               NA
## 1413         humilie               NA
## 1414         humilie               NA
## 1415        humilier                4
## 1416        humilier                4
## 1417        humilier                4
## 1418        humilier                4
## 1419        humilier                4
## 1420        humilier                4
## 1421        humilité                4
## 1422        humilité                4
## 1423        humilité                4
## 1424        humilité                4
## 1425        humilité                4
## 1426        humilité                4
## 1427          humour                5
## 1428          humour                5
## 1429          humour                5
## 1430          humour                5
## 1431          humour                5
## 1432          humour                5
## 1433          humour                5
## 1434          humour                5
## 1435          humour                5
## 1436           humus                5
## 1437           humus                5
## 1438           humus                5
## 1439           humus                5
## 1440           humus                5
## 1441           humus                5
## 1442           humus                5
## 1443            hune                2
## 1444            hune                2
## 1445            hune                2
## 1446            hune                2
## 1447            Huon                3
## 1448            Huon                3
## 1449          hurler                2
## 1450          hurler                2
## 1451          hurler                2
## 1452          hurler                2
## 1453           huron                2
## 1454           huron                2
## 1455           huron                2
## 1456        huronien                3
## 1457       hurricane                3
## 1458         hussard                2
## 1459         hussard                2
## 1460         hussard                2
## 1461         hussard                2
## 1462         hussard                2
## 1463         hussard                2
## 1464         hussard                2
## 1465      hussitisme                2
## 1466      hussitisme                2
## 1467      hussitisme                2
## 1468      hussitisme                2
## 1469       hyacinthe                4
## 1470       hyacinthe                4
## 1471         hyalite                4
## 1472         hyalite                4
## 1473     hyaloplasme                3
## 1474     hyaloplasme                3
## 1475     hyaloplasme                3
## 1476         hydrate                5
## 1477         hydrate                5
## 1478         hydrate                5
## 1479         hydrate                5
## 1480         hydrate                5
## 1481         hydrate                5
## 1482         hydrate                5
## 1483         hydrate                5
## 1484           hydro                5
## 1485           hydro                5
## 1486           hydro                5
## 1487           hydro                5
## 1488           hydro                5
## 1489       hydrogène                5
## 1490       hydrogène                5
## 1491       hydrogène                5
## 1492       hydrogène                5
## 1493       hydrogène                5
## 1494       hydrogène                5
## 1495       hydrogène                5
## 1496    hydrographie                4
## 1497    hydrographie                4
## 1498    hydrographie                4
## 1499      hydrologie                4
## 1500      hydrologie                4
## 1501      hydrologie                4
## 1502       hydrolyse                4
## 1503       hydrolyse                4
## 1504       hydrolyse                4
## 1505       hydrolyse                4
## 1506       hydrolyse                4
## 1507           hyène                3
## 1508           hyène                3
## 1509           hyène                3
## 1510           Hygie                4
## 1511         hygiène                4
## 1512         hygiène                4
## 1513         hygiène                4
## 1514         hygiène                4
## 1515         hygiène                4
## 1516           hyper                5
## 1517           hyper                5
## 1518           hyper                5
## 1519           hyper                5
## 1520           hyper                5
## 1521           hyper                5
## 1522        Hypérion                5
## 1523        Hypérion                5
## 1524        Hypérion                5
## 1525    hypertrophie                4
## 1526    hypertrophie                4
## 1527    hypertrophie                4
## 1528    hypertrophie                4
## 1529      hypothèque                4
## 1530      hypothèque                4
## 1531      hypothèque                4
## 1532      hypothèque                4
## 1533      hypothèque                4
## 1534      hypothèque                4
## 1535      hypothèque                4
## 1536      hypothèque                4
## 1537      hypothèque                4
## 1538       hypothèse                4
## 1539       hypothèse                4
## 1540       hypothèse                4
## 1541       hypothèse                4
## 1542       hypothèse                4
## 1543       hypothèse                4
## 1544           iambe                5
## 1545           iambe                5
## 1546           iambe                5
## 1547           iambe                5
## 1548            Iéna                5
## 1549            Iéna                5
## 1550            Iéna                5
## 1551        Iénisséi                5
## 1552        Iénisséi                5
## 1553        Iénisséi                5
## 1554        Iénisséi                5
## 1555          iodate                5
## 1556          iodate                5
## 1557          iodate                5
## 1558          iodate                5
## 1559          iodate                5
## 1560            iode                5
## 1561            iode                5
## 1562            iode                5
## 1563            iode                5
## 1564            iode                5
## 1565            iode                5
## 1566       iodoforme                5
## 1567       iodoforme                5
## 1568       iodoforme                5
## 1569       iodoforme                5
## 1570          iodure                5
## 1571          iodure                5
## 1572          iodure                5
## 1573          iodure                5
## 1574          iodure                5
## 1575          iodure                5
## 1576           Ionie                4
## 1577           Ionie                4
## 1578      ionisation                4
## 1579      ionisation                4
## 1580      ionisation                4
## 1581      ionosphère                5
## 1582      ionosphère                5
## 1583            iota                4
## 1584            iota                4
## 1585            iota                4
## 1586            iota                4
## 1587            Iowa                5
## 1588            Iowa                5
## 1589            Iowa                5
## 1590            Iowa                5
## 1591            iule                4
## 1592            iule                4
## 1593            iule                4
## 1594          oignon                5
## 1595          oignon                5
## 1596          oignon                5
## 1597          oignon                5
## 1598          oignon                5
## 1599          oignon                5
## 1600          oignon                5
## 1601             oil                5
## 1602             oil                5
## 1603             oil                5
## 1604           oille                4
## 1605           oille                4
## 1606          oindre                4
## 1607          oindre                4
## 1608          oindre                4
## 1609          oindre                4
## 1610            oing                5
## 1611            oing                5
## 1612            oing                5
## 1613            oing                5
## 1614            oint                4
## 1615            oint                4
## 1616            oint                4
## 1617            oint                4
## 1618            oint                4
## 1619            oint                4
## 1620            oint                4
## 1621            oint                4
## 1622            oint                4
## 1623            oint                4
## 1624            Oise                4
## 1625            Oise                4
## 1626          oiseau                5
## 1627          oiseau                5
## 1628          oiseau                5
## 1629          oiseau                5
## 1630          oiseau                5
## 1631          oiseau                5
## 1632          oiseau                5
## 1633          oiseau                5
## 1634          oiseau                5
## 1635           oison                5
## 1636           oison                5
## 1637           oison                5
## 1638           oison                5
## 1639           oison                5
## 1640            onze                2
## 1641            onze                2
## 1642            onze                2
## 1643            onze                2
## 1644            onze                2
## 1645            onze                2
## 1646            onze                2
## 1647           Ouadi                3
## 1648           Ouadi                3
## 1649           ouadi                4
## 1650           ouadi                4
## 1651           Ouadi                3
## 1652           ouadi                4
## 1653           Ouadi                3
## 1654           ouadi                4
## 1655           ouadi                4
## 1656            oued                5
## 1657            oued                5
## 1658            oued                5
## 1659            oued                5
## 1660            oued                5
## 1661             ouï                3
## 1662             ouï                3
## 1663             ouï                3
## 1664             ouï                3
## 1665             ouï                3
## 1666             ouï                3
## 1667            ouïr                4
## 1668            ouïr                4
## 1669            ouïr                4
## 1670            ouïr                4
## 1671            ouïr                4
## 1672            ouïr                4
## 1673        ouistiti                3
## 1674        ouistiti                3
## 1675        ouistiti                3
## 1676        ouistiti                3
## 1677        ouistiti                3
## 1678          ouolof                3
## 1679          ouolof                3
## 1680             oye                5
## 1681             oye                5
## 1682             oye                5
## 1683             oye                5
## 1684           Utica                5
## 1685           Utica                5
## 1686           yacht                1
## 1687           yacht                1
## 1688           yacht                1
## 1689           yacht                1
## 1690           yacht                1
## 1691           yacht                1
## 1692           yacht                1
## 1693           yacht                1
## 1694            yack                2
## 1695            yack                2
## 1696            yack                2
## 1697            yang                2
## 1698            yang                2
## 1699            yang                2
## 1700            yang                2
## 1701            yard                2
## 1702            yard                2
## 1703            yard                2
## 1704            yard                2
## 1705         yatagan                2
## 1706         yatagan                2
## 1707         yatagan                2
## 1708         yatagan                2
## 1709           Yémen                2
## 1710           Yémen                2
## 1711           Yémen                2
## 1712           Yémen                2
## 1713           Yémen                2
## 1714             yen                1
## 1715             yen                1
## 1716             yen                1
## 1717             yen                1
## 1718             yen                1
## 1719             yen                1
## 1720             yen                1
## 1721           yeuse                5
## 1722           yeuse                5
## 1723            yoga                1
## 1724            yoga                1
## 1725            yoga                1
## 1726            yoga                1
## 1727            yoga                1
## 1728        Yokohama                2
## 1729        Yokohama                2
## 1730        Yokohama                2
## 1731            yole                2
## 1732            yole                2
## 1733            yole                2
## 1734            yole                2
## 1735           Yonne                4
## 1736           Yonne                4
## 1737          Yunnan                1
## 1738          Yunnan                1
## 1739          Yunnan                1
## 1740          Yunnan                1
## 1741          Yunnan                1

Preliminaries to fitting grammar models

For the 8 word1s examined above, and the 5 word2 categories, make a table of V and C token-weighted type counts, by summing (not averaging) voculences, and also summing (1-voculence)s:

#Get the target subset
mySubset <-subset(french,french$word1 == "au_àl" | french$word1 == "beau_bel" | french$word1 == "de_d" | french$word1 == "du_del" | french$word1 == "la_l" | french$word1 == "le_l" | french$word1 == "ma_mon" | french$word1 == "vieux_vieil" )

#Get rid of unused levels
mySubset$word1 <- factor(mySubset$word1)

#Make table for this subset
v_counts_subset <- tapply(mySubset$voculence, list(mySubset$word1, mySubset$aspire_rank_word), FUN=sum, na.rm=TRUE)
c_counts_subset <- tapply((1-mySubset$voculence), list(mySubset$word1, mySubset$aspire_rank_word), FUN=sum, na.rm=TRUE)
v_and_c_table_subset <- cbind(as.data.frame.table(v_counts_subset),as.data.frame.table(c_counts_subset))
colnames(v_and_c_table_subset) <- c("word1", "word2_group", "v_count", "word1_dup", "word2_group_dup", "c_count")

Multiplicative model

One of the models to fit to the data is a multiplicative model: give each Word1 a probability of creating a configuration compatible with resyllabification/unaspiratedness, and give each Word2 a probability of behaving as unaspirated, if it’s in the right configuration. The probability of Word1+Word2 combination behaving as unaspirated is then the product of those two probabilities (hence, ‘multiplicative’). The problem is that a model like this can create a pinch only at one end, resulting in a “claw” shape rather than a “wug” shape (Dustin Bowers’s term).

Fit the optimal probabilities:

myMultiplicative <- function(x) {   #define the error function that is to be minimized
    
    #log likelihood
    log_likelihood <- 0
    for (i in 1:8) { #loop over word1s
      for(j in 1:5) { #loop over word2 groups
        #increment log likelihood by (token-weighted) number of V items times the log of their probability, plus (token-weighted) number of C items items log of their probability
        log_likelihood <- log_likelihood + v_and_c_table_subset[i+8*(j-1),3]*log(x[i]*x[8+j]) + v_and_c_table_subset[i+8*(j-1),6]*log(1-(x[i]*x[8+j])) #This is rather fiddly. The idea is that x[1:8] are the probabilities for the Word1s, and x[9:13] are the probabilities for the Word2 groups. We're using the loops to step through the rows of v_and_c_table_subset
      }
    }
    return(-1*log_likelihood) #by default optim() minimizes, so we minimize the negative log likelihood (that is, get log likelihood as close to zero as possible)
}

#run the optimizer
myOptimization <- optim(par=c(0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5, 0.5,0.5,0.5,0.5,0.5), fn=myMultiplicative, lower=0.000001, upper=1-0.000001, method="L-BFGS-B") #we have to keep the parameters away from actual 0 and 1 or else undefined log()s will result and an error is thrown

#view the parameters
myOptimization$par
##  [1] 0.995604 0.999999 0.982975 0.967096 0.995730 0.960713 0.999999
##  [8] 0.955863 0.005954 0.027870 0.340271 0.937182 0.999999
#get the log likelihood
-1*myOptimization$value
## [1] -207.9

Take a look at the resulting model:

#make a matrix of predicted probabilities
multVector <- c()
for (i in 1:8) {
  for (j in 9:13) {
    multVector <- c(multVector,myOptimization$par[i]*myOptimization$par[j])
  }
}

multMatrix <- matrix(multVector, nrow=8, ncol=5, byrow=TRUE)

#put it in a matrix, and fix the column names and levels of word1 and word2Group
multDataFrame <- as.data.frame.table(multMatrix)
colnames(multDataFrame) <- c("word1","word2Group", "multiplicativePrediction")
levels(multDataFrame$word1) <- c("au_àl", "beau_bel",  "de_d", "du_del", "la_l","le_l","ma_mon", "vieux_vieil")
levels(multDataFrame$word2Group) <- c(1,2,3,4,5)

#make an interaction plot
interaction.plot(x.factor=multDataFrame$word2Group, trace.factor=multDataFrame$word1, response=multDataFrame$multiplicativePrediction, main="Fitted multiplicative model", xlab="Word2 unaspiratedness group",ylab="rate of non-alignment", trace.label="Word1 group", fixed=TRUE, xaxt="n")
axis(1, at=1:5, labels=1:5)

plot of chunk viewMultiplicativeResults

Make an interaction plot that groups the Word1s into three groups (one can’t just take the average, because each Word1 occurs with a different number of Word2s in each group). To do this, add a new column to the french dataframe that contains the multiplicative model’s prediction, then make an interaction plot on that new variable:

#initialize new column
french$multiplicativePrediction <- NA

#look up values
for (i in 1:length(french$word1)) {
  #get the value; we need to use as.character() because "word1" has more levels in the full "french" data frame than in the "maxEntPredictions" data frame.
  multiplicativeValue <- multDataFrame$multiplicativePrediction[as.character(multDataFrame$word1)==as.character(french$word1[i]) & multDataFrame$word2Group==french$aspire_rank_word[i]]
  
  #use it if it's not NA
  if(is.na(multiplicativeValue[1])==FALSE) { #use just first element of maxEntValue, to avoid throwing warning when maxEntValue is vector of NAs, e.g. c(NA,NA,NA); when it's not NA, it will be a single number
    french$multiplicativePrediction[i] <- multiplicativeValue
  }
}

#Make interaction plot as before:
par(mar=c(5,5,3,6))
with(french, {
  interaction.plot(x.factor=as.factor(french$aspire_rank_word), 
  trace.factor=factor(word1_group), 
  response=multiplicativePrediction,
  xlab="Word2 alignancy group",ylab="Decision tree predicted rate of non-alignment"
  , trace.label="Word1 group", fixed=TRUE)
})

plot of chunk multiplicativeInteractionPlot

#Since this will appear in the paper, also do a PNG file:
png(file="multiplicative_predictions_plot.png",width=myResMultiplier*460,height=myResMultiplier*300, res=myResMultiplier*72, family=myFontFamily)
par(mar=c(5,4,2,0)+0.1)
with(french, {
  interaction.plot(x.factor=as.factor(french$aspire_rank_word), 
  trace.factor=factor(word1_group), 
  response=multiplicativePrediction,
  xlab="Word2 alignancy group",ylab="Decision tree predicted rate of non-alignment"
  , trace.label="Word1 group", fixed=TRUE)
})
dev.off()
## pdf 
##   2

Making OTSoft input file

To aid reproducibility and record-keeping, we generate an OTSoft input file automatically from this script.

Print first two rows (constraint names):

write(c("\t\t\tAlign_1\tAlign_2\tAlign_3\tAlign_4\tAlign_5\tNoHiatus\tUseAu\tUseBeau\tUseDe\tUseDu\tUseLa\tUseLe\tUseMa\tUseVieux","\t\t\tAlign_1\tAlign_2\tAlign_3\tAlign_4\tAlign_5\tNoHiatus\tUseAu\tUseBeau\tUseDe\tUseDu\tUseLa\tUseLe\tUseMa\tUseVieux"), file="French_for_OTSoft_targetWord1sOnly.txt",append=FALSE) #overwrite anything there already

A function to turn each word1 into a number (will be usedful in following code chunk). This works because the order of Use constraints is alphabetical, as is the order of word1 levels in the data frame v_and_c_table.

#Yes, I know I could make this more general by making the table another input argument to the function, but I didn't bother. It's a little different for each output-file format, so it's probably best to do this afresh for each model below.
word1_to_num_subset <- function(myString) {
  for(i in 1:length(levels(v_and_c_table_subset$word1))) {
    if(myString==levels(v_and_c_table_subset$word1)[i]) {
      return(i)
    }
  }
}

Print each row of v_and_c_table_subset as two tableau rows

for(i in 1:dim(v_and_c_table_subset)[1]) {
  
  #don't use this row if it has NAs
  if(is.na(v_and_c_table_subset$v_count[i])==FALSE) {
  
      #put together the input string
      myInput <- paste(v_and_c_table_subset$word1[i],"+","group_",v_and_c_table_subset$word2_group[i],sep="")
    
      #get the violation vector for the "voculent" (unaspirated) candidate
      #one violation of Align and one of UseX
      voc_violations <- rep(0,14) #initialize to all zeros--note that there are only 14 now
      voc_violations[as.numeric(v_and_c_table_subset$word2_group[i])] <- 1 # add 1 for the Align constraint
      voc_violations[word1_to_num_subset(v_and_c_table_subset$word1[i])+6] <- 1 #add 6: 5 for the Align constraints and 1 for NoHiatus; substitute 1 violation for the relevant UseX constraint
      voc_violations <- paste(as.character(voc_violations),collapse="\t")
      
      #get the violation vector for the "consulent" (aspirated) candidate
      #just a violation of NoHiatus (#6 constraint)
      cons_violations <- rep(0,14) #initialize to all zeros
      cons_violations[6] <- 1
      cons_violations <- paste(as.character(cons_violations),collapse="\t")
      
  
    write(c(paste(myInput,"voculent",v_and_c_table_subset$v_count[i],voc_violations,sep="\t"),paste("","consulent",v_and_c_table_subset$c_count[i],cons_violations,sep="\t")),file="French_for_OTSoft_targetWord1sOnly.txt",append=TRUE)
  }
}

#While the above produces an output file that *looks* fine, and that the MaxEnt Grammar Tool has no problem with, OTSoft ignores its last line for some reason. This seems to fix the problem:
write(c("\t\t\t\t\t\t\t\t\t"),file="French_for_OTSoft_targetWord1sOnly.txt",append=TRUE)
##BUT! The line with the tabs has to be removed in order to run Bruce's multifold GLA program, or else it acts as an additional candidate (with no constraint violations, so always wins)

Constraint models

Outside of this script, the resulting file French_for_OTSoft_targetWord1sOnly.txt (moved to folder \\French\OTModels_new is then used as input for a few constraint models. (With some easy modifications, we could also use French_for_OTSoft.txt if we wanted, but I decided it makes more sense to fit the model just to the data that we’re talking about in the empirical section.)

MaxEnt

The MaxEnt Grammar Tool (http://www.linguistics.ucla.edu/people/hayes/MaxentGrammarTool/), is used, with default settings. In particular, we use the default values of mu and sigma for every constrain(mu=0, sigma=10000). The large sigma means that, in effect, there no regularization/smoothing/prior–just as close a fit as possible. The output file is named French_for_OTSoft_targetWord1sOnly_MaxEnt_output.txt.

Read in the MaxEnt output file:

conn <- file("French_for_OTSoft_targetWord1sOnly_MaxEnt_output.txt",open="r")
maxEntLines <- readLines(conn)
close(conn)

Find and parse the lines at end that give probabilities for candidates:

#Initialize data frame
maxEntPredictions <- data.frame(word1=character(0), word2Group=character(0), unaspProb = numeric(0))

#Define function to find the header row for this part
getHeaderRowIndex <- function() {
  for(i in 1:length(maxEntLines)) {
    if(maxEntLines[i]=="Input:\tCandidate:\tObserved:\tPredicted:") {
      return(i)
    }
  }
}

#Use the function to see where to start for loop
headerRowIndex <- getHeaderRowIndex()

for(i in headerRowIndex:length(maxEntLines)){
  #split into columns
  lineParts <- strsplit(maxEntLines[i],split="\t")
  
  #Don't bother proceeding futher if it's a "consulent" candidate line
  if(lineParts[[1]][2]=="voculent") {
    
    #extract word1, word2 group, and probability of unaspirated ("voculent") candidate
    myWord1 <- strsplit(lineParts[[1]][1],split="+",fixed=TRUE)[[1]][1]
    myGroup <- str_sub(lineParts[[1]][1],-1)
    myVoculenceProbability <- as.numeric(lineParts[[1]][4])
    
    #add to data frame
    maxEntPredictions <- rbind(maxEntPredictions,data.frame(word1=myWord1,word2Group=myGroup,unaspProb=myVoculenceProbability))
  }

}

Make an interaction plot:

par(mar=c(5,5,2,6))
with(maxEntPredictions, {
  interaction.plot(x.factor=as.factor(maxEntPredictions$word2Group), 
  trace.factor=factor(word1), 
  response=unaspProb,
  xlab="Word2 unaspiratedness group",ylab="rate of behaving as if Word2 is unaspirated"
  , trace.label="Word1 group", fixed=TRUE, main="MaxEnt model")
})

plot of chunk maxEntResultsPlot

To match the presentation of the observed data, we want to group these word1s into groups A, B, and C as before for the observed data. But, we can’t just take averages over all the word1s in a group, because in the original interaction plot of the observed data, average rate is weighted by number of items in that cell. In other words, if there are a lot more beau than vieux that occur with Group2 word2s, then beau contributes more heavily to the Group A+Group 2 average unaspiratedness rate. So instead we add MaxEnt prediction as a column to the main data frame, and plot it in the same way as the observed probabilities.

Add a column to french that is MaxEnt predicted prob:

#initialize new column
french$MaxEntPrediction <- NA

#look up values
for (i in 1:length(french$word1)) {
  #get the value; we need to use as.character() because "word1" has more levels in the full "french" data frame than in the "maxEntPredictions" data frame.
  maxEntValue <- maxEntPredictions$unaspProb[as.character(maxEntPredictions$word1)==as.character(french$word1[i]) & maxEntPredictions$word2Group==french$aspire_rank_word[i]]
  
  #use it if it's not NA
  if(is.na(maxEntValue[1])==FALSE) { #use just first element of maxEntValue, to avoid throwing warning when maxEntValue is vector of NAs, e.g. c(NA,NA,NA); when it's not NA, it will be a single number
    french$MaxEntPrediction[i] <- maxEntValue
  }
}

Make interaction plot as before:

par(mar=c(5,4,2,0)+0.1)
with(french, {
  interaction.plot(x.factor=as.factor(french$aspire_rank_word), 
  trace.factor=factor(word1_group), 
  response=MaxEntPrediction,
  xlab="Word2 alignancy group",ylab="MaxEnt predicted rate of non-alignment"
  , trace.label="Word1 alignancy group", fixed=TRUE)
})

plot of chunk plotMaxEntPredictionsThreeTraces

#Since this will appear in the paper, also do a PNG file:
png(file="MaxEnt_predictions_plot.png",width=myResMultiplier*460,height=myResMultiplier*300, res=myResMultiplier*72, family=myFontFamily)
par(mar=c(5,4,2,0)+0.1)
with(french, {
  interaction.plot(x.factor=as.factor(french$aspire_rank_word), 
  trace.factor=factor(word1_group), 
  response=MaxEntPrediction,
  xlab="Word2 alignancy group",ylab="MaxEnt predicted rate of non-alignment"
  , trace.label="Word1 alignancy group", fixed=TRUE)
})
dev.off()
## pdf 
##   2

Get a table of constraint names and weights:

#Initialize data frame
maxEntGrammar <- data.frame(constraintName=character(0), constraintWeight = numeric(0))

#Define function to find the header row for this part
getConstraintHeaderRowIndex <- function() {
  for(i in 1:length(maxEntLines)) {
    if(maxEntLines[i]=="|weights| after optimization:") {
      return(i)
    }
  }
}

#Use the function to see where to start for loop
constraintHeaderRowIndex <- getConstraintHeaderRowIndex()

#Extract names and weights and add to data frame
for(i in (constraintHeaderRowIndex+1):(headerRowIndex-1)){
  lineParts <- strsplit(maxEntLines[i],split="\t")
  myConstraintName <- strsplit(lineParts[[1]][1],split=" ")[[1]][1]
  myConstraintWeight <- lineParts[[1]][2]

  maxEntGrammar <- rbind(maxEntGrammar,data.frame(constraintName=myConstraintName, constraintWeight=myConstraintWeight))
  
}

#Print to console, for pasting into Word document
maxEntGrammar
##    constraintName   constraintWeight
## 1         Align_1 10.131961658471631
## 2         Align_2  8.008200752338379
## 3         Align_3  4.950545256495552
## 4         Align_4 2.0633383837085844
## 5         Align_5                0.0
## 6        NoHiatus  6.198762474587922
## 7           UseAu  2.260387406596071
## 8         UseBeau 0.2826195950363789
## 9           UseDe 1.5600878659640336
## 10          UseDu 2.7769451224452797
## 11          UseLa 1.3403914301234388
## 12          UseLe 2.5436458101675825
## 13          UseMa                0.0
## 14       UseVieux 0.8793520626192605

Noisy Harmonic Grammar

We use OTSoft (http://www.linguistics.ucla.edu/people/hayes/otsoft/) to fit a Noisy Harmonic Grammar model using the Gradual Learning Algorithm. Default options were used everywhere:

  • number of times to run learning: 10,000,000
  • apply noise by tableau cell not constraint? NO
  • apply noise after multiplication of weights by violation? NO
  • Allow constraint weights to go negative? NO
  • initial plasticity: 0.01 (used to be 2, but we reduced it to get smaller weights)
  • final plasticity: 0.001
  • number of times to test grammar: 1,000,000
  • default initial rankings (all same)

The only non-default option is that in the main screen, Options > Sort candiates in tableaux by harmony is turned off, to make dealing with the OTSoft output, below, easier.

Read in the Noisy HG output file:

conn <- file("French_for_OTSoft_targetWord1sOnlyTabbedOutput_NHG.txt",open="r")
NHGLines <- readLines(conn)
close(conn)

Find and parse the lines at end that give probabilities for candidates–we need to find the third line that starts with " 1 " :

#Initialize data frame
NHGPredictions <- data.frame(word1=character(0), word2Group=character(0), unaspProb = numeric(0))

#Define function to find the header row for this part
howManyTimes1Seen <- 0 #initialize counter
getNHGStartingRowIndex <- function() {
  for(i in 1:length(NHGLines)) {
    if(str_sub(NHGLines[i],1,4)==" 1 \t") {
      howManyTimes1Seen <- howManyTimes1Seen + 1
      if(howManyTimes1Seen==3) {
        return(i) #once we hit the third one, return the index
      }
    }
  }
}

#Use the function to see where to start for loop
startingRowIndex <- getNHGStartingRowIndex()

for(i in startingRowIndex:length(NHGLines)){
  #split into columns
  lineParts <- strsplit(NHGLines[i],split="\t")
    
  #extract word1, word2 group, and probability of unaspirated ("voculent") candidate
  myWord1 <- strsplit(lineParts[[1]][2],split="+",fixed=TRUE)[[1]][1]
  myGroup <- str_sub(lineParts[[1]][2],-1)
  
  #different lines have candidates in different orders
  if(lineParts[[1]][4]=="consulent") {
    myVoculenceProbability <- 1 - as.numeric(lineParts[[1]][7])/1000000
  }
  else if(lineParts[[1]][4]=="voculent") {
    myVoculenceProbability <- as.numeric(lineParts[[1]][7])/1000000
  }
    
  #add to data frame
  NHGPredictions <- rbind(NHGPredictions,data.frame(word1=myWord1,word2Group=myGroup,unaspProb=myVoculenceProbability))

}

Make an interaction plot:

par(mar=c(5,5,2,6))
with(NHGPredictions, {
  interaction.plot(x.factor=as.factor(NHGPredictions$word2Group), 
  trace.factor=factor(word1), 
  response=unaspProb,
  xlab="Word2 unaspiratedness group",ylab="rate of behaving as if Word2 is unaspirated"
  , trace.label="Word1 group", fixed=TRUE, main="Noisy Harmonic Grammar model")
})

plot of chunk NHGResultsPlot

Just as we did for Maxent, we now group the Word1s into groups A, B, C.

Add a column to french that is NGH predicted prob:

#initialize new column
french$NHGPrediction <- NA

#look up values
for (i in 1:length(french$word1)) {
  #get the value; we need to use as.character() because "word1" has more levels in the full "french" data frame than in the "NHGPredictions" data frame.
  NHGValue <- NHGPredictions$unaspProb[as.character(NHGPredictions$word1)==as.character(french$word1[i]) & NHGPredictions$word2Group==french$aspire_rank_word[i]]
  
  #use it if it's not NA
  if(is.na(NHGValue[1])==FALSE) { #use just first element of NHGValue, to avoid throwing warning when maxEntValue is vector of NAs, e.g. c(NA,NA,NA); when it's not NA, it will be a single number
    french$NHGPrediction[i] <- NHGValue
  }
}

Make interaction plot:

par(mar=c(5,5,3,6))
with(french, {
  interaction.plot(x.factor=as.factor(french$aspire_rank_word), 
  trace.factor=factor(word1_group), 
  response=NHGPrediction,
  xlab="Word2 unaspiratedness group",ylab="NHG predicted rate of behaving as if Word2 is unasp."
  , trace.label="Word1 group", fixed=TRUE)
})

plot of chunk plotNHGPredictionsThreeTraces

#Since this will appear in the paper, also do a PNG file:
png(file="NHG_predictions_plot.png",width=myResMultiplier*460,height=myResMultiplier*300, res=myResMultiplier*72, family=myFontFamily)
par(mar=c(5,4,2,0)+0.1)
with(french, {
  interaction.plot(x.factor=as.factor(french$aspire_rank_word), 
  trace.factor=factor(word1_group), 
  response=NHGPrediction,
  xlab="Word2 alignancy group",ylab="NHG predicted rate of non-alignment"
  , trace.label="Word1 alignancy group", fixed=TRUE)
})
dev.off()
## pdf 
##   2

Get a table of constraint names and weights:

#Initialize data frame
NHGGrammar <- data.frame(constraintName=character(0), constraintWeight = numeric(0))

#Define function to find the header row for this part
getNHGConstraintEndingRowIndex <- function() {
  for(i in 1:length(NHGLines)) {
    if(str_sub(NHGLines[i],1,4)==" 1 \t") { #find first instance of row that starts with "1" (rather than constraint rame)
      return(i)
    }
  }
}

#Use the function to see where to start for loop
constraintEndingRowIndex <- getNHGConstraintEndingRowIndex()

#Extract names and weights and add to data frame
for(i in 1:(constraintEndingRowIndex-1)){
  lineParts <- strsplit(NHGLines[i],split="\t")
  myConstraintName <- lineParts[[1]][1]
  myConstraintWeight <- as.numeric(lineParts[[1]][2])

  NHGGrammar <- rbind(NHGGrammar,data.frame(constraintName=myConstraintName, constraintWeight=myConstraintWeight)) 
}

#Print to console, for pasting into Word document
NHGGrammar
##    constraintName constraintWeight
## 1         Align_1          17.5130
## 2         Align_2          14.9080
## 3         Align_3           9.5782
## 4         Align_4           3.7487
## 5         Align_5           0.1941
## 6        NoHiatus          12.1251
## 7           UseAu           4.6129
## 8         UseBeau           0.2151
## 9           UseDe           3.3023
## 10          UseDu           5.5242
## 11          UseLa           2.9153
## 12          UseLe           5.0722
## 13          UseMa           0.0130
## 14       UseVieux           1.9459

Stochastic OT and Stratal OT with multiple runs

The GLA is not deterministic, and results can vary substantially from one run to the next. We used a (non-distributed) version of OTSoft that can run the GLA 10, 100, or 1000 times on the same input file, collating the results in a single file. We can then choose the best-fit model from this set.

We did this three times, twice in an attempt to find the best StOT model, and once to find the best stratal Anttilan model–in each case, the learner was run 1000 times:

  • NoMagri:
  • number of times to go through data: 1000000
  • starting plasticity: 2
  • ending plasticity: 0.001
  • number of times to test grammar: 100000
  • use Magri update rule: no

  • WithMagri:
  • number of times to go through data: 1000000
  • starting plasticity: 2
  • ending plasticity: 0.001
  • number of times to test grammar: 100000
  • use Magri update rule: yes

  • Stratal:
  • number of times to go through data: 1000000
  • starting plasticity: 20
  • ending plasticity: 20
  • number of times to test grammar: 100000
  • use Magri update rule: no

To examine the results, we read in the collated files:

conn <- file("CollateRuns_NoMagri.txt",open="r")
collated_NoMagri <- readLines(conn)
close(conn)

conn <- file("CollateRuns_WithMagri.txt",open="r")
collated_WithMagri <- readLines(conn)
close(conn)

conn <- file("CollateRuns_Stratal.txt",open="r")
collated_Stratal <- readLines(conn)
close(conn)

A function to read in all 1000 runs from a collated file, and find the one with the best log likelihood: (I wrote this function in a lazy way so that if the very last grammar gets ignored, even if it’s the best one; this is equivalent to doing 999 runs rather than 1000)

find_best_grammar <- function(collatedFile, backoff=1/100000) {
  #initialize
  best_log_like <- -Inf #the log likelihood to beat; starts out infinitely bad
  current_log_like <- 0
  
  best_grammar <- data.frame(constraintName=character(0), rankingValue=numeric(0))
  current_grammar <- data.frame(constraintName=character(0), rankingValue=numeric(0))
  
  best_probabilities <- data.frame(input=character(0), output=character(0), observedFreq=numeric(0), predictedProb=numeric(0))
  current_probabilities <- data.frame(input_word1=character(0), input_word2_group=character(0), output=character(0), observedFreq=numeric(0), predictedProb=numeric(0))
  
  current_index <- 1
  
  #step through lines of collated file
  for(i in 1:length(collatedFile)) {
    #parse the line
    myLine <- strsplit(collatedFile[i], split="\t")
    
    #if index has gone up, then before going on to the next grammar, it's time to assess the one just finished
    if(as.numeric(myLine[[1]][2]) > current_index) {
      if(current_log_like > best_log_like) { #this grammar is the new winner
        best_log_like <- current_log_like
        best_grammar <- current_grammar
        best_probabilities <- current_probabilities
      }
      #either way [i.e., whether latest grammar was new winner or not], start the grammar, probabilities, and log likelihood fresh and update the index
      current_grammar <- data.frame(constraintName=character(0), rankingValue=numeric(0))
      current_probabilities <- data.frame(input_word1=character(0), input_word2_group=character(0), output=character(0), observedFreq=numeric(0), predictedProb=numeric(0))
      current_log_like <- 0
      current_index <- as.numeric(myLine[[1]][2])
    }
    
    #process current line
    if(myLine[[1]][1] == "G") { #if starts with G, add to grammar
        current_grammar <- rbind(current_grammar, data.frame(constraintName=myLine[[1]][3], rankingValue=myLine[[1]][4]))  
    } else if(myLine[[1]][1] == "O" & (myLine[[1]][6]=="consulent" | myLine[[1]][6]=="voculent")) { #if starts with O [that's a capital letter, not a number], add to probabilities, and update log likelihood; don't both if it's one of those weird lines at the end of each group with a nonexistent output
        #split out the word1 and the word2-group
        myWord1 <- strsplit(myLine[[1]][4],split="+",fixed=TRUE)[[1]][1]
        myGroup <- str_sub(myLine[[1]][4],-1)
      
      #add line to data frame
        current_probabilities <- rbind(current_probabilities, data.frame(input_word1=myWord1, input_word2_group=myGroup, output=myLine[[1]][6], observedFreq=myLine[[1]][7], predictedProb=myLine[[1]][8]))
        myProb <- as.numeric(myLine[[1]][8])
        if(myProb == 0) {
          myProb <- backoff
        } else if (myProb==1) {
          myProb <- 1 - backoff
        }                
        current_log_like <- current_log_like +  as.numeric(myLine[[1]][7])*log(myProb)                              
    }
  }
  #return best grammar, probabilities, and log likelihood
  return(c(best_grammar,best_probabilities,best_log_like))
}

Find the best grammar from each group:

best_NoMagri <- find_best_grammar(collated_NoMagri) #-233.6096
best_WithMagri <- find_best_grammar(collated_WithMagri) #-238.9304
best_Stratal <- find_best_grammar(collated_Stratal) #-410.639

Print the grammars and their log likelihoods:

best_NoMagri_grammar <- data.frame(matrix(unlist(best_NoMagri[1:2]), nrow=14, byrow=F))
colnames(best_NoMagri_grammar) <- names(best_NoMagri)[1:2]
best_NoMagri_grammar
##    constraintName       rankingValue
## 1         Align_1   193.52133194562 
## 2         Align_2  192.393226945096 
## 3         Align_3  188.270604312033 
## 4         Align_4  182.861897681478 
## 5         Align_5 -344.226598038655 
## 6        NoHiatus   187.17953715443 
## 7           UseAu  181.113234466165 
## 8         UseBeau  11.7665862139545 
## 9           UseDe  90.6703754255064 
## 10          UseDu  183.058838889673 
## 11          UseLa  76.7507514227784 
## 12          UseLe  182.891195260894 
## 13          UseMa -34.5212974679407 
## 14       UseVieux  21.0907786345409
best_NoMagri[[8]][1] #log likelihood
## [1] -233.6
best_WithMagri_grammar <- data.frame(matrix(unlist(best_WithMagri[1:2]), nrow=14, byrow=F))
colnames(best_WithMagri_grammar) <- names(best_WithMagri)[1:2]
best_WithMagri_grammar
##    constraintName       rankingValue
## 1         Align_1   64.183428237863 
## 2         Align_2 -145.500889005994 
## 3         Align_3 -151.031979970691 
## 4         Align_4 -222.331583684614 
## 5         Align_5 -1087.46722000522 
## 6        NoHiatus -151.105564779652 
## 7           UseAu -155.137618634362 
## 8         UseBeau -154.041881165981 
## 9           UseDe -157.627182669382 
## 10          UseDu -154.967224313928 
## 11          UseLa -155.116729593185 
## 12          UseLe -155.193752930466 
## 13          UseMa  -154.76460367086 
## 14       UseVieux -155.299251450449
best_WithMagri[[8]][1] #log likelihood
## [1] -238.9
best_Stratal_grammar <- data.frame(matrix(unlist(best_Stratal[1:2]), nrow=14, byrow=F))
colnames(best_Stratal_grammar) <- names(best_Stratal)[1:2]
best_Stratal_grammar
##    constraintName rankingValue
## 1         Align_1        3460 
## 2         Align_2        3460 
## 3         Align_3        3420 
## 4         Align_4        3420 
## 5         Align_5      -16580 
## 6        NoHiatus        3420 
## 7           UseAu        3400 
## 8         UseBeau       -3100 
## 9           UseDe       -1820 
## 10          UseDu        3380 
## 11          UseLa       -1260 
## 12          UseLe        3400 
## 13          UseMa       -4640 
## 14       UseVieux       -1880
best_Stratal[[8]][1] #log likelihood
## [1] -410.6

Plot the wugs for the NoMagri StOT model:

#Get the items and probabilities into a data frame
best_NoMagri_probs <- data.frame(matrix(unlist(best_NoMagri[3:7]), nrow=80, byrow=F))
colnames(best_NoMagri_probs) <- names(best_NoMagri)[3:7]

#Make an interaction plot:
par(mar=c(5,5,2,6))
with(best_NoMagri_probs[best_NoMagri_probs$output=="voculent",], {
  interaction.plot(x.factor=as.factor(best_NoMagri_probs[best_NoMagri_probs$output=="voculent",]$input_word2_group), 
  trace.factor=factor(input_word1), 
  response=as.numeric(as.character(predictedProb)),
  xlab="Word2 unaspiratedness group",ylab="rate of Word2 unasp., best-fit StOT model"
  , trace.label="Word1 group", fixed=TRUE)
})

plot of chunk plotWugForBestNoMagri

And for the Stratal model:

#Get the items and probabilities into a data frame
best_Stratal_probs <- data.frame(matrix(unlist(best_Stratal[3:7]), nrow=80, byrow=F))
colnames(best_Stratal_probs) <- names(best_Stratal)[3:7]

#Make an interaction plot:
par(mar=c(5,5,2,6))
with(best_Stratal_probs[best_Stratal_probs$output=="voculent",], {
  interaction.plot(x.factor=as.factor(best_Stratal_probs[best_Stratal_probs$output=="voculent",]$input_word2_group), 
  trace.factor=factor(input_word1), 
  response=as.numeric(as.character(predictedProb)),
  xlab="Word2 unaspiratedness group",ylab="rate of Word2 unasp., best-fit Stratal model"
  , trace.label="Word1 group", fixed=TRUE)
})

plot of chunk plotWugForBestStratal

As with Maxent, we now group the Word1s into groups A, B, C.

#Add columns to `french` that is StOT predicted prob (no Magri):
#initialize new column
french$StOTPrediction_best_NoMagri <- NA
french$StOTPrediction_best_Stratal <- NA

#look up values
for (i in 1:length(french$word1)) {
  #get the value; we need to use as.character() because "word1" has more levels in the full "french" data frame than in the "StOTPredictions" data frame.
    StOTPrediction_best_NoMagri <- best_NoMagri_probs$predictedProb[as.character(best_NoMagri_probs$input_word1)==as.character(french$word1[i]) & best_NoMagri_probs$input_word2_group==french$aspire_rank_word[i]  & best_NoMagri_probs$output=="voculent"]

    StOTPrediction_best_Stratal <- best_Stratal_probs$predictedProb[as.character(best_Stratal_probs$input_word1)==as.character(french$word1[i]) & best_Stratal_probs$input_word2_group==french$aspire_rank_word[i]  & best_Stratal_probs$output=="voculent"]

  #use it if it's not NA
  if(is.na(StOTPrediction_best_NoMagri[1])==FALSE) { #use just first element of NHGValue, to avoid throwing warning when maxEntValue is vector of NAs, e.g. c(NA,NA,NA); when it's not NA, it will be a single number
    french$StOTPrediction_best_NoMagri[i] <- as.numeric(as.character(StOTPrediction_best_NoMagri))
  }
  if(is.na(StOTPrediction_best_Stratal[1])==FALSE) { 
    french$StOTPrediction_best_Stratal[i] <- as.numeric(as.character(StOTPrediction_best_Stratal))
  }
}

#Make interaction plots as before:
par(mar=c(5,4,2,0)+0.1)
with(french, {
  interaction.plot(x.factor=as.factor(french$aspire_rank_word), 
  trace.factor=factor(word1_group), 
  response=StOTPrediction_best_NoMagri,
  xlab="Word2 unaspiratedness group",ylab="best StOT model's predicted unasp. rate"
  , trace.label="Word1 group", fixed=TRUE)
})

plot of chunk groupedWugsForBestGrammars

with(french, {
  interaction.plot(x.factor=as.factor(french$aspire_rank_word), 
  trace.factor=factor(word1_group), 
  response=StOTPrediction_best_Stratal,
  xlab="Word2 unaspiratedness group",ylab="best Stratal model's predicted unasp. rate"
  , trace.label="Word1 group", fixed=TRUE)
})

plot of chunk groupedWugsForBestGrammars

#Since this will appear in the paper, also do a PNG file:
png(file="best_StOT_plot.png",width=myResMultiplier*460,height=myResMultiplier*300, res=myResMultiplier*72, family=myFontFamily)
par(mar=c(5,4,2,0)+0.1)
with(french, {
  interaction.plot(x.factor=as.factor(french$aspire_rank_word), 
  trace.factor=factor(word1_group), 
  response=StOTPrediction_best_NoMagri,
  xlab="Word2 alignancy group",ylab="best StOT model's predicted non-alignment rate"
  , trace.label="Word1 alignancy group", fixed=TRUE)
})
dev.off()
## pdf 
##   2
png(file="best_Stratal_plot.png",width=myResMultiplier*460,height=myResMultiplier*300, res=myResMultiplier*72, family=myFontFamily)
par(mar=c(5,4,2,0)+0.1)
with(french, {
  interaction.plot(x.factor=as.factor(french$aspire_rank_word), 
  trace.factor=factor(word1_group), 
  response=StOTPrediction_best_Stratal,
  xlab="Word2 alignancy group",ylab="best Stratified model's predicted non-alignment rate"
  , trace.label="Word1 alignancy group", fixed=TRUE)
})
dev.off()
## pdf 
##   2

Pseudo-Hasse diagram for StOT model

Vertical axis represents ranking value, with all the bottom constraints clumped together.

constraint_names <- c("A\U029F\U026A\U0262\U0274—1 (193.52)","A\U029F\U026A\U0262\U0274—2 (192.39)","A\U029F\U026A\U0262\U0274—3 (188.27)","A\U029F\U026A\U0262\U0274—4 (182.66)","N\U1D0FH\U026A\U1D00\U1D1B\U1D1Cs (187.18)","Us\U1D07 A\U1D1C (181.11)","Us\U1D07 L\U1D07\n(182.89)","Us\U1D07 D\U1D1C\n(183.06)", "Us\U1D07 D\U1D07 (90.67)\nUs\U1D07 L\U1D00 (76.75)\nUs\U1D07 V\U026A\U1D07\U1D1Cx (21.09)\nUs\U1D07 B\U1D07\U1D00\U1D1C (11.77)\nUs\U1D07 M\U1D00 (-34.52)","A\U029F\U026A\U0262\U0274—5 (-344.23)")
                      
dummyValueForBottom <- 180
#x <- c(20,20,20,5,20,20,20,35,20) #horizontal positions
x <- c(30,30,30,30, 20, 10,8,12,10, 30) #horizontal positions
y <- c(193.52,192.39,188.27,182.66,187.18,181.11,182.89,183.06,dummyValueForBottom-6,dummyValueForBottom-8) #ranking values
vert_increment <- 0.4

plot(x,y, xlab="",ylab="ranking value",type="n",xaxt="n",yaxt="n",xlim=c(5,35),ylim=c(dummyValueForBottom-8,194))
suppressWarnings(text(x,y,labels=constraint_names, cex=1))
yat <- pretty(y)
yat <- yat[yat>dummyValueForBottom]
axis(2,at=yat)
axis.break(2,dummyValueForBottom-2,style="slash")

#Add in line segments
segments(x[1],y[1]-vert_increment,x[2],y[2]+vert_increment) #add line segment from Align1 to Align2
segments(x[2],y[2]-vert_increment,x[3],y[3]+vert_increment) #from Align2 to Align3
segments(x[3],y[3]-vert_increment,x[4],y[4]+vert_increment) #from Align3 to Align4
segments(x[4],y[4]-vert_increment,x[10],y[10]+vert_increment) #from Align4 to Align5
segments((x[7]+x[8])/2,(y[7]+y[8])/2-vert_increment-0.5,x[6],y[6]+vert_increment) #from UseLe/UseDu to UseAu
segments(x[6],y[6]-vert_increment,x[9],y[9]+2+vert_increment-0.5) #from UseAu to big clump

plot of chunk pseudoHasse2

#write it to a file
png(file="French_PseudoHasse2.png",width=myResMultiplier*460,height=myResMultiplier*460, res=myResMultiplier*72,family=myFontFamily)
par(mar=c(0,4,0,0)+0.1)
plot(x,y, xlab="",ylab="ranking value",type="n",xaxt="n",yaxt="n",xlim=c(5,35),ylim=c(dummyValueForBottom-8,194))
#text(x,y,labels=constraint_names, cex=c(1,1,1,1,1.3,1,1,1,1,1,1,1,1,1))
text(x,y,labels=constraint_names, cex=1)
yat <- pretty(y)
yat <- yat[yat>dummyValueForBottom]
axis(2,at=yat)
axis.break(2,dummyValueForBottom-2,style="slash")

#Add in line segments
segments(x[1],y[1]-vert_increment,x[2],y[2]+vert_increment) #add line segment from Align1 to Align2
segments(x[2],y[2]-vert_increment,x[3],y[3]+vert_increment) #from Align2 to Align3
segments(x[3],y[3]-vert_increment,x[4],y[4]+vert_increment) #from Align3 to Align4
segments(x[4],y[4]-vert_increment,x[10],y[10]+vert_increment) #from Align4 to Align5
segments((x[7]+x[8])/2,(y[7]+y[8])/2-vert_increment-0.5,x[6],y[6]+vert_increment) #from UseLe/UseDu to UseAu
segments(x[6],y[6]-vert_increment,x[9],y[9]+2+vert_increment-0.5) #from UseAu to big clump
dev.off()
## pdf 
##   2

Log likelihood of each constraint model

Note that we already retrieved the log likelihood of the multiplicative model above. For reference, it is -1 * myOptimization\(value, or `{r} -1*myOptimization\)value`. We also already got them for the best StOT and Stratal models.

Make a function for getting log Likelihood using v_and_c_table_subset and french:

#for StOT and NHG models, where we use similation to get predicted probabilities, we'll get some zeroes. Since we run 1,000,000 trials, we back of zeros to 1/1000000 by default.

getLogLike <- function(colName, backoff = 1/1000000) {
  log_likelihood <- 0 #initialize
  for(i in levels(v_and_c_table_subset$word1)) {
    for(j in levels(v_and_c_table_subset$word2_group)) {
      v_value <- french[,colName][french$word1==i & french$aspire_rank_word==j][1]
      if(v_value==0) { #avoid log of 0
        v_value <- v_value + backoff
      }
      else if(v_value == 1) { #avoid log of 1-1=0
        v_value <- v_value - backoff
      } #multiply log prob (according to model) by (token-weighted type) frequency
      log_likelihood <- log_likelihood + v_and_c_table_subset$v_count[v_and_c_table_subset$word1==i & v_and_c_table_subset$word2_group==j] * log(v_value) + v_and_c_table_subset$c_count[v_and_c_table_subset$word1==i & v_and_c_table_subset$word2_group==j] * log(1-v_value)
      }
    }
  return(log_likelihood)
}

Now get them all, including multiplicative model, as a check that the function is correct. I’m doing it with the default backoff (1/1000000), appropriate for NHG, where grammar was checked 1,000,000 times, and also the backoff used for picking best StOT and Stratal grammars (100,000, since those grammars were checked 100,000 times)

#multiplicative:
getLogLike("multiplicativePrediction") #-207.9451
## [1] -207.9
# MaxEnt:
getLogLike("MaxEntPrediction") #-197.7132
## [1] -197.7
#NHG:
getLogLike("NHGPrediction") #-198.7964
## [1] -198.8
getLogLike("NHGPrediction", backoff=1/100000) #-198.7964: no 0s or 1s, so backoff doesn't matter
## [1] -198.8
#best StOT (no Magri):
getLogLike("StOTPrediction_best_NoMagri") #-241.1755
## [1] -241.2
getLogLike("StOTPrediction_best_NoMagri", backoff=1/100000) #-233.6096
## [1] -233.6
#best Stratal (no Magri):
getLogLike("StOTPrediction_best_Stratal") #-443.9495
## [1] -443.9
getLogLike("StOTPrediction_best_Stratal", backoff=1/100000) #-410.639
## [1] -410.6

Baseline (“perfect”) model

For comparison, we also include a baseline model, with perfect frequency matching for each combination of word2-group and word1. It will still not fit the data perfectly, because it it predicts voculence 60% of the time, for a category that has 60% voculence, it will still sometimes predict voculence when the datum is non-voculent (0.6 * 0.4), and vice-versa (0.4 * 0.6), for a total error rate of 48%. This provides a ceiling on how well any model (that makes the same distinctions ours do) could do.

To do this, we add another column to the french data frame that has the overall rate for that word1-word2group combination.

french$voculence_for_this_bin <- NA #initialize the new column
for(i in 1:length(french$voculence)) { 
  #look up the value in v_and_c_table
  myValue <- v_and_c_table_subset$v_count[as.character(v_and_c_table_subset$word1)==as.character(french$word1[i]) & v_and_c_table_subset$word2_group==french$aspire_rank_word[i]] / (v_and_c_table_subset$v_count[as.character(v_and_c_table_subset$word1)==as.character(french$word1[i]) & v_and_c_table_subset$word2_group==french$aspire_rank_word[i]] + v_and_c_table_subset$c_count[as.character(v_and_c_table_subset$word1)==as.character(french$word1[i]) & v_and_c_table_subset$word2_group==french$aspire_rank_word[i]])
  #use it only if match achieved; otherwise leave it NA
  if(length(myValue) > 0) {
    french$voculence_for_this_bin[i] <- myValue
  }
}

Now we can use the function above to get the log likelihood::

getLogLike("voculence_for_this_bin") #-189.3649
## [1] -189.4