Email Correspondence with Omer Preminger, Feb./Mar. 2019

Thanks to Omer for his thoughtful attention to our paper; here we provide the text of our email correspondence.

OMER 2/26/19

Hi Canaan and Bruce,

I read with interest the paper you just posted to lingbuzz, Phonological markedness effects in sentence formation, and I have a question about your methodological and architectural assumptions – in particular as they play out in Section 9 of the paper.

If I understand correctly, you are assuming that if speakers can be shown to systematically (where "systematically" just means "in a statistically significant way that doesn't reduce to other known factors") prefer syntactic structure S' over syntactic structure S (where "structure" includes both the choice of terminals and their arrangement), and the choice of S' over S is phonologically motivated, then this constitutes evidence in favor of a parallel architecture (à la Jackendoff, say) – and against fully modularized theories (à la Zwicky & Pullum, say).

I might be confused, but this seems to me to be a conflation of competence with performance. There are all kinds of things we can do with our competence systems; for example, suppose that people modulate their VOT based on contextual and social variables (I haven't been keeping up with this particular literature, but at least at some point this was considered an established result). Does this mean that the cognitive component that keeps track of social variables is not fully encapsulated from the component that governs VOT, and, instead, the two are part of one single gigantic parallel architecture (which, at that point, might as well encompass all of cognition)? An alternative view of the latter case, which is more attractive in my mind, would be to say that the competence system is compatible with various settings for VOT; and within the bounds that the competence system permits, which 'derivations' are actually utilized is something that can be modulated by all kinds of things, both linguistic and extra-linguistic.
But then, the autonomy of syntax is the proverbial gander to our VOT goose: just because syntax allows S and S', and S' is chosen over S for phonological reasons, in no way means that syntax is part of a single parallel architecture that includes phonology.

One could wonder what, then, would constitute evidence in favor of a parallel architecture. My answer would be: nothing. There can be no evidence in favor of the weakest possible assumption, only evidence against stronger assumptions. So the question, I'd say, is what would be evidence against the autonomy of syntax. This, I think, would have to take the form of a demonstration that syntax is fundamentally shaped by phonological and/or semantic forces. And I don't think such a demonstration is forthcoming.

Now: I presume, given that you wrote the paper that you wrote, that you wouldn't endorse what I have said here. If you could spare the time & energy, could you pinpoint for me where we part ways (and why)?

Thanks in advance,
– Omer

BRUCE, 2/26/19

Thanks, Omer.  Let me get together with Canaan soon and we can pen a somewhat less off-the-cuff reply than otherwise.

Best regards,
Bruce

OMER 2/26/19

I really appreciate that – thanks!

- Omer

OMER 3/2/19

Hi again, Canaan and Bruce,
I've been putting together a slightly longer post on this matter that I plan to post on my personal blog at some point. It's not up, yet, as I'm still tinkering with it, but here's a private link to the current draft: https://omer.lingsite.org/?p=1986&shareadraft=baba1986_5c791f29c30eb

If you want, I'd be glad to wait until you send me a reply, and I'll post your reply together with the post itself. Or we can have the debate over the comments underneath the blogpost. (Or on any other platform, if you prefer.) I've become convinced, though, that this is probably a discussion to be had publicly. That's because, to the extent that what I've identified is a real problem in the logic of the argument (and you may disagree that it is), it's not unique to your piece, but extends to the literature you build on (for example, Stephanie Shih's work on this topic has the same logical structure; though, notably, not Arto Anttila's 2016 paper, which I found quite reasonable in its assessment of the relationship between his results and phonology-free theories of syntax).

Anyway, needless to say (but I'll say it anyway), you are under no obligation to respond to my message or my posts one way or another! My blog is, in the grand scheme of things, a tiny, seldom-visited corner of the linguistics landscape; and you might both have generally more pressing and important things to tend to right now. Keeping all of this in mind, if you would nevertheless like to go with the joint, post-and-reply published together route, please let me know in the next couple of days so I can make sure and hold off on publishing the post until I get your reply.
Yours,
– Omer

BRUCE AND CANAAN, 3/4/19

Dear Omer,

Thanks for your interest in our paper; we finally got some time to read your email together and discuss it. Until the third read-through it seemed paradoxical to us, but now we think we understand better. There seem to be three points under discussion.

1. What the evidence is telling us

We take your main point to be:  we and the like-minded researchers we cite have no argument at all in favor of a parallel model of grammatical organization. On reflection, we decided we actually agree with you on this point. In truth, our real argument is simply that any serious model of feed-forward grammatical architecture (i.e., one that is set up rigorously enough to have empirical teeth) faces severe problems with language data. Then the rest of the story, at present, is perhaps just a matter of intellectual taste. Our own taste as Optimality theorists leads us to favor the parallel model, though it would be interesting to try to figure out alternative models that also would be compatible with the facts (see also below).

2. Language and the grand cognitive system

You also raise the question of how big we should assume that the cognitive system underlying language ought to be, and particularly whether factors such as speaking style or attitude should play a role. Again, it’s conceivable that (as you suggest) a colossal parallel model, embracing all of human thought and feeling, is appropriate; but in this case our view is that a fairly constrained sort of interface might be possible. On the basis of sociolinguistic work (see particularly the classic paper by Labov 1972 we have attached), it seems to us that style-sensitive linguistic processes may characteristically be in lockstep, responding to style in parallel ways. For instance, Labov found that several phonological processes of New York City English ( “aw” Diphthongization, “ae” Diphthongization, Theta Hardening, R Dropping) responded in parallel as he systematically varied the speaking style he elicited from his NYC consultants. The lockstep phenomenon suggests to us that there might be some kind of “knob”, located on the outside of the “grammar box”, and interfacing with the broader cognitive system. Speaking style, governed by extralinguistic cognition, turns the knob, and the knob simultaneously governs all the stylistically-sensitive processes of grammar.

Generative research on knobs is in its infancy. But we attach a paper (Phonology) we find intriguing, in which Coetzee and Kawahara formalize a different knob (the word frequency knob), incorporating it into a grammar that predicts the pattern of Alveolar Dropping in English and Geminate Devoicing in Japanese.

The style knob is probably connected to syntactic phenomena as well; for instance in English it seems clear that Pied Piping is stylistically governed.

3. “Grammar offers range of options, and Some Other Component chooses among them.”

At one point in your email you put forth this view; it has also suggested by Anttila (2016).  For us, the key is our belief that both syntax and probability are probabilistic; they assign probabilities to candidates. If so, the sequential and parallel approaches are merely notational variants.  This is because probabilities multiply, and multiplication is a commutative operation. The putative intermediate level (output of syntax, input to phonology) therefore has no observable consequences.

There is a perhaps-related view:  some scholars think that a chunk of the grammatical system is not probabilistic (e.g., a “pure” grammar dealing with nothing but categories, feeding into a “performance component” that is probabilistic). Such suggestions are usually answered by gradience-oriented scholars thus:  look inside your putative “performance component”, and you will discover that its substantive content is regulated by the very same constraints that are found in the “pure grammar”, but with lower weights. There is no gain in positing separate components, and there is loss of generality.

Once again, thanks again for taking an interest in our paper.

Yours very truly,
Canaan Breiss
Bruce Hayes

OMER 3/4/19

Hi,

This is great! Thanks for both of the papers, I look forward to having a look.

Your response mostly makes sense to me, except for these parts (and, unfortunately, I think a lot of the logic rides on the fate of these parts):

> In truth, our real argument is simply that any serious model of feed-forward grammatical architecture (i.e., one that is set up rigorously enough to have empirical teeth) faces severe problems with language data.

I still don't see how this follows from your data. No physicists has the faintest idea of how to model leaves blowing in the wind. Is this a sign that physics doesn't have "empirical teeth" or that it "faces severe problems with [physical] data"?

Explanatory theories are developed against data collected in highly artificial settings. This is a feature, not a bug. If you believe the natural world is a massive interaction effect (which I do, and I think this most certainly extends to language) then looking at naturalistic data is as facile in theoretical linguistics as in theoretical physics.

> The lockstep phenomenon suggests to us that there might be some kind of “knob”, located on the outside of the “grammar box”, and interfacing with the broader cognitive system.

The "knob" idea strikes me as intriguing, but I don't quite see how it solves the modularity issue. The "knob" is either itself an object of social cognition, or it is modulated by some object of social cognition – and either way, this defeats the idea of a firewall between the grammar and social cognition if the grammar is treated as a performance system. My suggestion remains as it was: separate social cognition from the language module entirely, and within the latter, separate phonology from syntax entirely. If the language module in general, and syntax in particular, are treated as competence systems, then the fact that things outside the competence module can affect the choices among the candidates that the competence module makes available is not an obstacle to modularity.

NB: Raising the possibility that "a colossal parallel model, embracing all of human thought and feeling, is appropriate" was intended, on my part, as a reductio ad absurdum of a particular approach to modularity. (The one where if module Y gets to affect the choice between multiple representations that module X makes available, then Y is not encapsulated from X.) But it seems you took it as a serious suggestion...? I guess one person's modus ponens is another's modus tollens...!

Anyway, it sounds like you both are busy, so I won't belabor these issues further right now – looking forward to hearing from you,
– Omer

OMER, 3/5/19

Short follow-up: I think I'm going to go ahead and publish my blog post, and you can reply whenever and wherever you see fit, incl. on the pages of my blog itself (where you have a standing invitation to post your rebuttal) – sound good?

BRUCE 3/5/19

Hi Omer,

As you please.

I have a sneaking suspicion that the disagreements would arise in simpler cases than in our paper. Are we disagreeing about probabilistic approaches to grammar?  Then

Bresnan, J., Cueni, A., Nikitina, T., & Baayen, R. H. (2007). Predicting the dative alternation. In G. Bouma, I. Kraemer, & J. Zwarts (Eds.), Cognitive foundations of interpretation (pp. 69-94). Amsterdam: KNAW.

would be a clearer case to discuss. Are we disagreeing about feed-forward per se?  Then Shih and Zuraw, cited by us, would be a clearer case.  Our work presupposes both earlier types of research result and so is perhaps down the path a bit.

Regards,
Bruce

OMER, 3/5/19

No, I don't think probabilistic grammars are the content of our disagreement, or at least not the crux of it. I am happy to assume that the output of syntax is not a set of <PF, LF> pairings (as the post assumes), but rather a probability distribution over said pairs (including, perhaps, a zero probability for some), and that this probability distribution is then convoluted with a second probability distribution gleaned from the phonological component. This is still a fully feed-forward system, in that nothing about the syntactic computation needs to know anything about the phonological system in the competence grammar. In production, it is a different story; but like we said, in production you would also need to know what the sexual orientation of your interlocutors is to modulate your VOT, or, in a language like Korean, what the relative age of you and your interlocutors are to choose the proper honorifics, etc. etc. – things whose integration into the grammar I consider to be beyond the pale in the first place. (Hence, in a sense, we have in these results yet-another-reason to distinguish competence from performance.)

OMER, 3/5/19

Sorry, I was trying to add something to the reply below and instead just sent it again :-/ What I meant to add was the following:

I can imagine the response to <content of last email> being that convolution is commutative, and so there is no empirical content to calling the picture I sketched "feed-forward." Let me say that even if that's mathematically true (and it of course is), that's not an argument in the modularity debate, for the reasons we've already covered. Social cognition is also probabilistic, I'd assume, and so its contributions – if they combine via convolution with the outputs of the grammar – also commute in the same manner. The result of this line of thinking is, then, mental globalism (or at least globalism for all probabilistic components of the mind that connect to one another at all). Since we have plenty of evidence from neuropathology that the mind is modular, the argument from commutativity must be incorrect (since it leads to a consequent that we know to be false). I therefore assume that it is.
You could then ask what the content of the feed-forward approach is, in view of this commutativity. My answer would be that its content lies in the explanatory avenues it opens up, e.g. the explanation for the vast swaths of syntax in which phonology seems to play no role whatsoever, as well as all the evidence adduced in favor of "late insertion" within the Distributed Morphology framework (stated somewhat more theory-neutrally, this is evidence that the insertion context for "morphemes" is derived syntactic structure, which implies ordering).


Bruce Hayes's Web Page