Email Correspondence with Omer Preminger, Feb./Mar. 2019
Thanks to Omer for his thoughtful attention to our paper; here we provide the text of our email correspondence.
OMER 2/26/19Hi Canaan and Bruce,
I read with interest the paper you just posted to lingbuzz,
Phonological markedness effects in sentence formation, and I have a
question about your methodological and architectural assumptions – in
particular as they play out in Section 9 of the paper.
If I understand correctly, you are assuming that if
speakers can be shown to systematically (where "systematically" just
means "in a statistically significant way that doesn't reduce to other
known factors") prefer syntactic structure S' over syntactic structure
S (where "structure" includes both the choice of terminals and their
arrangement), and the choice of S' over S is phonologically motivated,
then this constitutes evidence in favor of a parallel architecture (à
la Jackendoff, say) – and against fully modularized theories (à la
Zwicky & Pullum, say).
I might be confused, but this seems to me to be a conflation of
competence with performance. There are all kinds of things we can do
with our competence systems; for example, suppose that people modulate
their VOT based on contextual and social variables (I haven't been
keeping up with this particular literature, but at least at some point
this was considered an established result). Does this mean that the
cognitive component that keeps track of social variables is not fully
encapsulated from the component that governs VOT, and, instead, the two
are part of one single gigantic parallel architecture (which, at that
point, might as well encompass all of cognition)? An alternative view
of the latter case, which is more attractive in my mind, would be to
say that the competence system is compatible with various settings for
VOT; and within the bounds that the competence system permits, which
'derivations' are actually utilized is something that can be modulated
by all kinds of things, both linguistic and extra-linguistic.
But then, the autonomy of syntax is the proverbial gander to our VOT
goose: just because syntax allows S and S', and S' is chosen over S for
phonological reasons, in no way means that syntax is part of a single
parallel architecture that includes phonology.
One could wonder what, then, would constitute evidence in favor of a
parallel architecture. My answer would be: nothing. There can be no
evidence in favor of the weakest possible assumption, only evidence
against stronger assumptions. So the question, I'd say, is what would
be evidence against the autonomy of syntax. This, I think, would have
to take the form of a demonstration that syntax is fundamentally shaped
by phonological and/or semantic forces. And I don't think such a
demonstration is forthcoming.
Now: I presume, given that you wrote the paper that you
wrote, that you wouldn't endorse what I have said here. If you could
spare the time & energy, could you pinpoint for me where we part
ways (and why)?
Thanks in advance,
– Omer
BRUCE, 2/26/19
Thanks, Omer. Let me get together with Canaan soon and we can pen a somewhat less off-the-cuff reply than otherwise.
Best regards,
Bruce
OMER 2/26/19
I really appreciate that – thanks!
- Omer
OMER 3/2/19
Hi again, Canaan and Bruce,
I've been putting together a slightly longer post on this matter that I
plan to post on my personal blog at some point. It's not up, yet, as
I'm still tinkering with it, but here's a private link to the current
draft:
https://omer.lingsite.org/?p=1986&shareadraft=baba1986_5c791f29c30eb
If you want, I'd be glad to wait until you send me a reply, and I'll
post your reply together with the post itself. Or we can have the
debate over the comments underneath the blogpost. (Or on any other
platform, if you prefer.) I've become convinced, though, that this is
probably a discussion to be had publicly. That's because, to the extent
that what I've identified is a real problem in the logic of the
argument (and you may disagree that it is), it's not unique to your
piece, but extends to the literature you build on (for example,
Stephanie Shih's work on this topic has the same logical structure;
though, notably, not Arto Anttila's 2016 paper, which I found quite
reasonable in its assessment of the relationship between his results
and phonology-free theories of syntax).
Anyway, needless to say (but I'll say it anyway), you are under no
obligation to respond to my message or my posts one way or another! My
blog is, in the grand scheme of things, a tiny, seldom-visited corner
of the linguistics landscape; and you might both have generally more
pressing and important things to tend to right now. Keeping all of this
in mind, if you would nevertheless like to go with the joint,
post-and-reply published together route, please let me know in the next
couple of days so I can make sure and hold off on publishing the post
until I get your reply.
Yours,
– Omer
BRUCE AND CANAAN, 3/4/19
Dear Omer,
Thanks for your interest in our paper; we finally got some time to read
your email together and discuss it. Until the third read-through it
seemed paradoxical to us, but now we think we understand better. There
seem to be three points under discussion.
1. What the evidence is telling us
We take your main point to be: we and the like-minded researchers
we cite have no argument at all in favor of a parallel model of
grammatical organization. On reflection, we decided we actually agree
with you on this point. In truth, our real argument is simply that any
serious model of feed-forward grammatical architecture (i.e., one that
is set up rigorously enough to have empirical teeth) faces severe
problems with language data. Then the rest of the story, at present, is
perhaps just a matter of intellectual taste. Our own taste as
Optimality theorists leads us to favor the parallel model, though it
would be interesting to try to figure out alternative models that also
would be compatible with the facts (see also below).
2. Language and the grand cognitive system
You also raise the question of how big we should assume that the
cognitive system underlying language ought to be, and particularly
whether factors such as speaking style or attitude should play a role.
Again, it’s conceivable that (as you suggest) a colossal parallel
model, embracing all of human thought and feeling, is appropriate; but
in this case our view is that a fairly constrained sort of interface
might be possible. On the basis of sociolinguistic work (see
particularly the classic paper by Labov 1972
we have attached), it seems to us that style-sensitive linguistic
processes may characteristically be in lockstep, responding to style in
parallel ways. For instance, Labov found that several phonological
processes of New York City English ( “aw” Diphthongization, “ae”
Diphthongization, Theta Hardening, R Dropping) responded in parallel as
he systematically varied the speaking style he elicited from his NYC
consultants. The lockstep phenomenon suggests to us that there might be
some kind of “knob”, located on the outside of the “grammar box”, and
interfacing with the broader cognitive system. Speaking style, governed
by extralinguistic cognition, turns the knob, and the knob
simultaneously governs all the stylistically-sensitive processes of
grammar.
Generative research on knobs is in its infancy. But we attach a paper
(Phonology) we find intriguing, in which Coetzee and Kawahara formalize
a different knob (the word frequency knob), incorporating it into a
grammar that predicts the pattern of Alveolar Dropping in English and
Geminate Devoicing in Japanese.
The style knob is probably connected to syntactic phenomena as well;
for instance in English it seems clear that Pied Piping is
stylistically governed.
3. “Grammar offers range of options, and Some Other Component chooses among them.”
At one point in your email you put forth this view; it has also
suggested by Anttila (2016). For us, the key is our belief that
both syntax and probability are probabilistic; they assign
probabilities to candidates. If so, the sequential and parallel
approaches are merely notational variants. This is because
probabilities multiply, and multiplication is a commutative operation.
The putative intermediate level (output of syntax, input to phonology)
therefore has no observable consequences.
There is a perhaps-related view: some scholars think that a chunk
of the grammatical system is not probabilistic (e.g., a “pure” grammar
dealing with nothing but categories, feeding into a “performance
component” that is probabilistic). Such suggestions are usually
answered by gradience-oriented scholars thus: look inside your
putative “performance component”, and you will discover that its
substantive content is regulated by the very same constraints that are
found in the “pure grammar”, but with lower weights. There is no gain
in positing separate components, and there is loss of generality.
Once again, thanks again for taking an interest in our paper.
Yours very truly,
Canaan Breiss
Bruce Hayes
OMER 3/4/19
Hi,
This is great! Thanks for both of the papers, I look forward to having a look.
Your response mostly makes sense to me, except for these parts (and, unfortunately, I think a lot of the logic rides on the fate of these parts):
> In truth, our real argument is simply that any serious model of feed-forward grammatical architecture (i.e., one that is set up rigorously enough to have empirical teeth) faces severe problems with language data.
I still don't see how this follows from your data. No physicists has the faintest idea of how to model leaves blowing in the wind. Is this a sign that physics doesn't have "empirical teeth" or that it "faces severe problems with [physical] data"?
Explanatory theories are developed against data collected in highly artificial settings. This is a feature, not a bug. If you believe the natural world is a massive interaction effect (which I do, and I think this most certainly extends to language) then looking at naturalistic data is as facile in theoretical linguistics as in theoretical physics.
> The lockstep phenomenon suggests to us that there might be some kind of “knob”, located on the outside of the “grammar box”, and interfacing with the broader cognitive system.
The "knob" idea strikes me as intriguing, but I don't quite see how it solves the modularity issue. The "knob" is either itself an object of social cognition, or it is modulated by some object of social cognition – and either way, this defeats the idea of a firewall between the grammar and social cognition if the grammar is treated as a performance system. My suggestion remains as it was: separate social cognition from the language module entirely, and within the latter, separate phonology from syntax entirely. If the language module in general, and syntax in particular, are treated as competence systems, then the fact that things outside the competence module can affect the choices among the candidates that the competence module makes available is not an obstacle to modularity.
NB: Raising the possibility that "a colossal parallel model, embracing all of human thought and feeling, is appropriate" was intended, on my part, as a reductio ad absurdum of a particular approach to modularity. (The one where if module Y gets to affect the choice between multiple representations that module X makes available, then Y is not encapsulated from X.) But it seems you took it as a serious suggestion...? I guess one person's modus ponens is another's modus tollens...!
Anyway, it sounds like you both are busy, so I won't
belabor these issues further right now – looking forward to hearing
from you,
– Omer
OMER, 3/5/19
Short follow-up: I think I'm going to go ahead and publish my blog
post, and you can reply whenever and wherever you see fit, incl. on the
pages of my blog itself (where you have a standing invitation to post
your rebuttal) – sound good?
BRUCE 3/5/19
Hi Omer,
As you please.
I have a sneaking suspicion that the disagreements would arise in
simpler cases than in our paper. Are we disagreeing about probabilistic
approaches to grammar? Then
Bresnan, J., Cueni, A., Nikitina, T., & Baayen, R. H. (2007). Predicting the dative alternation. In G. Bouma, I. Kraemer, & J. Zwarts (Eds.), Cognitive foundations of interpretation (pp. 69-94). Amsterdam: KNAW.
would be a clearer case to discuss. Are we disagreeing about
feed-forward per se? Then Shih and Zuraw, cited by us, would be a
clearer case. Our work presupposes both earlier types of research
result and so is perhaps down the path a bit.
Regards,
Bruce
OMER, 3/5/19
No, I don't think probabilistic grammars are the content of our
disagreement, or at least not the crux of it. I am happy to assume that
the output of syntax is not a set of <PF, LF> pairings (as the
post assumes), but rather a probability distribution over said pairs
(including, perhaps, a zero probability for some), and that this
probability distribution is then convoluted with a second probability
distribution gleaned from the phonological component. This is still a
fully feed-forward system, in that nothing about the syntactic
computation needs to know anything about the phonological system in the
competence grammar. In production, it is a different story; but like we
said, in production you would also need to know what the sexual
orientation of your interlocutors is to modulate your VOT, or, in a
language like Korean, what the relative age of you and your
interlocutors are to choose the proper honorifics, etc. etc. – things
whose integration into the grammar I consider to be beyond the pale in
the first place. (Hence, in a sense, we have in these results
yet-another-reason to distinguish competence from performance.)
OMER, 3/5/19
Sorry, I was trying to add something to the reply below and instead
just sent it again :-/ What I meant to add was the following:
I can imagine the response to <content of last email> being that
convolution is commutative, and so there is no empirical content to
calling the picture I sketched "feed-forward." Let me say that even if
that's mathematically true (and it of course is), that's not an
argument in the modularity debate, for the reasons we've already
covered. Social cognition is also probabilistic, I'd assume, and so its
contributions – if they combine via convolution with the outputs of the
grammar – also commute in the same manner. The result of this line of
thinking is, then, mental globalism (or at least globalism for all
probabilistic components of the mind that connect to one another at
all). Since we have plenty of evidence from neuropathology that the
mind is modular, the argument from commutativity must be incorrect
(since it leads to a consequent that we know to be false). I therefore
assume that it is.
You could then ask what the content of the feed-forward approach is, in
view of this commutativity. My answer would be that its content lies in
the explanatory avenues it opens up, e.g. the explanation for the vast
swaths of syntax in which phonology seems to play no role whatsoever,
as well as all the evidence adduced in favor of "late insertion" within
the Distributed Morphology framework (stated somewhat more
theory-neutrally, this is evidence that the insertion context for
"morphemes" is derived syntactic structure, which implies ordering).