Syntactic Structures Downfall

So I’ve pretty much given up on Syntactic Structures. It wasn’t really as great as I thought it was. There was some linguistics jargon that I didn’t feel like learning, also. Before I stopped there was some interesting, yet unsurprising, things (especially for its time).

Chomsky talked about modeling languages using what he called (I think, I’m not actually looking it up) “finite state markov processes.” Apparently this was how linguists thought at the time. According to today’s standards, I’m not entirely sure he wanted to use the phrase “Markov process” as that usually implies randomly switching from state to state. Clearly when people speak, it isn’t random streams of words that come out (although it may seem that way sometimes).

Nevertheless, I assume that in today’s jargon he wanted to use “nondeterministic finite state automata” to model language, at least that is what his description was of. Now from basic theory of computation we know that if we could model a language using one of these, then it would have to consist entirely of regular expressions. No language consists only of regular expressions, thus the myth of being able to model in this way is debunked.

Chomsky obviously did not have that tool at his disposal, since he went on for pages considering different cases and why they wouldn’t work to conclude (in a pretty non-rigorous way) that you can’t model a language using an NFA (or DFA for that matter). Not surprising, but noteworthy I’d say. It really was a paradigm shift to claim that there is no way to have a grammar that can’t be modeled. Too many people are still trying to figure out how to do it. Things like online translators, AI, and many others work from the assumption that it is possible to get really close to being able to do it.

Since this little gem was in there, I feel like quitting is depriving me from some other interesting little tidbit that I hadn’t thought about, but oh well.



I’ve decided to start reading Chomsky’s pivotal book Syntactic Structures, since I’m into the phil language thing and it was a really important work. This post is going to be sillyness, but it was something I couldn’t get out of my head while attempting to read past the first couple sentences.

“Each language has a finite number of phonemes (or letters in its alphabet) and each sentence is representable as a finite sequence of these phonemes (or letters), though there are infinitely many sentences.”

Now we encounter “free structures” all the time in math. It is perfectly legitimate to create something with an infinite number of elements from just stringing together a finite alphabet. In fact, you can impose lots of structure such as the free group on two generators. This “language” only has an alphabet of two, must satisfy group axioms, and ignores triviality (must be fully reduced), yet still achieves an infinite number of words (not even sentences, but words).

I must argue, though, that there are not “infinitely many sentences.” I don’t think it would be controversial to claim that there are a finite number of words in a language. Take English, for example. Use the good old OED plus maybe a slang dictionary and throw in a couple thousand for good measure as an upper bound on the number of words in the language.

This number of words is huge, though finite. When we generate sentences, if we do so in the “free” way, then we clearly get an infinite number. Now I’m not so concerned with “grammatically” correct sentences, as I am with imposing conditions on repetition. The sentence “the dog ran dog ran” is pointless. Due to repetition, I argue that there must then be some upper bound on the length of the longest sentence possible (to continue the group analogy, this is like the “free presentation” with restrictions like \{a : a^3=1 \}=\mathbb{Z}_3).

To make this easier, let’s reduce our sentences to ones that are not conjunctions of two complete sentences (if former is finite, then so is the latter). Now a sentence can only be so long (non-conjunctively), say you use basically every word in the language a couple of times (which I find hard to believe that you would still have a “sentence” at that point). So now we have an upper bound of, I don’t know, a couple billion words in a sentence. This would give us on the order of a couple billion factorial number of sentences. This is absurdly large (and an absurdly overestimate in my opinion), but still finite.

Despite having zero relevance to your book Mr. Chomsky, I must respectfully disagree with your opening lines. What does everyone else think?