Wednesday, March 15, 2006

The Use of Simplification in Scientific Research

There is an approach to science according to which the path to enlightenment is one in which one simplifies some complex real-world problem in some way and researches the resultant problem, and then adds some complexity and researches the resultant problem, and so on and so forth, gradually working up until one has accounted for the problem as a whole. One expression of this approach is found in Pat Langley's Power Point presentation of an outline of a talk on Herbert A. Simon’s views on how research proceeds in science.
Science is a gradual process. Build incrementally on your previous results, extending them to cover ever more phenomena.
This is a mind-bogglingly naive account of how any but lollipop scientific investigations proceeds.

There is some recent research that nicely illustrates the hazards of simplification in science, especially simplification by people unqualified to engage in it. This research alleges that different sorts of language structures are processed in different parts of the brain. I have access only to a press release from the Max Plank Society which you can access through the (title link. One doesn't need to know any more about this research than is presented in the press release for the purposes of this blog.

This research holds to the followoing view of the syntax of human languages:
When analysing language rules (syntax), one discovers two fundamentally different grammatical patterns. A simple rule governs the establishment of typical (probable) connections between words, like between article and noun ("a song") in contrast to article and verb ("a pleases"). The probability for a noun to follow an article is very high, while the probability of a following verb is very low. However, in order to understand longer sentences, a complex structural model is required - what is called a "hierarchy". Hierarchical dependencies serve to connect parts of a sentence - for example "around" an inserted subordinate clause: "The song the boy sang] pleased the teacher". The Max Planck study aimed to compare brain activities during the processing of both models - simple "local probability" and complex "hierarchy".
Reading this is like deja vu all over again." Back in my grad schoold days, grammars consisting of local probability rules were referred to as "Finite State Grammars" while grammars conaining hierarchical rules of the particular sort the Max Planck folks studied were referred to as "Context Free Phrase Structure Grammars." Neither has anything whatever to do with the grammar of German or Chinese or any other language and Noam Chomsky set about trying to demonstrate that grammars of much greater power were required to account for human languages.

A linguistics course site at Michigan State University provides some information on finite state and context free grammars as well as a very remarkable claim. I appologize for going technical on you folks and also being typically wordy. The claim is:
English is just like anbn [Note: any number of a's followed by the same number of b's, as in the Planck "heirarachical" rules -- Mike]
(2) a. The cat died.
b. The cat the dog chased died.
c. The cat the dog the rat bit chased died.
d. The cat the dog the rat the elephant admired bit chased died
The MSU folks have entered a minefield. Clearly, if uttered aloud (reading doesn't count here), speakers of English would instantly understand examples (2a) and (2b). The problem is that we fall apart when we encounter (2c) and (2d). They are flat out not acceptable. The question is, then, "Are they grammatical?" Some say that they are and some say that they are not. And some people like me don't care one way or the other.

The more important question is whether or not English is "just like anbn" type languages. The answer is, "No, of course not." Compare the b-c examples above with the following:
(3b) The cat that the dog chased died.
(3c) The cat that the dog that the rat bit chased died.
We have added just the word "that" and the nasty consequence of doing this is that these new sentences are not of the form anbn. Why do I say this? To force these sentences into the anbn procrustean bed we will have to say that "that the dog" and "that the rat" are units or constituents of these sentences and they are not. A phrase like "the dog" is a noun phrase and is a unit or constitutent of these sentences. The sequence "that the dog" consists of "that" plus a noun phrase and is not a phrase in and of itself. In these sentences "that" can be thought of (for our purposes) as a pronomial element that functions as (or refers back to) the direct object of the verbs "chased" (3b & 3c) and "bit" (3c). In short, in (3b) we have something like the following gross organization "(the cat (that (the dog) (chased e)) died)" where "e" is a "place holder" serving to indicate what the grammatical function of "that" is in the sentence. In this case, "e" is in direct object position and "that" functions as the object of the verb. It is an abstract element that is not pronounced. However, I should note that there have always been people who believe in a sharp distinction between the various components of a linguistic description who would argue that the function of "that" in our sentence is not a syntactic matter, but rather involves rules of semantic intepretation that will link "that" to the verb "chased" in some appropriate way. In my opinion, this is nonsense, but I can't go into that here. I'm just warning you of this possible critique of my position.

Consider a sentence like "the boys left town who are being chased by the cops" and "the boys left town who is being chased by the cops." The former is grammatical but the latter is not. In this case we have a nonlocal dependency between the plural noun phrase "the boys" and the verb "are" way to the right. This sort of sentence cannot be produced by context free rules (nor by finite state rules for that matter).

We (sensible) linguists have long since abandoned thinking about the syntax of natural languages in these terms. We stopped sometime in the mid-sixties if not sooner. This, unfortunately, has not stopped the brain researchers of the Maz Planck Society from studying how humans process finite state and context free "syntactic" rules. They get a result as one can see from the link above but the results they get tell us nothing about human natural language processing. Why? Because they left out all of the really interesting stuff. The press release says:
The advantage of experimenting with artificial grammars - as opposed to naturally spoken grammars - lies in the fact that other elements of language (semantics, phonology, morphology) do not have additional influences on neurological processing.
This is anything but an advantage. In fact, if one is interested in human language understanding one must absolutely take "other elements of language" into consideration. See my book, "Speech Acts and Conversational Interaction" for proof of this.

In that book, I cite an example of a phone onversation in which a male is trying to get a female friend to pick him up where his car has stalled and take him to the bank he works at so he can open it up. The interesting thing is that in this conversation, no reference is made to someone's giving another person a ride and no overt request is made. Moreover, the woman does not overtly reject the request. The entire conversation consists of indirect communication. The need for a ride is communicated (if I may be permitted a simplification) by "My car is stalled...and I'm up in the Glen" followed by "And I don't know if its possible but see I haveta open up the bank in Brentwood." The response is "Yeah, I know you want... and I would but except I've got to leave in about five minutes." I have heard a tape recording of this conversation and it is unremarkable sounding (i. e., doesn't sound deviant in any way) though it is, as we shall see, a quite remarkable conversation.

This is a very important conversation for understanding human language processing. We have a ride request with no mention of a ride nor is a request overtly made. All the requester does is state a problem that he hopes his friend can solve. The responder communicates that she knows what he wants but provides a reason why he cannot solve the problem. We may usefully flesh out his request as "My car is stalled...and I'm up in the Glen" followed by "And I don't know if its possible for you to pick me up and give me a ride to my bank but see I need for you to give me a ride because I haveta open up the bank in Brentwood." We can flesh out Marcia's response as follows "I know you want a ride and I would give you one but except I can't give you a ride because I've got to leave in about five minutes." Somehow the people involved in this conversaton know how to go about filling in the blanks and this requires, as I demonstrate in my book, reference to a cognitive representation of the essential elements of requests, even something as specific as ride requests, and a mechanism for interpreting what is said in terms of this cognitive representation, relevant facts about their past experiences, and the present context (e. g., that it is early in the day). This is what we might call the "hard language processing problem."

The fact is that English is not a context free language and the fact that some English sentences can be wedged into the anbn mold while kicking and screaming, not every kind of English sentence can. Moreover, as my ride request example demonstrates, one must, if one wants to understand language processing, take into consideration the "other elements of language" the Max Planck folks feel free to omit.

We come then to the issue with which we began, namely the idea that science proceeds through simplification of a "hard" problem to the point that one can get traction with it, i. e., make it an "easy" problem, and gradually work up to the "hard" stuff. The following illustrates the sort of "linguistic" (snigger quotes) examples the Max Planck people dealt with (in short, meaningless baby talk):
The simple rule involved alternating sequences from categories A and B (e.g., AB AB = de bo gi ku); the complex rules on the other hand required hierarchies to link both categories (e.g., AA BB = de gi ku bo).
In my opinion, the sort of language processing study that the Max Planck people are engaged will not end up contributing anything to our understanding of human language processing, that is, we will not be able to work up from easy stuff like studing how people process "linguistic" (snigger quotes again) examples like "de bo gi ku" and "de gi ku bo," which occur in no language on earth to examples like "Yeah, I know you want and I would but except I've got to leave in about five minutes." And you can take that to the bank. In my opinion, it is sometimes easier to understand the "hard" problem by addressing it head on. In the case of the ride request, doing so forces one to understand that cognitive and contextual information is critical to understanding language and to understand language understanding.

For those interested, there is a study comparing how humans vs certain nonhuman primates handle such phenomena as interested the Max Planck people. I suggest you go to The Language Log for links as well as discussion.

Tweet This!


Blogger The MetaKong said...

Though I'm totally ignorant as re: theoretical linguistics; as usual, you make your topic easy to understand and make a compelling and logical argument for your stance.

I think there's a natural tendency to avoid facing "big" problems head on because, generally, the larger the problem, the more variables involved, and the greater energy that must be exerted in order to find a solution...I've thought a lot recently that physical laws (i.e. conservation of energy) apply to human beings in their behavior, to some extent, at a deeply rooted psychological level...

On a slight tangent; while cleaning an office last night, I found the following posted on a cubicle, I copied it so I could present it to everyone here (there's an irony that has to illicit at least a small grin here):


I cdnuolt blveiee taht I cluod aulacity uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the hmuan mnid, aoccdrnig to rscheearch taem at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers ina wrod are, the olny oprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sittl raed it wouthit a porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Such a cdonition is arppoiately cllaed Typoglycemia :)-

Amzanig huh? Yaeh and yuo awlyas tohugt siplleng was ipmorantt.


funny; i notice, while typing, that my speed was increased if I attempted to pronounce the misspelled word as is in my mind, wierd...



5:51 PM

Blogger Mr K said...

I think its pretty clear that their method is not very useful- an over application of occum's razor, I suspect.

I've seen that note before, it's a rather cool.

7:10 PM

Blogger IbaDaiRon said...

LG, maybe it's just me not reading carefully enough, but it seemed a little like you're arguing at the beginning against simplification/abstraction in the scientific method, although it becomes clear later that the real danger lies in excessive or inappropriate abstraction. (After all, controlling extraneous factors [like STP in a chemistry or physics experiment] and idealizing aspects of phenomena [assuming that the earth moves in perfectly circular orbit; discussing "ideal" gases] represent simplifications of the objects of study.)

A few years back I oversaw a student whose senior thesis project examined the relation between recognizing "word form patterns" and ESL reading fluency. His results clearly showed that the first-years at our school (=US high school sophomore level) were still, for the most part, reading every letter of every word but that the majority of fourth- and fifth-years (it was five-year polytechnic) had switched to recognition of the overall form of words. Pretty interesting.

This is another reason why it's so hard to catch your own typos and why it's always a good idea to have someone else read through your important writings!

(BTW, the Langley PPT link has an extra period at the end that will confuse most browsers [and users?!] and end up on an erreur page!)

12:14 AM

Blogger The Language Guy said...

IbaDaiRon, thanks for the tip about the Langley URL. In fact, my browser cracks under the pressure.

You are right that I am really talking about inappropriate simplification. Though it will piss off people in the "hard" sciences, I would say that if they find that the method of simplifying and then gradually increasing the complexity of a problem it is because the "hard" science isn't in fact hard. What is hard is research on cognition. That is way harder than physics or chemistry or other "hard" sciences.

I worked with a famous British Royal Scholar who earned his title through work in chemistry. He had turned to language processing and I was invited to assist in the research, which was done at Edinburgh University. He found it very hard going indeed. We at that time (1970) were in the infancy of research on language processing. For all practical purposes we still are, thanks in part to all the wrong-headed approaches that have been taken. I discovered when working on computationally modeling language production that people involved in that had an opposite approach to grammar than did people working on language understanding. It was this work that informed much of my last book where I argued that computational language understanding needs to take the same approach to grammar as does computational language production.

6:58 AM

Blogger Mr K said...

"You are right that I am really talking about inappropriate simplification. Though it will piss off people in the "hard" sciences, I would say that if they find that the method of simplifying and then gradually increasing the complexity of a problem it is because the "hard" science isn't in fact hard. What is hard is research on cognition. That is way harder than physics or chemistry or other "hard" sciences."

Please, everyone knows it goes maths>physics>chemistry>the rest

10:02 AM

Blogger The Language Guy said...

Actually, Mr. K you have it backwards. The easiest is math. Physics is harder than math because it involves finding the math that is appropriate to understanding the data of interest. Calculus was not invented by a mathematician for purely intellectual mathematical reasons but by a physicist. Chemistry is still tougher since it has been much harder to make it explicit enough to bring mathematics to bear on the subject. Linguistics, which is older than physics in fact, has the extremely difficult problem that we haven't (if we are honest) a clue as to the mathematics appropriate to syntax. Semantics has proved more tractable but again, the mathematics (logic, in this case) appropriate for semantics had to be invented by semanticians, not mathematicians.

How would you respond to that line of argument?

1:13 PM


Post a Comment

<< Home