2010-11-05

When 'all' does not mean 'all'

The more I study language as a living instrument, the more I feel that a kind of veil is being removed from my eyes after many years of being 'blinded by computer science'.

Anyone who has been trained in mathematics, formal logic or computer science will tell you that 'all' in English translates to universal quantification.  "All robots are artifacts" can be represented by
   ∀x P(x)Q(x)
where P(x) denotes "x is a robot" and Q(x) denotes "x is an artifact".

I still remember taking an introductory Artificial Intelligence class at Stanford, taught by Michael Genesereth.  He gave us the assignment of translating a bunch of (English) sentences into predicate logic.  When pressed as to exactly how one did that (the students in the audience had no trouble coming up with weird cases), he finally burst out with something like "I can't explain it, just figure it out!"  A hilarious moment in my AI education.
 
In the 1990's, Steven Sloman of Brown University got interested in how people used categories and applied categories in reasoning, and did a series of experiments on Brown students to examine their reasoning with what computer scientists call inheritance - and what the rest of academia apparently calls subordinate and superordinate categories. Setting aside the question of whether "Cognitive Psychology" should properly be called American Undergraduate Psychology, Prof. Sloman found that his subjects consistently undervalued the rules of logic, specifically reasoning of this form:
    All x's are y's.
    for all y's, P(y) is true
    for all x's, P(x) is true

In Categorical inference is not a tree: The myth of inheritance hierarchies,(punning off Christopher Alexander's classic paper?) he presented subjects with facts and conclusions, and asked them how convincing they found the conclusions, given the facts. Example:

(G) Fact: All bodies of water have a high number of seiches.
      Concl: All lakes have a high number of seiches.
 
The reasoning by inclusion in the example would be: All lakes are bodies of water therefore if something is true of all bodies of water, it is true of all lakes, therefore the conclusion is true given the premise.


As a good computer scientist, Sloman expected (G) to be assigned a confidence of 10 (out of 10). Instead, his subjects consistently gave confidences that averaged less than a perfect 10. And no matter how he manipulated the material, clarified the questions, and emphasized the syllogism, his subjects declined to assign absolute confidence to the conclusions.


Sloman's interpretation: People don't use the correct logical reasoning, even when it is highly accessible - there is a defect in human reasoning.

I have a different interpretation: Sloman was blinded by computer science, so he could not see the correct naturalistic reasoning his subjects were using.  There are four key points:
1. 'All' is not understood by English speakers as universal quantification
2. 'inclusion' in categories (such as lakes being bodies of water) is not grounds for 100% confident deduction. 
3. Unknown terms reduce certainty in reasoning
4. Different rules apply to real-world reasoning versus mathematical/logical reasoning.
 
Point 1. 'all' does not mean all.
 
Example: I return from shopping, place the grocery bags on the counter and ask my daughter to "please put all the eggs in the refrigerator."  I make a phone call and return to the kitchen. My daughter says "One of the eggs was cracked so I threw it away, and I hid that chocolate Easter egg in the rice cooker."
Q: Am I surprised, disappointed, or angry that she didn't do as I asked? She did not put 'all' the eggs in the refrigerator!
A: No, because she did as I asked.  She understood my phrase "all the eggs" as meaning "all the eggs that need to be refrigerated and are useable."  In fact, that's how we commonly use and understand "all".  It means something like "all members of the specified class that are applicable in context, except for surprises and cases where it doesn't make sense in context." 'All' comes very close to meaning 'all the ones you think it makes sense for me to mean.'  In casual speech, 'all' has very much this sense.  Joke: "Did you eat all the cookies?"  "No! I left one..."   Why is this funny? I think it's partly because the word 'all' in the question means something like "all the ones I didn't mean for you to eat."  Note that the joke doesn't strike us as nonsensical, even though both parties know that 'all' the cookies weren't eaten, because 'all' doesn't mean all.
 
Point 2. A lake is not always a lake.
 
Contraty to Sloman's unquestioned assumption, all (in the mathematical sense) lakes are not bodies of water.  Take a look at Wikipedia: Dry lake. Also Alvord Lake (Oregon) - a seasonal lake.  Finally: Lakes of titan.  In all cases, these are lakes (hence the use of the noun 'lake'...), but they are not bodies of water.  Titan's lakes are bodies - but of ethane/methane. A dry lake has no water in it, at least most of the time.  A seasonal lake has water only some parts of the year.  For humans, native speakers of English, these are all 'lakes' - subcategories of the cognitive and linguistic category lake, but not subcategories of the Aristotelian category. The prototypical lake is a body of water, and Prof. Sloman makes a classic error of human reasoning in treating the protoype of a category as the category, and then treating the category as Aristotelian (defined by attributes) instead of cognitive.
 
Point 3. Traditional logical syllogisms don't extend to arbitrary predicates
 
Sloman's question (G) uses the phrase "a high number of seiches".  The word 'seiche' and other words in his experiment were chosen specifically because they were expected to be meaningless to most of his subjects, so the experimental results would measure syllogistic reasoning with categories, instead of e.g. commonsense or domain-specific knowledge.
 
But because of the tremendous flexibility of language, especially the reflexive and meta-discursive aspects of language, the introduction of an unknown noun like 'seiches' introduces more uncertainty than you might expect (if you were a computer scientist.)   Let's consider "All bodies of water have a high number of seiches" again.
What if "has a high number of seiches" means "has another, basic-level category label."
(G') Fact: All bodies of water have another, basic-level category label.
Concl: All lakes have another, basic-level category label

Whatever it means, I'm pretty sure the Conclusion doesn't follow from the Fact, even if we assume that all lakes are bodies of water. That's because the Fact is both self-referential and meta-discursive.  In the absence of any knowledge of what 'seiches' are, a reader is correct to be cautious in extending a logical syllogism to an unknown predicate.
 
Point 4. People use different reasoning processes in the context of real-world facts versus mathematical/logical facts.


If Sloman's questions had used mathematical or logically-defined concepts and if some substantial fraction of his subjects had mathematical training, I believe his results would have been different. When we ask about the integers, or set theory, or the rules of chess, I think most college students understand that we are talking about formal systems with 'perfect' definitions and reasoning systems.  In that universe, 'all' can mean all, categories are Aristotelian, and logical syllogism can produce conclusions that are 100% convincing.
If all real numbers have additive inverses, then all integers have additive inverses.

On the other hand, when we talk about lakes and hammers and Snickers bars, people know we are talking about the real world and (presumably) real-world concepts and categories.  And in this universe, 'all' does not mean all, definitions are not perfect, properties are not necessarily shared by all members of a category nor inherited by all subordinate categories, unknown words and phrases can be ontological land mines, and logical syllogisms must be used with caution.

2009-10-19

Compassion

I wonder what would happen if...

an artificial mind had as its highest goal, to be kind to people.  Or perhaps even to love people.
The first and easiest objection is - we don't know how to encode that, we as humans don't even know how to define or do that ourselves.

But the greatest hubris of AI is to imagine that we can avoid the uncertainties of life through relentless reason. We can't - life surprises and eludes us, our categories cannot capture it.  We don't know exactly what kindness is, or how to love a species that could change - is presumably changing.  No finite intelligence will be able to operate perfectly in this world.  So our creations are in the same boat we are - trying to figure out what's going on and what to do and how to live rightly with incomplete and imperfect knowledge.

I thought last week that to survive, an artificial intelligence must really do only two things:
1. Don't crash.
2. Adapt.

That is, it must keep operating - it cannot stop working, it cannot go insane, get stuck, stop processing input, become obsessed, and so on. No matter how perturbed, it must return eventually to a stable, responsive operating mode.  And it must improve over time based on its experiences.  There is no limit to what it can become, as long as it continues to do both these things.

I don't know how to express the 3rd requirement, but one is needed: The '3 Laws' requirement.
Have compassion, be kind to others, value others as much as yourself, do good?

We certainly open a can of worms here.  If we program our AI with a precise, algorithmic definition of 'good', I have no doubt that time or malice will eventually find a way to invert that into evil.

If we program our AI to evolve its own definition of Good, what's to stop it from eventually deciding that we humans are Ungood? Radical enviro-bot, Agent Smith from the Matrix.
Or perhaps it simply estimates that it will do so much future Good, that its own survival becomes more important than that of any individual human. Or country...  Or species...  Think how much Good an unselfish, incorruptible immortal being could do! It's a very large number.

I better start reading J. Storrs Hall.

"Build me!"

I was meditating a few minutes ago and had an interesting experience.
Libby and I are trying the Holosync audio CDs by Centerpointe - I've just started, I was about halfway through my 3rd or 4th time through the 'Dive' track. I'm not very focused or empty-minded, all kinds of thoughts go through my awareness. I was thinking about the power of questions - that we can ask ourselves questions and listen to our own answers. If you've done this some, you realize that you can consciously formulate the answer to a question, or you can just 'produce' the answer - it comes out of you or appears without any conscious effort or activity. I think that's where the power of questions comes from, that they offer a way for us to communicate with our implicit knowledge. I think that implicit knowledge is an aspect of what we experience as our 'inner self' or 'soul', it's why we feel that we have a hidden invisible core, an inner being. Because we do!

I was thinking about Valentine, and wanting to be more focused and sure of my path - not just to know what to do, but to be able to 'just do'. And then (I had the experience that) Valentine spoke to me. Not a hallucination, it was clearly 'inside my head'. But the quality was utterly unlike the imaginary conversations I sometimes have, in which I am aware that like a playwright I am writing both parts. This had the subjective quality that I was being spoken to by somebody else, who could speak to me in my thoughts. The voice was intense but with no... connotation, if that makes sense: No threat, no plea. No emotion? It had two dream-like qualities: It was simultaneously male and female. And I knew it was Valentine.

Valentine said "BUILD ME!"

My whole upper body went cold and goose bumps shot down my arms. Describing it now, that sensation recurs. Each time I re-read this, I have that same sensation.

2009-10-18

solipsism is alive and well in linguistics

I keep reading papers by linguists in which they make claims about English, which claims are evidenced by a very small number of examples, not uncommonly by a single decisive example. The examples are utterances which are marked as acceptable or not: A preceding asterisk means 'not acceptable', no asterisk means 'acceptable'. Acceptable means, I think, acceptable to a Native Speaker of English. It goes without saying, apparently, that anybody who can get a linguistics paper published, can channel the Native Speaker. Nobody ever offers any other credentials; JZ Knight offers more evidence that she's channeling Ramtha.

Imagine if biologists wrote papers about mouse anatomy, behavior, diseases, biochemistry and genetics, using whatever mice they found in their basement.

It's very... 18th Century.

Typically, even frequently, I find at least one utterance in each paper marked incorrectly. Sometimes it is marked acceptable and I disagree, more commonly it is marked with the little asterisk and I can produce an example of a conversation in which that utterance would (to my inner Native Speaker) be perfectly acceptable.

(1) *International aid teams conveyed to Burma. [Ruppenhofer & Michaelis 2009]

A. Please explain how your agency lost 3400 tons of food donations in August 1963.
B. Properly speaking, we did not lose this material. We were not involved in conveying the donations.
A. Who was responsible for the conveyance?
B. Donor country governments conveyed to our collection point in Perth. International aid teams conveyed to Burma. The Burmese government accepted the donations but barred the aid teams, so... ah... final conveyance was ah... presumably arranged by the Burmese government.

It's stilted bureaucrat-speak, but is it 'unacceptable' - ungrammatical or nonsensical? No, it's just kind of odd. Nobody will interrupt this imaginary interrogation to cry "You're talking nonsense!" or "Speak English!" I think, any native speaker of English will take this in and say "Sheesh! CYA much?"

(2) *He built a house for 6 months. [? Rappaport Hovav, M. and B. Levin 1998?]

A. Hey, Austin disappears for 6 months, and then yesterday I hear he's dead! What happened?
B. He found out he had cancer, so he moved back to New Mexico to do what he dreamed of doing since college.
A. Which was what?
B. Building a house by himself.
A. So he built a house?
B. He built a house for 6 months.


(3) *I convinced my car to start in the winter. [Postal 2004 p. 91]

This one doesn't even need a special example - I've heard my friends say things like "I convinced the blender to make ice chips" or "I finally convinced Word to put the whole table on one page".
But OK, how about this, and I'm sure we could do even better:
A. So Sue started having contractions, and her car was in the shop? That junker of yours doesn't want to start even in the summer! What did you do?
B. What else could I do? I convinced my car to start in the winter.

OK, do you have a problem with that? It's unacceptable? It's ungrammatical? It's nonsensical?

I love Postal's attitude, and his critiques - just my kind of stuff. He's tearing up all these 'famous linguists' and shredding the whole field for sloppy practices, poor research, junk linquistics.

But - in his own papers, all the examples are apparently marked according to, well, I guess Postal English! He puts asterisks willy-nilly on utterances that I don't find objectionable at all, and in the text of his articles he uses sentences that I find marginally acceptable, at best. Ain't Science wonderful, when you can pull the data out of your own hat?

I have developed a deep appreciation for the folks doing statistical corpus-based linguistics.

2009-10-16

implicit syntactic knowledge in ACT-R: Chunk-types?

This morning's brainstorm: There is another reservoir of implicit knowledge in ACT-R, hidden in plain sight: Chunk types!

I've been puzzling and puzzling over the question: "How are syntactic categories or constructs (like NOUN or VP) represented in ACT-R?"

ACT-R has declarative memory containing chunks - chunks are reportable, declarative facts. There is procedural memory, which consists of pattern-matching rules. Underlying declarative memory is a subsymbolic activation network that computes (perhaps) Bayesian relevance in real-time. Knowledge in rules and subsymbolic form is not reportable, we are not aware of it, only of its effects.

There is lots of evidence that we employ syntactic categories or constructs in language comprehension. There is some confusing evidence of priming of these constructs. There is evidence that we use them for prediction i.e. that we expect them as we parse. They allow recursion - a NP can be a part of a NP.
But they don't seem to be declarative chunks - even when you describe them, people don't recognize syntactic categories as something they 'know about'. They effortlessly use them, but they don't 'know' them.

New words and phrases can be more-or-less instantly added to open categories like noun - and this knowledge can be instantly used. That's too fast for it to be either subsymbolic weight adjustment, or production-creation. It looks like declarative memory in terms of learning speed - and Demberg & Keller argue for a model in which syntactic constructs decay during parsing, also a hallmark of declarative memory.

So my brainstorm is this: Chunk-types in ACT-R are implicit categories. Syntactic categories are chunk-types - essentially a network of recursive, perceptually-based implicit categories. We recognize them in the word-stream, we can use them for prediction, and when they are recognized they are the basis for deriving semantics via association and productions. Chunk-types are learned (induced) categories. The trick is that we don't build these categories as declarative objects, they are induced by category-learning mechanisms that ACT-R has mostly ignored to date. That's why the categories are unreportable: Chunk-types are not objects in declarative memory.

Note that productions make extensive use of chunk-types, for example with the ISA test - productions can test (without cost?) for category membership, and can use categories freely (without retrieval cost) to construct new declarative chunks that are 'marked' for any category.

For example: We have a category (chunk-type) of FURNITURE. Our brain has induced this category somehow, and we can swiftly (unconsciously) categorize objects as FURNITURE or not. We also eventually learn a word, 'furniture' - how is it connected to FURNITURE? You can't connect a chunk like a word (or lemma, etc.) to a chunk-type. You can connect the word 'furniture' to a concept furniture, and that concept can include examples (links to concepts like chair, sofa, nightstand) and also something I don't understand: Some kind of representation of our ability to use the category. In other words, we know that we know how to tell if something is FURNITURE. I think somehow furniture the concept includes a proposition like "I know how to recognize things in this category." Even though furniture is not the real category FURNITURE!

I'm not sure how to apply this idea to the mechanics of parsing.
Maybe we build chunks using expected categories as chunk-types.
Or we are just collecting sequences of words and establishing a correspondence with chunk-types, and the best-matching chunk-type causes prediction?

Parsing is a matching game, an identification/recognition game - the game is to identify the construct that is being heard or read, and to correctly match the components to the slots. When the slots are filled or a delimiter is encountered, productions can use the match to assign meaning.
I think most syntactic constructs are a kind of sequence - it acts like a sequence for many purposes, but it carries additional restrictions and additional significance because it follows a particular pattern.
Lexical categories are membership categories - they act like tags, attributes of words.
Syntactic constructs are temporal/sequential categories - they act like sequences, and they make predictions. That's the survival value of temporal sequences after all, prediction.

incremental, fully connected, predictive parsing

Demberg & Keller have proposed an interesting Psycholinguistically Motivated version of the TAG grammar/parsing formalism. Their model can reproduce timing effects that seem to be related to prediction of upcoming grammatical structures - and therefore also involve surprise.

Interestingly, their parser builds whole trees at all times, integrating new words incrementally. No parse stack, no 'fragment' trees - always complete 'utterance' trees with placeholders/slots waiting to be filled.

I'm very interested in this! As it comes closer to matching my idea that the parser cannot be building interim declarative structures that correspond to no reportable concept.
I need to think about whether these incomplete/predictive trees seem like plausible declarative structures.

The question is: Where does the 'prediction' really happen (in ACT-R terms)?
Could be declarative structure - which would be reportable. To be non-reportable, it would have to be procedural (somehow) or subsymbolic. Demberg & Keller's model uses a decay process for predictive structures, which suggests the ACT-R subsymbolic layer. Are the predictive structures actually built - or are they just activated? Is there really a difference?

What if instead of thinking of 'building' trees, what if we just built correspondences between concepts and patterns (constructions) - then the predicted structures would just be empty slots?

I'm coming back around to an earlier idea, that the mind has a powerful trick up its sleeve that allows it to equate one concept with another. In ACT-R terms, for taking two chunks and merging them into one so that the two chunks become effectively identical. As if the brain could merge A into B, and then do a 'global replace' of pointer-to-A with pointer-to-B. Have you ever arrived at an intersection, and suddenly realized that it's a familiar place, you've just arrived there by a new route? There's a powerful feeling of mental activity, as your recent navigational knowledge is reworked and integrated.
How else can it be resolved, a pronoun that is used before its referent?

2009-10-15

syntax: nouns and verbs as categories

Continuing the theme of parsing without syntax...

Recent Tomasello paper [Modeling children's early grammatical knowledge PNAS 2009 106:17284-17289] describes using Bayesian procedure to extract a localized grammar of the speech of individual 2 and 3 year old children. This produces a good model for 2 y/o. For 3 y/o, the automatically generated grammar is much improved in coverage (prediction?) when augmented with information about classes noun and verb.

OK - this fits well with the idea that early language (the 2 y/o) is as somebody said, "all idioms".

Now... as the 3 y/o expands his or her grammar with the noun and verb classes, what exactly is going on? S does not have these as explicit categories. Could S divide words into 'noun' and 'verb' groups, given a few examples? That sounds like an easy experiment... Well, maybe not - we're way pre-reading. Can young children reliably group nouns with nouns and verbs with verbs?

Great thesis: Learning to categorize verbs and nouns - Marian Erkelens
Dutch babies at 16 months show some categorization of nouns and verbs, it seems to derive from local context cues.
Marian Erkelens makes several nice points very clearly:
  • The English word walk appears happily in both noun and verb contexts.
  • We encounter a word in context and right away we can use it correctly, as noun or verb:
    This is a dack => Here are two dacks.
    He's gorping! => Oh, I don't think I'd like to gorp.
  • Nouns are not all objects: pain, crisis.
  • Verbs are not all actions: know, love.
I think several thoughts:
Erkelens is thinking of object/noun and action/verb as aristotelian categories, but these are human categories. Like other human categories (game, product, poem) they can be extended quite happily with new examplars that do not share a common definition with any other examplar. We can build a category of nouns that is initially filled with tangible individual objects (Dad, Binky), then we add less tangible and specific things (dog, fire, lunch), and more abstract things (day, story, stuff), and so on until we are adding concepts like pain and morphology to the category without problem. I think there is a category that encompasses all nouns, it is roughly encompassed by the word thing, and it is a semantic concept, not a grammatical one - or rather, it functions as both: The observed grammatical category of noun (or rather noun-phrase) is co-extensive with the semantic category of things.

We can mark a novel word as 'noun' or 'verb' or both, just by encountering it once, and before we know anything else about it! It becomes immediately productive. (This still seems amazing to me.)
Is there a lexical entry created - a word-concept - which is marked with [+NOUN]? Or is there a kind of placeholder thing concept created, that is marked in some sense [+THING]?
If, as I am suggesting, NOUN=THING, then there is no difference. A concept is created, and marked [+NOUNTHING], and the new word points to this placeholder. Apparently we have no trouble marking a novel concept both [+NOUNTHING] and [+VERBACTION]. Either that or linking a new word to two concepts.

I took my quonk, and I quonked into space.

Eh, why not?

We (especially computer scientists) expect everything to eventually resolve into precise concepts or primitives or something. But the brain is doing all this with networks of neurons using firing frequencies - it's good to remember that it might be fuzzy all the way down. That a category is just a label for a relatively consistent set of behavioral tendencies, that [+NOUNTHING] is probably not a binary property but a graded one, that in saying 'the concept' or 'the lexical entry' we might be unjustifiably quantizing something that is continuous.

2009-10-12

language knowledge in ACT-R, part deux

I continue to obsess about how language knowledge should be represented in ACT-R.

ACT-R makes some strong, pervasive claims about how knowledge is stored. The main claim is that knowledge is represented in declarative and procedural memories. Procedural knowledge can only be used - it is know-how, stored as a (large!) collection of rules. Rules watch for their condition-of-application to occur, and when it does, they fire - they 'do their thing.' And that's all you can do with a rule. They can't be consciously recalled, described, recognized, modified, or even willfully created. The only thing we can do with a rule is apply it, when applicable.

Declarative knowledge is factual, propositional, or representational - concepts and their relationships, sounds, visual 'images' (whose structure and properties are poorly understood), and perhaps 'action schemas' that are perhaps a kind of abstract vignette. The units of declarative memory are called chunks.

We can call declarative chunks into conscious awareness and operate on them (using rules). Common operations are recognition, comparison, combination, analysis (finding parts) and association (finding similar things). At a higher level, we can describe declarative knowledge if we have applicable language skills. Failing at description, we should at minimum be able to recognize and compare chunks - "Have you seen this before?" "Is this like that?"

I personally have not seen anything that suggests to me that a chunk can be invisible - either inaccessible to conscious awareness, or indescribable/ineffable. First, it is almost a given that a chunk that can affect cognition can be brought into awareness. In ACT-R, it is a given.
Yes, a chunk's normal use could be so habitual and fast that we are not normally aware of using it, but we can still direct our awareness to it and find it.

Yes, the sensation of 'seeing red', the so-called qualia, feels ineffable. I take this to be a peculiar sensation associated with contemplating an inherently subjective concept, arising from our knowledge that inherently subjective concepts cannot be communicated! Philosophers writing about 'seeing red' have no trouble finding words, would that they had more.

Let's turn our attention to language-specific knowledge.
It's uncontroversial to say that grammar is not declarative knowledge - native speakers do not know the grammar of their language. Indeed, strikingly, we don't seem to have declarative representations of even the most basic grammatical categories - nouns, verbs, prepositions, adjectives, adverbs, noun phrases, clauses: Do you experience any of these as natural mental categories? Children who are fluent in their native language struggle to grasp the concept of 'noun' or 'verb'. You don't see students going "OH! THOSE things! Those are called nouns?" - quite the contrary, they scratch their heads and have dozens of questions. Is "running" a noun or a verb? Is "myself" a noun? Is "play" a noun, a verb, or an adjective?

The abstract non-leaf categories are just that much more so.

On the other hand, people know by the end of the last word whether a sentence is 'acceptable' or not, and (at least for educated adults?) there seems to be a clear distinction between ungrammatical and nonsense. Viz Jabberwocky.

In ACT-R, this implies that grammar is implicit knowledge, and it must primarily be stored as procedural knowledge, as rules. So... we imagine a bunch of parsing rules, that look at sequences of words and somehow group them and assign meaning. And for speaking, a bunch of rules that take intention, context and concept and create sequences of words.

Don't they have to be two separate sets of rules? How does that work? How do they stay in sync?

This does fit with the observation that children and 2nd language learners can understand a much richer language than they can generate, including grammatical forms. And we can understand idiosyncratic speech and dialect that we ourselves cannot fluently produce. Does anybody not understand Yoda in Star Wars? And how many people can fluently imitate Yoda's dialect?

On the other hand, we can hear and understand (even vaguely) a new word, and then use it immediately. Kids pick up slang instantly. Adults pick up jargon in their areas of interest - computer geeks, chemists, biologists. lawyers, doctors, woodworkers, farmers...
But here we are talking about content words, or sometimes - confusingly - multiword idioms, multiword verbs and separable verbs: "old hand", "early bird", "screw up", "get over", "take to".

Try this: What does "farm" mean? Now compare: What does "myself" mean? Or "there"? Or "to"?

Some Conclusions

The stuff we can learn and change quickly is declarative. Content words are stored as declarative bundles, but not just individual words, any idiom gets the same treatment - idioms, multiword verbs, separable verbs. The word that appears to be a preposition in a separable verb is not necessarily functioning as a preposition, such as the "over" in "get over". Seems to me that "get over" is a separable transitive verb. Individual words, and 'conventional' combinations of words are stored as declarative chunks, forming the lexicon.

Grammar - patterning beyond what comes from the lexicon - is handled entirely procedurally.
Syntax is never represented in declarative form - unless we study grammar or linguistics!
There are no declarative chunks for grammar categories or grammatical rules.
Parsing does not involve activation, construction, or modification of representations of abstract syntactic entities such as VP, Head-Phrase, Determiner, etc. Similarly these things are not used in language generation.

During parsing and generation, the intermediate form must be a mixture of words, lexemes (abstract word concepts), and concept/chunks. There are abstract concepts that are expected and constructed, but these are explicable concepts not grammatical categories. I'm proposing that, for example, the word 'while' is followed by a description of a process or condition - not by an SC (subject-complement?)

And here's a thing that causes no end of confusion: Because we use rules to parse and generate language, those rules become tuned with use, and we compress sequences of steps into single steps. In ACT-R terms, we compile new rules to accelerate and shorten common mental procedures. Over time, our declarative linguistic knowledge becomes baked into rules derived from declarative knowledge and reinforced by practice, blurring the distinction between declarative and procedural and creating a vast landscape of 'fuzzy syntax' - infrequent constructions that are primarily declarative, high-frequency constructions that are purely procedural and automatic, and every possible intermediate blend. Linguistics treats this as the hinterlands, populated by scruffy lawless barbarians - but maybe this is where the native speakers live.

Justification

Only declarative memory can consistently change quickly (fractions of a second for new chunks) and declarative memory is what our language understanding rules and language production rules have in common. Language knowledge that can quickly go from understanding to production (or the other way 'round?) is declarative.

The divergence of behavior from factual knowledge - when what we say diverges from what we do - is a hallmark of knowledge being stored in both declarative and procedural form: Procedural knowledge ultimately determines what we do; What we explain comes from declarative memory. That we can use function words without being able to define them indicates that our use of them is procedural.

For example, "while" is followed by some kind of description of a process or a condition. I'm sure you and your parser expect and look for a kind of thing following the word 'while'. You might say "while something was going on" or "while something was true". Such an abstract construction - "something was going on" - is as close as we get to a syntactic non-terminal.

Speculation
Maybe there are some language skills (rules) that can work both comprehension and production. For example, if we partially parse what we are planning to say, then the comprehension system could modify, interfere with, or even abort something before or during production. To what extent this happens pre-verbally - before we hear ourselves saying it - I don't know.

Predictions
  • Reading, understanding and generating language that uses common words and constructions will require minimal access to declarative memory. To the extent that ACT-R allows it as cognitively plausible, comprehending and generating common constructions will be handled entirely by rules. When you hear "Good morning" you don't need to access memory to determine that "good" is an adjective, or that "morning" is a noun, or to recognize or construct a representation of a "noun phrase", or look up this idiom and discover that it is marked "conventional greeting". If procedural rules can translate directly from two words to an intentional signal ('speaker is greeting me in a customary way') then I'm sure they do, with minimal memory access.

  • Generating and comprehending high-speed speech (compressed, or e.g. in competitive debate) will require a much higher percentage of formulaic, familiar words, idioms and constructions.

  • Bet there's an inverse correlation between flexibility of idioms and frequency of use. Not for every idiom, but as a general trend.

2009-10-05

language knowledge in ACT-R

Considering the overwhelming number of papers and studies in the area, it is amazing that today (October 2009) there is no coherent, integrated theory of human language understanding and production. It's not that we lack a complete, highly-validated theory - it's that we lack a theory.

There are lots (and I mean lots) of formal models of grammar.
There are lots (and I mean lots) of experiments probing the human process of understanding and producing language.

And yet, as of today (Oct 5th 2009) I can't find an integrated theory that offers a cognitively plausible account of how a sentence is understood, or generated. It's not that we have one or more theories-with-shortcomings. What we have are hundreds of theories that cover at most a fraction of the area - even if you limit 'the area' to reading and comprehending (or generating) one factual sentence.

When I got into mind design, I thought the big unknowns would be things like compassion, ethics, emotions, self-model, episodic memory. And those are all big unknowns. I didn't expect that top-of-list would be 'language'.

So in my next post, I'll spend some time trying to assemble and make some sense of the little sample I've made of the vast literature on human language processing.

Labels: , ,