Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Chomsky and the Two Cultures of Statistical Learning (norvig.com)
58 points by atomicnature 8 hours ago | hide | past | favorite | 34 comments




This essay is missing the words “cause” and “causal”. There is a difference between discovering causes and fitting curves. The search for causes guides the design of experiments, and with luck, the derivation of formulae that describe the causes. Norvig seems to be confusing the map (data, models) for the territory (causal reality).

A related* essay (2010) by a statistician on the goals of statistical modelling that I've been procrastinating on:

https://www.stat.berkeley.edu/~aldous/157/Papers/shmueli.pdf

To Explain Or To Predict?

Nice quote

We note that the practice in applied research of concluding that a model with a higher predictive validity is “truer,” is not a valid inference. This paper shows that a parsimonious but less true model can have a higher predictive validity than a truer but less parsimonious model.

Hagerty+Srinivasan (1991)

*like TFA it's a sorta review of Breiman


This essay frequently uses the word "insight", and its primary topic is whether an empirically fitted statistical model can provide that (with Norvig arguing for yes, in my opinion convincingly). How does that differ from your concept of a "cause"?

> I agree that it can be difficult to make sense of a model containing billions of parameters. Certainly a human can't understand such a model by inspecting the values of each parameter individually. But one can gain insight by examing (sic) the properties of the model—where it succeeds and fails, how well it learns as a function of data, etc.

Unfortunately, studying the behavior of a system doesn't necessarily provide insight into why it behaves that way; it may not even provide a good predictive model.


(this is from 2017)

Here's Chomsky quoted in the article, from 1969:

> But it must be recognized that the notion of "probability of a sentence" is an entirely useless one, under any known interpretation of this term.

He was impressively early to the concept, but I think even those skeptical of the ultimate value of LLMs must agree that his position has aged terribly. That seems to have been a fundamental theoretical failing rather than the computational limits of the time, if he couldn't imagine any framework in which a novel sentence had probability other than zero.

I guess that position hasn't aged worse than his judgment of the Khmer Rouge (or Hugo Chavez, or Epstein, or ...) though. There's a cult of personality around Chomsky that's in no way justified by any scientific, political, or other achievements that I can see.


I agree that Chomsky's influence, especially in this century, has done more harm than good.

There's no point minimizing his intelligence and achievements, though.

His linguistics work (eg: grammars) is still relevant in computer science, and his cynical view of the West has merit in moderation.


If Chomsky were known only as a mathematician and computer scientist, then my view of him would be favorable for the reasons you note. His formal grammars are good models for languages that machines can easily use, and that many humans can use with modest effort (i.e., computer programming languages).

The problem is that they're weak models for the languages that humans prefer to use with each other (i.e., natural languages). He seems to have convinced enough academic linguists otherwise to doom most of that field to uselessness for his entire working life, while the useful approach moved to the CS department as NLP.

As to politics, I don't think it's hard to find critics of the West's atrocities with less history of denying or excusing the West's enemies' atrocities. He's certainly not always wrong, but he's a net unfortunate choice of figurehead.


> novel sentence

The question then becomes on of actual novelty versus the learned joint probabilities of internalised sentences/phrases/etc.

Generation or regurgitation? Is there a difference to begin with..?


I'm not sure what you mean? As the length of a sequence increases (from word to n-gram to sentence to paragraph to ...), the probability that it actually ever appeared (in any corpus, whether that's a training set on disk, or every word ever spoken by any human even if not recorded, or anything else) quickly goes to exactly zero. That makes it computationally useless.

If we define perplexity in the usual way in NLP, then that probability approaches zero as the length of the sequence increases, but it does so smoothly and never reaches exactly zero. This makes it useful for sequences of arbitrary length. This latter metric seems so obviously better that it seems ridiculous to me to reject all statistical approaches based on the former. That's with the benefit of hindsight for me; but enough of Chomsky's less famous contemporaries did judge correctly that I get that benefit, that LLMs exist, etc.


My point is, that even in the new paradigm where probabilistic sequences do offer a sensible approximation of language, would novelty become an emergent feature of said system, or would such a system remain bound to the learned joint probabilities to generate sequences that appear novel, but are in fact (complex) recombinations of existing system states?

And again the question being, whether there is a difference at all between the two? Novelty in the human sense is also often a process of chaining and combining existing tools and thought.


He did say 'any known' back in the year 1969 though, so judging it to today's knowns would still not be a justification to the idea's age.

Shannon first proposed Markov processes to generate natural language in 1948. That's inadequate for the reasons discussed extensively in this essay, but it seems like a pretty significant hint that methods beyond simply counting n-grams in the corpus could output useful probabilities.

In any case, do you see evidence that Chomsky changed his view? The quote from 2011 ("some successes, but a lot of failures") is softer but still quite negative.


wasn't his grammar classification revolutionary at the time ? it seems it influenced parsing theory later on

His grammar classification is really useful for formal grammars of formal languages. Like what computers and programming languages do.

It's of rather limited use for natural languages.


Is this essay from 2011?

Chomsky is truly exceptional in the following sense:

I have a yet to witness a man so smart yet who ended up being so profoundly wrong on everything he did in his life.

Both on the linguistics side of things and on his politics.

And to see him at such an advanced age still rejecting what is an absolutely clear and painful proof that all he's done in linguistics was wrong ... how sad.

What a terrible waste of an intellect.


Is this bayesian vs. frequentist?

In one word: no.

In more detail: Chomsky is/was not concerned with the models themselves, but rather with the distinction between statistical modelling in general, and "clean slate" models in particular on the one hand, and structural models discovered through human insight on the other.

With "clean slate" I mean models that start with as little linguistically informed structure as possible. E.g., Norvig mentions hybrid models: these can start out as classical rule based models, whose probabilities are then learnt. A random neural network would be as clean as possible.


Dude is literally in the Epstein Files.

Dude would talk about manufacturing consent, elitist circles, and what Israel is doing with poor Palestinians and then go aboard Israeli-spy, super elitist, consent manufacturing, sex trafficker, rapist, Epstein's private jet. What a total insult to everyone who ever read his things

Hopefully he'll have something to say on it

And?

The article by Peter Norvig is still interesting.


it's just kinda weird and sus.

honestly, I'm surprised Noam is even still alive (aged 97), he is not long for this world and will be gone very soon.


I won't try to defend Chomsky. (Not really a big fan even before this.) But if the mere mention of him is sus to you then I advise you to not study either linguistics or computer science because it's Chomsky normal forms and Chomsky hierarchies all the way down. There's even still people clinging to some iteration of the universal grammar despite the beating it has taken lately.

He's also one of the most prominent political thinkers on the American hard left for the last half century.

There's a joke going around for a while now that you either know Chomsky for his politics, or for his work in linguistics and discrete mathematics, and you are shocked to discover his other work. I guess we can extend that to a third category of fame, or infamy.


The merge operation in the later Chomsky modern linguistics program is similar in a lot of ways to transformer's softmax merging of representations to the next layer.

There's also still a lot to his arguments that we are much more sample efficient. And it isn't like monkies only learn language at a gpt-2 level, bigger brains take us to gpt-8 or whatever. There's a step change where they don't really pick things up linguistically at all and we do. But with a lot more data than we ever get, LLMs seem to distill some of the broad mechanisms what may be our innate ability, though still seems to have a large learned component in us.


Not sure that's relevant? People still discuss what Einstein did, and he's long dead.

(I don't like Chomsky for other reasons, but having an obituary ain't no reason to disregard someone's thoughts.)


Does it matter?

A lot of Chomsky’s appeal I believe is due to his politics as his universal grammar theories turned out to be an academic dead end.

But his politics centers around the moral failings of the West so I think yes, if he was involved in the sexual exploitation of trafficked children, then this would devalue his criticism of the morality of the Western political system.


> But his politics centers around the moral failings of the West so I think yes, if he was involved in the sexual exploitation of trafficked children, then this would devalue his criticism of the morality of the Western political system.

Why would it devalue his criticism assuming he was right?


His criticism of the Western political system was always way too simplicist and why it has immense appeal to college students.

Essentially it can be summed as any Western action must be rationalized as evil, and any anti-west action is therefore good. This is also in line with Christian dualism so the cultural building blocks are already in place.

Then you get Khmer Rouge, Putin, Hezbollah, Iran apologetism or downright support


I am not a fan of Chomsky - the opposite in fact. I was deliberately avoiding judging his actual arguments - to make the point that his own morality undermines his lecturing others on their moral failings.

Who else in tech/AI did they whale?

Are you implying Norvig is a victim or otherwise not responsible for their choices and actions?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: