Now the Humanities Can Disrupt "AI" - Public Books https://t.co/USKez3Q4QF
— Josh Marshall (@joshtpm) February 21, 2023
According to an influential paper (that Google sought to quash), large language models of this kind are best understood as “stochastic parrots”—programs that generate plausible text in response to a user’s prompt without benefit of any human-like understanding.
Why? Because:
Much of what is now hyped as “AI” is still a form of machine learning (in which learning denotes a program’s ability to update the weights in a statistical calculation in order to “optimize” for useful prediction). At bottom such technology entails mining vast troves of data whether the task in question is predicting a consumer’s creditworthiness, the next move in a game of Go, the likely progress of a hurricane, or the next few sentences in a sequence of words following a user’s prompt. The “intelligence” of such systems depends on fitting human-generated data to statistical functions that readers might imagine as vastly complex and multi-dimensional variations on an old-fashioned bell curve.
Basically it’s my cellphone, but the dataset is all the sentences I’ve written on this phone. Other than that…
Actually my cellphone is part of this story:
In a watershed moment more than a decade ago, Image Net (an archive of millions of images scraped from the web and labeled by an army of human pieceworkers hired through Amazon’s “Mechanical Turk”) provided a benchmark dataset to determine which machine vision classifier could advance the state-of-the-art for correctly identifying, say, a kangaroo. The success of a particular technique that came to be known as “deep learning” (with “deep” connoting the layers in a virtual architecture for adjusting statistical weights) ushered in a new paradigm for data-mining at scale.
With the explosion of human data on the web, smartphones, social media, and an ever-expanding “internet of things,” “Big Data” became a form of capital, while scale—the quantity of training data and the size of the model that mined it—promised to deliver new kinds of power and knowledge. But these celebrated advances, as AI Now Institute co-founder Meredith Whittaker writes in an important paper, “were not due to fundamental scientific breakthroughs.” They were instead “the product of significantly concentrated data and compute resources that reside in the hands of a few large tech corporations.” Such AI, to borrow the title of Shoshana Zuboff’s best-seller, was both the handmaiden to and product of the “Age of Surveillance Capitalism.” When, in 2016, a deep learning model defeated the world’s best Go player (after training on millions of the best human plays), the term “AI” was ready to return in a blaze of glory.So “Big Tech” is pushing AI because Big Tech owns AI. Which isn’t really intelligent. It’s not really as smart as a parrot. But there’s money to be made hyping AI. Just like there was in virtual reality. Until Meta invested heavily in it and it went…nowhere. And took Meta with it.
In a watershed moment more than a decade ago, Image Net (an archive of millions of images scraped from the web and labeled by an army of human pieceworkers hired through Amazon’s “Mechanical Turk”) provided a benchmark dataset to determine which machine vision classifier could advance the state-of-the-art for correctly identifying, say, a kangaroo. The success of a particular technique that came to be known as “deep learning” (with “deep” connoting the layers in a virtual architecture for adjusting statistical weights) ushered in a new paradigm for data-mining at scale. That is not to deny that ChatGPT benefits from enlarged scale.
….
But the real secret sauce behind OpenAI’s makeover is grueling human labor. The effort to detoxify ChatGPT required Kenyans earning less than $2 per hour to label graphic content at high speed (including “child sexual abuse, bestiality, murder, suicide, torture, self harm, and incest”). Although little has been written about the conditions for the people who provided the necessary “reinforcement learning through human feedback” to “align” various models since GPT-3, we know that OpenAI recently hired hundreds of contract workers to upgrade the autogeneration of code, furthering an effort to “disrupt” the professional labor of programmers which began with the controversial “scraping” of open source code.
So much for AI replacing human labor.
AI is described by computer scientist Yejin Choi as “a mouth without a brain.” Supplying the brain is the stumbling block. That brain is the key. You probably saw the hype that Microsoft’s ChatGPT will vanquish Google as a search engine. Not so fast:
Nor, for a number of reasons can ChatGPT reliably take the place of Wikipedia or a conventional search engine. As the novelist Ted Chiang shows in his elegant analysis, to constitute a trustworthy replacement for search, an LLM, unlike ChatGPT, would need to train on high-quality data and avoid “outright fabrication.” As if to illustrate Chiang’s point, when Google unveiled its new chatbot, Bard, the company somehow neglected to fact-check the erroneous content displayed on its demo. This bizarre failure of Google to “google” cost the company more than $100 billion in market capitalization.The adage from high school English classes was: “If you can’t dazzle ‘em with brilliance, baffle ‘em with bullshit.” It’s actually good advice for recognizing hype and puffery and even trolls. AI is neither a mouth nor a brain: it’s a series, admittedly a complex one, but still, of programmed responses. If you stop here and go watch this video, you’ll get a quick primer on set theory and Russell’s paradox. You’ll also get a lesson in predicates, in language and, by analogy, in mathematics. I don’t think the analogy is strong enough to prove Russell’s paradox inescapable (though I think it is), and it’s for much the same reason as to why logic (Russell’s real pursuit) is not the limit or the foundation of human thinking. Basically, predicates can (per the presentation) do what set theory can’t: generate “sets” which confirm the paradox but don’t violate the rules of language (creating contradictions in language don’t undermine the grammar of language, or even the usefulness of language. But Russell’s paradox undermines the ultimate value of set theory, and so mathematics, and so the use of logic Russell was pursuing. This is one reason modern Continental philosophy follows Wittgenstein (partly) and focuses on language, not mathematics and formal logic.
A human toddler usually requires just a few examples to recognize that a kangaroo is not an elephant, and that both real-world animals are different than, say, pictures of animals on a sippy cup. And yet, the powerful statistical models now driving “artificial intelligence” (AI)—such as the much-discussed large language model ChatGPT—have no such ability.
The human brain evolved over 500 million years to help people make sense of a world of multifarious objects, within the lived contexts that embed their learning in social relations and affective experiences. Deprived of any such biological or social affordances, today’s machine learning models require arsenals of computer power and thousands of examples of each and every object (pictured from many angles against myriad backgrounds) to achieve even modest capabilities to navigate the visual world. “No silly! The cup is the thing that I drink from. It doesn’t matter that there’s a kangaroo on it–that’s not an animal, it’s a cup!,” said no statistical model ever. But then no toddler will ever “train on” and effectively memorize—or monetize—the entirety of the scrapable internet.
The brain is not just an “organic” computer, nor a biological machine. Although the idea of monetizing the “scrapable” internet is a key point of AI as we now know it. Which is the really interesting story. SkyNet isn’t coming to get; we are.*
*And if you think Tesla never should have put kinda self-driving cars on the road because AI was never going to get to the point of operating one the way a human can, you win an inspired No Prize, True Believer!
No comments:
Post a Comment