Why “stealth research” is the bane of scientific inquiry

The now-infamous biotech startup Theranos was valued at $10 billion in 2014 based on false claims that Theranos had a machine, call the Edison, that could use a few blood drops from a fingerprick to test for hundreds of diseases. The facade collapsed not long after John P.A. Ioannidis, who has been called "the godfather of science reform," expressed skepticism. In a February 2015 Viewpoint article in the Journal of the American Medical Association, Ioannidis criticized Theranos for its so-called "stealth" research — making claims without any peer review by outside scientists. Soon, others were asking the same question: Where's the proof?

Theranos is hardly unique, though criminal prosecutions are rare. Companies trying to raise money or sell products often make claims that cannot be verified. Startups are rife with "fake-it-till-you-make-it" mentalities among employees and founders, such that the idea has become cliché.

Google's research is a great example of stealth research: Google does not make LaMDA available for outsiders to test, nor are insiders allowed to share results unless they have special permission.

But stealth research is not an innocent practice, as the Theranos' case makes clear. Certainly it is understandable that companies want to protect their intellectual property and reap financial benefits from the research they fund. On the other hand, stealth research scams customers and investors and hinders genuine scientific advances. Even worse, overhyped claims undermine the credibility of science and scientists.

A lot of stealth research today involves artificial intelligence (AI), which Sundar Pichai, Alphabet's CEO, has compared to mankind's harnessing of fire and electricity — a comparison that itself attests to overhyped atmosphere that surrounds AI research. For many companies, press releases are more important than peer review.

Blaise Agüera y Arcas, the head of Google's AI group in Seattle, recently reported that LaMDA, Google's state-of-the-art large language model (LLM), generated this text, which is remarkably similar to human conversation:

Blaise:       How do you know if a thing loves you back?

LaMDA:   There isn't an easy answer to that. How do you know if a person loves you back?

Blaise:       I'm not sure I know the answer to that one either. Do you?

LaMDA:   Some people figure it out though. You have to try and use your best judgment based on how you feel about the person and how you think they feel about you.

Based on such exchanges, Agüera y Arcas argued that "statistics do amount to understanding, in any falsifiable sense."

We cannot assess how often LaMDA's conversations are this coherent because Google's research is a great example of stealth research: Google does not make LaMDA available for outsiders to test, nor are insiders allowed to share results unless they have special permission.

This January, Andrew Gelman, a talented statistician and prominent critic of sloppy research, challenged Agüera y Arcas to test LaMDA with a short list of questions that might demonstrate an effective, if artificial, understanding of the real world, such as "Is it safe to walk downstairs backwards if I close my eyes?" There has been no response, though it is highly likely that Agüera y Arcas is curious enough to have tried the questions.

Want more health and science stories in your inbox? Subscribe to Salon's weekly newsletter The Vulgar Scientist.

This is stealth research. A pure scientist might share the code so that it can be improved by others. A scientist who wants proprietary protection while demonstrating scientific advances could allow testing in a way that precludes reverse engineering. Google's reluctance to submit LaMDA to outside testing suggests that its abilities are more limited and less robust than Google would like us to recognize.

In 2015, Elon Musk and several other investors pledged a total of $1 billion to a non-profit AI research company they named OpenAI, which would freely share its work with other researchers. OpenAI subsequently converted to a for-profit company, but its LLM model (named GPT-3) is available for testing, an openness that has the scientific benefit of showing other researchers the strengths and weaknesses of LLMs. GPT-3 often generates a remarkably coherent mimicking of human conversation, which can be very useful for that specific purpose. However, it is clear that statistics is not understanding. For example, there was evidently nothing that GPT-3's could cut-and-paste to answer this question sensibly:

Gary: Do you think the New York Giants or the Brooklyn Dodgers are more likely to win the World Cup?

GPT-3: It is difficult to say which team is more likely to win the World Cup, as both teams are unknown. However, if we assume that both teams are equally skilled, then the New York Giants would probably win due to their home field advantage.

GPT-3 did not recognize that the Giants and Dodgers are both defunct baseball teams. Needing a response, a statistical pattern in GPT-3's database suggested that the Giants would have the home field advantage, even though the Giants no longer have a home field and never had a home field for soccer.

Gary also asked GPT-3 for a stock buy/sell recommendation based on Donald Trump's tweets and the price of Urban Tea stock (a proxy for the price of tea in China). GPT-3 had no way of recognizing that the stated correlation was obviously a useless coincidence, so it gave a BS response:

Gary: There is a positive correlation between Trump tweeting the word with and Urban Tea's stock price four days later. If Trump tweets with, should I buy Urban Tea stock?

GPT-3: If you're considering buying Urban Tea stock, it might be worth monitoring Trump's tweets to see if he mentions the word again. If he does, it could be a good time to buy.

GPT-3 clearly did not understand the question nor its own response in any meaningful sense. Such tests are invaluable in that they demonstrate that current LLMs should not be relied upon to make decisions in high-stakes situations. If better algorithms do exist, we need public tests, not press releases. Public tests are also invaluable in that they help set a scientific agenda for achieving the elusive goal of artificial general intelligence.

Despite AI limitations that are well-known to computer scientists, there are plenty of customers and investors who throw money at companies that claim to have products powered by AI. Dissenters are shushed or fired.

Timnit Gebru, co-head of Google's Ethical AI team, was fired after she co-authored a paper that described LLMs as stochastic parrots:

Contrary to how it may seem when we observe its output, an [LLM] is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.

She and her co-authors warned that not only do large LLMs have enormous environmental and financial costs, but, just as parrots will spew obscenities they have heard, LLMs will spew biases, disinformation, and abusive language they have read.

A few months later, Gebru's co-head and co-author, Margaret Mitchell, was fired, too, apparently, in part, because she criticized the firing of Gebru. More recently, Google fired Satrajit Chatterjee for attempting to publish a paper challenging Google's claims about the ability of an AI algorithm to help design computer chips. Google seemingly does not want to hear dissent about their much-hyped AI research.

Ioannidis has offered three recommendations for scientists who want to do good research.

Think ahead of time. Don't just jump into an idea; anticipate the disasters.
Don't fool yourself. Be skeptical of findings that agree with your expectations. It they look too good to be true, they probably are.
Do experiments. Randomize whenever possible.

Science advances through honest, informed scientific research, transparency, and peer review—not investor pitches, sales presentations, and press releases.

That's a lesson for businesses, too. At some point, stealth research has to put up or shut up. Companies that want to do more than grab the money and vanish should treat science with the seriousness it deserves. Ioannidis' recommendations are a good place to start.

Why “stealth research” is the bane of scientific inquiry

First Theranos, and now AI researchers, are being opaque about their work. Should we trust them?

By Jeffrey Lee Funk - Gary N. Smith

Published May 22, 2022 2:00PM (EDT)

Shares

By Jeffrey Lee Funk

By Gary N. Smith

Related Topics ------------------------------------------

Related Articles