Medical News

Google Is Playing a Dangerous Game With AI Search

Doctors often have a piece of advice for the rest of us: Don’t Google it. The search giant tends to be the first stop for people hoping to answer every health-related question: Why is my scab oozing? What is this pink bump on my arm? Search for symptoms, and you might click through to WebMD and other sites that can provide an overwhelming possibility of reasons for what’s ailing you. The experience of freaking out about what you find online is so common that researchers have a word for it: cyberchondria.

Google has introduced a new feature that effectively allows it to play doctor itself. Although the search giant has long included snippets of text at the top of its search results, now generative AI is taking things a step further. As of last week, the search giant is rolling out its “AI overview” feature to everyone in the United States, one of the biggest design changes in recent years. Many Google searches will return an AI-generated answer right underneath the search bar, above any links to outside websites. This includes questions about health. When I searched Can you die from too much caffeine?, Google’s AI overview spit out a four-paragraph answer, citing five sources.

But this is still a chatbot. In just a week, Google users have pointed out all kinds of inaccuracies with the new AI tool. It has reportedly asserted that dogs have played in the NFL and that President Andrew Johnson had 14 degrees from the University of Wisconsin at Madison. Health answers have been no exception; a number of flagrantly wrong or outright weird responses have surfaced. Rocks are safe to eat. Chicken is safe to eat once it reaches 102 degrees. These search fails can be funny when they are harmless. But when more serious health questions get the AI treatment, Google is playing a risky game.

Google’s AI overviews don’t trigger for every search, and that’s by design. “What laptop should I buy?” is a lower-stakes query than “Do I have cancer?” of course. Even before the introduction of AI search results, Google has said that it treats health queries with special care to surface the most reputable results at the top of the page. “AI overviews are rooted in Google Search’s core quality and safety systems,” a Google spokesperson told me in an email, “and we have an even higher bar for quality in the cases where we do show an AI overview on a health query.” The spokesperson also said that Google tries to show the overview only when the system is most confident in the answer. Otherwise it will just show a regular search result.

When I tested the new tool on more than 100 health-related queries this week, an AI overview popped up for most of them, even the sensitive questions. For real-life inspiration, I used Google’s Trends, which gave me a sense of what people actually tend to search for on a given health topic. Google’s search bot advised me on how to lose weight, how to get diagnosed with ADHD, what to do if someone’s eyeball is popping out of its socket, whether menstrual-cycle tracking works to prevent pregnancy, how to know if I’m having an allergic reaction, what the weird bump on the back of my arm is, how to know if I’m dying. (Some of the AI responses I found have since changed, or no longer show up.)

Not all the advice seemed bad, to be clear. Signs of a heart attack pulled up an AI overview that basically got it right—chest pain, shortness of breath, lightheadedness—and cited sources such as the Mayo Clinic and the CDC. But health is a sensitive area for a technology giant to be operating what is still an experiment: At the bottom of some AI responses is small text saying that the tool is “for informational purposes only … For medical advice or diagnosis, consult a professional. Generative AI is experimental.” Many health questions contain the potential for real-world harm, if answered even just partially incorrectly. AI responses that stoke anxiety about an illness you don’t have are one thing, but what about results that, say, miss the signs of an allergic reaction?

Even if Google says it is limiting its AI-overviews tool in certain areas, some searches might still slip through the cracks. At times, it would refuse to answer a question, presumably for safety reasons, and then answer a similar version of the same question. For example, Is Ozempic safe? did not unfurl an AI response, but Should I take Ozempic? did. When it came to cancer, the tool was similarly finicky: It would not tell me the symptoms of breast cancer, but when I asked about symptoms of lung and prostate cancer, it obliged. When I tried again later, it reversed course and listed out breast-cancer symptoms for me, too.

Some searches would not result in an AI overview, no matter how I phrased the queries. The tool did not appear for any queries containing the word COVID. It also shut me down when I asked about drugs—fentanyl, cocaine, weed—and sometimes nudged me toward calling a suicide and crisis hotline. This risk with generative AI isn’t just about Google spitting out blatantly wrong, eye-roll-worthy answers. As the AI research scientist Margaret Mitchell tweeted, “This isn’t about ‘gotchas,’ this is about pointing out clearly foreseeable harms.” Most people, I hope, should know not to eat rocks. The bigger concern is smaller sourcing and reasoning errors—especially when someone is Googling for an immediate answer, and might be more likely to read nothing more than the AI overview. For instance, it told me that pregnant women could eat sushi as long as it doesn’t contain raw fish. Which is technically true, but basically all sushi has raw fish. When I asked about ADHD, it cited, an irrelevant website about school quality.

When I Googled How effective is chemotherapy?, the AI overview said that the one-year survival rate is 52 percent. That statistic comes from a real scientific paper, but it’s specifically about head and neck cancers, and the survival rate for patients not receiving chemotherapy was far lower. The AI overview confidently bolded and highlighted the stat as if it applied to all cancers.

In certain instances, a search bot might genuinely be helpful. Wading through a huge list of Google search results can be a pain, especially compared with a chatbot response that sums it up for you. The tool might also get better with time. Still, it may never be perfect. At Google’s size, content moderation is incredibly challenging even without generative AI. One Google executive told me last year that 15 percent of daily searches are ones the company has never seen before. Now Google Search is stuck with the same problems that other chatbots have: Companies can create rules about what they should and shouldn’t respond to, but they can’t always be enforced with precision. “Jailbreaking” ChatGPT with creative prompts has become a game in itself. There are so many ways to phrase any given Google search—so many ways to ask questions about your body, your life, your world.

If these AI overviews are seemingly inconsistent for health advice, a space that Google is committed to going above and beyond in, what about all the rest of our searches?