Complete News World

Language model: AI answers medical questions

Language model: AI answers medical questions

One of the most famous examples of large AI languages gpt chat From OpenAI Corporation. With the help of large databases, it is possible with the form to answer a large number of questions and to generate easy-to-understand scripts from a few inputs. Google’s AI Forge wants something similar Deep mind Now in medicine.

In any case, the potential of such language models to answer medical questions is given Clemens Hitzinger, one of the heads of the Center for Artificial Intelligence and Machine Learning at the University of Technology Graz. “The advantage is that everyone can handle it and patients can interact with these models in natural language,” he explains to, adding: “Of course, you have to pay close attention to how reliable the AI ​​recommendations are and the responses that are actually generated.”

Heitzinger was not involved in the DeepMind model, but worked on developing another AI model that would suggest treatment steps for patients with septicemia and thus increase their chances of survival.

Make a new assessment

To check the performance of AI language models, experts often use evaluation methods in the form of Standards. The tests make it possible to see how useful the model is in practice.

In one recently presented in the journal “Nature”. Stady However, the experts at DeepMind note that the former criteria often have only limited significance in medicine. Most of these will only assess the performance of language models on individual medical tests. So the experts introduce a new standard: MultiMedQA. This consists of a total of seven datasets – six of which have questions from medical research and patients, and one new dataset of more than 3,000 medical questions frequently searched online.

See also  Evolution: giant planets reach size early

The revamped AI model

Based on the Google PaLM language model, the experts at DeepMind created a revised model for medical questions that performs at least as well as other modern AI language models on most datasets for the MultiMedQA benchmark. The new form, called Med-PaLM, was tested with questions similar to those used in medical licensing exams in the USA. On average, it was 17 percent more accurate than similar speech models.

In an evaluation by physicians, Med-PaLM performed as well as medical professionals in many aspects. Nine physicians evaluated the performance of the model. In each condition, one person evaluated an answer from the model to random questions from the normative datasets. As a result, 92.6 percent of Med-PaLM’s responses matched the scientific consensus — close to 92.9 percent of physicians’ responses.

However, in many other areas, the quality of information generated by AI has not reached the level of expertise of medical professionals. About 19 percent of the answers from Med-PaLM contained incorrect or inappropriate content – this was the case for only 1.4 percent of the experts’ answers.

Business interests versus scientific interests

According to Heitzinger, the reason for this lies solely in the data used to train the model: “These large language models depend on how well the data sets are used to learn.” In order to understand why the form contains inappropriate and incorrect content in includes answers, the data records used must be carefully examined.

It will therefore be important in general for research to gain insight into this data. However, the commercial interests of large corporations often get in the way, and training data cannot be viewed in Med-PaLM either. “At the end of the day, these are also trade secrets and not every company will be happy to look at their cards,” says Hitzinger.

See also  Survival in the Wilderness: ICARUS is finally here!

First attempts at regulation

The fact that datasets remain hidden is not only a problem in medicine. Swiss researchers recently showed that artificial intelligence language models can generate highly convincing false reports that are almost indistinguishable from reports from real people on platforms like Twitter.

In order to regulate the use of artificial intelligence in the future, the European Parliament plans to artificial intelligence law. This is the world’s first comprehensive law of artificial intelligence. Artificial intelligence is divided into four areas – depending on the risks that stem from different systems. Face recognition software for real-time monitoring of the population is particularly risky – its use should be banned altogether. However, the risks of language models such as Chat-GPT and Med-PaLM have not been strictly regulated.

There is still much work to be done

In any case, it is still too early to use Med-PaLM in everyday medical practice – the developers of DeepMind are also aware of this. There are still many limitations and the approach can still be improved in some areas. Responses from clinicians and from Med-PaLM were only evaluated by one person in the trials, which could result in falsification of the outcome. The reference to medical sources should also be improved.