ChatGPT Models Surpass Human Benchmark in Neurology Exams

In a study featured in JAMA Network Open, two versions of ChatGPT Large Language Models (LLMs) have demonstrated a remarkable ability to outperform human neurology students in board-style examinations. This development marks a significant milestone in the application of artificial intelligence (AI) in the medical field, particularly in neurology.

AI’s stride in neurology exams

Researchers employed LLM 1 (ChatGPT version 3.5) and LLM 2 (ChatGPT version 4) to tackle questions from the American Board of Psychiatry and Neurology (ABPN) question bank. The study’s key finding was that LLM 2 achieved an impressive 85% success rate, surpassing the human average of 73.8%. Notably, this performance was achieved without the models having access to the internet or undergoing neurology-specific tuning.

The study adhered to rigorous scientific protocols, including the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines. The comparison with human neurology students involved a range of questions, classified as either lower-order, focusing on basic understanding and memory, or higher-order, requiring application, analysis, and evaluative thinking.

The implications of AI in medical fields

The superior performance of LLM 2, especially in higher-order questions, underscores the rapid advancements in AI and its potential applications in clinical settings. This is particularly relevant as AI continues to cross into domains traditionally reserved for human expertise, such as medicine, military, education, and research.

The use of AI in clinical neurology has been expanding, with tasks ranging from diagnosis to treatment planning and prognosis. The study highlights how AI, especially transformer-based architectures like ChatGPT, can aid and sometimes replace human roles in these fields.

Balancing AI and human expertise

While the results are promising, they also open up discussions about the balance between AI and human expertise in sensitive fields like medicine. The study’s authors emphasize that AI’s strengths in memory-based tasks, compared to those requiring deep cognition, indicate a complementary role rather than a replacement of human medical experts.

The study’s findings are a testament to the potential of AI in enhancing medical practices and educational tools. However, it also underscores the need for ongoing evaluation and refinement of these AI systems to ensure they augment human expertise effectively.

The study from JAMA Network Open reveals a significant leap in AI capabilities, particularly in the medical field of neurology. The results demonstrate AI’s prowess in complex analytical tasks and open the door to new possibilities in medical education and practice. The future of AI in medicine appears bright, with these technologies poised to play an increasingly supportive role alongside human professionals.

Source: https://www.cryptopolitan.com/chatgpt-surpass-human-in-neurology-exams/