Last Updated:
IIT Delhi and FSU Jena find AI models excel at basic tasks but struggle with scientific reasoning, highlighting limits for lab safety and research applications.
Study finds leading AI models excel at basic tasks but struggle with scientific reasoning.(Representational Image)
Researchers from the Indian Institute of Technology (IIT) Delhi and Friedrich Schiller University Jena (FSU Jena), Germany have found that while leading Artificial Intelligence (AI) models perform well in basic tasks, they struggle with scientific reasoning. Their findings, published in Nature Computational Science, show that these AI models have important limitations that could be risky if used in research without proper supervision.
The team, led by NM Anoop Krishnan, associate professor at IIT Delhi, and Kevin Maik Jablonka, professor at FSU Jena, developed “MaCBench”, the first benchmark designed to test how vision-language AI models handle real-world tasks in chemistry and materials science.
The results revealed a notable paradox. AI models achieved near-perfect results in basic perception tasks like identifying lab equipment but struggled with spatial reasoning, combining information from multiple sources, and multi-step logical thinking, skills necessary for real scientific discovery.
“Our findings represent a crucial reality check for the scientific community. While these AI systems show remarkable capabilities in routine data processing tasks, they are not yet ready for autonomous scientific reasoning. The strong correlation we observed between model performance and internet data availability suggests these systems may be relying more on pattern matching than genuine scientific understanding,” Krishnan explained.
One concerning finding was related to laboratory safety. “While models excelled at identifying laboratory equipment with 77 pc accuracy, they performed poorly when evaluating safety hazards in similar laboratory setups, achieving only 46 pc accuracy. This disparity between equipment recognition and safety reasoning is particularly alarming,” said Kevin Maik Jablonka.
“It suggests that current AI models cannot bridge the gaps in tacit knowledge that are crucial for safe laboratory operations. Scientists must understand these limitations before integrating AI into safety-critical research environments,” he added.
The researchers also conducted ablation studies to understand where AI models fail. They found that models performed much better when information was presented as text rather than images, showing that current AI struggles with multimodal integration, a key requirement for scientific work.
ALSO READ: IIT Roorkee Launches Advanced Certificate In Quantum Computing
These findings have implications beyond chemistry and materials science, pointing to broader challenges for AI in scientific research. Developing reliable AI assistants will require improvements in training methods that focus on real understanding rather than just pattern recognition.
“Our work provides a roadmap for both the capabilities and limitations of current AI systems in science. While these models show promise as assistive tools for routine tasks, human oversight remains essential for complex reasoning and safety-critical decisions. The path forward requires better uncertainty quantification and frameworks for effective human-AI collaboration,” said Indrajeet Mandal, IIT Delhi PhD scholar.
A team of reporters, writers and editors brings you news, analyses and information on college and school admissions, board and competitive exams, career options, topper interviews, job notifications, latest in …Read More
A team of reporters, writers and editors brings you news, analyses and information on college and school admissions, board and competitive exams, career options, topper interviews, job notifications, latest in … Read More
October 14, 2025, 3:46 PM IST
Read More
Source link
[ad_3]