TY - JOUR
T1 - Hallucitation in Scientific Writing
T2 - Exploring Evidence from ChatGPT Versions 3.5 and 4o in Responses to Selected Questions in Librarianship
AU - Oladokun, Bolaji David
AU - Enakrire, Rexwhite Tega
AU - Emmanuel, Adefila Kolawole
AU - Ajani, Yusuf Ayodeji
AU - Adetayo, Adebowale Jeremy
N1 - Publisher Copyright:
© 2025 Bolaji Oladokun, Rexwhite Enakrire, Adefila Emmanuel, Yusuf Ajani, Adebowale Adetayo.
PY - 2025
Y1 - 2025
N2 - The rapid adoption of AI in academic writing, particularly with tools like ChatGPT, has raised significant concerns regarding the accuracy of generated content. This study explores the phenomenon of “hallucitation” in scientific writing, where AI models fabricate citations, analyzing responses from ChatGPT versions 3.5 and 4o in the context of librarianship. Through an experimental design, scientific content with citations was generated and systematically verified using Google Scholar and the publisher’s website. The findings reveal a disturbingly high frequency of false or non-existent citations—42.9% in ChatGPT-3.5 and 51.8% in ChatGPT-4o. Despite slight improvements in citation accuracy from version 3.5 to 4o, with accuracy rates of 3.92% and 6.35%, respectively, both versions exhibit significant limitations. Notably, ChatGPT 3.5 frequently generated completely fabricated sources, while ChatGPT-4o introduced subtle errors, such as mismatched journals. The study indicates no significant difference in accuracy between the two versions, underscoring the persistent risks associated with AI-generated citations. These findings highlight the urgent need for rigorous verification of AI-generated content to safeguard the integrity of scholarly work.
AB - The rapid adoption of AI in academic writing, particularly with tools like ChatGPT, has raised significant concerns regarding the accuracy of generated content. This study explores the phenomenon of “hallucitation” in scientific writing, where AI models fabricate citations, analyzing responses from ChatGPT versions 3.5 and 4o in the context of librarianship. Through an experimental design, scientific content with citations was generated and systematically verified using Google Scholar and the publisher’s website. The findings reveal a disturbingly high frequency of false or non-existent citations—42.9% in ChatGPT-3.5 and 51.8% in ChatGPT-4o. Despite slight improvements in citation accuracy from version 3.5 to 4o, with accuracy rates of 3.92% and 6.35%, respectively, both versions exhibit significant limitations. Notably, ChatGPT 3.5 frequently generated completely fabricated sources, while ChatGPT-4o introduced subtle errors, such as mismatched journals. The study indicates no significant difference in accuracy between the two versions, underscoring the persistent risks associated with AI-generated citations. These findings highlight the urgent need for rigorous verification of AI-generated content to safeguard the integrity of scholarly work.
KW - academic integrity
KW - AI-generated citations
KW - ChatGPT
KW - Hallucination
KW - librarianship
UR - https://www.scopus.com/pages/publications/105003189116
U2 - 10.1080/19322909.2025.2482093
DO - 10.1080/19322909.2025.2482093
M3 - Article
AN - SCOPUS:105003189116
SN - 1932-2909
VL - 19
SP - 62
EP - 92
JO - Journal of Web Librarianship
JF - Journal of Web Librarianship
IS - 1
ER -