ChatGPT: The End of Online Exam Integrity?

Background: This study addresses the burgeoning impact of Large Language Models (LLMs) such as ChatGPT on the integrity of online examinations and seeks to demonstrate latent and advanced LLM reasoning capabilities. The research devises an iterative self-reflective strategy for invoking critical thinking and higher-order reasoning in LLMs based on multimodal exam questions using visuals and text. Methods: The proposed strategy was demonstrated and evaluated on real exam questions by subject experts and the performance of ChatGPT (GPT-4) with vision was estimated on an additional dataset of 600 text descriptions of multimodal exam questions, focusing on its response to complex, visual, and textual prompts. Results: The results indicate that latent multi-hop reasoning capabilities can be invoked using the proposed self-reflective strategy, which can effectively steer ChatGPT towards correct answers by integrating critical thinking from each modality into the final response. Meanwhile, ChatGPT demonstrated considerable proficiency in being able to answer multimodal exam questions across 12 subjects. Conclusions: This work challenges findings from recent studies which claim that existing LLMs are unable to reason on complex multimodal tasks, and therefore this study underscores the need for enhanced online exam security measures like proctoring software, that can mitigate the potential misuse of AI technologies in educational settings.

Read more here: External Link