ChatGPT fails UT lecturer's exam question

Can text robot ChatGPT answer an exam question correctly while escaping fraud detection? Daniel Braun, assistant professor at the UT, took it to the test. 'It was a good-looking answer, but it was not really a good answer.'

Photo by: RIKKERT HARINK

Photo: UT student on campus works with a laptop, photo for illustration purposes.

The surge in attention for ChatGPT, a 'chatbot' that can produce hyper-realistic texts, has not gone unnoticed by Braun. He is an assistant professor in Industrial Engineering and Business Information Systems and his research includes the use of natural language processing (NLP) and artificial intelligence (AI) in so-called 'knowledge-intensive processes'. 'In my field, it is almost impossible to ignore ChatGPT. For example, I am working on a research project where we are looking at whether artificial intelligence can help with grading exams in higher education.'

Experiment

The small experiment Braun recently conducted with ChatGPT focuses on something else, though. 'It started with questions from fellow lecturers at the UT. They approached me asking how students could use chatbots like ChatGPT.' The assistant professor decides to put it to the test. He waits until he has taken an exam for the course Electronic Commerce and then has ChatGPT answer one of the questions. He gives it one clear instruction: suppose you are a Business Information Technology student at the UT, how would you answer this question?

When Braun sees ChatGPT's answer, he is initially bewildered. 'Wow, this is really good, I thought. But when I looked more closely, I saw that it was actually not a very good answer at all. It was a good-looking answer, but it was not really a good answer. Like a student who has quite a lot of knowledge, but doesn't quite know the answer. It was a bit dodgy. In my grading, I think I would give it less than half the points. Still, it remains impressive that a chatbot can produce such an answer.'

Photo: Daniel Braun.

Detection

The assistant professor decides to extend his experiment. He wants to know whether two different detection tools can discover whether the answer was written by a 'bot'. The first programme is AI Text Classifier, a 'tool' recommended by the UT. Braun decides to make it even a bit more challenging by asking ChatGPT to rephrase the answer so that it cannot be identified as an answer written by a bot. Also, the chat robot has to add a few spelling mistakes, just like students.

The results are different for the two tools, but inconclusive in both cases. 'It is anecdotal evidence and not scientific research, but my experiment shows that detection tools are quite easy to fool by ChatGPT.' And there is another problem, according to the assistant professor. 'If a detection tool says: this is probably a text written by AI, what can we do with that information? It's not one hundred percent sure, rather eighty or ninety percent, but even if it would be, that would not be sufficient evidence for an examination board.'

According to Braun, taking an exam on campus is therefore the only way to completely prevent the use of ChatGPT for the time being. 'That's actually quite sad. Fortunately, there are other forms of testing, such as writing a thesis. This is more about the process and, as a teacher, I see that process taking place through weekly meetings. But this is not very scalable. Other, more standardised, forms of testing remain necessary.' As a follow-up, Braun therefore hopes to look at other ways of detection. 'One could analyse previous student answers to determine someone's writing style. However, the question remains: what do we do with this information?'

Obsolete

Ultimately, the answer does not lie in detection technology, Braun believes. According to him, educational institutions need to think more fundamentally about the use (and misuse) of ChatGPT. 'You can also see it as a powerful tool, like a calculator. Then it can even be beneficial for education.' The rules for ChatGPT are also not substantially different from other tools, the assistant professor says. ‘If students want to use a chatbot, they just have to indicate this clearly in advance.’

Furthermore, chatbots might be overestimated for the time being. 'ChatGPT does have limitations. It is very good at reproducing things it has seen before, but it cannot really understand them. Also, ChatGPT's knowledge dates back to 2021. Updating that knowledge is extremely expensive. As long as we regularly update and adapt our education, ChatGPT cannot keep up.'

Even more important is the question of whether universities should teach knowledge that a chatbot can easily reproduce, says Braun. ‘What is the value of that? It won't help students in their later work either, because they can use ChatGPT there too. I don't really see it as a problem if current knowledge is made obsolete by chatbots. Then universities should learn students other things.'