+8618665898745

GPT-4 Fails The Turing Test: Challenges Remain in The Field Of Intelligent Conversation

Nov 06, 2023

ChatGPT, the superstar of artificial intelligence, has been faced with a question as it continues to move forward: Has it met the Turing test standard for generating outputs indistinguishable from human responses? The latest research suggests that ChatGPT, despite its excellent performance, does not appear to have fully crossed that threshold.

Two researchers at the University of California, San Diego, Cameron Jones, an expert in language, semantics, and machine learning, and Benjamin Bergen, a professor of cognitive science, asked this question by referring to Turing's work 70 years ago. Turing proposed a process for determining whether a machine could achieve a level of intelligence and conversational ability sufficient to fool others into thinking it was human.

Their report is titled "Does GPT-4 Pass the Turing Test?" It can be found on the arXiv preprint server. For the study, they gathered 650 participants to play 1,400 "games" in which participants had a brief conversation with another human or GPT model and were asked to determine who they were talking to.

What the researchers found was remarkable. The GPT-4 model fooled participants 41 percent of the time, while GPT-3.5 fooled them only 5 to 14 percent of the time. Interestingly, humans only succeeded in convincing participants that they were not machines in 63 percent of the trials.

"We found no evidence that GPT-4 passed the Turing test," the researchers concluded. However, they note that the Turing Test still has value in assessing the effects of machine conversations, as a framework to measure smooth social interactions and deception, and in understanding human strategies for adapting to these devices.

However, they also warn that in many cases, chatbots will still be able to communicate in a convincing way. "The 41 percent success rate suggests that AI models may already have the ability to deceive, especially in situations where humans are less alert to the possibility that they may not be talking to a human," the researchers note. AI models that robustly mimic humans could have broad social and economic implications."

The researchers observed that participants who correctly identified AI with people focused on several factors. A model that is too formal or too informal raises suspicions. If their expression is too wordy or too concise, if their grammar or punctuation is unusually good or "unconvincingly" poor, it will also be a key factor in determining whether participants are interacting with humans or machines. In addition, participants were sensitive to responses that sounded too generic.

The researchers suggest that tracking AI models will become increasingly important as they become more fluid and absorb more human-like quirks. "Identifying factors that lead to deception and strategies to mitigate it will become increasingly important," they said. The study reveals that the field of intelligent conversation still faces challenges, but also provides useful insights on how AI models can be improved.

Send Inquiry