Surpassing Expectations: GPT-3 and Reasoning Tasks
Recent research by UCLA psychologists reveals that the artificial intelligence language model GPT-3 astonishingly performs almost as well as college undergraduates when it comes to solving reasoning problems, typically found on intelligence and standardized tests like the SAT. The findings, published in Nature Human Behaviour, have raised questions about GPT-3’s cognitive processes and whether it is truly mimicking human reasoning or using a fundamentally new approach.
The study’s authors, however, point out that GPT-3 has significant limitations and can fail spectacularly in certain tasks. For instance, it struggles to solve problems that involve physical tasks or tool usage, which are usually easy for humans.
Testing GPT-3’s Problem-Solving Abilities
To test GPT-3’s reasoning capabilities, the UCLA researchers used a set of problems inspired by Raven’s Progressive Matrices. They converted the images into a text format that GPT-3 could process and compared its performance to that of 40 undergraduate students. Not only did GPT-3 achieve an 80% success rate (outperforming the human subjects’ average score of just below 60%), but it also made similar mistakes to the human participants.
The researchers then tested GPT-3 on SAT analogy questions that they believed had never been published online, preventing the AI from relying on its training data. They found that GPT-3 performed better than the average human score in these tasks.
Comparing GPT-3 and GPT-4 with Human Analogies
When tested with analogies based on short stories, GPT-3 did not perform as well as human students. However, its successor, GPT-4, showed improvement over GPT-3. The UCLA researchers are also developing their psychological AI model inspired by human cognition and comparing its abilities to commercial AI models.
Understanding the Cognitive Processes of AI Models
While GPT-3’s performance in reasoning tasks is impressive, researchers are still unsure whether it is genuinely thinking like humans or using a completely different process. To determine this, they would need access to the underlying cognitive processes AI models employ, which requires access to the software and data used to train the models. This step is crucial for deciding the future direction of artificial intelligence research.
The UCLA scientists hope to explore language learning models further and determine if they are starting to “think” like humans or are merely mimicking human thought. Accessing the backend of GPT models would be invaluable for achieving this goal and enabling more decisive research outcomes.