Anthropic research: skilled devs make better use of AI, but using AI is bad for learning skills
An Anthropic-sponsored research experiment found that developers need coding skills to benefit from AI tools, but junior developers using AI assistance acquire skills more slowly than those who don’t.
The research, enabled by the Anthropic Fellows Program for investigating AI safety issues, addresses a critical concern: “the problem of supervising more and more capable AI systems becomes more difficult if humans have weaker capabilities.”
In the experiment, 52 mainly junior software engineers with more than one year of Python experience, completed two coding tasks using the Trio library, used for aync and concurrent programming – a library none had used before. Half of the group were allowed to use AI assistance, and half were not. Following these tasks, participants were asked to complete a quiz with 14 questions designed to assess skills including debugging, code reading and comprehension, code writing, and conceptual understanding of the Trio library. No AI assistance was allowed for either group when completing the quiz.
The researchers used four pilot studies with different participants, before conducting the main study, in order to learn how to avoid issues including non-compliance (using AI despite being asked not to) and problems not relevant to the study, such as struggles with Python syntax.
The results of the study raise awkward questions for proponents of AI-assisted development. The first problem, according to the researchers, is that “contrary to previous work finding significant uplift or speedup of AI assistance for coding, our results do not show a significant improvement in productivity if we only look at the total completion time.” The reason was the time spent thinking about what to ask the AI to do and then in composing the queries.
The second problem is that the participants using AI scored significantly worse in the quiz than those not using AI, on average “a 17 percent score difference of 2 grade points.” The biggest difference, the researchers said, was in the debugging questions.
The research considered the question of whether agentic AI, where AI is given more autonomy to complete tasks, would improve productivity. The researchers think that it might, but that “the loss of knowledge is likely even greater in an agentic or autocomplete setting where composing queries is not required.”
All of this leads the researchers to the bleak conclusion that “as companies transition to more AI code writing with human supervision, humans may not possess the necessary skills to validate and debug AI-written code if their skill formation was inhibited by using AI in the first place.”
Anthropic has posted about the research and deserves credit for reporting on these negative results. The company does add that the study is only a first step, that the sample was small, and “there remain many unanswered questions.”
Regarding the lack of productivity gain, Anthropic noted its previous research showing a reduction in time taken to complete tasks by up to 80 percent, and suggested the difference was because in these earlier studies “participants already had the relevant skills.”
According to Anthropic, the research will help it design AI assistance such that it enables both higher productivity and the development of new skills. From this study though, it seems that moment has not yet arrived.