AI is becoming an increasingly popular tool in many industries, with its potential for improving efficiency seeming almost limitless. Many organisations are trying to make use of it wherever they can. In education however, more caution is needed.
Our research shows that there is still a long way to go before it can be relied upon for high-stakes testing. In his 2023 article on algorithms for assessment, Cesare Aloisi, Head Of Research at AQA, identified some significant issues around trust and accountability in using AI:
With these issues still up in the air, jumping straight into AI for question writing could be a risky strategy.
Here we will address some of the key issues which need to be resolved before AI can be considered for use in authoring high-stakes tests, and how forward-thinking awarding bodies can make use of existing, reliable technologies to make gains and lay the groundwork for more efficient processes as they develop.
Training AI to behave ethically is not easy
AI is only ever a product of the materials it has been trained on and, in our research, it has been shown to repeat and reflect societal biases:
“An AI system could treat some groups of people more favourably or discriminate against them, based on characteristics such as sex, ethnicity or religious beliefs.”
– Cesare Aloisi, Head of Research at AQA
This would then represent a step backwards when it comes to generating questions for high-stakes assessment, as organisations are actively working to improve the accessibility and fairness of tests.
The problem of intellectual property and AI-generated content
The problem of ownership in AI is currently a very grey area legally speaking, and the complication is twofold.
Firstly, there are currently ongoing, unresolved legal challenges around the use of the data which AI models have been trained on, as the majority available currently have had unrestricted access to copyrighted resources from across the internet.
The question is, are the original authors of materials used to train AI due compensation for or protection from their work being used? Do they have any claims over the products of AI which clearly match their style or, in some cases even reproduced their work verbatim?
If an AI tool learns from and generates material which is similar to the existing materials it was fed, an organisation could be vulnerable to challenges of intellectual property theft. Legal cases are currently testing this point and are as yet unresolved. Equally, if a model is trained on only the data owned by an organisation, it may be too limited to generate a truly useful end result.
Secondly is the question of who owns the materials generated. This is, again, an undeveloped legal area and can vary from country to country, therefore assessment organisations working internationally will have to be confident that their materials are adequately protected in every region they work in before using AI generated materials there. Without this assurance organisations could potentially lose control of their work after publication.
Until a substantial number of these cases have been resolved, and the proper legal assurances are in place, the use of AI for content generation is likely to be a risky strategy.
AI Output is Unreliable and Insecure
“Newer generative AI technologies can’t tell fact from fiction, and often repeat falsehoods.”
– Cesare Aloisi, Head of Research at AQA
Having AI generate assessments has the potential to speed up the creation of varied tests, however they can often create questions which look right, but are incorrect, unverified or even use entirely invented sources.
As accuracy is such a vital part of any high-stakes assessment, this is a big hurdle to being able to introduce AI into a production process.
The methods AI uses to generate these answers also pose inherent security risks too, for several reasons:
- If questions are produced through AI, there is no guarantee of their originality. Some could be taken wholesale from places they already exist, where they are already being practiced on by candidates.
- It is possible for people using open AI tools to regenerate answers it has previously provided, this means that any question you generate for a test would not be exclusive to you and could in theory also be accessed by anyone else using that same AI model.
- Popular AI tools also learn from the inputs given to them, so any information you provide in asking questions could be reproduced for others. For example, if you ask an open AI to produce distractor answers to a question, that question will be fed back into the algorithm for future users to find.
The AQA Approach
At AQA we have started by further increasing the efficiency of our question writing in a way which builds quality and validity directly into each test, without the risk currently inherent to AI.
We are just at the outset of extensive research into the ethical use of AI in assessment, however, one guiding principle we are confident in is to always keep a human in the loop.
This need for extra human checks hasn’t stopped us from pushing forward with innovations.
Our question generation processes now involve significant levels of automation, rather than AI, and this has helped us to attain most of the benefits AI can offer, without the risk of using incorrect or even plagiarized content.
We collaborate with thousands of authors, reviewers and approvers in our processes, but have significantly cut down on the admin and complications this would usually entail by centralising and modernising our workflow and paper versioning in GradeMaker Pro. This streamlining and digitizing of the authoring and review process have given us a significant boost to the efficiency of our paper generation, without any compromise to the validity or reliability of our assessments.
Find out more
If you want to see how we make it possible to take advantage of automation that keeps humans in the loops at all stages, book a demo to see the system in action: