This week I met Glenys Stacey, the CEO of OfQual. She’d invited me to share views about aspects of the examination system. I had previously met Amanda Spielman, the Chair of OfQual – where she offered advice on our Baccalaureate model. OfQual leaders are keen to engage in dialogue with people in the profession, not least so that they have opportunities to explain their work. (Some of my ideas about exams are summarised and compiled in this post: Exam Reform: A blog manifesto. )
Some of the things I’ve had a chance to discuss include:
- The issues surrounding marking and grading; error margins, variation across subjects, cliff-edges, the merits of points or grades systems; the variation in grade widths within and across subjects
- The statistical processes used to ensure comparability between different Boards, across subjects and between years
- The mechanics of grade inflation and possible solutions to tackle it whilst still allowing for system improvement.
- The nature of the appeals system, blind re-marking, research on the quality of marking, the inherent variation in judgement-based marking; tech-innovations in marking systems.
- Recent specific issues including successful changes to Science GCSEs, the growth and diverse nature of IGCSEs, the problems with GCSE English in 2012, recent difficulties around Speaking and Listening, multiple entry, reform to GCSEs, tiered papers.
- The evidence for discrepancies between schools in the way they approach teacher assessed components; the impact this has on grade awards and the extent to which this can be controlled
- The role of OfQual in working with Exam Boards and the DFE, maintaining independence, having a technical role but also a public profile, issues around communication and public understanding of examination systems.
- The tension between envisaging an ideal system and establishing one, given the need to maintain public confidence during the transition; the political realities surrounding reform and the limits this places on making radical changes.
It’s all quite geeky but hugely fascinating. There is so much more to know about how the system works and how it could improve. Given its critical importance to schools and students, it strikes me that not nearly enough people truly understand it. We’re often shouting at the injustice issued by the Black Box when, if we knew what was inside, we’d take a different view.
Thoughts about OfQual:
- Regulating the exam system is a massive job: trying to ensure that A*s, Cs and Es have parity across every subject and exam board, referencing different specifications, modes of assessment and subject matter is an almighty task.
- OfQual people are motivated to ensure fairness and rigour in the system and deal with this in a highly technical way. It works closely with the four main exam boards and has links to the DFE but I believe that OfQual is more independent than people give it credit for. No-one tells OfQual to depress grades or halt grade inflation. They arrive at these conclusions based on technical advice from assessment experts. There’s no-one on the grassy knoll, so to speak.
- OfQual has access to a mountain of data. They have information about the characteristics of every exam in every subject set by every Board. They act on this information if their analysis suggests that standards are not being maintained or if significant anomalies emerge – which does happen.
- When problems are detected at a national level, they seem to prefer to act quickly so that technical discrepancies are resolved - in OfQual’s view so that unfairness does not persist – rather than set longer time-frames for changes to come into force, allowing existing qualifications to run their course. Political implications are not their concern and communications issues are not their highest priority or strength. I’d suggest it would be better for all concerned if major mid-course changes were communicated such that Heads and teachers could engage with the rationale – but perceiving the changes as an injustice is wide of the mark.
- I’m pretty sure that if we had access to the same information about, say, English GCSE grades overall or the scores for Speaking and Listening, we’d all be able to reach the same technical conclusions as OfQual. I believe Glenys when she says that there are major inconsistencies in the way different centres award marks for Speaking and Listening. If she says it is so serious that S&L marks can’t be allowed to form part of the overall grade awards, we have no basis to argue: she’s saying that because of the analysis, whether we want to hear it or not. Her rationale: it is simply unfair for some students to gain more credit than others to the extent that is shown in the data, for performance that does not warrant it. We can argue about process, the timing, the value of speaking and listening in the English curriculum but I don’t think we would argue with the evaluation of the data.
- Ofqual has too many issues to contend with at a national level to focus very much on resolving local issues with centres and exam boards. Their main interest is in issues that have system-wide consequences; it is unrealistic to expect them to get involved where there are consequences for individuals, case by case. They can’t do this with the resources they have and, to a large degree, rely on Boards to do their jobs well at this level. The appeals system is on their agenda as is the quality of marking.
Further thoughts about exams and exam reform.
We tend to assume that exam results are absolutes, to far too high a degree. Here is a bombshell: in papers with extended writing, there is no one correct mark. Marking is judgement-based; it is widely accepted that two or three markers could give a script different marks. None of them is necessarily wrong. This has conceptual implications for many of us; we are looking for absolute standards that are not present.
Grade boundaries should shift for any given exam from year to year; that is how we get stability in the system and some sense of consistent standards. When people say ‘well, the grade boundaries shifted’ as if to indicate some form of injustice or interference, they’ve misunderstood the process. Year to year, grade boundaries have always shifted as a means of linking standards between papers. It is a result of applying routine standardisation processes based on a statistical distribution. We must not tell students that mark X will get you grade Y – that’s not how it works until after all the marks are in.
Although there is a checking mechanism used by which subject experts seek to ensure grades broadly agree with some absolute standards as part of the overall process, grading is primarily a process of dividing the cohort into attainment bands. Grades say more about relative performance in a cohort than performance against fixed standards, even though the subject specialists have input into decisions around where exactly a grade boundary should be set. Conceptually, we need to be more explicit about that. Exhortations to narrow gaps or secure continuous improvement or meet floor targets indicate a fundamental lack of understanding about exams and grading. Norm referencing isn’t an evil thing; it is how our exam grading works at a core level…we may have lost sight of that.
Genuine, deep-level improvement in learning as a result of improved teaching and changes to a school’s ethos, leadership and management, is slow and steady. At a national level, this is likely to be very gradual if it is happening – ie if English children are getting better educated over time. Rapid change at school level or national level should be treated with scepticism and subjected to scrutiny. It suggests that either things were extremely bad in the past; that the profile of the learners has significantly changed on intake, or that the scale of improvement is an illusion; we’re simply seeing schools do better at training students to pass exams. However, grade inflation is a structural consequence of aggregating ‘benefit of the doubt’ on margins of error, year after year; it’s not because of ‘cheating’ or ‘dumbing down’.
In order to establish a system where improvement in real standards could be shown at national or school level, we need a much more stable baseline of measures – ie free from perverse incentives, gaming and multiple entries. We also need a radical change in the culture that labels every low grade as a failure. Given that not all learners can gain the highest grades, we must develop a system that gives much greater weight and value to non-examined elements of a young person’s education – cue our English Bacc model or something like it.
I recognise that in the current political climate many of my exam reform suggestions are pie in the sky. They suggest too much change at a time when we’ve had enough already. However, we should be able to set out a 10 year plan for reform that cuts across the election cycle… so we aim in the right direction.
I’m certain that accountability reform needs to precede fundamental exam reform. If ministers continue to insist on using blunt data instruments to hold us to account based on exams that are not designed for that purpose, we’ll never get the level of intelligent behaviour and integrity in the system that we need.