Exams have finished; it’s all over bar the shouting. Students have hung up their pencil cases and we’re all waiting for the verdict; judgement day. It is extraordinary how much rides on this process given how creaky and precarious the whole enterprise feels at times. For schools there is the ritual farce of single-measure rankings in the newspapers (it’s optional so we’ve stopped playing that game) and then, several months later, the opaque algorithms that constitute RAISEOnline are unleashed threatening either OfSTED death or OfSTED glory aka “Sig – ” or “Sig +” . More importantly, for students it is the Door Open / Door Closed decider; a grade or two here or there can be life changing. It matters to teachers too; I know a lot of teachers whose self-esteem and professional pride depend on their class’ results to a ludicrous extent; each dropped grade a knife through the heart (I am one of them).
Exam results change lives; they close schools; flatter schools, they create and destroy reputations and sieve us all out into career and life paths, the sheep and goats, chiefs and Indians, wheat and chaff, town and gown…( too much? Ok. But you get the point). So, the system had better work then hadn’t it. But does it?
Even though KEGS is without doubt a ‘winner’ in the examination machine (ie, yes, it is alright for us and it largely works in our favour), there are a number of under-exposed features of the system that I have some difficulty with, based on various real cases. The extension being that, if we’re not content here, it must be worse for virtually everyone else…which is wrong!
Fundamentally, the big illusion, the national con, is that exams are about standards. They aren’t; they are all about statistics. Norm referencing, levelling and grading give the impression that objective standards are being met but actually they are just a crude large-scale ranking system. For example last year on one of our PE GCSE papers, 6 marks out of 80 spanned A-D. The idea that an A student was really much better at PE than a B student on this paper is untrue. The margins are tiny. And don’t tell me your internal assessment of 6c vs 5a is reproducibly reliable..how can you tell? The business of morphing raw marks into grades and levels creates hard boundaries that distort the meaning of real learning. Missing a grade by one mark puts you on a par with someone who just scraped into the same grade range. Instead of simply getting marks, we get a scrambled substitute. Is this because marking is so inconsistent or subjective that exam boards only have confidence that ranges of marks are broadly the same.? Or is it that we are obsessed with categorising people – and a large range of numbers along a wide scale just doesn’t satisfy our craving for simple labels. A,B,C vs 168, 127 and 111 out of 180. Scores at least have some meaning; grades are always arbitrary but still we dish out those letters. From time to time, due to some statistical tinkering, a whole cohort in a subject goes belly-up with all the A*s downgraded to As – because the cut-off mark gets changed; all sense of real standards gets lost. (Imagine the Olympic committee deciding that 2.35 metres in the high jump will now we called 2.25 metres.)
This process applies to exam grades but its validity is stretched beyond all credibility in the convoluted construct of ’3 levels progress’. A progress measure may be desirable but measuring the gap between two highly manipulated metrics (eg KS2 level to GCSE English grade) has a massive margin or error; yet these errors are wrapped up into a single data point that can make or break a school. The idea that X levels of progress comes close to a reliable, meaningful measure of actual progress in the command of English is highly debatable. (My daughter’s KS3 levels go up and down like a yo-yo depending on who teaches her; it has little to do with her performance). Of course, across the nation, the national data will fit the bell-curve perfectly…it is built so that it does. But for a school sample, a class or an individual there is a cacophony of statistical noise….. those error bars on the RAISE graphs are worth looking at. A positive difference could well be a negative one within the margin of error. (I’d actually argue that an NC level is almost totally meaningless in terms of defining a child’s ability because learning is too complicated – don’t get me started on sub-levels…so to talk of 3 levels progress pushes me over the edge somewhat……which is why, at KEGS, we have made up our own system)
At KEGS we have some subjects where the greatest unknown in the whole process is the quality of examiner marking. English at GCSE and A level is particularly afflicted in this regard and every other year we have some scandal or other to contend with. Marking is subjective and error prone.. and there is no process within ‘the Code’ (the tome that governs the whole thing) that requires Exam Boards to examine trends in the results from one centre from year to year. So, when (as we had recently) a cohort of students drops from 75% A/A* at AS to 23 % at A2, they will fight to the death to say the students all simultaneously had a disaster rather than admit that, just possibly, someone made an error in the marking. (We have the scripts – solid physical evidence – and still they won’t budge.) The appeals process is quite literally Kafkaesque (geddit) and, ultimately, after two face-to-face hearings and many unanswered questions after writing to OfQual, we have given up on AQA English. We’ll take our chances for one more year but from September we’ve decided to adopt the Cambridge Pre-U; at least we feel we can trust them with the marking. Fundamentally, the exam system cannot cope with the number of exams it is running; there aren’t enough markers of the right quality to cope with it all – so mistakes happen and overly rigid mark schemes have to be written so they are idiot proof; the trouble is that lots of candidates in lots of schools are too clever and they write answers that don’t appear on the markers’ list…. car-crash.
Breadth vs Depth
There is no proper measure of breadth in the system -at least not in the headlines. We don’t deal well with this. By narrowing down to 5 subjects as a core measure at GCSE, solely for the purpose of benchmarking and league tables, and 3 subjects at A level – to fit the demands of the UCAS system - there is little value given to a broader curriculum. Is it better, for example, to achieve AABB in four A levels, than AAA in three? Who is to say. Similarly, with separate sciences, is BBB or ABB in a broader, more challenging course not better than AA in Double Science? Of course the comparisons are meaningless – just like all equivalences. It doesn’t even make sense to equate a standard in English to one in Maths – except on the basis of a national ranking. Why then, do we have to add these things up and make so many foolish league tables ranking schools by %A*-B at A Level or A*/A at GCSE? At A2, AABC is 75% where AAB is 100% – does that mean that the extra C in an enriching, demanding course is worthless? At KEGS, average total points is our favoured measure (because, of course, this is where we do best!). Here, every single grade counts and that seems reasonable. In truth (oh so obviously), one measure alone does not give you a full picture. Hello Editor of the Daily Telegraph/Guardian/Independent/Times…. stop the nonsense of publishing one column of figures. It is an insult to us all that all the endeavours of the students in a school can be summed up in one piece of data. Pure nonsense…. but still the game is played. Let us at least have a four or five: %5A*-C, %5 A*-A %, average total points, best 8 average points, a value added measure and so on…. give us the full story -every time. All or nothing.
What do exams measure anyway?
Of course, all this is built around a basic assumption that exams are valid in the first place. Nationally, we don’t debate this enough. At my school where admissions depend on boys’ rankings on the 11+ we are required to discuss testing at length; doing well on a test is not necessarily the same as being good at a subject and we regularly examine the nature of our tests. It is interesting to note that the rank order in English, Maths and Verbal Reasoning are very very closely related. This means, in part, that doing well at exams in general is what you are testing. True ability (if that exists) or understanding are related to test performance but they are not the same. Another example is that we often have students joining our Sixth Form with an A* in German at GCSE who cannot use German as well as others from within our school with Bs. Why? Because we seem to do less drilling for precision and more learning for its own sake. We don’t teach to the test – at least not as much as they do elsewhere.
So, if we would all accept that exams only measure ability in some narrow sense… why do we treat exam outcomes with such reverence?
We can’t cope with detail
Every year, Heads will be at pains to point out all the individual success stories that underpin the aggregate results. Sometimes, it is an individual tragedy. Last year two of our students suffered serious mental health issues – one following bereavement; the other due to his Asperger’s condition. They didn’t take any exams. So, with less than 100% 5A*-C for the first time in years, the headline was ‘KEGS drops out of top 200′. They did report the reasons but still, on a continuing basis, I have to re-assure people that we are not on a giant downward spiral. The headline planted the seed of decline; for some that is now a fact. Of course, we are in a privileged position and we’ll cope; but the same system, the same narrow view of exam results, sends other schools down the pan…. We all know that exam success, defining learning, measuring performance, designing exams, deciding on grade boundaries are all massively complicated. By wrapping it all up into a few simple grades we create the illusion of simplicity – and this leads to significant injustices year on year. Really it is all a house of cards….. and we should be honest enough to admit that so that one day education can escape the shackles of the league tables and the curse of the Sig minus.
Ok, rant over. If we do well, I’ll read this and smile. If judgement day yields something unexpected or disappointing, at least I’m ready with my excuses!! Good luck everyone. Unfortunately, we can’t all do well – the system won’t allow it.