Assessment for gamers

Dr Simon Katan, Director of Undergraduate Studies, recently published this Medium post on the gamification of learning. We reprint it here.

When I was 14 I took my French GCSE. My French teacher, Madame Percival, had studied the exam and mark scheme intricately. She taught us just the right grammatical constructions to achieve the highest marks. ‘Au bord de la mer’ was worth a particularly high number of marks, so we were all to use this phrase as much as possible.

In my exam I stitched together these little memorised passages to score maximum points — ‘Hier soir, je suis allé à la crêperie et j’ai choisi une crêpe au jambon mais c’était trop salé, j’ai décidé d’acheter un verre d’eau … oh yeah … au bord de la mer.’

I got an A. I can’t speak French. I’m not proud.

Such behaviour is called ‘gaming the system’ and it is the scourge of university lecturers everywhere. In Introductionary Programming it takes numerous forms ranging from the pragmatic to the malevolent. Some examples are selectively attempting assignments to achieve a minimum pass, reverse engineering projects around grading criteria, abstaining from programming roles in group work, exam cramming, manipulating teaching assistants to write their code, superficially adapting copied code, and sharing exercise solutions on Whats App groups.

The problem of course with these behaviours is that they result in poorer learning. Whilst we can adapt our assignments to close loopholes and can reprimand and inform those we catch in the act of gaming, a significant portion of students persist. Achieving a minimum pass at introductory programming with scant knowledge of how to program a computer, such students find themselves facing increasingly insurmountable challenges as their course attempts to scaffold on faulty foundational knowledge.

I’m in no doubt that this scenario is a major contributor to Computer Science’s status as the worst performing subject with regards to undergraduate non-continuation in the UK.

This isn’t the fault of our students. It’s not surprising that they game the system in this way. Our students are gamers. Outside of education, they have been brought up on a rich diet of commercial video games informed by forty years of industry experience in optimising for maximal engagement.

This cultural fact is immutable, but we needn’t view it negatively. Jane McGonigal says that “… when we’re in game worlds, I believe that many of us become the best version of ourselves — the most likely to help at a moment’s notice, the most likely to stick with a problem as long at it takes, to get up after failure and try again.”

Viewed through the eyes of our students, university degrees look just like games; they have rules, challenges and goals, points, levels and competition. The problem is that they’re bad games. Just imagine if computer games functioned like exams. Super Mario Bros with no feedback, no stats, no lives, and no replays probably wouldn’t be so much fun.

It is with all this in mind that in 2016 I first began work on developing gamified assessment for teaching programming rudiments. Over three years, in collaboration with my colleague Edward Anstead, we have developed Sleuth, a series of film-noir gamified code puzzles. Students access Sleuth via a web-app themed as a detective agency ‘Sleuth & Co’ in with them playing the character of a fledgling detective. They are guided by ‘the Chief’ who gives them feedback on individual puzzle attempts as well as their general progress in the game. You can give it a go here .

Our design uses procedural content generation to provide students with as many opportunities to practice their coding rudiments as possible, and uses simple game dynamics to encourage them to do so. A key feature is instantaneous feedback. Students can upload a puzzle attempt at any time for immediate grading and feedback from the Chief. Students get five goes at solving a particular puzzle before the Chief suspends them from the case and prescribes some cool down time.

However, such suspensions have no penalties attached — we want our students to try and fail as many times as they need to achieve mastery. On their return from the cool down period , students are presented with a procedurally generated variation of the puzzle which they can attempt afresh. As students progressively solve puzzles, so they see their score increase in real time. In negating any opportunity for over-estimation of their performance such mechanics put students firmly in control of their final grade.

Sleuth case 302: Bank heist

The environment we have created has undoubtedly fulfilled its primary aims of motivating and facilitating practice. In the initial on-campus run students made a total of 42534 code submissions — an average of 138 per student over a ten week period. Despite perceiving the task’s level as between fair and difficult, the class’ achievement was high with an average grade of 90.67%. We’ve now had over 2500 students play Sleuth both on campus and online with Coursera, and we’ve had similar responses across the different scenarios.

However, Sleuth has also engendered some unexpected and somewhat dysfunctional behaviours. Sleuth has a partially open level design which allows students to progress in a non-linear fashion. In designing this we imagined students setting aside levels which they found difficult, and returning once they had built confidence on other levels.

Contrary to our expectations, students make little strategic use of this design. The majority progress doggedly in sequence often at the cost of many failed attempts at the harder levels as they fixate on individual problems. The resultant frustration finds its expression through increasingly angry VLE forum posts as deadlines loom.

This obsessive behaviour also carries over into attitudes about grades. Much to our surprise, despite a pass threshold of 40% and a first class threshold of 70%, around a third of students expect, indeed demand, a grade of 100%. I have found myself dealing with student demands for deadline extensions to increase their grade from 85% to 100% and late night angry emails from students with a grade of 97% who can’t solve the final puzzle.

This is borne out in the final grade distribution which, as opposed to being normally distributed or bimodal, peaks sharply at 40%, 70%, and 100%. We could characterise the students at these peaks as being respectively pass-orientated, grade-orientated, and game-oriented.

All of this raises quite a few dilemmas for Edward and me in where to go next with Sleuth. Our encounters with obsessive behavioural patterns might tempt us to iterate on our game design to engineer them away, but in doing so do we risk robbing students of an opportunity to develop autonomy ? How as pedagogues can we be sure that the behaviours we are engineering are more desirable than others ?

Similarly the strangeness of our grade distributions could lead us to raise the difficulty of some puzzles or the use of other available metrics to improve grade differentiation, but what pedagogical or behavioural improvement would such a change serve ?

For me, such questions expose the contradictions at the heart of current approaches to assessment in higher education. Game mechanics are a powerful tool for motivating shifts in student behaviour, but be warned, such power has disruptive potential.