Intro. [Recording date: January 6, 2025.]
Russ Roberts: Today is January 6th, 2025, and I want to remind readers that voting for your favorite episodes of 2024 closes on February 9th, so please go to econtalk.org to vote.
My guest today is Daisy Christodoulou. She is the director of education at No More Marking. She has a Substack with that same name, No More Marking, and she has written three books on education. Her latest book, which is our topic for today, is I Can’t Stop Thinking about VAR. VAR, in this case, is the Video Assistant Referee, a technology used in football–what Americans call soccer–but we’ll call football throughout this conversation. And, VAR was added to the game to improve decisions made by referees about fouls, offsides, other matters.
And this book that you’ve written, Daisy, is on the surface about football, but it’s really about much more than that. It’s about something much deeper. It’s about our endless human desire for perfection, the challenge of designing policy and rules that actually achieve what we want. I love this book. It actually captures what I think of as the essence of the so-called Coase Theorem, which we may get to later. Daisy, welcome to EconTalk.
Daisy Christodoulou: Fantastic. Thanks for having me, Russ.
Russ Roberts: When was VAR [Video Assistant Referee] introduced, and why has it been controversial? Isn’t it a wonderful thing to have more accurate decisions? Isn’t that more fair, more just, and leads to more accurate outcomes?
Daisy Christodoulou: Well, definitely, no one would argue with any of those things. We all want more accurate decisions in sport and in life. And, in fact, before technology was introduced to football’s decision-making process, the thing you heard people say all the time, is that we just want more right decisions. We just want more right decisions.
And, they would say the livelihoods of players and managers depend on more right decisions. And, if you get a really bad decision at a crucial moment–football is a billion-dollar industry, there’s so much money riding on these decisions.
So, VAR, which was the system to get more right decisions, had a bit of a staged introduction. It was first used–it was used in the World Cup in 2018, and then it was introduced in the English Premier League, which is the league I watch. I follow a team. We’re currently in the Premier League, and which is the world’s most-watched league. It was introduced there in the season 2019-2020, which obviously, the second half of that coincided with COVID, with all the stadiums emptying, everyone watching on TV at home.
And so, kind of in a way, sort of appropriate it was introduced in that season, because it felt we were all watching everything through a screen. There were no crowds in the ground anymore.
And so, it’s been around for a while now–been around for quite a few seasons. And it’s been enormously controversial, so much more so than I think, certainly, anybody anticipated, and much more so than the similar systems that you have in other sports.
So, American football was one of the first to introduce a system like this. I don’t know lots about American football–you’ll know a lot more than me–but my understanding is it’s been fairly well accepted as part of the game.
And, you’ve got equivalent systems in cricket and rugby, which work very well. Tennis, you have the Hawk-Eye System, which has worked so well they’re actually getting rid of line judges in a lot of tournaments.
But, football just seems to be this outlier. It doesn’t seem to be working. And they keep making tweaks to it and it still doesn’t work. The tweaks kind of make it worse.
So, it’s been this really interesting case study to explore a lot more issues around technology, progress, authority, transparency. It’s thrown up all of these issues.
Russ Roberts: We had a recent episode with Emily Oster on healthcare. We talked about the importance of nuance. And, nuance is about giving people information and recognizing that some things are complicated. And, what I thought after reading your book, is VAR overemphasizes nuance. Using it is to say that ‘Well, close enough isn’t enough. We’ve got to do better than that, so let’s get it right.’ And, why is that so problematic, the desire to get something right?
And, I would just say, by the way, in American football, the introduction of electronic replay–the big issue there is whether something is actually a catch, which is very similar to the issues that come up in European and world football, because we all know what a catch is, and we all know what a goal is. We all know what a handball is. When we see it, we know it. And, yet, once we get down to these details of making sure, absolutely sure, somehow it gets harder, not easier.
Daisy Christodoulou: Absolutely. So, I listened to that episode with Emily Oster. I’m a big Emily Oster fan. And, I think actually, some of the things you said about the challenges of being nuanced in discussing complex topics like public health, I talk about them in a chapter called “Transparency”–maybe we could come back to that.
But, just on that issue, the question you’ve asked there, about why is it so problematic to want more right decisions? I think first of all, another thing you talk about a lot in this podcast–you know, I’m a fan–is trade-offs. And trade-offs are inherent to thinking as an economist. And, I think for all of that talk about more right decisions–‘We want more right decisions. We want more right decisions’–what we realized when VAR came in is actually that is not the sole thing we want to optimize for. That is actually not the only thing we care about. We care about so many other things, some of which we hadn’t realized we care about.
So, one of the issues is: some of the checks, some of the time it takes to make these checks, it’s a really long time. So, you get these decisions where they’re scrutinizing replays for five or six minutes. And, that’s a big difference with American football–and cricket–because those are sports that have more natural breaks in play. Football [soccer] is not like that. Football is very fluid, and I don’t think any of us pre-VAR appreciated how much the fluidity and the spontaneity of the game really mattered. And, that is actually potentially something we should be thinking about, and we are trading off against that.
And the biggest tool people have is you have these–football is a very passionate game. It’s the origins of the working class sport in England. You get these big, big crowds. And, one of the joys of watching football–and I have a season ticket; I know what this is like–is people leap up in joy when a goal is scored. And, you now have the situation where they do that, and suddenly they go, ‘Oh, no, no, no, no. We’ve got to check it.’ And, you have this three-, four-, five-minute pause sometimes, and the players are standing around getting cold.
So, that’s one thing. There’s a trade-off in there.
And then, the other thing is: It’s not just the trade-off, but even when you’ve done these five-, six-minute checks. People are not convinced that the results are the right one. And that’s something we could explore a because a bit more as well.
Russ Roberts: It’s kind of shocking. As I said, you’d think–well, you look at it, and of course, in modern technology, of all these different angles, often to evaluate and assess. And, sometimes, when you look from all those different angles, and you take five or six minutes, you’re still not sure.
In American football, what happens is that while that five or six minutes is going on–they’re talking to an expert–in New York usually–who is a former referee. And he explains what the right decision is. And, a lot of times, it’s not what they decide. And it’s infuriating to the team, obviously. They’ve got their good fortune, or good play reversed.
But, as you point out–and I think this is a great insight–there aren’t that many goals scored in football [soccer]. So, when one of those is reversed, the dopamine surge that you enjoyed is now shot.
Daisy Christodoulou: Absolutely. So, that is one of the things about football, that it’s very hard to score a goal. So, compared to other sports, the goal really matters. And, you get one reversed, it’s a really, really big deal. And, yes, this point about scrutiny, and this point about when you apply more scrutiny, you don’t always get more clarity: you actually potentially get more confusion.
And, I think you kind of touched on this a little bit in that Emily Oster episode, which I thought was a fascinating discussion, and I actually wrote down one of the things that I think you or Emily said. She said that public health officials, they can erode trust by being unwilling to accept uncertainty. And, I think that’s true. But I also think the reverse is true: that you can erode trust by engaging too much with uncertainty.
And, as you say, if you are watching a match, and you see an expert on the TV pronouncing various, ‘This is absolutely the right answer, and this is what this decision should be,’ and then the expert on the pitch disagrees with them: That kind of uncertainty is really corrosive. That you’ve got two people in positions of authority who should be agreeing, disagreeing with each other. And, we’ve seen exactly the same thing with football.
So, one of the TV channels for the tournament in the summer, employed–interestingly it was an American referee–they employed her to discuss the big decisions in the breaks. And, there was one very high profile decision where she completely disagreed with what the on-field referee and the VAR ended up doing. And, this was a very high-profile match that everybody was watching. And it was just incredibly corrosive for the authority of the officials.
So, if you have these situations where–and I had a long chapter where I discussed transparency–if you have situations where the transparency is exposing some genuine flaws in how the system works, this is quite problematic.
And of course, what you hope is that the transparency is shining a light that will lead to improvement. That’s the justification for transparency: that we can see the problems and we can solve them.
But, one of the things I discuss in that chapter–and I am in favor of balance for transparency, but I think we have to be honest that unless you can use the transparency to get improvement, potentially your first step is you’ve made things worse.
Russ Roberts: Yeah, you’re right. I don’t want to miss this quote–it’s one of the great EconTalk quotes of all time that isn’t about EconTalk. You quote Randolph Nesse on the topic of tradeoffs. You say–this is Nesse:
The body is a bundle of tradeoffs. Everything could be better but only at a cost. Your immune system could react more strongly, but at the cost of increased tissue damage. The bones in your wrist could be thick enough that you could safely skateboard without wrist guards, but then your wrist would not rotate, and you could throw a rock only half as far. You could have an eagle’s ability to spot a mouse from a mile away, but only at the cost of eliminating color vision and peripheral vision. Your brain could have been bigger, but at the risk of death during birth. Your blood pressure could be lower at the cost of weaker, slower movement. You could be less sensitive to pain at the cost of being injured more often. Your stress system could be less responsive at the cost of coping less well with danger.
That’s the end of the quote of Randolph Nesse.
You go on to write:
There are a lot of tradeoffs in other walks of life. Indeed, you can make the argument that all solutions to any problem, technological or otherwise, are not really solutions, but tradeoffs.
And, of course, that’s one of our mottoes of this show, ‘No solutions, only trade-offs.’ Along with ‘It’s complicated.’ And they both apply incredibly well to these issues. You want to add anything about trade-offs?
Daisy Christodoulou: Yeah, absolutely. So, trade-offs are absolutely central to this story. And, as I say, I think what we didn’t realize, there were all these things that we did like about football, and we didn’t realize that introducing a technological review system that they would be intention with these other things, like simplicity, spontaneity, the flow of the game. I think that’s really important.
The other thing I would say about trade-offs is–like, I want to say this–I go on to say this in this chapter that you quoted from, is that what you want to do, and if you’re mathematically modeling this, you obviously want to put your parameters into a spreadsheet, you want to draw the curve, you want to optimize. It’s based under the curve.
Obviously, there are some parameters we don’t really have metrics for, like simplicity.
So, what do you do when you don’t have a number for it?
And then, the other thing is, is that you can’t keep trading off limitlessly. And often, what you’re trying to do is get the best of both worlds. So, I talk about essentially, the conflict, the trade-off at the heart of–a lot of decision-making is consistency versus common sense.
Russ Roberts: Talk about that.
Daisy Christodoulou: That’s a fundamental trade-off.
Russ Roberts: Yeah.
Daisy Christodoulou: Yeah. So, we want a kind of common-sensical approach to rules in all walks of life. We don’t want absurd outcomes. But, if you do that, you then have to really allow individual decision-makers quite a bit of discretion. And, when you allow discretion, what you get is a lack of consistency. So, because you have individuals operating in different ways, they will be inconsistent. And you have that in football.
And, before VAR, the thing everyone would do, is you could align up lots of videos of handballs, and you could say, ‘Well, why was that given as a handball and that wasn’t? That was inconsistent.’
And then, once you get inconsistently, you get people have an accusations of bias. And, the same happens in the justice system.
And, the reason why most countries have introduced some kind of sentencing guidelines for judges, is, if you allow judges to completely have discretion over the sentences that they give someone, then often they will respond to the particularities of the case, but you’ll often get very, very inconsistent decision-making, and you will see one person getting 10 years in prison for something that someone else gets a slap on the wrist for. People will say, ‘How can this be possible?’
So, you have this tension between consistency and common sense. And, as I say, I think you’re trading off.
And, because they’re not things you can really put a number on, and you can’t plug it into a spreadsheet, as I say, and actually literally optimize for their space under the curve–what you are doing instead, is you’re kind of trying to do your best.
And, the problem you can do, is you are aiming for the best of both worlds, but the worst outcome is when you get the worst of both worlds. And, I talk about that in the evolutionary biology analogy as well.
I talk about optimizing race horses for speed. So, this is something where race horses has been bred and bred and bred over time to get faster and faster and faster. And, the trade-off–as Randolph Nesse is talking about–the trade-off is their bones get lighter. And then you get to the point [?], as I say, where their bones break and they have zero speed. So, you’ve ended up with nothing.
And, the thing I say is: ‘You’ve given up everything for nothing.’ And, I think that’s what a lot of people feel about VAR now. They feel like, ‘Well, we were promised better consistency and we haven’t ended up with better consistency. We’ve still got the issue where people can collect videos of different handballs, and say, ‘These are still–some of them being given as handballs, some aren’t. And, we’ve lost the common sense and we’ve lost the speed and the flow of the game.’ So, it’s like the racehorse with broken bones: we’ve tried really hard to get the best of both of them. We’ve ended up with nothing. And, that’s the situation I think we’re in with VAR.
Russ Roberts: So I think–for me, this was the most profound part of the book. I like lots of different parts of it, and it resonated with me in different ways. But, there was something in here that I’d never thought of, and it comes to this point of certain things you can’t put a number on.
There are also certain things you can’t observe, and one of those is human intention. So, you obviously cannot intentionally hit a ball with your hand into the goal, or advance the ball well before the goal is scored, which is a whole other can of worms that that VAR has opened.
But, you’re allowed to have, quote, “incidental”–I don’t know what the wording is; it actually doesn’t even matter. But, if in passing, in unintentional contact, a kick is made that brushes against your arm with zero intent, you don’t want to stop play for that. But, of course, we can’t observe intent. Not only can we not measure it, we could say it’s a zero-one in this case–most of the times in life it’s something actually not zero-one; we think of it that way. But, I think the most extraordinary insight that you have, is that in many of these situations we’re trying to impose a categorical system–on/off, onside/offside, goal/not goal, touched by your hand/not touched by your hand–using something that is in fact continuous. So, this comes from a essay by Richard Dawkins called “The Tyranny of the Disontinuous Mind.” So, give us Dawkins’ argument and how you’ve applied it to football and this issue.
Daisy Christodoulou: Absolutely. So, I think Dawkins makes a really good point, that a lot of the things we think of as being categorical are not categorical: they’re continuous. So, he gives a lot of examples, again from evolutionary biology. He talks about quite a weird one, where he talks about if we were to go back to your 200,000,000th ancestor, great-great-grandfather, whatever, it would be a fish. But, there’s a smooth unbroken continuity between that fish and you. And that’s quite difficult to get your head around. He says there’s no sort of discrete break, really, where you can say the fish turns to a human.
The other example he gives, which is a bit more every day, and then has practical applications for everyday life is, and very relevant to the discussion you were having with Emily Oster last week, is safety and risk. So, people, they’ll ask scientists, ‘Is this safe, yes or no?’ And, safety and risk are not discrete categories, they’re continuous. And, you saw that really, really so much with, obviously, the COVID vaccines and with COVID in general.
So, what you’re doing with a lot of these really difficult situations, is you’ve got a continuum, and you, in many cases in life, have to draw a line on the continuum, and you have to say, ‘Well, we’ve got to draw the line. The stuff on this side is something, and the stuff on this side is not that something.’ And, that line is very often quite arbitrary.
And, the word ‘arbitrary’–so, the reason I’m really interested in this side, I work in assessment; and these things are hugely important in assessment, and people spill lots of ink about assessment, and they spill lots of ink about the meaning of the word ‘arbitrary.’ What does it mean for something to be arbitrary, where you draw that line? And, because in an assessment you have a similar thing, that student attainment is on the continuum, and we often have to draw lines. We draw lines for there to be grades, and we also draw lines to get into certain programs or not.
So, you will say, ‘We will draw a line, and if you are this side of the line, you can be eligible for this very elite university or college, or what have you. And, if you are this side of the line, you can’t.’ And, that line is often quite arbitrary. And, often, the measurement error of the underlying distribution will mean that two things either side of that line, we cannot say with any certainty that they really are. That, as far as we know, they are actually probably very similar.
Russ Roberts: Or reversed–
Daisy Christodoulou: So, this is a huge issue.
Russ Roberts: Or reversed.
Daisy Christodoulou: Or reversed. Reversed. Yeah, yeah.
Russ Roberts: When you rank a little bit higher above the line, and this went a little below, it’s actually–there’s error, and so–
Daisy Christodoulou: Yeah. So, Dawkins is completely right to talk about the tyranny of the discontinuous mind: that we do view a lot of things that are continuous as categorical.
But, I also say that whilst Dawkins is completely right about that, the opposite–a mirror image–cognitive distortion, is the anarchy of the continuous mind. Which is where you think, ‘Well, because everything is continuous, and because everything is on a continuous scale, well, then there is no difference between one end and the other end.’ And, it’s like, ‘Well, no: there is.’ There is still a difference between a fish and a human. There are some things which really are quite safe, and some things which are not.
So, the real burden of authority is that sometimes you have to draw a line, and you have to hold the line. And, that is hard, because there will be moments where you have to hold that line and you will probably know deep down, if you’re being honest with you yourself, that that line has an awful lot of uncertainty. And, that’s true in so many walks of life.
Russ Roberts: Yeah. And, economists are very sensitive to this, in a way that I think non-economists aren’t. The safe thing: ‘What do you mean there’s continuous? It’s either safe or it’s not.’ And, that’s a desire. We wish that was true, but it isn’t true. Everything that is, quote, “safe” has some downsides, and everything that is–it’s a longing, not a fact.
But, I think the other deep point here–I think there’s a couple of things I wanted to add. Your point is that the line is arbitrary. Coase’s insight, and I’m not going to go into why this comes from Coase [Ron Coase]. People–we’ll link to some other episodes we’ve done maybe that talk about this. But, Coase’s insight is once you draw that arbitrary line, you’re going to change incentives. So, if you draw it way over here or way over here, you’re going to get different behaviors by the different participants.
And, that will have implications for the quality of the football that you watch. It may be more just, but it means that the game might be less entertaining.
And so, that’s the fundamental tradeoff I’ve already talked about. But the other insight–and this, I thought, was also incredibly deep–that you bring, is that: where you draw the line fundamentally has to be a question of language, not measurement. And so, what you’re talking about–say, a handball–it might involve intention, or the phrase ‘Clear and obvious,’ or something about ‘Made a difference in the play.’ And, once you do that, you’re un-moored. You don’t want to be; in fact, that’s why you put the language in there, is to exactly get the right thing. But, you’re fundamentally taking a continuous variable and using a non-measurable metric to decide where the line is drawn. And, by definition, it’s going to lead to discontent.
Daisy Christodoulou: Absolutely. So many really important things there. I think the point from Coase, about where you draw the line will start to then affect the underlying reality–in extremely competitive environments, completely. So, where people are targeting the line, absolutely, it will. And obviously, that happens in football, because that’s literally two teams competing.
But again, where I saw that happening in my world was in assessment, because it’s very important for students to get certain grades, and it’s very important for schools who are being judged for accountability that their students get certain grades.
So, the reason I got interested in this line on a distribution point, is because there’s a number of metrics in the English system, accountability metrics, where schools are judged by the number of students who get a certain grade. And so, the way this transmitted itself, is you would just have lots of schools running really intensive revision and preparation sessions for students just the other side of that line.
Russ Roberts: Yeah.
Daisy Christodoulou: And forgetting about everybody else. Right? And, if you did that effectively, you could really boost your position in the league tables.
So absolutely: where you draw those lines, it may be arbitrary, but it’ll have really, really big real world consequences. Yeah. So, you see that as well; so I think, yes, I would agree with that.
I think the point you’re saying about heights, that’s something measurable. I think the thing I say in the book is: lots of things are measurable. And, one of the things about sort of measurement theory, and I think Lord Kelvin says this in some way, that: even if you can just improve your measurement a little bit, that’s still an improvement. That’s still giving you a better grasp of reality. So, trying to put a number on something can be quite valuable.
But, with the things like height and weight, mass, with these measurements of some of the physical world–temperature is a really good one–we have centuries of really incredible science behind them that have led us to–there is still measurement error, and there is still uncertainty of these phenomena, but we have really reduced that measurement uncertainty down to something that in everyday life really doesn’t matter. And there are all these other metrics we have–new metrics–where I think they’re better than nothing, but they have nothing like the precision of the physical phenomena that we’ve become accustomed to.
And, a lot of these are the inventions of economists. And this is why sometimes people love and hate economics.
So, I give a couple of examples in the book. There’s one that’s used to often decide whether new drugs are worthwhile–so, it’s quality-adjusted life years.
There’s another one–we talked about risks and safety–the micro-mort, which is the chance of something leading to a one in a million chance of death. So, you can measure riding on a motorbike versus riding on a car versus riding an airplane: what’s the micro-mort of each? And, I think these are really useful, and I quite like them, but they’re not as precise as a lot of other measurements of the physical world that we’re more used to.
Russ Roberts: I’m sure I’ve quoted this on the program before, and I think it’s in my book, Wild Problems, but it’s carved into stone, or it was when I was there at the University of Chicago, this quote from Lord Kelvin. It’s not quite the way he actually said it, but this is the way it gets repeated: ‘If you cannot measure it, your knowledge is of a meager and unsatisfactory kind.’
According–I forget where this story comes from–but supposedly, George Stigler was giving a tour of campus to Vernon Smith. Both men would later win Nobel Prizes in economics. And, Stigler took Smith to this quote, and he read it, and he said–he read the quote–which is again, ‘When you cannot measure, your knowledge is of a meager and unsatisfactory kind.’ And, Stigler allegedly said, ‘And, when you can measure, it’s of meager and unsatisfactory kind.’
Daisy Christodoulou: Brilliant.
Russ Roberts: That’s very Stiglerian, always a very funny man. But, it’s of course, very deep as well.
Russ Roberts: So, this whole question of precision–this human desire for precision–is very normal, but I think a lot of times, it gets applied to areas where it does not fit.
Daisy Christodoulou: Yeah. And, I think–that’s a great quote. That was the quote, that was the Kelvin quotation I was thinking of, but I hadn’t heard that follow-up, which is good as well.
Something which isn’t in the book, but which I’ve been sort of speculating on writing about since then, is: Yes, this desire for precision and this desire for accuracy–I think there’s a couple of things going on. So, I think partly it’s the incredible success of precision measurement in the physical world.
Russ Roberts: Yeah.
Daisy Christodoulou: And, I think we should just dwell on that for a second, because it is crazy, the levels of precision. And, it’s got to the point where we take it for granted. The technology that we use in everyday life is dependent on a level of precision that is insane. And, the methods that are used in the really high-tech fabrication plants that they have for microchips–the level of precision they have to achieve and the extent to which they have to go to achieve that, are, in some ways, just beyond belief. And we take that for granted.
And, I think there is an element here, of we take it for granted and we kind of assume then that you can get that in every walk of life, or that it should be simple to get that. And, I think what we forget is firstly, how unbelievably brilliant that is. And, secondly, just all of the work that went into that.
And so, I quote a little bit in the book some things about the development of temperature–early thermometers and the enormous hard work that went into it. And, a lot of the issues that we see–you were talking about with language–with VAR, that one of the first Frenchmen who was investigating temperature, investigating the boiling point of water, he had all these different words used to define the different stages of water boiling. And, I think one of the things we see in the development of measurement, as I say, in the physical world, is the replacement of words with numbers.
Russ Roberts: Yeah.
Daisy Christodoulou: And, when you talk about words–and I am an English literature graduate–and my day job is: the tool we use, comparative judgment, we use mostly to assess writing. And, the terrible thing is, I’ve become increasingly dubious about language and the ability of language to give us the kind of precision we crave, for all of the reasons that you’ve been talking about.
And, I sometimes think–language is not designed to give us this precision. And, in a sense, the history of the invention of number, and the way that number is used, is almost a human invention to give us something that gives us the precision that language does not give us.
And, the person who writes best about the problems of language and the inability of it to give us what we really crave, is Michael Polanyi. And, I quote him in the book; and I know you’ve talked about him on the podcast quite a bit. And he has this concept of tacit knowledge, which is: we know more than we can tell. There are things that we can do and we have as a skill, but we cannot really explain them in words.
And, he gives examples like learning to ride a bike. You could read a book about riding a bike, you could hear someone explain it, you could explain it to someone else; and that doesn’t mean you can ride the bike.
And, that is a very good description of the issues with handball: that it is something that people who watch a lot of football, they all know what a handball is when they see it. My contention is there would actually be a lot of agreement from fans about what a handball is. But attempting to define it in words is incredibly difficult. And so much more difficult than anybody thought.
And, you’ve got to the situation now, where–before VAR, when the rules just existed, something that a referee would just use to interpret with a bit of common sense–the handball rule was 11 words long. Since VAR, and since we’ve applied all the scrutiny to it, the handball law is now 11 times as long. Has that 11-times increase led to any more clarity about what a handball is? No, it has not. There are, if anything, just more arguments about it.
And, this is probably a good example as well of another thing you talk about a lot: Hayek’s “Use of Knowledge in Society”–the difference between legislation and law.
Russ Roberts: Yeah.
Daisy Christodoulou: And, what has happened in football, is you’re now having this very top-down imposition of very wordy rules onto something that I think before, was more of an emergent bottom-up process.
Russ Roberts: Yeah. You talk about the opportunity of players to challenge. One way to solve this problem is to not look at every decision, but to give players, or managers, a chance to challenge a call a limited number of times. That would reduce the number of interruptions.
But, you also see this in playground basketball. Playground basketball doesn’t have a referee. But everybody who plays–and it’s different where you play–but on a particular court where people play frequently, there’s an understanding of what’s a foul and what isn’t a foul. Even though, of course, you can’t write it down. And, it might be much more violent in certain games, in certain courts, than in other locations and other places, but the players enforce it themselves. And, a player who repeatedly invokes a foul for his own advantage is shunned. And, that decision of what is a foul, emerges from the bottom-up of all these countless interactions and the play that they have together.
I want to just say one thing, I don’t–
Daisy Christodoulou: Yeah.
Russ Roberts: No, go ahead.
Daisy Christodoulou: You did an episode about 10 years ago, with Michael Munger, on just this, on different sports and how they police themselves, and often, how a lot of that policing is done from the bottom-up. And, I think not just how they self-police, but I even think a lot of the way that referees apply laws, and officials apply laws, there is an element of a bottom-up tacit knowledge about how that was applied–at least that was the case in the pre-VAR era.
And, one of the things I say, is: what you’ve got with all of the technology in sport, is you have rules that were drafted in–in a lot of the English sports–the 19th century. These rules were drafted in the 19th century, in a completely pre-technological era, really. And, what we are now attempting to do is to graft kind-of 21st century technology onto this system of laws that was not designed for that level of scrutiny. And that’s often where you’re getting the tension and getting the clash.
Russ Roberts: I’ll just say one more thing about tacit knowledge, then I want to segue to something else.
Poetry is an attempt to explain things that we can’t say in normal words–in prose. And, in general, you’d think that would be inferior: an obscure set of words that has multiple interpretations, that some people can’t even access–that that could more capture these ineffable human experiences like love, regret, sadness, poignance, bittersweet. These are all things that prose struggles with. And, we have really good languages, but they’re not enough. And, poetry often gets closer. And sometimes music gets closer without any words, which is maybe ironic, maybe not.
Daisy Christodoulou: No, I think that’s very true. And, just in the way, as I said, sort of non-linguistic ways of communicating meanings, music, mathematics. We’re a wordy culture; and we are, in lots of ways, a legal culture, and the law is based on words. And, as I say, I’m a literature graduate, and my day job is assessing writing, so I don’t want to be too down on this. But, I guess because I run up against it every day, I see the limitations of it as well. I see the strengths; I see the limitations.
I mean, if you want to take, again, an evolutionary psychology perspective, what is the point of language? Is language a truth-seeking missile? Is that its aim? And, there’s a lot of people in evolutionary biology who say, ‘No, that’s not its aim. Its aim is essentially to help you to tell nice stories about yourself,’ or ‘Its aim is to help you to lie.’
So, language–the vagueness and the imprecision of language–is not a bug, it’s a feature. That would be what a lot of people would say. And, maybe I don’t want to go that far, but we can’t just communicate with a language. It does have its strengths, but it can’t do everything. And, I think we’re trying to press it into service in places where it’s just not equipped to do the job.
Russ Roberts: I want to try for you to give the flavor–you have a lot of different potential improvements for this current world we’re in with VAR and the Premier League, and serious football fans can dig into those and make their own assessment.
But, there’s a piece of your suggestion–I would say a piece of your set of suggestions–that’s pretty impractical but extremely interesting. And so, I want to digress on it, and I want to apply it alongside your day job. Why don’t you talk about comparative judgment and how you use it in assessing writing? And then I want to talk about how one might–even though I think it’s a bit far-fetched, but extremely interesting–how you might apply it to, say, enforcing football regulations.
Daisy Christodoulou: Absolutely. So, yeah, comparative judgment is what I do as my day job. So, the organization I work for, No More Marking, we use it to assess writing. So, I’ll just say a bit about what it is. So, comparative judgment, it rests on the psychological principle that as human beings, we are not very good at making absolute judgments. We are much better at comparative judgments.
And, I’ll give you a simple example. If someone walks into the room you’re in at the moment, and I say, ‘How tall is that person?’ That’s an absolute judgment. If two people walk into the room you’re in, and I say, ‘Who is taller, the person on the left or the person on the right?’ that’s a comparative judgment. And, I hope you can tell from that very simple example, that the comparative judgment is just much easier. You are always going to get that right. The absolute judgment is much harder.
If you ask 100 people that, they will come up with different numbers. Whereas you ask them, ‘Left or right, who is taller?’ they’ll get it right. They’ll agree.
And, what you can do, is you can have lots of people make lots of comparative judgments, and then you can use an algorithm to combine all of those judgments to create a measurement scale. And, the person who first developed this algorithm–the law of comparative judgment and the theory behind it–was an American, Louis Thurstone. And, he developed this back in the 1920s. So this is not a new idea. What we have done–and other people have done, what you can do now–is you can plug that algorithm into a piece of software, and you can instantly crunch all of the decisions. And you can start to do very interesting things.
So, what we do with students’ writing is we will put all of the students’ writing into our system. So, let’s say we will run assessments with maybe 100,000 pieces of student writing. And then what we’ll do, is we’ll get the teachers–all the teachers [?actually?]–and we’ll get them to make lots of decisions. So, not just one or two, but lots and lots of decisions: lots of these paired decisions, these comparative judgments, where they’ll be looking at two pieces of student writing, and they will say, ‘Which is better, the piece on the left or the piece on the right?’
And, this is where we go back to the issue about language and about what is quality, is that: before comparative judgment came along, what people would be doing, is they would be looking at those essays one by one, using absolute judgment, and they would be using a rubric, a mark scheme. They would be using a prose description of the quality that that piece of writing should embody.
And the problem with that, is all the problems we’ve talked about, is that the prose rubric–the mark scheme–does not do a good job of capturing what a good piece of writing is. Now, when you do comparative judgment, the teachers can look at the two pieces. There is no rubric. We have one criterion, which is: the better piece of writing. And the teachers can make a professional judgment about what they think the better piece of writing is.
And then–this is the crazy thing, which I never get tired of, even though I’ve never been doing this for nearly 10 years now–is that when you get people to make these judgments, in what seems this incredibly subjective way, with no click list, with no mark scheme, with no rubric, with just one very open-ended criterion, they have very, very high levels of agreement. And, when you get them to mark those essays the traditional way, with the mark scheme and the rubric, they have much lower levels of agreement.
So, you have this weird paradox, in that what feels like an incredibly subjective method of assessment, the data shows it is actually really quite objective. And, the flip side is true: That when you have this very objective measure–seemingly very objective measure of assessment–which has all these tick lists, and you can say, ‘Does it feature this? Does this piece of writing feature that, does it feature this?’ But, when you crunch the numbers, people do not agree at all. You’re very low level, so it’s actually very subjective.
So, we use this at scale, in a number of countries. We use it in a lot of U.S. schools: we work with a number there as well. And it works really well; and teachers like it. And, the way I like to talk about it is it’s almost like a machine or a method for capturing tacit knowledge.
Because people always say–they get worried before they’ve done it–and they go, ‘What are people judging on?’ You know, ‘What are they making their decisions on?’
And, they’re making their decisions using tacit knowledge. Just that tacit knowledge of ‘I see that piece of a student’s writing, and there’s just something that’s maybe ineffably good about it. And, there’s this piece here, which just isn’t as good.’
And, they make those decisions and they agree.
And, the other thing it does–and going back to the point about Coase, about incentives, and where you draw the line causing incentives–is the problem, again, with words, is that when you try and define things in words, you will often get these distortions. And, we have a little collection of all the distortions that the traditional rubric, or mark scheme, causes.
And a classic example–let me give you a classic example. In England, we have a mark scheme, which, it’s a part of the curriculum, which talks about fronted adverbials as being a kind of marker of sophisticated writing. And, a fronted adverbial is really just–it’s when you say ‘Suddenly, I woke up.’ ‘Suddenly’ is the fronted adverbial. It’s an adverbial at the front of the sentence.
And, this has become something that people have–it’s rewarded in the system. And so, you’d think, ‘Well, isn’t that a good thing?’ because it does make their sentences more original, and it does make them sound nicer.
And no, because what happens is people ܀搒Ũ摧㊜cᘀ. Ithaca, New York: Cornell University Press, 1994.Whittaker, Helène. “The Status of Areteach to that. And then you have children using fronted adverbials that just don’t make sense.
So, my favorite example is: ‘Forgettably, he crept through the darkness.’ And so, that’s the point where you have this incentive. And that student will then get a better mark than a student who has not used a fronted adverbial but has written something really very good, because they’ve got the tick on the list.
And, comparative judgment eliminates that. And, I’ve given you that example, but I could multiply examples. There’s examples about shifts in informality of register, which lead to students shoehorning in slang wherever they can. There’s all these kinds of things. [More to come, 41:33]