On speedruns, golf, and interestingness

I find myself fascinated by video game speedruns lately. They’re not quite like anything else I’ve ever seen.

I mean, first of all they showcase a particular kind of technical skill that I find both enormously impressive and aesthetically pleasing. Watching a speedrun I often find myself struck by two simultaneous thoughts:

1. How the heck did they do that?

and

2. That was beautiful

It’s the combination of technicality and visual appeal that gets to the heart of it, I think. Speedruns probably have much the same appeal as, say, juggling, or other feats of dexterity. It’s about pushing the limits of what the human body can do, but in a unique way that emphasizes complexity and intricacy rather than maximization of a single trait, like speed or strength. And there’s beauty to it – not in the sense that speedrunners are optimizing for beauty, really, but in the sense that I don’t think people would watch speedruns if they were about players using the same button combinations to manipulate spreadsheets or something.

But there’s more to it than that. It strikes me that most kinds of races that people compete in are kind of…well, boring, for lack of a better term. There’s just not much to them. For example, I think marathon runners are extremely impressive athletes, and if I heard that someone had broken the marathon world record I would find that really cool and interesting. But would I have any desire to actually watch the race? No, of course not. Why on earth would I? For the most part I can just simulate it in my head: first they ran really fast…then they continued running really fast. I mean, yes, there might be some details you would want to know – the splits could be interesting, for example. Maybe the runner had a very weak start and then a strong finish. Or if it was an actual race and not a time trial, it could be interesting if two runners fed off of one another. But I see those as relatively minor things – they wouldn’t be enough to make me actually want to watch the race. At least for me, 99% of the information I care about when it comes to a race like that is captured by the actual finishing times. Beyond that, the act of running itself simply doesn’t have enough going on to really interest me.

Speedruns aren’t really like that, though. They have structure. The different sections of a speedrun can be wildly different from one another – not just in terms of difficulty, but in terms of the actual skills required to do them well. Maybe one section requires extreme skill with the game’s jumping mechanics. Maybe for another you need to be master the combat system. Some parts might demand extreme precision, while others might require fast reflexes.

But even that doesn’t really capture it. After all, if all you want is for different parts of your race to involve different skills, you can trivially achieve that by slapping together different skills into a frankensport – this is essentially what triathlons do. And indeed, I would probably be more inclined to watch a triathlon than I would any of the individual sports that make it up – but I still wouldn’t be that inclined. So it’s not just that multiple skills are involved – it’s that each of the skills on their own are interesting, multi-dimensional skills. Running at a world class level, as impressive as it is, does not involve that many dimensions of skill – you have to be fast, and you have to have endurance, and I guess you have to be good at managing the race and, like, being aware of your opponent’s psychology and knowing when to attack and things like that. But that’s still not that much going on – in terms of the interestingness of the skill, I would say it pales in comparison to something like the ability to dribble and maneuver a soccer ball. There’s just a lot more nuance, a lot more ways to be good or bad in the latter case. And I would say that most video game skills are much more like dribbling a soccer ball than they are like running.

Honestly, the closest comparison I can think of to make – and I know this will sound silly, but hear me out – is to golf. Golf is sort of like a race in that it’s a one-player game – no teammates, and you compete with opponents only indirectly – and you’re trying to minimize a particular quantity. In a race it’s time, and in golf it’s number of shots. Just like speedruns, though, golf has structure to it. In a running race, what differentiates one part of the race from another? Well, hills I guess, but in general not a whole lot – any one kilometer is pretty much the same as any other. Golf isn’t like that – as you go through a round you play different holes that each have their own unique challenges and opportunities (analogous to the different levels or sections of a video game). Moreover, just like a speedrun golf involves many different skills – driving is very different from iron play is very different from chipping is very different from putting. And I would say that certain of those skills are fairly multi-dimensional – chipping in particular involves a lot of nuance. So ignoring the fact that you might personally find golf to be very boring, if you can see what I mean by saying that golf has structure, you can maybe see what I’m getting at when it comes to speedruns.

Of course, golf isn’t unique in that regard – there are plenty of other sports that have this same kind of structure/intricate nature. Hockey, soccer, basketball – these are sports you might still want to watch, even after being told what the score of a game was. Maybe someone made an amazing play. Maybe something really cool or unlikely happened. Maybe one team had a last-minute comeback or collapse. Whatever. The point is, it’s not just a one-dimensional “they ran fast/they didn’t run fast” dynamic. You care about the process, how the game happened, rather than just the result.

Okay, so if these sports exist, why go on and on about speedruns being so unique? Well, basically – because those sports aren’t races. They’re team games, and more importantly they involve players competing directly against one another. There’s nothing wrong with that, of course – not everything has to be a race, and I enjoy watching those sports. But I do like races. They’re a really cool class of competition – and given their wide appeal, I think it’s safe to say that they tickle some part of human psychology. But most races that exist right now just don’t have the same complexity that other sports have – they’re one-dimensional. So it’s neat to see an activity like speedrunning come along that takes the intricate, nuanced aspects that typify team sports and fuses them with the general awesomeness that is a race. It fills a niche that I think was missing from the world of competition and play.

Like, I can see why the traditional races came about in the first place, of course. Swimming, running, biking – these sports are about pushing the human body as far as it can go in certain specific directions. I can see the appeal in that, and I’m glad those races exist. God knows they’re impressive as all hell.

I just can’t really find them interesting.

Results from the Great Scotch Tasting of 2016

Two guys. Twelve kinds of Scotch. One poorly controlled experiment.

That’s right – my friend and I just did a blind Scotch tasting, and I’m here to give you all the smokey, peaty details.

Background first. I was introduced to Scotch about five years ago, as a young and maybe-not-entirely-innocent grad student. At the time I wasn’t one to enjoy the taste of alcohol much – to this day I can’t stomach a drop of beer – but Scotch I immediately took a liking to. It tasted great, and made you look classy as all hell while drinking it. What wasn’t to love? It became my drink of choice, and naturally I started to branch out, trying as many different labels as I could get my hands on. My taste developed and favourites quickly emerged – Bowmore became my go-to brand; Lagavulin and Talisker my indulgences, reserved for special occasions.

As I tried more and more brands, though, a few things became clear:

First, expensive Scotch could be really, really good. One day I was lucky enough to try some 18-year-old Laphroaig (for the uninitiated, older Scotches tend to be both more expensive and more highly regarded). I only got a small taste, but…wow. I don’t want to oversell it, but let’s just say that it’s a good thing for my bank account that it’s not available in Ontario.

Second, more expensive didn’t always mean better tasting. Plenty of the pricey brands I tried were quite good, but they still didn’t measure up to the (relatively) affordable Bowmore 12, in my estimation.

From this I concluded that there was probably something real that the Scotch market was capturing – that there was at least a trend towards more expensive Scotch tasting better, even if it could be overwhelmed by the idiosyncrasies of one’s own personal preferences. But when it came to truly mind-blowing, explode-in-your-mouth taste, it did seem as though a high price was a necessary condition, if not a sufficient one. And certainly most of the Scotches I would list as my favourites were towards the higher end of the price spectrum.

Still, though, some doubt lingered in my mind. I had heard all the standard warnings: how unblinded ratings are essentially meaningless, how people are extremely vulnerable to the power of suggestion, and how we tend to massively overrate things if we’re told that they’re expensive and/or highly regarded. There’s certainly no shortage of cautionary tales involving self-styled connoisseurs getting egg on their face when submitting to a blind taste test. A nagging part of me wondered if I wasn’t just rating the expensive Scotches highly because I subconsciously wanted to like them better. I mean, it certainly didn’t feel like that’s what I was doing – to me the more expensive Scotches just seemed to actually taste better. But I guess that’s what everyone says.

So, naturally, I did what any curious person would do: I tried an experiment. I enlisted a friend who also appreciates Scotch, and one night we set out to do a blind taste test: we would try a range of Scotches without knowing which was which, and rate them. We both saw it as a win-win situation: if it did turn out that we preferred the more expensive brands, then that would be vindication of a sort – what we had said all along about our preferences would be born out by the evidence. On the other hand, if it turned out we actually liked the cheaper labels just as much (or more than) the expensive ones, then that would be fascinating – we would get to experience a cognitive bias at work first hand, like when you see an optical illusion revealed for what it is.

(plus, from then on we would get to switch to buying cheaper Scotch, which would be a nice bonus)

Yeah, yeah, okay, I can hear you saying, just get to the results! Did you embarrass yourself or not?

Alright, fair enough. As it happens we didn’t totally embarrass ourselves, but we definitely had some surprises. So without further ado, I’ll get to the experiment.

(Oh, and before you ask: yes, I’m drinking Scotch as I write this. Of course I’m drinking Scotch as I write this)

The setup we decided on was pretty simple. We chose 12 different Scotches from our combined collection, varying widely both in price and in terms of our personal preferences. They were, in order of cost per bottle:

  • McClelland Islay – $45
  • Glenfiddich 12 – $55
  • Glenlivet 12 – $57
  • Bowmore 12 – $60
  • Aberlour 10 – $60
  • Glenfiddich 15 – $77
  • Bowmore 15 – $93
  • Talisker 10 – $100
  • Glenfiddich 18 – $112
  • Glenlivet 18 – $120
  • Lagavulin 16 – $122
  • Bowmore 18 – $127

Blinding was straightforward – we each left the room in turn as the other poured out the twelve Scotches in a random order. Then we went through them one by one, rating each out of ten and taking notes (we also tried to identify which Scotch was which, although that was less of a priority).

I won’t keep you in suspense any longer – here are the results:

scotch_ratings

Of course, it’s hard to get much from looking at a table, but a few things jump out. In my case, probably the most noticeable trend is…the lack of a trend. My ratings were fairly uniform across the board, with an 8 being the most common rating (which I used to indicate a “very good but not amazing” Scotch). Apparently my reaction to most Scotch is basically “yeah, tastes great”, and variations on top of that are relatively minor – it seems I just like the stuff. Still, there were some interesting datapoints. In my favour, I’ve always said that Bowmore was my favourite Scotch, and indeed two of my top three rated whiskeys ended up being different ages of Bowmore (the third Bowmore rated a solid 8/10, although embarrassingly this was Bowmore 18, the most expensive of the three). Also somewhat in my favour, my other top-rated whiskey was Glenfiddich 18, which is a fairly high-end Scotch. I previously would have pegged Glenfiddich 18 as excellent, although I don’t know if I would have said it was my favourite. Less to my credit, two expensive Scotches that I would have said were among my absolute favourites – Lagavulin and Talisker – rated a mere 7 and 8, respectively. Lagavulin in particular surprised me – I would have been willing to bet it would have ended up with an above average rating, so having it come in near the bottom was a big shock.

Overall, I didn’t come off too terribly, but given that I rated most Scotches as essentially the same, I didn’t exactly knock it out of the park, either. Apparently my “preference” for more expensive Scotch had a lot more social conditioning to it than I would have guessed.

My friend’s ratings were, I think, much more interesting than mine. The most obvious difference is that he had a much wider range of ratings than I did (3 to 9, compared to 6 to 9 for me). And I don’t think he was just using a different scale than I was – I think he genuinely disliked certain Scotches to a degree that I didn’t and rated them accordingly. He also took much more detailed notes than I did, which leads me to believe that he has a more attuned sense of taste than I do. Intriguingly, he too picked out Bowmore 15 as his top-rated whiskey, which I don’t think either of us would have predicted at the outset. His other top picks – Glenlivet 18, Lagavulin 16, Bowmore 18 – were all quite high-end, which reflects pretty well on him. He did have some surprises, though, even bigger than mine. Two of his favourite Scotches – Glenfiddich 18 and Talisker 10 – fared quite poorly, garnering ratings of 6.5 and 4 out of 10, respectively. And previously he would have said that he really liked those two, so it was a fairly unexpected outcome. Those surprises notwithstanding, though, I think my friend came out of the experiment looking pretty good – his ratings tended to align more with his pre-experiment rankings than mine did.

That’s all just verbal analysis, though, and I know you all came here for the math (admit it, none of you can resist the allure of sweet, sweet, data analysis). So let’s dig into the data a bit further.

The first thing we’ll want to do is compare my set of ratings to my friend’s, to see how well the two agree. Of course, even two of the world’s most experienced whiskey connoisseurs could simply have different taste, so you wouldn’t necessarily expect the ratings of two different people to match up perfectly with one another. But still, going in my friend and I would have said we had pretty similar preferences, so it’d be strange if the ratings were completely uncorrelated. With that in mind, we can pull up the scatterplot:

scotch_ratings_comparison

And…yeah. Looks pretty random. Whoops. Granted, the best fit line does at least have a positive slope, so our ratings were positively correlated. But when I run the correlation I get a coefficient of 0.19, which is…pretty low. I mean, it’s not zero (or negative!), but it’s pretty low. So charitably you could say that we just have very different taste (and didn’t realize it before), or you could say that our ratings have a lot of randomness to them. Take your pick.

From there we move on to the more interesting question – how well did our ratings correlate with price? As I alluded to above, going in to the experiment I wouldn’t have expected a perfect correlation. I would have said that more expensive whiskeys tend to taste better, but that still leaves plenty of room for personal preference. After all, some Scotches are just going to have a flavour that you really like, for whatever reason. And you can age another Scotch as much as you like, but it still won’t necessarily be able to compete with that cheaper Scotch you simply have a taste for.

Of course, with that being said, my friend and I both went in claiming to like more expensive Scotch, so some kind of correlation with price was expected. If it didn’t show up, or was much smaller than expected, there would certainly be some egg on our faces.

So how did we do? Well, here are the scatterplots:

my_rating_price

friend_rating_price

The first thing you might notice about my scatterplot is that it’s very flat – the best fit line has a positive slope, indicating a positive correlation, but the slope is very small. However – statistics to the rescue! – it turns out this doesn’t actually mean all that much. When your data has low variance along one axis, as mine does, the slope of the best fit isn’t a very informative quantity. Instead you want to look at the correlation coefficient. And when I run a correlation between my ratings and the cost per bottle, I get a coefficient of…

0.32.

Now, how you interpret that number will depend on your field, but a correlation of 0.32 is…probably decent at best. It’s certainly not nothing, but it’s not huge either. If you want an intuitive sense of what a correlation coefficient means, you can square it, and that will give you the fraction of the variance in the data that’s explained by the dependent variable. What does that mean? Well, in my case, 0.32 squared is 0.104. This means that price accounts for roughly 10% of the variance in my ratings – in other words, 90% of what makes me like a given Scotch is coming from something other than price. If you like, you can picture ten different equally important factors that contribute to my Scotch preferences – maybe the smokiness of the Scotch would be one, or the peatiness – and price would be just one of those ten. So the bottom line is, at least for the 12 Scotches that we happened to pick out, price isn’t extremely important when it comes to determining my preferences. It’s definitely a factor, but I’ll admit that it’s a smaller factor than I expected.

Okay, so what about my friend? Well, right off the bat his scatterplot looks more promising than mine. You can see a clear trend towards more expensive Scotches having higher ratings. And indeed, when you run the correlation you get a coefficient of…(drumroll please)…0.53. Not bad! When you square that you get 0.284, so over 28% of the variance in his ratings is coming directly from price. This means that, for my friend at least, if you want to answer the question “what makes a good Scotch?”, a decent fraction (more than a quarter) of the answer is going to be price. Whereas I would need nine other equally important factors (aside from price) to fully account for my preferences, my friend would only need two or three. Which is pretty cool! This is more the level of correlation I expected going in to the experiment.

But here’s where it gets interesting. So far I’ve been using price as a proxy for the “overall quality” of the Scotch, and judging our ratings against this standard. But there’s another metric that’s commonly used to judge Scotch, and that’s age – common wisdom holds that the older a Scotch is, the better it is. Of course, age tends to correlate with price, but there are exceptions – the Talisker we sampled, for example, is only 10 years old but costs $100. Overall the correlation between age and price for our 12 whiskeys was 0.76. Given that, I initially figured that there wasn’t much point in running a correlation between age and rating – I guess I thought that price would reflect the quality of the Scotch pretty well, and I doubted our ability to pick out any effect age might have beyond that.

But boy, was I wrong. Recall that the correlations between rating and price were 0.32 for me, and 0.53 for my friend. However, if you instead look at the correlations between rating and age, they jump up to 0.47 for me and 0.70 (!) for my friend. All of a sudden we’re starting to get respectable correlations, ones even a physicist might not scoff at. In my friend’s case that’s fully half the variance in rating being explained by one quantity, age.

I did not expect this at all, and I find it fascinating. Going in I would have been willing to bet that whatever correlations we saw, they would have been stronger for price than they were for age. I had in my head the idea that age was a very noisy measure of Scotch quality: sure, older Scotches were usually better, but there were plenty of exceptions (Talisker 10, for instance, I would have rated above plenty of 18-year-old whiskeys). Price seemed like a better indicator of quality because it could correct for those exceptions – if it turned out that some young Scotch was particularly good, well, then a lot of people would say “hey, this is really good,” and they would want to buy it, and it would end up with a high price. In other words, price would reflect quality even when the age heuristic broke down. So if we were going to pick out anything with our ratings, I thought, it would be price.

But the opposite happened, and that’s interesting. It suggests that age really is a good indicator of Scotch quality, and that once you take age into account, price is possibly only adding noise. So if you’re ever in the liquor store shopping for some Scotch, you might do better ignoring the price tag and just looking at how old it is.

(also, not to put too fine a point on it, but – how cool is it that result? We did a blind taste test and our ratings ended up correlating well with age. I’m gonna go ahead and call that a little bit of vindication for the “Scotch quality is a real thing” camp)

Finally, just as we were curious as to how our unblinded ratings would hold up under blinding, we were also curious as to how our blinded ratings would hold up once we were unblinded again. So one of the first things we did after the tasting was over was to re-taste some of the “surprise” Scotches – the ones that rated much higher or lower than we expected. We were worried that with the blinding removed, we would revert back to our earlier, preconception-laden opinions. But surprisingly this didn’t end up happening. When my friend tried Talisker, and I tried Lagavulin, our two big underperformers, we both said “hey yeah, I guess this really isn’t as good as I always thought it was”. And when we both tried Bowmore 15, our surprise winner, it turned out to be every bit as good as our ratings suggested. In other words, the blinded assessments really did carry over – they made us notice things that we hadn’t noticed before, but once noticed the blinding wasn’t necessary.

Which means we won’t be treating this as merely an academic exercise. Both my friend and I intend to change our buying habits based on these results. In my case my highest rated “affordable” Scotch was Bowmore 12, which was already kind of my go-to, so I won’t be changing anything there. But from now on I will absolutely be getting Bowmore 15 as my “special occasion” Scotch, rather than any of the more expensive bottles I used to buy sometimes. For my friend’s part, he also plans to start buying Bowmore 15 for special occasions, and is considering switching to Glenlivet 12 as an “everyday” Scotch.

 

Okay, so what’s the takeaway from all this? Well, I would say that we got sort of a middle-of-the-road result from the experiment. On the one hand our ratings both ended up correlating with age and price, and many of the ratings we gave agreed roughly with our previous, non-blinded assessments. So we didn’t totally embarrass ourselves – no Bottle Shock for us. On the other hand, some of those correlations were fairly small, and we were both humbled when some of our purported favourites ended up getting quite low ratings. So neither complete vindication, nor abject humiliation – just some fascinating results, and a whole lot of Scotch.

I’d call that a successful experiment.