Results from the Great Scotch Tasting of 2016

Two guys. Twelve kinds of Scotch. One poorly controlled experiment.

That’s right – my friend and I just did a blind Scotch tasting, and I’m here to give you all the smokey, peaty details.

Background first. I was introduced to Scotch about five years ago, as a young and maybe-not-entirely-innocent grad student. At the time I wasn’t one to enjoy the taste of alcohol much – to this day I can’t stomach a drop of beer – but Scotch I immediately took a liking to. It tasted great, and made you look classy as all hell while drinking it. What wasn’t to love? It became my drink of choice, and naturally I started to branch out, trying as many different labels as I could get my hands on. My taste developed and favourites quickly emerged – Bowmore became my go-to brand; Lagavulin and Talisker my indulgences, reserved for special occasions.

As I tried more and more brands, though, a few things became clear:

First, expensive Scotch could be really, really good. One day I was lucky enough to try some 18-year-old Laphroaig (for the uninitiated, older Scotches tend to be both more expensive and more highly regarded). I only got a small taste, but…wow. I don’t want to oversell it, but let’s just say that it’s a good thing for my bank account that it’s not available in Ontario.

Second, more expensive didn’t always mean better tasting. Plenty of the pricey brands I tried were quite good, but they still didn’t measure up to the (relatively) affordable Bowmore 12, in my estimation.

From this I concluded that there was probably something real that the Scotch market was capturing – that there was at least a trend towards more expensive Scotch tasting better, even if it could be overwhelmed by the idiosyncrasies of one’s own personal preferences. But when it came to truly mind-blowing, explode-in-your-mouth taste, it did seem as though a high price was a necessary condition, if not a sufficient one. And certainly most of the Scotches I would list as my favourites were towards the higher end of the price spectrum.

Still, though, some doubt lingered in my mind. I had heard all the standard warnings: how unblinded ratings are essentially meaningless, how people are extremely vulnerable to the power of suggestion, and how we tend to massively overrate things if we’re told that they’re expensive and/or highly regarded. There’s certainly no shortage of cautionary tales involving self-styled connoisseurs getting egg on their face when submitting to a blind taste test. A nagging part of me wondered if I wasn’t just rating the expensive Scotches highly because I subconsciously wanted to like them better. I mean, it certainly didn’t feel like that’s what I was doing – to me the more expensive Scotches just seemed to actually taste better. But I guess that’s what everyone says.

So, naturally, I did what any curious person would do: I tried an experiment. I enlisted a friend who also appreciates Scotch, and one night we set out to do a blind taste test: we would try a range of Scotches without knowing which was which, and rate them. We both saw it as a win-win situation: if it did turn out that we preferred the more expensive brands, then that would be vindication of a sort – what we had said all along about our preferences would be born out by the evidence. On the other hand, if it turned out we actually liked the cheaper labels just as much (or more than) the expensive ones, then that would be fascinating – we would get to experience a cognitive bias at work first hand, like when you see an optical illusion revealed for what it is.

(plus, from then on we would get to switch to buying cheaper Scotch, which would be a nice bonus)

Yeah, yeah, okay, I can hear you saying, just get to the results! Did you embarrass yourself or not?

Alright, fair enough. As it happens we didn’t totally embarrass ourselves, but we definitely had some surprises. So without further ado, I’ll get to the experiment.

(Oh, and before you ask: yes, I’m drinking Scotch as I write this. Of course I’m drinking Scotch as I write this)

The setup we decided on was pretty simple. We chose 12 different Scotches from our combined collection, varying widely both in price and in terms of our personal preferences. They were, in order of cost per bottle:

  • McClelland Islay – $45
  • Glenfiddich 12 – $55
  • Glenlivet 12 – $57
  • Bowmore 12 – $60
  • Aberlour 10 – $60
  • Glenfiddich 15 – $77
  • Bowmore 15 – $93
  • Talisker 10 – $100
  • Glenfiddich 18 – $112
  • Glenlivet 18 – $120
  • Lagavulin 16 – $122
  • Bowmore 18 – $127

Blinding was straightforward – we each left the room in turn as the other poured out the twelve Scotches in a random order. Then we went through them one by one, rating each out of ten and taking notes (we also tried to identify which Scotch was which, although that was less of a priority).

I won’t keep you in suspense any longer – here are the results:

scotch_ratings

Of course, it’s hard to get much from looking at a table, but a few things jump out. In my case, probably the most noticeable trend is…the lack of a trend. My ratings were fairly uniform across the board, with an 8 being the most common rating (which I used to indicate a “very good but not amazing” Scotch). Apparently my reaction to most Scotch is basically “yeah, tastes great”, and variations on top of that are relatively minor – it seems I just like the stuff. Still, there were some interesting datapoints. In my favour, I’ve always said that Bowmore was my favourite Scotch, and indeed two of my top three rated whiskeys ended up being different ages of Bowmore (the third Bowmore rated a solid 8/10, although embarrassingly this was Bowmore 18, the most expensive of the three). Also somewhat in my favour, my other top-rated whiskey was Glenfiddich 18, which is a fairly high-end Scotch. I previously would have pegged Glenfiddich 18 as excellent, although I don’t know if I would have said it was my favourite. Less to my credit, two expensive Scotches that I would have said were among my absolute favourites – Lagavulin and Talisker – rated a mere 7 and 8, respectively. Lagavulin in particular surprised me – I would have been willing to bet it would have ended up with an above average rating, so having it come in near the bottom was a big shock.

Overall, I didn’t come off too terribly, but given that I rated most Scotches as essentially the same, I didn’t exactly knock it out of the park, either. Apparently my “preference” for more expensive Scotch had a lot more social conditioning to it than I would have guessed.

My friend’s ratings were, I think, much more interesting than mine. The most obvious difference is that he had a much wider range of ratings than I did (3 to 9, compared to 6 to 9 for me). And I don’t think he was just using a different scale than I was – I think he genuinely disliked certain Scotches to a degree that I didn’t and rated them accordingly. He also took much more detailed notes than I did, which leads me to believe that he has a more attuned sense of taste than I do. Intriguingly, he too picked out Bowmore 15 as his top-rated whiskey, which I don’t think either of us would have predicted at the outset. His other top picks – Glenlivet 18, Lagavulin 16, Bowmore 18 – were all quite high-end, which reflects pretty well on him. He did have some surprises, though, even bigger than mine. Two of his favourite Scotches – Glenfiddich 18 and Talisker 10 – fared quite poorly, garnering ratings of 6.5 and 4 out of 10, respectively. And previously he would have said that he really liked those two, so it was a fairly unexpected outcome. Those surprises notwithstanding, though, I think my friend came out of the experiment looking pretty good – his ratings tended to align more with his pre-experiment rankings than mine did.

That’s all just verbal analysis, though, and I know you all came here for the math (admit it, none of you can resist the allure of sweet, sweet, data analysis). So let’s dig into the data a bit further.

The first thing we’ll want to do is compare my set of ratings to my friend’s, to see how well the two agree. Of course, even two of the world’s most experienced whiskey connoisseurs could simply have different taste, so you wouldn’t necessarily expect the ratings of two different people to match up perfectly with one another. But still, going in my friend and I would have said we had pretty similar preferences, so it’d be strange if the ratings were completely uncorrelated. With that in mind, we can pull up the scatterplot:

scotch_ratings_comparison

And…yeah. Looks pretty random. Whoops. Granted, the best fit line does at least have a positive slope, so our ratings were positively correlated. But when I run the correlation I get a coefficient of 0.19, which is…pretty low. I mean, it’s not zero (or negative!), but it’s pretty low. So charitably you could say that we just have very different taste (and didn’t realize it before), or you could say that our ratings have a lot of randomness to them. Take your pick.

From there we move on to the more interesting question – how well did our ratings correlate with price? As I alluded to above, going in to the experiment I wouldn’t have expected a perfect correlation. I would have said that more expensive whiskeys tend to taste better, but that still leaves plenty of room for personal preference. After all, some Scotches are just going to have a flavour that you really like, for whatever reason. And you can age another Scotch as much as you like, but it still won’t necessarily be able to compete with that cheaper Scotch you simply have a taste for.

Of course, with that being said, my friend and I both went in claiming to like more expensive Scotch, so some kind of correlation with price was expected. If it didn’t show up, or was much smaller than expected, there would certainly be some egg on our faces.

So how did we do? Well, here are the scatterplots:

my_rating_price

friend_rating_price

The first thing you might notice about my scatterplot is that it’s very flat – the best fit line has a positive slope, indicating a positive correlation, but the slope is very small. However – statistics to the rescue! – it turns out this doesn’t actually mean all that much. When your data has low variance along one axis, as mine does, the slope of the best fit isn’t a very informative quantity. Instead you want to look at the correlation coefficient. And when I run a correlation between my ratings and the cost per bottle, I get a coefficient of…

0.32.

Now, how you interpret that number will depend on your field, but a correlation of 0.32 is…probably decent at best. It’s certainly not nothing, but it’s not huge either. If you want an intuitive sense of what a correlation coefficient means, you can square it, and that will give you the fraction of the variance in the data that’s explained by the dependent variable. What does that mean? Well, in my case, 0.32 squared is 0.104. This means that price accounts for roughly 10% of the variance in my ratings – in other words, 90% of what makes me like a given Scotch is coming from something other than price. If you like, you can picture ten different equally important factors that contribute to my Scotch preferences – maybe the smokiness of the Scotch would be one, or the peatiness – and price would be just one of those ten. So the bottom line is, at least for the 12 Scotches that we happened to pick out, price isn’t extremely important when it comes to determining my preferences. It’s definitely a factor, but I’ll admit that it’s a smaller factor than I expected.

Okay, so what about my friend? Well, right off the bat his scatterplot looks more promising than mine. You can see a clear trend towards more expensive Scotches having higher ratings. And indeed, when you run the correlation you get a coefficient of…(drumroll please)…0.53. Not bad! When you square that you get 0.284, so over 28% of the variance in his ratings is coming directly from price. This means that, for my friend at least, if you want to answer the question “what makes a good Scotch?”, a decent fraction (more than a quarter) of the answer is going to be price. Whereas I would need nine other equally important factors (aside from price) to fully account for my preferences, my friend would only need two or three. Which is pretty cool! This is more the level of correlation I expected going in to the experiment.

But here’s where it gets interesting. So far I’ve been using price as a proxy for the “overall quality” of the Scotch, and judging our ratings against this standard. But there’s another metric that’s commonly used to judge Scotch, and that’s age – common wisdom holds that the older a Scotch is, the better it is. Of course, age tends to correlate with price, but there are exceptions – the Talisker we sampled, for example, is only 10 years old but costs $100. Overall the correlation between age and price for our 12 whiskeys was 0.76. Given that, I initially figured that there wasn’t much point in running a correlation between age and rating – I guess I thought that price would reflect the quality of the Scotch pretty well, and I doubted our ability to pick out any effect age might have beyond that.

But boy, was I wrong. Recall that the correlations between rating and price were 0.32 for me, and 0.53 for my friend. However, if you instead look at the correlations between rating and age, they jump up to 0.47 for me and 0.70 (!) for my friend. All of a sudden we’re starting to get respectable correlations, ones even a physicist might not scoff at. In my friend’s case that’s fully half the variance in rating being explained by one quantity, age.

I did not expect this at all, and I find it fascinating. Going in I would have been willing to bet that whatever correlations we saw, they would have been stronger for price than they were for age. I had in my head the idea that age was a very noisy measure of Scotch quality: sure, older Scotches were usually better, but there were plenty of exceptions (Talisker 10, for instance, I would have rated above plenty of 18-year-old whiskeys). Price seemed like a better indicator of quality because it could correct for those exceptions – if it turned out that some young Scotch was particularly good, well, then a lot of people would say “hey, this is really good,” and they would want to buy it, and it would end up with a high price. In other words, price would reflect quality even when the age heuristic broke down. So if we were going to pick out anything with our ratings, I thought, it would be price.

But the opposite happened, and that’s interesting. It suggests that age really is a good indicator of Scotch quality, and that once you take age into account, price is possibly only adding noise. So if you’re ever in the liquor store shopping for some Scotch, you might do better ignoring the price tag and just looking at how old it is.

(also, not to put too fine a point on it, but – how cool is it that result? We did a blind taste test and our ratings ended up correlating well with age. I’m gonna go ahead and call that a little bit of vindication for the “Scotch quality is a real thing” camp)

Finally, just as we were curious as to how our unblinded ratings would hold up under blinding, we were also curious as to how our blinded ratings would hold up once we were unblinded again. So one of the first things we did after the tasting was over was to re-taste some of the “surprise” Scotches – the ones that rated much higher or lower than we expected. We were worried that with the blinding removed, we would revert back to our earlier, preconception-laden opinions. But surprisingly this didn’t end up happening. When my friend tried Talisker, and I tried Lagavulin, our two big underperformers, we both said “hey yeah, I guess this really isn’t as good as I always thought it was”. And when we both tried Bowmore 15, our surprise winner, it turned out to be every bit as good as our ratings suggested. In other words, the blinded assessments really did carry over – they made us notice things that we hadn’t noticed before, but once noticed the blinding wasn’t necessary.

Which means we won’t be treating this as merely an academic exercise. Both my friend and I intend to change our buying habits based on these results. In my case my highest rated “affordable” Scotch was Bowmore 12, which was already kind of my go-to, so I won’t be changing anything there. But from now on I will absolutely be getting Bowmore 15 as my “special occasion” Scotch, rather than any of the more expensive bottles I used to buy sometimes. For my friend’s part, he also plans to start buying Bowmore 15 for special occasions, and is considering switching to Glenlivet 12 as an “everyday” Scotch.

 

Okay, so what’s the takeaway from all this? Well, I would say that we got sort of a middle-of-the-road result from the experiment. On the one hand our ratings both ended up correlating with age and price, and many of the ratings we gave agreed roughly with our previous, non-blinded assessments. So we didn’t totally embarrass ourselves – no Bottle Shock for us. On the other hand, some of those correlations were fairly small, and we were both humbled when some of our purported favourites ended up getting quite low ratings. So neither complete vindication, nor abject humiliation – just some fascinating results, and a whole lot of Scotch.

I’d call that a successful experiment.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s