Fake Graph: The Actual “Dunning-Kruger Effect” Is NOTHING Like I Thought It Was

For years, I’ve been teaching a fake graph.

In pretty much every course I teach, on some day when students seem discouraged or distracted, I’ll draw an X axis labeled “Experience” and a Y-axis labeled “Confidence,” and sketch out the “Dunning-Kruger Effect” curve, as preparation for an informal pep talk.

(Update, 27 Nov 2021: My subject area is English, so my syllabus doesn’t cover how psychologists conduct research or how to create or interpret graphs. When I wrote above that I would “sketch out” this curve, I literally mean I would draw this curve freehand on the whiteboard, in order to illustrate an informal, extemporaneous pep talk. This story is less about the real findings of the Dunning-Kruger study, and more about my own personal discovery that I had been spreading a very popular but inaccurate meme.)

That pep talk would typically go like this:

Figure 1: A curve labeled as “Dunning-Kruger Effect,” but with a typo on the X axis, which is labeled “No nothing” instead of “Know nothing.”

When learning anything new, beginners tend to learn so much when they are first exposed to new ways of looking at the world and learning vocabulary terms to talk about them, that they get a huge initial confidence boost.

But as soon as you gain just a bit of knowledge, your confidence level starts to drop. Even though your professor can see you are making steady progress along the “experience” axis, the more you know, the more you realize how much you don’t know.

So regardless of the subject, student confidence takes a sharp nose-dive, just when the course really gets rolling.

Lots of beginners give up on the downslope, but eventually the curve bottoms out, and as you look back on your accomplishments, your confidence will rise again.

Dunning and Kruger won the Ig Nobel Prize in Psychology for this work. Isn’t that encouraging to know?

Like many people who share a meme on social media because it helps them make a point they really want to make, I have sadly perpetuated a falsehood.

(Record scratch; freeze-frame.)

But let’s go back to when this all started. (Fast-rewind video effect.)

Monday morning, I figured that it was about time for me to give my Dunning-Kruger Effect pep talk. For an in-person class, I would just sketch the curve freehand on the whiteboard, but since I’m teaching all my classes online (thanks, COVID-19), I thought instead I might work this concept into a short video or maybe a new handout.

As I was still brainstorming, I was looking for an image that looked clean, and as I inspected one to re-familiarize myself with what it actually says, I noticed a typo — instead of “Know nothing,” the label at X = 0 is “No nothing.” When I expanded my Google search to find a better image, I noticed just how many iterations of this curve are out there.

Most of the results looked like the curve I was familiar with (a needle sharp spike and a sharp curve up), but a significant number showed a variation (with a rounded peak and gradual rise).

Figure 2: A very different curve, also labeled “Dunning-Kruger Effect,” but with the Y axis labeled “Low” to “High” confidence, and terms like “Mount Stupid” and “Valley of Despair” added.

All these images are labeled “Dunning-Kruger Effect,” but the time I’ve recently spent studying coronavirus pandemic curves has made me appreciate just how significantly different some of these graphs are.

For instance, this one I’ve called “Figure 2” adds labels such as “Mount Stupid” and “Valley of Despair,” which struck me not only as a poor fit for the kind of pep talk I wanted to give, but also too informal for legitimate scholarship.

So I wondered, for the first time in my teaching career, what exactly were Dunning and Kruger measuring when they plotted “Confidence” on the Y graph? Did they come up with the terms “Mt. Stupid” and “Valley of Despair”? How did they measure the acquisition of knowledge?

I then wondered about the scale on the Y axis. What exactly were Dunning and Kruger measuring when they labeled a point on the Y axis as measuring “100% Confidence”? The graph plots the curve through the (0,0). But who begins any course having exactly *zero* confidence and knowing exactly “nothing”? Certainly SOME students will rate above zero in both categories, which should push the average at least slightly above zero.

How did they know to stop their study as soon as their test subjects became what the first figure calls an “expert” but the second calls a “guru”? How do they define expertise or guru status? Did this study actually follow test subjects from when they were babies (when they knew exactly “nothing”) to when they became experts in their fields? How many of the babies in their study went on to become experts with 100% knowledge in a field?

Since the label for Version 1 cites the title of the 1999 article by Dunning and Kruger, it wasn’t hard for me to find the full text in Academic Search Elite.

Kruger, Justin, and David Dunning. “Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments.” Journal of Personality and Social Psychology, vol. 77, no. 6, Dec. 1999, pp. 1121–1134. Academic Search Elite, doi:10.1037/0022-3514.77.6.1121.

I scanned the article looking for the familiar curve, but what I found instead were these graphs:

What the actual what?

Their “Figure 2” does show a slight valley, but nowhere do I see anything that comes close to the beautiful curve I have freehanded from memory many times on whiteboards.

The X axis of the D-K graphs floating around on the Internet label the X-axis “Experience” or “knowledge in field.” But in the actual article, each of these charts plots exactly four points — one for each quartile. The Y axis, instead of having some kind of externally verified scale of “Confidence” is instead labeled “percentile.”

So the narrative I’ve been giving in the name of Dunning and Kruger is totally wrong.

Dunning and Kruger did not measure the confidence of students at the start of a class (at X = 0), and then track them through the course by measuring their confidence after the first, second, third, and final quarters.

No evidence from their study supports the narrative that the confidence of learners starts out at zero, spikes, nose-dives, and then climbs again — even though the Internet is full of graphics purported to illustrate that very narrative.

What data are Dunning and Kruger actually plotting?

Each chart breaks down the responses of groups of students who were asked to predict their score on a single test.  Their responses were then sorted intro four groups according to how they scored on that one test. (That’s why there’s a perfect 45 degree angle for “test scores” — the students were deliberately sorted that way.)

Here is the take-away message that Dunning and Kruger leave their readers with:

In sum, we present this article as an exploration into why people tend to hold overly optimistic and miscalibrated views about themselves. We propose that those with limited knowledge in a domain suffer a dual burden: Not only do they reach mistaken conclusions and make regrettable errors, but their incompetence robs them of the ability to realize it. Although we feel we have done a competent job in making a strong case for this analysis, studying it empirically, and drawing out relevant implications, our thesis leaves us with one haunting worry that we cannot vanquish. That worry is that this article may contain faulty logic, methodological errors, or poor communication. Let us assure our readers that to the extent this article is imperfect, it is not a sin we have committed knowingly.

Just as I wouldn’t want to use “Mt. Stupid” or “The Valley of Despair” in an informal pep talk to students who are frustrated in the middle of a course, I wouldn’t want to use terms like “unskilled” and “incompetent”  — which have specific meanings in the professional world where Dunning and Kruger live, but carry unpleasant emotional connotations that might make my students feel I am belittling their efforts to learn.

I can understand why an educator might want to take Dunning and Kruger’s negatively phrased finding — that students who lack knowledge of a domain also lack the ability to recognize the errors they make in that domain — and rephrase it more positively: “As students learn more, they are better able to recognize their errors.”

That positive version nicely supports the observation that students were better able to predict their test scores as they learned more.

However, Dunning and Kruger’s study did not actually measure student “confidence” on the Y axis, and the X axis does not measure how much experience students gain over time.

We are not looking at what happens to students over time as they learn; instead, we are looking at how accurately students are able to predict their scores on a single test, and those students are sorted into four groups (graphed at X=1 through X=4) according to their test scores.

Students at all levels all predicted they would get roughly the same scores, slightly above average. The students in the bottom quartile vastly over-estimated their scores, while the students in the top quartile under-estimated.

The Y axis includes the student’s “perceived test score” at four different points,  Dunning and Kruger don’t provide us with any information to flesh out the left side, where X = 0. We only have data points for X at 1, 2, 3, and 4.

I just don’t see any support for the distinctive peak and valley curve that so many online sources associate with the Dunning-Kruger effect.

07 Apr 2020 — first posted
27 Nov 2021 — “update” added, emphasizing my purpose for writing this page; minor tweaks throughout.

26 thoughts on “Fake Graph: The Actual “Dunning-Kruger Effect” Is NOTHING Like I Thought It Was

    • Thanks, “No Nothing.” Here’s a relevant quote from that article:

      “Knowing I had a preference for one view, I began to worry I was making a mistake and that the mistake would be memorialized in my book for all time. So, I read as much as I could from the critics, and I contacted David Dunning directly to ask him if he still believed the DK effect is real. He was quite firm in his belief that DK is real and not merely a statistical fluke. He supported his view by pointing to the many contexts in which it had been replicated—which, while impressive, I did not accept as proof—and to some studies testing alternative explanations for the DK effect. The most convincing of these was a recent large-scale study published in Nature Human Behaviour (Jansen et al. 2021) that included many improvements over previous studies. For example, Jansen and colleagues conducted two studies, one on logical reasoning and one on grammar, each with over three thousand online participants. This large sample guaranteed that there would be sufficient people at the extremes of high and low performance to accurately trace the relationship between actual and expected test scores.”

      • Hi. Thanks for your reply. I’m happy you found that article interesting.

        You may also find it interesting that I have been emailing with Dr. Alexander Danvers Ph.D., author of the 2020 article in Psychology Today, “Dunning-Kruger Isn’t Real.” [https://www.psychologytoday.com/us/blog/how-do-you-know/202012/dunning-kruger-isnt-real]

        I also pointed Dr. Danvers to that 2022 article by Stuart Vyse, and he replied enthustially. Stating it’s a good article and that he hadn’t tried the new papers from 2021 in Nature Human Behavior. He also wrote that he “definitely updated [his] thinking” on DK and that he would like to write a follow-up article.

        Best, No Nothing

        P.S. I wonder if “know nothing” was internally misspelled as “no nothing” as a tongue-in-cheek joke regarding those who know nothing – meaning they don’t even know how to spell “know.” :-)

  1. I just posted a YouTube video of a long talk I gave on Dunning-Kruger, which addresses much of what’s in this blog post, including that “No nothing” graph and where it probably came from. I also included a link back to here in my first comment on the video. I hope more people find this; it’s a breath of fresh air.

  2. I’m feeling super extra cranky today, so my apologies in advance for the probably-unwarranted snark. But: Had you *seriously* been considering that image to be a literal scientific graph and not as the simplified visual aid it’s so clearly meant to be?

    All the clues are right there: the beginning point of “0”, the abrupt rise to precisely “100”, the smooth, consistent swoop of the curve as experience increases. I mean, idk, but when I first saw the image years ago, I thought “No way, that’s too pretty/perfect to be real,” looked at the axes, and the very first thing I saw was the “No nothing,” which was the nail in the coffin.

    The limitations of and flaws in the Dunning-Kruger study are worthy of discussion, but this image is the least of D-K’s problems. It’s a stylized image meant to provide a visual explanation or mnemonic to augment the (difficult for some) concept of “As your expertise grows, so does your awareness of the limitations of your knowledge.” It hardly makes sense to condemn the image if/because you misinterpreted its objective and thought it was more than simply a pretty picture. (Might as well reply to “So a rabbi, a priest, and a rhinoceros walked into a bar” by interrupting to ask where this bar is and do they not have health codes there.)

    • Snark is valuable and I sometimes employ it myself; but

      * No, I never seriously considered this stylized graph to be an actual scientific document.
      * No, I did not use this specific image with the “No nothing” typo in my lectures.
      * Yes, I do sometimes sketch a curve to represent the “rising action” or “climax” of a play, or an “inverted pyramid” in journalism, without stressing over exactly what the lines are measuring.

      Of course I’m not blaming D & K for the fact that other people are using these images to spread the “after you put in the effort, frustration will turn to confidence” message in their name. I’s a benevolent message, and I still do give it to my students in a different form, without invoking the “Dunning-Kruger Effect” label. The extra time it would take to teach the actual D-K study would not help my students learn literature, or journalism, or composition (I am not trained to teach psychology).

      On the Monday I sat down to commit to more formal writing something I had been teaching informally from memory, I researched the source of the curve and wrote this post about what I learned in the process. That’s really the lesson I want my students to learn — that growth comes from consulting experts and examining the evidence, and adjusting your values and beliefs to match the available evidence, rather than choosing the evidence that supports your pre-existing beliefs.

      Perhaps D & K published a different study in which they did observe the progress of specific individuals over time, but the study I cited seems to refer to a one-shot test, which divides the students into 4 groups based on how well they performed, and explores how their predictions differed from their objective scores.

      True, the simple curve is much easier to grasp than the scholarly study, but the fact that the curve nicely supports the popular narrative (about how persisting through struggles leads to rewards) doesn’t make it an accurate representation of the data Dunning and Kruger found.

      • The graph stems from research that measured the response time of people answering questions to an online exam. The confidence factor was measured in literal time of response. Since, the Dunning/Kruger Effect has been reverified so many times now I am not sure it was published academically as this type of trial happened at dozens of universities in the early 2000s as just part of common coursework. Someone took that result and stylized the graph.

    • I don’t agree that the popular misrepresentation of D-K is benign. It matters that the actual research shows what’s referred to on the bad graph as “confidence” increasing as “knowledge” increases, whereas the fake graphs all show it *decreasing* up to some middle point, after which it changes direction. I don’t see how getting directionality wrong can be considered a minor error.

      There is a world of difference between “people with lower ability overestimate their ability by more than those with moderate to high ability” and “people with lower ability rate their ability more highly than people with moderate ability rate theirs”.

      This distinction is also important if one is to understand the actual scientific debate surrounding D-K, which is the extent to which it describes a causal phenomenon vs the extent to which it describes a statistical artifact (regression to the mean). The true graphs are consistent with at least some amount of regression to the mean. The fake graphs are not; if they were accurate, they’d suggest vastly stronger evidence in support of D-K than really is there.

      • Thank you for your thoughtful response, Ben. I don’t pretend to have the expertise to add to anything you said. I was glad to read your clear explanation. I wonder how many beliefs that are firmly lodged in my head, that got there because large numbers of people preferred an over-simplification they think they understand, as opposed to a complex truth that requires expertise to understand.

  3. Pingback: Web Literacy for Student Fact-Checkers: Four Moves | Jerz's Literacy Weblog (est. 1999)

  4. Actually it does Dennis, if you read the “Later studies” and “Mathematical critique” sections, you’ll find that more in depth analysis has refuted the original study:
    “The authors discovered that the different graphics refuted the assertions made for the effect. Instead, they showed that most people are reasonably accurate in their self-assessments.”

    • Thanks, Noel, for your informative response. Yes, I do see that the Wikipedia article mentions critiques of D&K’s work, but mentioning scholars who disagree is not the same thing having their work “debunked years ago,” as Maximilian Wicen phrased it so trollingly.

      I hope nobody is looking to my blog post for an authoritative assessment of the current status of Dunning and Krueger’s work, but I do hope my blog post has served its purpose of documenting how prevalent the faulty meme is — and in part atoning for my own role in unwittingly helping to spread that false meme.

      • “I do hope my blog post has served its purpose of documenting how prevalent the faulty meme is”

        It has — at least for this one, single data-point here.
        (And the plural of “anecdote” IS actually “data,” is it not — or have I been misinformed on that one, too?)

    • In the same year Dunning/Kruger won their IG Prize, the winner of the IG Physics Prize was Geim. Who then went on to win the actual Nobel Prize in 2010.

  5. The article labels the graphs accurately, and it’s not really fair to call them “fake” just because so many people (myself included) have treated the Y axis as if it is doing anything other than marking the bottom quarter, the next higher quarter, the next higher, and the top quarter.

  6. You’ve still missed the fact the ORIGINAL graphs are fake. The Y axes aren’t valid indicators of competence. They’re percentile rankings, which is why they yield the diagonal lines labeled “actual test score” – which don’t reflect actual test score at all. That is, no matter how well the subjects do on the test in absolute terms, the percentiles only reference positional differences within the domain. Someone sits at the bottom, whether they had a raw score of 20/100 or 90/100. Given that the subjects for tests were drawn from undergrad classes at an Ivy League university, we can guess that there actually was very little difference between the high and low raw scores, at least in comparison to the general population. There’s no reason to believe any of the test subjects were truly “unskilled” at all. Nor is it at all surprising that they would be unaware of where they rank viz a viz other students who took the tests. They probably always did well in high school, bested the vast majority of their peers there, and were regularly praised for academic excellence. So, is it really their assessments of their performance they’re reporting, or the assessments of the adult experts from wherever they hail?

  7. Why even bother? The Dunning Kruger effect was debunked years ago.
    Do you not even read the Wikipedia page before you post? Talk about doing ZERO research. ZERO ZILCH NADA ZIP.

    • Um… this blog entry is all about how the popular conception of the DK effect is wildly inaccurate. That’s why I bothered to write this post — to acknowledge that I, as a non-expert, had been influenced by and even helped to spread the false narrative.

      P.S. The Wikipedia page says nothing about the D-K Effect being debunked years ago.

      • It wasn’t “debunked”, there were many studies done and they all showed similar results. The tests were done in an Asian country (let’s call it Japan. lol), and the opposite of the Dunning Kruger effect had the opposite results. Many people in various fields have written about it. It isn’t that it is debunked, it just isn’t all that useful. However, pop culture seems to have found a use for it, in that there are people who brag excessively about their intelligence and abilities yet are clearly not as intelligent as they think, and bragging that you know more about nearly every topic than the experts… that wins a Dunning Kruger effect award. Only arrogant, insecure people who are malignant narcissist and the like would qualify for the pop culture remodel of the DKE.

Leave a Reply

Your email address will not be published. Required fields are marked *