Critikid Logo

A Modern Guide to Thinking, Fast and Slow

Return to Guide

Part II - Heuristics and Biases

  1. The Law of Small Numbers
  2. Anchors
  3. The Science of Availability
  4. Availability, Emotion, and Risk
  5. Tom W’s Specialty
  6. Linda: Less Is More
  7. Causes Trump Statistics
  8. Regression to the Mean
  9. Taming Intuitive Predictions

Chapter 10: The Law of Small Numbers

Overview
Small samples are less reliable than large ones and often produce more extreme results purely by chance. For instance, both the lowest and highest cancer rates are found in small rural counties. A common mistake is to assume that small samples are just as trustworthy as large ones. Kahneman calls this the belief in the law of small numbers. Closely related is the clustering illusion: our tendency to see streaks or clusters in random data as meaningful rather than inevitable outcomes of small samples.

Replication & Reliability

  • Belief in the law of small numbers (Tversky & Kahneman, 1971): A 2023 replication of the 1971 study (still in preprint) replicated most findings except the claim that people tend to underestimate the sample size required for a replication.
  • The “hot hand” illusion (Gilovich, Vallone, & Tversky, 1985): The conclusion that "hot hands" are an illusion has been challenged numerous times. Many recent analyses, such as a 2021 study based on 34 years of NBA data, show evidence of real streaks. However, the effects do tend to be smaller than players and fans believe. A fascinating 2016 paper demonstrates that Gilovich used a biased estimation procedure.

Recommendation

This chapter is important. The belief in the law of small numbers explains why flashy findings in small studies are often misleading. While the “hot hand” effect is controversial, the broad point that we underestimate how much variability naturally arises when working with small samples holds up.

Chapter 11: Anchors

Overview
The anchoring effect is a cognitive bias in which people place too much weight on the first piece of information they encounter (the “anchor”) when making judgments or decisions. This can happen even if the anchor is random and obviously irrelevant to the decision or judgement.

Replication & Reliability

  • Classic anchoring: Numerous studies since the 1970s confirm robust anchoring effects across contexts. A 2023 literature review of anchoring studies found that “the anchoring effect is a pervasive judgment bias when people are in an uncertain decision-making environment.”
  • Adjustment mechanism (Epley & Gilovich): A 2022 preregistered replication of three Epley & Gilovich studies found that anchoring effects were robust, but found no evidence that need for cognition, cognitive load, and forewarning moderate anchoring: “In line with recent replication efforts, we found that anchoring effects were robust, but the findings on moderators of anchoring effects should be treated with caution.” Note: It is still in preprint.

I haven’t found replications of the real-estate agents' valuation study or the German judges & loaded dice sentencing study, nor direct replications of the Strack & Mussweiler experiments. If you know of any, please email me.

Recommendation

The anchoring effect itself is supported by robust evidence, but some of the striking real-world illustrations from this chapter (real-estate agents, judges) have not been extensively studied and should be treated as provisional.

Chapter 12: The Science of Availability

Overview
When judging how common or likely something is, people often answer a substitute question: “How easily do examples come to mind?” This is called the availability heuristic. Vivid news, personal anecdotes, and memorable events impact our judgment more than base rates. How easily examples come to mind is more influential than how many examples one can think of.

Replication & Reliability

  • Marriage contributions: This part comes from a 1979 study by Ross & Sicoly. They found that individuals remember their own contributions to a shared product more easily (not only in terms of marriage, but also on basketball teams and in laboratory groups). While I have not found an exact replication of this study, numerous later studies support the general principle that people tend to accept more responsibility for a group's outcome than other members attribute to them. Interestingly, a 2005 study found people’s tendency to do this can be attenuated “when they ‘unpack’ their collaborators, conceptualizing them as separate individuals, rather than as ‘the rest of the group.’” A 2016 study found that the effect increases with group size.
  • Schwarz and colleagues’ 1991 study about recalling assertiveness: This study found that the ease of recalling examples mattered more than the number recalled, but it had a small sample size and statistically weak results. The problems with this study are explained in more depth here. However, there have been many studies that support this basic effect. A 2017 meta-analysis confirmed the effect is real, but the average size is smaller and more variable than early studies suggested. It also found evidence of publication bias.

Recommendation
The studies Kahneman cites point to real psychological effects: people do overestimate their own role in groups, and the ease of recalling examples shapes judgments. But these effects are smaller than the book suggests.

Chapter 13: Availability, Emotion, and Risk

Overview
Kahneman extends availability to risk assessment: vivid and emotionally triggering disasters feel more likely. Public fear and media attention can feed each other into availability cascades, pushing policy toward the most memorable risks rather than the biggest ones.

Replication & Reliability

  • Post-disaster insurance behavior: Robust. Kunreuther’s early research showing that people’s willingness to buy disaster insurance spikes right after a disaster has been replicated. For example, Browne & Hoyt, 2000 and Gallagher, 2014 confirmed the “disaster memory” pattern in U.S. insurance data. Economists call it the recency effect in risk perception.

  • Causes-of-death estimates: The original 1978 study by Lichtenstein and colleagues showed that people vastly overestimated the frequency of vivid, dramatic, or media-covered causes of death (accidents, tornadoes) and underestimated mundane ones (strokes, asthma). A 2024 analysis of replications submitted findings to Bayesian statistical analysis and found that “All datasets indicated very strong evidence for an overrepresentation of dramatic risks and an underrepresentation of nondramatic risks in media coverage. However, a reliable overestimation (underestimation) of dramatic (nondramatic) risks in people's frequency judgments emerged only in Lichtenstein et al.'s dataset; it did not replicate in the other datasets. In fact, aggregated across all datasets, there was evidence for the absence of a differential distortion of dramatic and nondramatic causes of death in people's risk frequency judgments. Additional analyses suggest that when judging risk frequency, people rely on samples from their personal social networks rather than from the media. The results reveal a limited empirical basis for the common notion that distortions in people's risk judgments echo distortions in media coverage.”

  • Affect heuristic: The 2000 study by Slovic and colleagues that found that when people are made to read about a technology's benefits, they judge it as less risky (and vice versa) has received multiple follow-ups. For example, Keller et al. found similar results in 2006, and Siegrist et al. found in 2007 that affective evaluations predict perceived risk across domains from chemicals to nuclear energy to climate change.

Recommendation
Post-disaster insurance spikes are well replicated, and the affect heuristic reliably shifts risk judgments. The causes-of-death pattern is less clear: media spotlight dramatic risks, but newer research doesn’t consistently find the same bias in people’s estimates and suggests judgments may reflect personal experience (social networks) more than media headlines.

Chapter 14: Tom W’s Specialty

Overview
Representativeness (stereotype fit) leads people to neglect base rates (underlying statistical information).

Replications & Reliability

  • Tom W. / Representativeness vs. base rates: In Stengard et al.’s 2022 study, they note, “While often replicated, almost all evidence for the phenomenon [of base rate neglect] comes from studies that used problems with extremely low base rates, high hit rates, and low false alarm rates.” They found that “the effect known as ‘base-rate neglect’ generalizes to a large set of reasoning problems, but varies largely across participants and may need a reinterpretation in terms of the underlying cognitive mechanisms.” In 2013, Pennycook et al. found that “base-rates, while typically underweighted or neglected, do not require Type 2 processing and may, in fact, be accessible to Type 1 processing.”
  • Schwarz’s “think like a statistician” vs. “think like a clinician” study: While I haven’t found a direct replication, a 2016 study found that prompting deliberation similarly boosts base-rate use, which supports the original framing effect.
  • “Frowning” increases vigilance on Tom W. (Harvard undergraduates): I could not find the original study, much less any replications. If you know the source, please email me.

Recommendation
Base rate neglect is generally supported, but newer work shows it varies by problem setup and person, and base rates can sometimes be processed intuitively once made clear. The “Tom W.” activity is an interesting illustration of representativeness, but the exercise is dated because computer science is a more popular field than when it was written. Flag the Harvard “frowning” undergraduate result as unverified.

Chapter 15: Linda: Less Is More

Overview
This chapter introduces the conjunction fallacy. A detailed story (“Linda is a feminist bank teller”) may feel more likely than the simpler one (“Linda is a bank teller”), even though adding detail can only lower probability.

Replications & Reliability

  • The “Linda” example: As Kahneman mentioned in the text, this example has been heavily criticized. Hertwig argues many read probable as “plausible/evidential” and that conversational cues (and even the word and) matter. Peter Lewinski wrote a commentary offering more nuance.
  • “Less-is-better effect”: In a 2023 study, Vonasch and colleagues performed a “mostly successful close replication” of Christopher Hsee’s 1998 dinnerware study.

Recommendation
Read about the Linda problem with the criticisms in mind. The dinnerware “less-is-better” example held up well to a recent replication attempt.

Chapter 16: Causes Trump Statistics

Overview
People often underweight statistical base rates (broad facts about a group) but not causal base rates. Students of psychology struggle to apply surprising statistical information when drawing conclusions about specific people, but are more easily able to use information about specific people to draw generalizations about human behavior.

Replications & Reliability

  • Statistical vs. causal base rates: I have not found any direct replications of Ajzen's Yale exam study, but other experiments have given conceptual support to the idea that when people are given causal base rates they are able to use statistics more appropriately, such as Krynski & Tenenbaum, 2007 and Hayes & Barber, 2018.
  • The "helping study" conducted in 1968 showed what's commonly called the bystander effect. The bystander effect has been studied for decades, and it's a highly replicated psychological effect. As Hortensius and de Gelder note in their 2018 paper, it has been observed in various contexts: "This pattern is observed during serious accidents (Harris & Robinson, 1973), noncritical situations (Latané & Dabbs, 1975), on the Internet (Markey, 2000), and even in children (Plötner, Over, Carpenter, & Tomasello, 2015)." However, in 2020, a large-scale observational study using CCTV footage from various countries found that in 9 of 10 public conflicts, at least 1 bystander, but typically several, will do something to help.
  • Nisbett & Borgida's 1975 study showing that students underused a known low helping base rate when judging individual participants: This was replicated by Wells and Harvey in a 1977 study. If anyone is aware of a more modern replication or follow-up, let me know.

Recommendations
This chapter is important. The core claim that people use causal base rates more readily than statistical ones has been supported by further research.

Chapter 17: Regression to the Mean

Overview
Extreme results are usually followed by more ordinary ones. This is called regression to the mean. This drift toward average is not evidence of a causal relationship, but we may be inclined to invent one.

Replications & Reliability
There are no studies to examine here. Regression to the mean isn’t a claim that needs to “hold up”—it’s a statistical phenomenon. The anecdotes in this chapter are simply examples of how people misread regression and attach causal stories to what is just a predictable return toward average.

Recommendations
Treat this chapter as a statistics-first correction to causal storytelling.

Chapter 18: Taming Intuitive Predictions

Overview
Our intuitive predictions often rely on substitution and intensity matching, making us feel confident even when the predictions are based on weak evidence. Moreover, intuitive predictions tend to neglect regression to the mean.

Replications & Reliability
The only study mentioned in this chapter was the one in which Kahneman and Tversky asked participants to evaluate descriptions of college freshmen. This was from their 1973 paper, "On the Psychology of Prediction". A stage 1 preregistered replication was launched in 2024. No data has been collected as of the publication of this post.

Recommendations
The key lesson to separate evaluation from prediction and regress forecasts toward the mean holds up and is broadly supported by modern statistical thinking. Treat the freshmen-description study as a clear illustration but weigh it cautiously for evidential strength; once the preregistered replication reports, we'll have a better basis for how to approach and cite that example.

Return to Guide


Courses

Fallacy Detectors

Fallacy Detectors

Develop the skills to tackle logical fallacies through a series of 10 science-fiction videos with activities. Recommended for ages 8 and up.

US$15

Social Media Simulator

Social Media Simulator

Teach your kids to spot misinformation and manipulation in a safe and controlled environment before they face the real thing. Recommended for ages 9 and up.

US$15

A Statistical Odyssey

A Statistical Odyssey

Learn about common mistakes in data analysis with an interactive space adventure. Recommended for ages 12 and up.

US$15

Logic for Teens

Logic for Teens

Learn how to make sense of complicated arguments with 14 video lessons and activities. Recommended for ages 13 and up.

US$15

Emotional Intelligence

Emotional Intelligence

Learn to recognize, understand, and manage your emotions. Designed by child psychologist Ronald Crouch, Ph.D. Recommended for ages 5 to 8.

US$10

Worksheets

Logical Fallacies Worksheets and Lesson Plans

Logical Fallacies Worksheets and Lesson Plans

Teach your grades 3-7 students about ten common logical fallacies with these engaging and easy-to-use lesson plans and worksheets.

US$10

Symbolic Logic Worksheets

Symbolic Logic Worksheets

Worksheets covering the basics of symbolic logic for children ages 13 and up.

US$5

Elementary School Worksheets and Lesson Plans

Elementary School Worksheets and Lesson Plans

These lesson plans and worksheets teach students in grades 2-5 about superstitions, different perspectives, facts and opinions, the false dilemma fallacy, and probability.

US$10

Middle School Worksheets and Lesson Plans

Middle School Worksheets and Lesson Plans

These lesson plans and worksheets teach students in grades 5-8 about false memories, confirmation bias, Occam’s razor, the strawman fallacy, and pareidolia.

US$10

High School Worksheets and Lesson Plans

High School Worksheets and Lesson Plans

These lesson plans and worksheets teach students in grades 8-12 about critical thinking, the appeal to nature fallacy, correlation versus causation, the placebo effect, and weasel words.

US$10

Statistical Shenanigans Worksheets and Lesson Plans

Statistical Shenanigans Worksheets and Lesson Plans

These lesson plans and worksheets teach students in grades 9 and up the statistical principles they need to analyze data rationally.

US$10