Reflections on judgmental ability
Robin M. Hogarth
Universitat Pompeu Fabra, Barcelona
Speech prepared for EADM’s lifetime achievement award (August 20, 2023).
First, let me say how honored I am to receive this award for lifetime contributions. However, although I am the person who receives the award, I would not be in this position if I hadn’t – over the years — had such a remarkable group of colleagues and students with whom to collaborate. So, the award is due to the many colleagues and students who illuminated my professional career, and I thank them for sharing my journey with them.
Over the years, the SPUDM conferences have provided a means to meet colleagues and to share findings and ideas and it has always been my favorite, especially as it takes place in an international context. The first SPUDM conference that I attended was 50 years ago, in Rome in 1973 and I think it is appropriate to reflect on that event today.
Rome 1973 was hosted by the legendary Bruno de Finetti who had set a goal for SPUDM in Hamburg in 1969 by stating that
“The true….subjective probability….problem consists in the investigations concerning the ways in which probabilities are assessed by more or less educated people and the way in which such abilities may be improved. This seems to me the field in which the cooperation between all specialists concerned is most wanted, and that is particularly true for the expected contributions from psychologists.”
I will come back to this quote at the end of this talk.
Accommodation in Rome was arranged in student dormitories, which were modest, to say the least, and the weather was hot and damp. There were about 25 participants and the sessions all took place in the same lecture hall of the University of Rome. There was just one stream of talks.
For me, it was an exciting experience – meeting people whose work I had read and admired, getting to know other researchers, and starting to make friendships with several people with whom I have continued to interact over the years. I came away from the conference in a buzz, with an excited feeling that decision research was the future…….
And so it was!
For the first paper presented at the conference was the famous 1974 Science paper by Amos Tversky and Daniel Kahneman on “heuristics and biases.” After papers, there were questions and discussion. And I remember well the gist of the first question posed to Tversky and Kahneman. It was “If you think that people make mistakes like your experimental subjects do, how do you exclude the possibility that you yourselves are not making mistakes in asserting this?” Tversky briefly acknowledged the pertinence of the question and Kahneman replied that they had followed all the scientific recommendations for testing hypotheses for different results.
Now, you may ask why I recount this incident. Well, first it was I who asked the question, and second it raises the issue that I want to address in this brief talk, namely “How good is human judgment?” What can we say 50 years after 1973?
Let me start by asking what we know.
• Humans have limited memory and information processing capacity both of which emphasize the importance of attention (Simon).
• Importance of task characteristics and especially feedback – differences between kind and wicked learning environments.
• Two modes of thought…tacit (System 1) and deliberate (System 2).
• On the other hand, humans have remarkable abilities for recognition (Goldstein & Gigerenzer) and a capacity to recall the frequency of past experiences (Hasher & Zacks).
As an example of the latter, imagine that somebody asks you how often you have been to the cinema in the last six months. The important result is that most people experience little difficulty in answering the question even though they had made no prior effort to recall the frequency of cinema visits.
• Much of human learning is tacit (Reber).
Major concern: How good (overall) may not be the right question – perhaps it is better to ask how good ability needs to be and therefore when you can expect to see effective or ineffective judgments.
In terms of task characteristics, one of the limitations of most research is that it is conducted within what I call a discrete model of judgments/decisions. That is, participants make a judgment, and this is subsequently seen to be correct or not. It is like firing at a target and then seeing how accurate the result is. (The target analogy is, incidentally, also used by Daniel Kahneman and his co-workers in the much-publicized book Noise).
A critical feature of this model is that at an individual level participants receive no feedback. They make their judgments and experimenters then determine how good they are. And yet this makes a difference. As Lejarraga and Hertwig (2022) document in a recent comprehensive review, there are important differences between inferences made in situations where people do or do not receive feedback.
In Hogarth and Soyer (2011), for example, we demonstrated how estimates were dependent on whether people answered probabilistic questions with or without feedback and showed that experience was superior to description.
Percentages of Correct Answers to Inferential Problems by Experimental Conditions
In this paper, we investigated how participants’ responses would vary depending on whether they answered a description or experienced a sample of observations. There were 2 groups of participants, “sophisticated” and “naïve”, the former being paid students and the latter volunteers from acquaintances. Each respondent saw 2 versions of the same 7 problems and we saw how responses varied when they relied on analysis or experience. (I skip the experimental design explanation.)
The table shows the responses to 2 of the 7 problems. As can be seen, responses based on experience are almost perfect for the Bayesian updating problem (98%) and in the majority for the birthday problem (60%).
Moving on from these results on differences between description and experience, I would like to make the claim that many judgments can be better described as being of a continuous as opposed to a discrete nature. Let me provide some examples. Imagine that you want to leave the lecture hall and need to select a route from your seat to the door. You could treat this as a discrete task that would involve the selection and use of a particular path that you would follow. However, it is more likely that on leaving your seat you would make a rough estimate of the direction of the door and update this judgment as you advance to the exit.
Indeed, the judgmental processes involved in walking, driving, and even conversations follow a model of a series of rough judgments made across time as a function of the feedback received on actions. It reminds one of the old joke about a Japanese airplane pilot who ditched his plane some 100 meters from the runway and into the sea at the San Francisco airport. When asked why he missed the runway, his reply was clear: “Considering that I flew from Tokyo, missing the runway at SFO by a few hundred meters was not bad.”
In short, in continuous judgmental tasks, people make a series of judgmental adjustments that allows them to reach their goal – the key point being that they are only really committed by the last adjustment that they make.
We can clarify these ideas by using the target analogy where judgment can be likened to aiming at a target, and choice represents the selection of a particular trajectory (i.e., shooting). For example, consider Figure 1 and imagine that you are at point A. The precise target is the point D, and the line BC shows the amount of allowable error.
As concrete examples, imagine predicting future values of an economic variable (e.g., sales, gross national product) or level of success in a job or graduate school. In both cases, choice is represented by the selection of a particular level of the variable.
Now consider the probability of hitting the target in Figure 1 at random (i.e., in the absence of predictive ability). Provided that you are “pointed” in the appropriate direction — indicated in Figure 1 by drawing EF parallel to BC—the probability of hitting the target BC at random in a discrete incident is equal to the ratio of the angle a (i.e., BAC) to 180°. However, a is a function of BC (size of target) to AD (distance from target). Therefore, the probability of hitting the target without predictive ability is an increasing function of BC/AD (i.e., size of target/distance from target).
For example, when the ratio of target to distance is .10, the probability of hitting the target at random in a discrete incident is .032. However, starting from the same point, a continuous process represents a much easier task. By simply moving toward the target and checking periodically on direction, the judge can transform the initial probability of .032 to almost a certainty, without exercising much predictive ability.
As a concrete example, imagine a sensitive interview where one would not want to pose a direct question before acquiring a more complete appreciation of the situation. This can be illustrated by comparing Figures 1 and 2.
Consider the effects of recognizing that the positions of the cues L and M provide information; in particular, that an action to the right of L, and to the left of M, is appropriate. In this case, the probability of hitting the target at random conditional on this knowledge is a divided by the angle LAM. Thus, the joint validity of the cues L and M can be measured by the
differential predictive information they provide over the base rate.
Now consider Figure 2 in which the person has moved from position A to position
X (i.e., after receiving feedback from initial directing questions). This has two effects: first, the base rate of hitting the target has increased, since angle BXC is greater than angle BAG; second, the predictive validities of L and M have decreased relative to Figure 1 (angle LXM is greater than angle LAM). Of course, it is unclear that as a person moves toward a target he or she would continue to rely on the same cues. In an interview, for instance, one formulates new questions due to prior feedback.
How appropriate is the continuous model? It certainly applies to many simple and even complex motor tasks (e.g., walking through a door or skiing), as well as more cognitive tasks that induce feedback through actions. For example, monitoring the performance of a trainee in a new job provides intermediate feedback that can be used to modify predictions and/or actions. More importantly, the continuous model highlights two critical dimensions of judgmental achievement: (a) the degree of commitment implied by actions and (b) that the availability and interpretation of feedback are often more important than predictive ability per se.
This all leads me to two questions:
Question 1: What proportion of our decisions is made in continuous mode?
Question 2: How aware are we of feedback?
Unfortunately I cannot answer the first question. It is hard to estimate the proportion of decisions that are taken in continuous mode. But, on reflection, it must be quite large.
To illuminate the second question, several years ago I conducted a study of decisions taken by business managers and undergraduate students using the experience sampling method (ESM). The task faced by the participants involved responding to SMS messages by describing the latest decision they had made together with information about feedback. Results showed that participants received or expected to receive feedback for only 60% of their decisions. In other words, much of their decision-making activities involved no feedback, of which they were consciously aware. This therefore suggests, inter alia, that tacit processes play a large role in the interpretation of feedback.
From a research perspective, what are the effects of adopting a continuous as opposed to a discrete perspective?
The continuous perspective is not a maximizing paradigm, and it opens the question as to what rationales people invoke when making decisions in this way.
One view is that people can be thought of as managing a portfolio of their “assets” and the rationale behind their decisions is “to stay in the game.”
Evidence supporting this hypothesis was provided by Ronen (1973) in a simple gambling task. Ronen presented participants with the choice between two mutually exclusive actions characterized by identical (positive) payoffs and probabilities of success. The payoffs, however, depended on the outcomes of two sequential events where the “paths” to success differed in that one had a higher first-stage probability of success but a lower second-stage probability. These probabilities were known to the subjects.
Since the expected utilities of the actions were equal, people should have been indifferent to choosing between them. However, they were not. Subjects largely favored the action with the greater first-stage probability.
Why? A plausible hypothesis is that people act with the immediate goal of reaching a better position from which to reach the ultimate outcome. Moreover, if people are accustomed to environments changing over time, less confidence should be given to the second stage probability. Thus, staying in the game as long as possible favors the action with the larger first-stage probability of success.
It also follows, of course, that if losses are at stake, people will seek to withdraw from the situation as soon as possible. And this was the case in the experiments.
Reflecting on discrete-continuous issues, I was led to imagine that the essence of decision making was how people manage portfolios of assets across time. This task can be captured by imagining attempts to cross a “foggy mine field.” There are multiple paths that can be taken, mines that can and cannot be detected, and rewards that can be achieved with different chances of success.
Indeed, this is like life itself. We all manage portfolios that include financial assets, professional performance, health, and happiness, and must deal with outcomes in all areas continuously. With some colleagues, we went as far as developing an experimental version of this paradigm which we named “the life game.” We had hoped to investigate how people would react to different situations in computer simulations of the life game. However, our experimental participants rated the task as “too boring” and, and this combined with some other issues led us to abandon the project.
And so where do we now stand with respect to the assessment of human decision-making ability?
First, there are clearly situations where people are biased according to the discrete paradigm. But there is some variation between tasks.
Second, the importance of task characteristics is reinforced by considering the effects of the continuous paradigm. This shows that people`s judgments can be trusted in many situations. How much, however, is not clear and needs more research.
All the above issues suggest some implications for future research.
First, are there ways to improve forecasts?
Transform discrete tasks into a continuous format.
Evidence from “superforecasters”(delaying responses and updating frequently)
City planning
Managing interviews
Other tasks…..
Second, can we use the continuous model to improve understanding of diagnostic tasks?
Third, the work probably relates to a recently proposed framework for decisions from experience (Olschewski et al., 2023).
Finally, what can we say about the validity of human judgment and respond to de Finetti’s challenge that I quoted at the beginning of this talk?
• Cognitive biases in probabilistic tasks are real but can be overcome by specific training, e.g., systematically using base rates, or averaging judgments to eliminate noise.
• We cannot make an overall assessment because it is not clear what the population of judgments is. There are multiple judgments where people use task characteristics to make judgment feasible and we don’t know enough about the universe of these judgments.
• The human judgmental system is extraordinarily adaptable.