Monday, August 24, 2020

Rational life plans and the stopping problem

Image: a poor solution to the stopping problem

In earlier posts I discussed the question of "rational plans of life" (link, link, link, link) and argued that standard theories of rational decision making under uncertainty don't do well in this context. I argued instead that rationality in navigating and building a life is not analogous to remodeling your kitchen; instead, it involves provisional clarification of the goals and values that one embraces, and then a kind of step-by-step, self-critical direction-setting in the choices that one makes over time in ways that honor these values.

Brian Christian and Tom Griffiths' Algorithms to Live By: The Computer Science of Human Decisions provides a very interesting additional perspective on this problem of living a life. The authors describe the algorithms that computer science has discovered to handle difficult choice problems, and they make an effort to both explain (generally) how the problem is solved formally and how it finds application in ordinary situations of human decision-making over an extended time -- such as the challenging question of where to stop for a meal on a long road trip, or which candidate to hire as an executive assistant.

The key features of decision-making that drive much of their discussion are time and uncertainty. We often have to make decisions and choices among options where we do not know the qualities of the items on offer (restaurants to consider for a special meal, individuals who are prospective friends, who to hire for an important position), the likelihood of success of a given item, and where we often cannot return to a choice we've already rejected. (If we are driving between Youngstown and Buffalo there are only finitely many restaurants where we might stop for a meal; but once we've passed New Bangkok Restaurant at exit 50 on the interstate, we are unlikely to return when we haven't found a better choice by exit 55.)

The stopping problem seems relevant to the problem of formulating a rational plan of life, since the stream of life events and choices in a person's life is one-directional, and it is rare to be able to return to an option that was rejected at a prior moment. In hindsight -- should I have gone to Harvard for graduate school, or would Cornell or Princeton have been a better choice? The question is literally pointless; it cannot be undone. Life, like history, proceeds in only one direction. Many life choices must be made before a full comparison of the quality of the options and the consequences of one choice or another can be fully known. And waiting until all options have been reviewed often means that the earlier options are no longer available -- just like that Thai restaurant on the Ohio Turnpike at exit 50.

The algorithms that surround the stopping problem have a specific role in decision-making in ordinary life circumstances: we will make better decisions under conditions of uncertainty and irreversibility if we understand something about the probabilities of the idea that "a better option is still coming up". We need to have some intuitive grasp of the dialectic of "exploration / exploitation" that the stopping problem endorses. As Christian and Griffiths put it, "exploration is gathering information, and exploitation is using the information you have to get a known good result" (32). How long should we continue to gather information (exploration) and at what point should we turn to active choice ("choose the next superior candidate that comes along")? If a person navigates life by exploring 90% of options before choosing, he or she is likely to do worse than less conservative decision-makers; but likewise about the person who chooses after seeing 5% of the options.

There is a very noticeable convergence between the algorithms of stopping and Herbert Simon's theory of satisficing (link). (The authors note this parallel in a footnote.) Simon noted that the heroic assumptions of economic rationality are rarely satisfied in actual human decision-making: full information about the probabilities and utilities associated with a finite range of outcomes, and choice guided by choosing that option with the greatest expected utility. He notes that this view of rationality requires an unlimited budget for information gathering, and that -- at some point -- the cost of further search outweighs the probably gains of finding the optimal solution. Simon too argues that rational decision-makers "stop" in their choices: they set a threshold value for quality and value, initiate a search, and select the first option that meets the threshold. "Good enough" beats "best possible". If I decide I need a pair of walking shoes, I decide on price and quality -- less than $100, all leather, good tread, comfortable fit -- and I visit a sequence of shoe stores, with the plan of buying the first pair of shoes that meets the threshold. But the advantage of the search algorithm described here is that it does not require a fixed threshold in advance, and it would appear to give a higher probability of making the best possible choice among all available options. As a speculative guess, it seems as though searches guided by a fixed threshold would score lower over time than searches guided by a balanced "explore, then exploit" strategy, without the latter being overwhelmed by information costs.

In one of the earlier posts on "rational life plans" I suggested that rationality comes into life-planning in several different ways:
We might describe this process as one that involves local action-rationality guided by medium term strategies and oriented towards long term objectives. Rationality comes into the story at several points: assessing cause and effect, weighing the importance of various long term goals, deliberating across conflicting goals and values, working out the consequences of one scenario or another, etc. (link)
The algorithms of stopping are clearly relevant to the first part of the story -- local action-rationality. It is not so clear that the stopping problem arises in the same way in the other two levels of life-planning rationality. Deliberation about longterm objectives is not sequential in the way that deciding about which highway exit to choose for supper is; rather, the deliberating individual can canvas a number of objectives simultaneously and make deliberative choices among them. And choosing medium-term strategies seems to have a similar temporal logic: identify a reasonable range of possible strategies, compare their strengths and weaknesses, and choose the best. So the stopping problem seems to be relevant to the implementation phase of living, not the planning and projecting parts. We don't need the stopping algorithm to decide to visit the grandchildren in Scranton, or in deciding which route across the country to choose for the long drive; but we do need it for deciding the moment-to-moment options that arise -- which hotel, which restaurant, which stretch of beach, which tourist attraction to visit along the way. This seems to amount to a conclusion: the stopping problem is relevant to a certain class of choices that come as an irreversible series, but not relevant to deliberation among principles, values, or guiding goals.

(Christian and Griffiths describe the results of research on the stopping problem; but the book does not give a clear description of how the math works. Here is a somewhat more detailed explanation of the solution to the stopping problem in American Scientistlink. Essentially the solution -- wait and observe for the first 37% of options, then taken the next option better than any of those seen to date -- follows from a calculation of the probability of the distribution of "best choices" across the random series of candidates. And it can be proven that both lower and higher thresholds -- less exploration or more exploration -- lead to lower average payoffs.)

No comments: