Sunday, September 30, 2018

Philosophy and the study of technology failure

image: Adolf von Menzel, The Iron Rolling Mill (Modern Cyclopes)

Readers may have noticed that my current research interests have to do with organizational dysfunction and largescale technology failures. I am interested in probing the ways in which organizational failures and dysfunctions have contributed to large accidents like Bhopal, Fukushima, and the Deepwater Horizon disaster. I've had to confront an important question in taking on this research interest: what can philosophy bring to the topic that would not be better handled by engineers, organizational specialists, or public policy experts?

One answer is the diversity of viewpoint that a philosopher can bring to the discussion. It is evident that technology failures invite analysis from all of these specialized experts, and more. But there is room for productive contribution from reflective observers who are not committed to any of these disciplines. Philosophers have a long history of taking on big topics outside the defined canon of "philosophical problems", and often those engagements have proven fruitful. In this particular instance, philosophy can look at organizations and technology in a way that is more likely to be interdisciplinary, and perhaps can help to see dimensions of the problem that are less apparent from a purely disciplinary perspective.

There is also a rationale based on the terrain of the philosophy of science. Philosophers of biology have usually attempted to learn as much about the science of biology as they can manage, but they lack the level of expertise of a research biologist, and it is rare for a philosopher to make an original contribution to the scientific biological literature. Nonetheless it is clear that philosophers have a great deal to add to scientific research in biology. They can contribute to better reasoning about the implications of various theories, they can probe the assumptions about confirmation and explanation that are in use, and they can contribute to important conceptual disagreements. Biology is in a better state because of the work of philosophers like David Hull and Elliot Sober.

Philosophers have also made valuable contributions to science and technology studies, bringing a viewpoint that incorporates insights from the philosophy of science and a sensitivity to the social groundedness of technology. STS studies have proven to be a fruitful place for interaction between historians, sociologists, and philosophers. Here again, the concrete study of the causes and context of large technology failure may be assisted by a philosophical perspective.

There is also a normative dimension to these questions about technology failure for which philosophy is well prepared. Accidents hurt people, and sometimes the causes of accidents involve culpable behavior by individuals and corporations. Philosophers have a long history of contribution to these kinds of problems of fault, law, and just management of risks and harms.

Finally, it is realistic to say that philosophy has an ability to contribute to social theory. Philosophers can offer imagination and critical attention to the problem of creating new conceptual schemes for understanding the social world. This capacity seems relevant to the problem of describing, analyzing, and explaining largescale failures and disasters.

The situation of organizational studies and accidents is in some ways more hospitable for contributions by a philosopher than other "wicked problems" in the world around us. An accident is complicated and complex but not particularly obscure. The field is unlike quantum mechanics or climate dynamics, which are inherently difficult for non-specialists to understand. The challenge with accidents is to identify a multi-layered analysis of the causes of the accident that permits observers to have a balanced and operative understanding of the event. And this is a situation where the philosopher's perspective is most useful. We can offer higher-level descriptions of the relative importance of different kinds of causal factors. Perhaps the role here is analogous to messenger RNA, providing a cross-disciplinary kind of communications flow. Or it is analogous to the role of philosophers of history who have offered gentle critique of the cliometrics school for its over-dependence on a purely statistical approach to economic history.

So it seems reasonable enough for a philosopher to attempt to contribute to this set of topics, even if the disciplinary expertise a philosopher brings is more weighted towards conceptual and theoretical discussions than undertaking original empirical research in the domain.

What I expect to be the central finding of this research is the idea that a pervasive and often unrecognized cause of accidents is a systemic organizational defect of some sort, and that it is enormously important to have a better understanding of common forms of these deficiencies. This is a bit analogous to a paradigm shift in the study of accidents. And this view has important policy implications. We can make disasters less frequent by improving the organizations through which technology processes are designed and managed.

Thursday, September 27, 2018

James Scott on the earliest states


In 2011 James Scott gave a pair of Tanner Lectures at Harvard. He had chosen a topic for which he felt he had a fairly good understanding, having taught on early agrarian societies throughout much of his career. The topic was the origins of the earliest states in human history. But as he explains in the preface to the 2017 book Against the Grain: A Deep History of the Earliest States, preparation for the lectures led him into brand new debates, bodies of evidence, and theories which were pretty much off his personal map. The resulting book is his effort to bring his own understanding up to date, and it is a terrific and engaging book.

Scott gives a quick summary of the view of early states, nutrition, agriculture, and towns that he shared with most historians of early civilizations up through a few decades ago. Hunter-gatherer human groups were the primary mode of living for tens of thousands of years at the dawn of civilization. Humanity learned to domesticate plants and animals, creating a basis for sedentary agriculture in hamlets and villages. With the increase in productivity associated with settled agriculture, it was possible for nascent political authorities to collect taxes and create political institutions. Agriculture and politics created the conditions that conduced to the establishment of larger towns, and eventually cities. And humanity surged forward in terms of population size and quality of life.

But, as Scott summarizes, none of these sequences has held up to current scholarship.
We thought ... that the domestication of plants and animals led directly to sedentism and fixed-field agriculture. It turns out that sedentism long preceded evidence of plant and animal domestication and that both sedentism and domestication were in place at least four millennia before anything like agricultural villages appeared. (xi)
...
The early states were fragile and liable to collapse, but the ensuing "dark ages" may often have marked an actual improvement in human welfare. Finally, there is a strong case to be made that life outside the state -- life as a "barbarian" -- may often have been materially easier, freer, and healthier than life at least for nonelites inside civilization. (xii)
There is an element of "who are we?" in the topic -- that is, what features define modern humanity? Here is Scott's most general answer:
A sense, then, for how we came to be sedentary, cereal-growing, livestock-rearing subjects governed by the novel institution we now call the state requires an excursion into deep history. (3)
Who we are, in this telling of the story, is a species of hominids who are sedentary, town-living, agriculture-dependent subjects of the state. But this characterization is partial (as of course Scott knows); we are also meaning-makers, power-wielders, war-fighters, family-cultivators, and sometimes rebels. And each of these other qualities of humanity leads us in the direction of a different kinds of history, requiring a Clifford Geertz, a Michael Mann, a Tolstoy or a Marx to tell the story.

A particularly interesting part of the novel story about these early origins of human civilization that Scott provides has to do with the use of fire in the material lives of pre-technology humans -- hunters, foragers, and gatherers -- in a deliberate effort to sculpt the natural environment around then to concentrate food resources. According to Scott's readings of recent archeology and pre-agriculture history, human communities used fire to create the specific habitats that would entice their prey to make themselves readily available for the season's meals. He uses a strikingly phrase to capture the goal here -- reducing the radius of a meal. Early foragers literally reshaped the natural environments in which they lived.
What we have here is a deliberate disturbance ecology in which hominids create, over time, a mosaic of biodiversity and a distribution of desirable resources more to their liking. (40)
Most strikingly, Scott suggests a link between massive Native American use of fire to reduce forests, the sudden decline in their population from disease following contact with Europeans and consequent decline in burning, and the onset of the Little Ice Age (1500-1850) as a result of reduced CO2 production (39). Wow!

Using fire for cooking further reduced this "radius of the meal" by permitting early humans to consume a wider range of potential foods. And Scott argues that this innovation had evolutionary consequences for our hominid ancestors: human populations developed a digestive gut only one-third the length of that of other non-fire-using hominids. "We are a fire-adapted species" (42).

Scott makes an intriguing connection between grain-based agriculture and early states. The traditional narrative has it that pre-farming society was too low in food productivity to allow for sedentary life and dense populations. According to Scott this assumption is no longer supported by the evidence. Sedentary life based on foraging, gathering, and hunting was established several thousand years earlier than the development of agriculture. Gathering, farming, settled residence, and state power are all somewhat independent. In fact, Scott argues that these foraging communities were too well situated in their material environment to be vulnerable to a predatory state. "There was no single dominant resource that could be monopolized or controlled from the center, let alone taxed" (57). These communities generally were supported by three or four "food webs" that gave them substantial independence from both climate fluctuation and domination by powerful outsiders (49). Cereal-based civilizations, by contrast, were vulnerable to both threats, and powerful authorities had the ability to confiscate grain at the point of harvest or in storage. Grain made taxation possible.

We often think of hunter-gatherers in terms of game hunters and the feast-or-famine material life described by Marshall Sahlins in Stone Age Economics. But Scott makes the point that there are substantial ecological niches in wetlands where nutrition comes to the gatherers rather than the hunter. And in the early millennia of the lower Nile -- what Scott refers to as the southern alluvium -- the wetland ecological zone was ample for a very satisfactory and regular level of wellbeing. And, of special interest to Scott, "the wetlands are ungovernable" (56). (Notice the parallel with Scott's treatment of Zomia in The Art of Not Being Governed: An Anarchist History of Upland Southeast Asia.)

So who are these early humans who navigated their material worlds so exquisitely well and yet left so little archeological record because they built their homes with sticks, mud, and papyrus?
It makes most sense to see them as agile and astute navigators of a diverse but also changeable and potentially dangerous environment.... We can see this long period as one of continuous experimentation and management of this environment. Rather than relying on only a small bandwidth of food resources, they seem to have been opportunistic generalists with a large portfolio of subsistence options spread across several food webs. (59)
Later chapters offer similarly iconoclastic accounts of the inherent instability of the early states (like a pyramid of tumblers on the stage), the advantages of barbarian civilization, the epidemiology of sedentary life, and other intriguing topics in the early history of humanity. And pervasively, there is the under current of themes that recur often in Scott's work -- the validity and dignity of the hidden players in history, the resourcefulness of ordinary hominids, and the importance of avoiding the received wisdom of humanity's history.

Scott is telling a new story here about where we came from, and it is a fascinating one.

Tuesday, September 25, 2018

System safety


An ongoing thread of posts here is concerned with organizational causes of large technology failures. The driving idea is that failures, accidents, and disasters usually have a dimension of organizational causation behind them. The corporation, research office, shop floor, supervisory system, intra-organizational information flow, and other social elements often play a key role in the occurrence of a gas plant fire, a nuclear power plant malfunction, or a military disaster. There is a tendency to look first and foremost for one or more individuals who made a mistake in order to explain the occurrence of an accident or technology failure; but researchers such as Perrow, Vaughan, Tierney, and Hopkins have demonstrated in detail the importance of broadening the lens to seek out the social and organizational background of an accident.

It seems important to distinguish between system flaws and organizational dysfunction in considering all of the kinds of accidents mentioned here. We might specify system safety along these lines. Any complex process has the potential for malfunction. Good system design means creating a flow of events and processes that make accidents inherently less likely. Part of the task of the designer and engineer is to identify chief sources of harm inherent in the process -- release of energy, contamination of food or drugs, unplanned fission in a nuclear plant -- and design fail-safe processes so that these events are as unlikely as possible. Further, given the complexity of contemporary technology systems it is critical to attempt to anticipate unintended interactions among subsystems -- each of which is functioning correctly but that lead to disaster in unusual but possible interaction scenarios.

In a nuclear processing plant, for example, there is the hazard of radioactive materials being brought into proximity with each other in a way that creates unintended critical mass. Jim Mahaffey's Atomic Accidents: A History of Nuclear Meltdowns and Disasters: From the Ozark Mountains to Fukushima offers numerous examples of such unintended events, from the careless handling of plutonium scrap in a machining process to the transfer of a fissionable liquid from a vessel of one shape to another. We might try to handle these risks as an organizational problem: more and better training for operatives about the importance of handling nuclear materials according to established protocols, and effective supervision and oversight to ensure that the protocols are observed on a regular basis. But it is also possible to design the material processes within a nuclear plant in a way that makes unintended criticality virtually impossible -- for example, by storing radioactive solutions in containers that simply cannot be brought into close proximity with each other.

Nancy Leveson is a national expert on defining and applying principles of system safety. Her book Engineering a Safer World: Systems Thinking Applied to Safety is a thorough treatment of her thinking about this subject. She offers a handful of compelling reasons for believing that safety is a system-level characteristic that requires a systems approach: the fast pace of technological change, reduced ability to learn from experience, the changing nature of accidents, new types of hazards, increasing complexity and coupling, decreasing tolerance for single accidents, difficulty in selecting priorities and making tradeoffs , more complex relationships between humans and automation, and changing regulatory and public view of safety (kl 130 ff.). Particularly important in this list is the comment about complexity and coupling: "The operation of some systems is so complex that it defies the understanding of all but a few experts, and sometimes even they have incomplete information about the system's potential behavior" (kl 137).

Given the fact that safety and accidents are products of whole systems, she is critical of the accident methodology generally applied to serious industrial, aerospace, and chemical accidents. This methodology involves tracing the series of events that led to the outcome, and identifying one or more events as the critical cause of the accident. However, she writes:
In general, event-based models are poor at representing systemic accident factors such as structural deficiencies in the organization, management decision making, and flaws in the safety culture of the or industry. An accident model should encourage a broad view of accident mechanisms that expands the investigation beyond the proximate evens.A narrow focus on technological components and pure engineering activities or a similar narrow focus on operator errors may lead to ignoring some of the most important factors in terms of preventing future accidents. (kl 452)
Here is a definition of system safety offered later in ESW in her discussion of the emergence of the concept within the defense and aerospace fields in the 1960s:
System Safety ... is a subdiscipline of system engineering. It was created at the same time and for the same reasons. The defense community tried using the standard safety engineering techniques on their complex new systems, but the limitations became clear when interface and component interaction problems went unnoticed until it was too late, resulting in many losses and near misses. When these early aerospace accidents were investigated, the causes of a large percentage of them were traced to deficiencies in design, operations, and management. Clearly, big changes were needed. System engineering along with its subdiscipline, System Safety, were developed to tackle these problems. (kl 1007)
Here Leveson mixes system design and organizational dysfunctions as system-level causes of accidents. But much of her work in this book and her earlier Safeware: System Safety and Computers gives extensive attention to the design faults and component interactions that lead to accidents -- what we might call system safety in the narrow or technical sense.
A systems engineering approach to safety starts with the basic assumption that some properties of systems, in this case safety, can only be treated adequately in the context of the social and technical system as a whole. A basic assumption of systems engineering is that optimization of individual components or subsystems will not in general lead to a system optimum; in fact, improvement of a particular subsystem may actually worsen the overall system performance because of complex, nonlinear interactions among the components. (kl 1007)
Overall, then, it seems clear that Leveson believes that both organizational features and technical system characteristics are part of the systems that created the possibility for accidents like Bhopal, Fukushima, and Three Mile Island. Her own accident model designed to help identify causes of accidents, STAMP (Systems-Theoretic Accident Model and Processes) emphasizes both kinds of system properties.
Using this new causality model ... changes the emphasis in system safety from preventing failures to enforcing behavioral safety constraints. Component failure accidents are still included, but or conception of causality is extended to include component interaction accidents. Safety is reformulated as a control problem rather than a reliability problem. (kl 1062)
In this framework, understanding why an accident occurred requires determining why the control was ineffective. Preventing future accidents requires shifting from a focus on preventing failures to the broader goal of designing and implementing controls that will enforce the necessary constraints. (kl 1084)
Leveson's brief analysis of the Bhopal disaster in 1984 (kl 384 ff.) emphasizes the organizational dysfunctions that led to the accident -- and that were completely ignored by the Indian state's accident investigation of the accident: out-of-service gauges, alarm deficiencies, inadequate response to prior safety audits, shortage of oxygen masks, failure to inform the police or surrounding community of the accident, and an environment of cost cutting that impaired maintenance and staffing. "When all the factors, including indirect and systemic ones, are considered, it becomes clear that the maintenance worker was, in fact, only a minor and somewhat irrelevant player in the loss. Instead, degradation in the safety margin occurred over time and without any particular single decision to do so but simply as a series of decisions that moved the plant slowly toward a situation where any slight error would lead to a major accident" (kl 447).

Saturday, September 15, 2018

Patient safety


An issue which is of concern to anyone who receives treatment in a hospital is the topic of patient safety. How likely is it that there will be a serious mistake in treatment -- wrong-site surgery, incorrect medication or radiation dose, exposure to a hospital-acquired infection? The current evidence is alarming. (Martin Makary et al estimate that over 250,000 deaths per year result from medical mistakes -- making medical error now the third leading cause of mortality in the United States (link).) And when these events occur, where should we look for assigning responsibility -- at the individual providers, at the systems that have been implemented for patient care, at the regulatory agencies responsible for overseeing patient safety?

Medical accidents commonly demonstrate a complex interaction of factors, from the individual provider to the technologies in use to failures of regulation and oversight. We can look at a hospital as a place where caring professionals do their best to improve the health of their patients while scrupulously avoiding errors. Or we can look at it as an intricate system involving the recording and dissemination of information about patients; the administration of procedures to patients (surgery, medication, radiation therapy). In this sense a hospital is similar to a factory with multiple intersecting locations of activity. Finally, we can look at it as an organization -- a system of division of labor, cooperation, and supervision by large numbers of staff whose joint efforts lead to health and accidents alike. Obviously each of these perspectives is partially correct. Doctors, nurses, and technicians are carefully and extensively trained to diagnose and treat their patients. The technology of the hospital -- the digital patient record system, the devices that administer drugs, the surgical robots -- can be designed better or worse from a safety point of view. And the social organization of the hospital can be effective and safe, or it can be dysfunctional and unsafe. So all three aspects are relevant both to safe operations and the possibility of chronic lack of safety.

So how should we analyze the phenomenon of patient safety? What factors can be identified that distinguish high safety hospitals from low safety? What lessons can be learned from the study of accidents and mistakes that cumulatively lead to a hospitals patient safety record?

The view that primarily emphasizes expertise and training of individual practitioners is very common in the healthcare industry, and yet this approach is not particularly useful as a basis for improving the safety of healthcare systems. Skill and expertise are necessary conditions for effective medical treatment; but the other two zones of accident space are probably more important for reducing accidents -- the design of treatment systems and the organizational features that coordinate the activities of the various individuals within the system.

Dr. James Bagian is a strong advocate for the perspective of treating healthcare institutions as systems. Bagian considers both technical systems characteristics of processes and the organizational forms through which these processes are carried out and monitored. And he is very skilled at teasing out some of the ways in which features of both system and organization lead to avoidable accidents and failures. I recall his description of a safety walkthrough he had done in a major hospital. He said that during the tour he noticed a number of nurses' stations which were covered with yellow sticky notes. He observed that this is both a symptom and a cause of an accident-prone organization. It means that individual caregivers were obligated to remind themselves of tasks and exceptions that needed to be observed. Far better was to have a set of systems and protocols that made sticky notes unnecessary. Here is the abstract from a short summary article by Bagian on the current state of patient safety:
Abstract The traditional approach to patient safety in health care has ranged from reticence to outward denial of serious flaws. This undermines the otherwise remarkable advances in technology and information that have characterized the specialty of medical practice. In addition, lessons learned in industries outside health care, such as in aviation, provide opportunities for improvements that successfully reduce mishaps and errors while maintaining a standard of excellence. This is precisely the call in medicine prompted by the 1999 Institute of Medicine report “To Err Is Human: Building a Safer Health System.” However, to effect these changes, key components of a successful safety system must include: (1) communication, (2) a shift from a posture of reliance on human infallibility (hence “shame and blame”) to checklists that recognize the contribution of the system and account for human limitations, and (3) a cultivation of non-punitive open and/or de-identified/anonymous reporting of safety concerns, including close calls, in addition to adverse events.
(Here is the Institute of Medicine study to which Bagian refers; link.)

Nancy Leveson is an aeronautical and software engineer who has spent most of her career devoted to designing safe systems. Her book Engineering a Safer World: Systems Thinking Applied to Safety is a recent presentation of her theories of systems safety. She applies these approaches to problems of patient safety with several co-authors in "A Systems Approach to Analyzing and Preventing Hospital Adverse Events" (link). Here is the abstract and summary of findings for that article:
Objective: This study aimed to demonstrate the use of a systems theory-based accident analysis technique in health care applications as a more powerful alternative to the chain-of-event accident models currently underpinning root cause analysis methods.
Method: A new accident analysis technique, CAST [Causal Analysis based on Systems Theory], is described and illustrated on a set of adverse cardiovascular surgery events at a large medical center. The lessons that can be learned from the analysis are compared with those that can be derived from the typical root cause analysis techniques used today.
Results: The analysis of the 30 cardiovascular surgery adverse events using CAST revealed the reasons behind unsafe individual behavior, which were related to the design of the system involved and not negligence or incompetence on the part of individuals. With the use of the system-theoretic analysis results, recommendations can be generated to change the context in which decisions are made and thus improve decision making and reduce the risk of an accident.
Conclusions: The use of a systems-theoretic accident analysis technique can assist in identifying causal factors at all levels of the system without simply assigning blame to either the frontline clinicians or technicians involved. Identification of these causal factors in accidents will help health care systems learn from mistakes and design system-level changes to prevent them in the future.
Key Words: patient safety, systems theory, cardiac surgical procedures, adverse event causal analysis (J Patient Saf 2016;00: 00–00)
Crucial in this article is this research group's effort to identify causes "at all levels of the system without simply assigning blame to either the frontline clinicians or technicians involved". The key result is this: "The analysis of the 30 cardiovascular surgery adverse events using CAST revealed the reasons behind unsafe individual behavior, which were related to the design of the system involved and not negligence or incompetence on the part of individuals."

Bagian, Leveson, and others make a crucial point: in order to substantially increase the performance of hospitals and the healthcare system more generally when it comes to patient safety, it will be necessary to extend the focus of safety analysis from individual incidents and agents to the systems and organizations through which these accidents were possible. In other words, attention to systems and organizations is crucial if we are to significantly reduce the frequency of medical and hospital mistakes.

(The Makary et al estimate of 250,000 deaths caused by medical error has been questioned on methodological grounds. See Aaron Carroll's thoughtful rebuttal (NYT 8/15/16; link).)