Thursday, July 18, 2019

Safety and accident analysis: Longford


Andrew Hopkins has written a number of fascinating case studies of industrial accidents, usually in the field of petrochemicals. These books are crucial reading for anyone interested in arriving at a better understanding of technological safety in the context of complex systems involving high-energy and tightly-coupled processes. Especially interesting is his Lessons from Longford: The ESSO Gas Plant Explosion. The Longford refining plant suffered an explosion and fire in 1998 that killed two workers, badly injured others, and interrupted the supply of natural gas to the state of Victoria for two weeks. Hopkins is a sociologist, but has developed substantial expertise in the technical details of petrochemical refining plants. He served as an expert witness in the Royal Commission hearings that investigated the accident. The accounts he offers of these disasters are genuinely fascinating to read.

Hopkins makes the now-familiar point that companies often seek to lay responsibility for a major industrial accident on operator error or malfeasance. This was Esso's defense concerning its corporate liability in the Longford disaster. But, as Hopkins points out, the larger causes of failure go far beyond the individual operators whose decisions and actions were proximate to the event. Training, operating plans, hazard analysis, availability of appropriate onsite technical expertise -- these are all the responsibility of the owners and managers of the enterprise. And regulation and oversight of safety practices are the responsibility of stage agencies. So it is critical to examine the operations of a complex and dangerous technology system at all these levels.

A crucial part of management's responsibility is to engage in formal "hazard and operability" (HAZOP) analysis. "A HAZOP involves systematically imagining everything that might go wrong in a processing plant and developing procedures or engineering solutions to avoid these potential problems" (26). This kind of analysis is especially critical in high-risk industries including chemical plants, petrochemical refineries, and nuclear reactors. It emerged during the Longford accident investigation that HAZOP analyses had been conducted for some aspects of risk but not for all -- even in areas where the parent company Exxon was itself already fully engaged in analysis of those risky scenarios. The risk of embrittlement of processing equipment when exposed to super-chilled conditions was one that Exxon had already drawn attention to at the corporate level because of prior incidents.

A factor that Hopkins judges to be crucial to the occurrence of the Longford Esso disaster is the decision made by management to remove engineering staff from the plant to a central location where they could serve a larger number of facilities "more efficiently".
A second relevant change was the relocation to Melbourne in 1992 of all the engineering staff who had previously worked at Longford, leaving the Longford operators without the engineering backup to which they were accustomed. Following their removal from Longford, engineers were expected to monitor the plant from a distance and operators were expected to telephone the engineers when they felt a need to. Perhaps predictably, these arrangements did not work effectively, and I shall argue in the next chapter that the absence of engineering expertise had certain long-term consequences which contributed to the accident. (34)
One result of this decision is the fact that when the Longford incident began there were no engineering experts on site who could correctly identify the risks created by the incident. Technicians therefore restarted the process by reintroducing warm oil into the super-chilled heat exchanger. The metal had become brittle as a result of the extremely low temperatures and cracked, leading to the release of fuel and subsequent explosion and fire. As Hopkins points out, Exxon experts had long been aware of the hazards of embrittlement. However, it appears that the operating procedures developed by Esso at Longford ignored this risk, and operators and supervisors lacked the technical/scientific knowledge to recognize the hazard when it arose.

The topic of "tight coupling" (the tight interconnection across different parts of a complex technological system) comes up frequently in discussions of technology accidents. Hopkins shows that the Longford case gives a new spin to this idea. In the case of the explosion and fire at Longford it turned out to be very important that plant 1 was interconnected by numerous plumbing connections to plants 2 and 3. This meant that fuel from plants 2 and 3 continued to flow into plant 1 and greatly extended the length of time it took to extinguish the fire. Plant 1 had to be fully isolated from plants 2 and 3 before the fire could be extinguished (or plants 2 and 3 could be restarted), and there were enough plumbing connections among them, poorly understood at the time of the fire, that took a great deal of time to disconnect (32).

Hopkins addresses the issue of government regulation of high-risk industries in connection with the Longford disaster. Written in 1999 or so, he recognizes the trend towards "self-regulation" in place of government rules stipulating the operating of various industries. He contrasts this approach with deregulation -- the effort to allow the issue of safe operation to be governed by the market rather than by law.
Whereas the old-style legislation required employers to comply with precise, often quite technical rules, the new style imposes an overarching requirement on employers that they provide a safe and healthy workplace for their employees, as far as practicable. (92)
He notes that this approach does not necessarily reduce the need for government inspections; but the goal of regulatory inspection will be different. Inspectors will seek to satisfy themselves that the industry has done a responsible job of identify hazards and planning accordingly, rather than looking for violations of specific rules. (This parallels to some extent his discussion of two different philosophies of audit, one of which is much more conducive to increasing the systems-safety of high-risk industries; chapter 7.) But his preferred regulatory approach is what he describes as "safety case regulation". (Hopkins provides more detail about the workings of a safety case regime in Disastrous Decisions: The Human and Organisational Causes of the Gulf of Mexico Blowout, chapter 10.)
The essence of the new approach is that the operator of a major hazard installation is required to make a case or demonstrate to the relevant authority that safety is being or will be effectively managed at the installation. Whereas under the self-regulatory approach, the facility operator is normally left to its own devices in deciding how to manage safety, under the safety case approach it must lay out its procedures for examination by the regulatory authority. (96)
The preparation of a safety case would presumably include a comprehensive HAZOP analysis, along with procedures for preventing or responding to the occurrence of possible hazards. Hopkins reports that the safety case approach to regulation is being adopted by the EU, Australia, and the UK with respect to a number of high-risk industries. This discussion is highly relevant to the current debate over aircraft manufacturing safety and the role of the FAA in overseeing manufacturers.

It is interesting to realize that Hopkins is implicitly critical of another of my favorite authors on the topic of accidents and technology safety, Charles Perrow. Perrow's central idea of "normal accidents" brings along with it a certain pessimism about the ability to increase safety in complex industrial and technological systems; accidents are inevitable and normal (Normal Accidents: Living with High-Risk Technologies). Hopkins takes a more pragmatic approach and argues that there are engineering and management methodologies that can significantly reduce the likelihood and harm of accidents like the Esso gas plant explosion. His central point is that we don't need to be able to anticipate a long chain of unlikely events in order to identify the hazard in which these chains may eventuate -- for example, loss of coolant in a nuclear reactor or loss of warm oil in a refinery process. These final events of numerous different possible accident scenarios all require procedures in place that will guide the responses of engineers and technicians when "normal accidents" occur (33).

Hopkins highlights the challenge to safety created by the ongoing modification of a power plant or chemical plant; later modifications may create hazards not anticipated by the rigorous accident analysis performed on the original design.
Processing plants evolve and grow over time. A study of petroleum refineries in the US has shown that "the largest and most complex refineries in the sample are also the oldest ... Their complexity emerged as a result of historical accretion. Processes were modified, added, linked, enhanced and replaced over a history that greatly exceeded the memories of those who worked in the refinery. (33)
This is one of the chief reasons why Perrow believes technological accidents are inevitable. However, Hopkins draws a different conclusion:
However, those who are committed to accident prevention draw a different conclusion, namely, that it is important that every time physical changes are made to plant these changes be subjected to a systematic hazard identification process. ...  Esso's own management of change philosophy recognises this. It notes that "changes potentially invalidate prior risk assessments and can create new risks, if not managed diligently." (33)
(I believe this recommendation conforms to Nancy Leveson's theories of system safety engineering as well; link.)

Here is the causal diagram that Hopkins offers for the occurrence of the explosion at Longford (122).


The lowest level of the diagram represents the sequence of physical events and operator actions leading to the explosion, fatalities, and loss of gas supply. The next level represents the organizational factors identified in Longford's analysis of the event and its background. Central among these factors are the decision to withdraw engineers from the plant; a safety philosophy that focused on lost-time injuries rather than system hazards and processes; failures in the incident reporting system; failure to perform a HAZOP for plant 1; poor maintenance practices; inadequate audit practices; inadequate training for operators and supervisors; and a failure to identify the hazard created by interconnections with plants 2 and 3. The next level identifies the causes of the management failures -- Esso's overriding focus on cost-cutting and a failure by Exxon as the parent company to adequately oversee safety planning and share information from accidents at other plants. The final two levels of causation concern governmental and societal factors that contributed to the corporate behavior leading to the accident.

(Here is a list of major industrial disasters; link.)


No comments: