A Guide for Conducting a Food Safety Root Cause Analysis

Approaches for investigating contamination incidents and preventing recurrence

Report March 24, 2020 Read time: A Guide for Conducting a Food Safety Root Cause Analysis

I. Introduction

A. The purpose of this guide

Learning from food contamination events and foodborne illness outbreaks helps uncover weaknesses in food safety systems and is a foundational property of a truly prevention-based system. Foodborne illness investigation methods continue to evolve to keep pace with changing hazards, technologies, and food production, processing, and distribution systems in an increasingly globalized food supply. However, investigation methods for identifying the root causes of food contamination have not yet been harmonized across food industries, regulatory agencies, academic institutions, and other key stakeholder groups.

Elements of root cause analysis (RCA)—commonly used to investigate air traffic accidents, patient safety issues, and other problems in various industries—have been included in many investigations of foodborne illness, where the technique can identify opportunities for improvement in the food safety system and strategies to solve them. Effective execution and communication of RCA can foster collaboration, trust, and knowledge-sharing among food producers, regulatory and nonregulatory agencies, and consumers. RCA also makes good business sense, because smoothly running operations have less downtime, fewer quality and safety holds, and improved employee morale.

The purpose of this guide is to improve food safety by encouraging the use of RCA in food operations and by safety regulators, and the sharing of information and lessons learned from these investigations. Food operations may include manufacturing and production facilities, farms, restaurants, caterers, or any other business that grows, handles, processes, distributes, prepares, or serves food. Food safety regulators include national, state, and local health agencies engaged in public health protection, inspections, and investigation of foodborne illness.

This guide is based on research on RCA in other industries as well as in-depth discussions among key stakeholders involved in food safety, including those that produce and sell food, those that have regulatory oversight over food, and those involved in the identification and resolution of foodborne illness outbreaks. The guide describes practices for effective RCA that, if used routinely, would help identify lessons learned from food safety failures and ultimately prevent foodborne illnesses. The guide provides approaches and rationales for how stakeholders can prepare for and conduct an RCA, report findings and conclusions, and apply lessons learned to prevent recurrence.

Investigators and management are encouraged to use this guide to plan RCAs, ensure the process includes steps essential to finding root causes, and design corrective actions that will prevent recurrence. To help decisionmakers determine whether dedicating resources to RCA is worthwhile, the impacts of RCA in different industries are discussed. Users may also consider the following overarching questions to gauge progress and assess the strengths and weaknesses of the investigation:

What Is Root Cause Analysis?

This systematic method of problem-solving can be used to determine the underlying reasons for how and why an event (such as product contamination or foodborne illness outbreak) occurred. It also helps clarify what steps are needed to correct the cause of the problem so that it will not recur. 1

Specifically, had the root causes not occurred, the event would have been prevented or had less of an impact. A series of contributing factors may have played a part in the event, but root causes are its most fundamental underlying reasons. 2

B. How to use this guide

This guide is intended for anyone who may conduct or manage an RCA or may be responsible for the allocation of resources to support such analysis at any point along the food production chain. This could include individuals employed by the food industry; federal, state, or local agencies with regulatory oversight over food; public health or agricultural agencies; trade or professional associations; academia; private consulting companies; or other entities with a vested interest in food safety. This group necessarily includes people with varying backgrounds and degrees of experience in an array of settings. Likewise, the factors included in this guide are for consideration in a variety of investigation settings, from assessments in individual farms or food production facilities to analyses of nationwide outbreaks.

RCA is an inherently scalable technique. The type and size of the organization, the significance of the event being investigated, the level of difficulty of the investigation, and resources available usually determine its scope. This document is intended to serve as a general guide for investigators and may also serve as a template for organizations seeking to develop internal standard procedures for conducting an RCA. In cases where not every step and recommendation will be applicable or appropriate, the document can still serve as a reference as stakeholders develop tailor-made practices and procedures. More detailed resources with further information on conducting RCAs are provided in the endnotes and Appendix A.

C. Root cause analysis has a long history in other industries

RCA was initially developed to improve manufacturing productivity in the car industry and is now used in various settings.

The technique was initially developed in the 1950s by Taiichi Ohno, former executive vice president of the Toyota Motor Corp. and developer of the Toyota Production System, as part of an operations management concept that would allow Toyota to catch up with American productivity after World War II. Ohno adapted the observational technique of Sakichi Toyoda, the founder of Toyota Industries, called the technique “five whys,” and integrated it into the Toyota Production System. 3 The idea of asking “why?” five times is to explore problems until the root causes are found, so that the quality of products and manufacturing processes can be improved. Due in part to its reliance on RCA, Toyota Motor Corp. is now the largest auto company in the world. 4

Inspired by this success, various industries around the world have adopted RCA principles to improve operations by conducting investigations of major or catastrophic incidents to prevent their recurrence. In the United States, this includes governmental and quasi-governmental agencies such as the National Aeronautics and Space Administration (NASA), National Transportation Safety Board (NTSB), the Chemical Safety Board, and the Nuclear Regulatory Commission. This approach has led to significantly better safety records in these industries. The case studies from other industries described below provide tangible implementation examples and valuable lessons learned that can inform improvements in the food industry as well.

Routine use of RCA helped rectify deficiencies in space flight’s safety culture.

NASA uses RCA as a critical component of operations. This has included, for instance, analyzing the fatal accidents of space shuttles Columbia and Challenger. In these cases, RCA uncovered persistent organizational and cultural problems that were discovered in the Challenger accident but remained unaddressed and contributed to the Columbia disaster. Similar challenges to organizational change can be found in any food producing or regulatory organization, and studying how they were ultimately solved provides valuable lessons for the food industry. The Columbia Accident Investigation Board concluded that physical mechanisms and organizational causes contributed to the loss of the Columbia and its crew in 2003. 5 The same root causes had contributed to the Challenger accident 17 years earlier, demonstrating that complex organizational cultures are difficult to change even if root causes have been identified.

The physical cause of the Columbia accident was a breach in the thermal protection system initiated by a damaged piece of insulating foam. But the Columbia Accident Investigation Board found deeper organizational root causes stemming from the space shuttle program’s history and culture, cuts in NASA’s budget starting in the early 1970s, and schedule pressures. Under these conditions, organizational practices and cultural traits that were detrimental to safety developed and created barriers to communication of critical safety information. 6 Now, in an effort to maintain an organizational culture of safety and prevent future accidents, the NASA Johnson Space Center’s Flight Safety Office maintains a Significant Incidents and Close Calls in Human Spaceflight chart to disseminate lessons learned from the incidents and encourage continued vigilance in the space flight community. 7 NASA’s Office of Safety and Mission Assurance also oversees Mishap Investigation, a program that allows NASA to understand the root causes of accidents and to prevent recurrences. 8 This example demonstrates the challenges in changing organizational safety culture, but future operations can become safer with the continued implementation of the lessons learned from root cause investigations.

Transparency through RCA has helped reduce fatal accidents in civil transportation.

Since 1975, the civil aviation industry has used RCA to help prevent avoidable accidents through a voluntary and confidential incident reporting system that allows the aviation community to report unsafe occurrences and hazardous situations. Created by NASA and the Federal Aviation Administration (FAA), the Aviation Safety Reporting System (ASRS) is mandated and funded by the FAA and is administered by NASA—a division of labor and protection against conflicts of interest. 9 The ASRS identifies potential safety hazards through analysis of voluntary aviation safety incident reports and issues safety alerts back to the aviation stakeholders so that potentially hazardous conditions can be corrected. This reporting model has been copied by aviation safety systems worldwide, and has been replicated in other industries including patient safety. (See text box “The Impact of Root Cause Analysis on Patient Safety.”) 10

The NTSB also conducts RCAs. Established in 1967 as an independent agency (part of the executive branch but independent of presidential control), the NTSB has the authority to investigate the causes of accidents in aviation, highway, marine, pipeline, railroad, and hazardous waste transportation 11 and make safety recommendations so that similar accidents do not recur. As of 2017, the NTSB had made more than 14,500 safety recommendations; more than 80 percent of these recommendations have been implemented, which has undoubtedly contributed to the prevention of fatal accidents. 12 As of 2019, there have been no fatal airline crashes in the United States in 10 years. 13 These successes have led to RCA being mandated by aviation legislation around the world. For instance, in 2013, the European Commission published regulations requiring the European Aviation Safety Agency to perform RCA. 14

RCA has led the engineering industry to develop improved safety systems.

Following the Exxon Valdez oil spill in 1989, the NTSB found faults in Exxon’s personnel policies and management’s decision-making in the root cause investigation of the accident. 15 Exxon responded by building a rigorous safety system that today is used throughout the company, and is credited with preventing another potentially disastrous accident in a deep-water exploration well called Blackbeard, located in the Gulf of Mexico. In 2007, the Blackbeard drilling team voiced major concerns about a potential blowout due to the extreme temperature and pressures of drilling at a depth of more than 30,000 feet. Exxon’s then-chairman, Rex Tillerson, sided with the drillers and abandoned the well. 16

The considerable value of this decision came into focus in 2011 when a competitor, BP, suffered an explosion on the Deepwater Horizon well, killing 11 workers and dumping roughly 4.9 million barrels of oil into the Gulf of Mexico. Root causes of this accident identified by the National Commission on the BP Deepwater Horizon Oil Spill and Offshore Drilling were management decisions that increased risk, poor communication between BP and its key contractor, Halliburton, and failure to communicate lessons learned about an earlier near-miss incident involving another contractor called Transocean. 17 As these incidents demonstrate, the implementation of effective safety systems that are built from RCA findings can prevent disasters.

Sharing RCA findings led to evidence-based recommendations to prevent adverse events in patient safety.

Routine use of RCA, and regular reporting and sharing of findings, has improved recommendations to protect patient safety. Since 1996, the Joint Commission (a nonprofit organization that accredits hospitals and other medical services) has required accredited health care organizations to conduct an RCA for all sentinel events— patient safety events, unrelated to the natural course of the patient’s illness, that lead to death, permanent harm, or severe temporary harm. Accredited health care organizations are also encouraged to report sentinel events so that they may be added to the commission’s Sentinel Event Database. Sentinel event statistics are reported yearly, but because fewer than 2 percent of all sentinel events are reported, no conclusions can be drawn about frequency or trends. 18 However, reported information is analyzed by the commission, and recommendations are disseminated to reduce the risk of future sentinel events. For example, a study on objects unintentionally left in patients’ bodies after surgery—the most frequently reported sentinel event—resulted in recommendations to improve surgical safety that address the most commonly identified contributing factors. 19

The Impact of Root Cause Analysis on Patient Safety

A 2013 study on the relationship between use of RCA and patient safety at 139 U.S. Department of Veterans Affairs (VA) medical centers found that rates of postoperative complications were higher at centers that performed fewer RCAs. 20

This study was made possible through standardization of RCA for patient safety and creation of a framework to share results and corrective actions. The VA National Center for Patient Safety (NCPS) developed and mandated an RCA process for all VA hospitals, and maintains a confidential reporting system, the Patient Safety Information System, for patient safety incidents and RCAs. 21 The VA NCPS maintains the information system to drive continuous improvement in this sector. This work demonstrates the potential value in standardization of food safety RCA procedures and in creating a platform to share findings so that industrywide analyses can be performed that provide similar assessments.

Routine RCA use allowed the recreational diving community to identify and mitigate common risk factors.

The Divers Alert Network (DAN) was created to provide real-time emergency assistance as well as access to relevant information and education to the recreational diving community. 22 Dive incidents—such as equipment failures, human errors, or entrapment underwater—are routinely reported to DAN, analyzed, and the root causes shared in incident reports posted on the network’s website. This provides valuable and actionable information to the recreational diver community. 23

For instance, in 2012, DAN researchers conducted an RCA of approximately 1,000 diving accidents to understand what events led to diver fatalities. 24 They found cardiac incidents caused at least one-quarter of all dive fatalities. Risk factors for cardiac fatalities were age (greater than 40 years old) and existing cardiovascular disease. 25 No other triggers (such as equipment problems) were associated with these fatalities. In response, DAN focused considerable outreach and education efforts on the risks of experiencing cardiac incidents during diving as well as available mitigation options.

D. History of modern food safety and opportunities for further integration of RCA

A new era of food safety oversight began with the development of hazard analysis and critical control points (known as HACCP), a systems analysis approach to food safety developed in the 1960s by the Pillsbury Co. when it began providing food for space expeditions. A HACCP system analyzes a food product’s processing steps, identifies points where hazards (such as biological, physical, or chemical contamination) are reasonably likely to occur, and designs procedures to prevent or control these risks. The U.S. Department of Agriculture (USDA) Food Safety and Inspection Service (FSIS) put in place HACCP requirements for meat and poultry operations in the mid-1990s, and the FDA Food Safety Modernization Act requires a similar approach for FDA-regulated food processing plants.

Prevention is an integral concept in HACCP, and existing federal agency directives and regulations instruct food establishments and government investigators to identify the causes of incidents and to establish actions to prevent recurrence. 26 RCA can be used in these cases as the approach that supports development of preventive actions, to strengthen HACCP plans, and as a tool to fulfill agency directives. Although there are differences between HACCP and RCA, both are systematic analyses that originated in engineering disciplines that try to detect and eliminate risks, and are fundamental to food safety. HACCP is prospective in predicting risks and developing control strategies for those risks, whereas RCA is retrospectively initiated in response to an incident and attempts to prevent a recurrence by investigating why a control strategy failed or why a risk was not identified. 27 For example, a hypothetical RCA conducted after a contamination incident found that an equipment breakdown caused a control step (e.g., heat treatment) to be bypassed. An employee failed to monitor the equipment per the HACCP plan and identify that the control step was not met. Root causes of the incident were inadequate training due to language barriers and company policies that failed to address language differences in employee trainings.

Since 1961, the Centers for Disease Control and Prevention (CDC) has been reporting foodborne disease outbreaks and routinely analyzing data to identify outbreak sources. Early outbreak investigation reports were based on inspector observations at retail establishments, and focused more heavily on detecting formal code violations than factors that contributed to the outbreak. Because not all contributing factors or root causes to an outbreak were formal violations, and not all violations that caused unsanitary conditions led to the outbreak, true root causes were rarely recorded. Recognizing limitations in outbreak reporting, CDC began to focus efforts on identifying causative factors rather than violations and to integrate systems theory into investigations by way of HACCP principles. 28 Outbreak-related information was initially grouped into five broad categories of contributing factors: 29

  1. Improper holding temperatures of food.
  2. Improper cooking temperatures of food.
  3. Using contaminated utensils and equipment.
  4. Poor health and hygiene of food handlers.
  5. Obtaining food from unsafe sources.

CDC incorporated this list into its outbreak reporting forms in the 1990s and continues to use an updated version that groups contributing factors into three broad categories: contamination (the pathogen or hazard gets into food), proliferation (the pathogen grows), and survival (the pathogen survives a processing step designed to reduce it). 30 Contributing factors are not root causes; root causes are the reasons why the contributing factors occur. 31 Nevertheless, this move by CDC was an important step forward in identifying potential causes of outbreaks and strategies to prevent recurrence.

To further improve outbreak investigations, CDC’s National Center for Environmental Health (NCEH) began incorporating root cause concepts (also referred to as environmental antecedents) into its environmental assessment process in the early 2000s. 32 Environmental assessments determine how and why foodborne illness outbreaks occurred and are used here synonymously with RCA. FDA and FSIS incorporated principles of CDC’s environmental assessment guidance into their foodborne disease outbreak investigations in the late 2000s. Results from these investigations show that they are subject to four important challenges, which are to some extent unique to food: complexity and interconnectivity of food supply chains, requiring high levels of critical thinking skills; time lag between food preparation and recognition of a foodborne illness outbreak, resulting in difficulties reconstructing events and determining food consumption histories; distinctions between regulatory infractions and contributing factors or root causes; and seasonality in the growing of food, where production operations may have ceased by the time an investigation is launched. 33

Product, pathogen, and consumer-specific factors including unexpected consumer behaviors can further complicate the analysis of foodborne illness outbreaks. Cross-contamination is another potential challenge for RCAs that is unique to food. For example, in an outbreak of Shiga toxin-producing E. coli, wheat likely became contaminated in the field, then was shipped to flour processing facilities and mixed in large silos with flour from other suppliers. 34 Investigators traced the outbreak back to one flour company, but assessing the root causes at the farm level was not feasible given the mixing of multiple sources of flour and lack of traceability back to the field of origin. As another example, in several Listeria outbreaks, cross-contamination in retail delis has posed similar challenges in identifying the root causes. 35 Deli slicers and knives that have not been properly cleaned and sanitized may spread Listeria from a single contaminated meat or cheese product to many other items sold in the deli. Because Listeria primarily sickens high-risk population groups—such as pregnant women, the elderly, and immunocompromised individuals—contaminated food will likely only cause an outbreak if consumed by these people. By chance, the contaminated food items that initially brought the bacteria into the deli may have only been consumed by healthy individuals and not caused any illness, while the food items consumed by susceptible individuals and associated with the Listeria outbreak may have been cross-contaminated at the deli, thus limiting the ability to identify root causes for the outbreak.

Fresh food products with short shelf lives pose another particular challenge for conducting effective RCAs. 36 Fresh produce is sold and consumed in a brief time frame and is usually gone from store shelves and homes by the time an outbreak is identified. Investigations of contamination from an open farm setting may have too many uncontrolled factors (such as weather or wild animal invasion) that may be difficult to reconstruct with cause-and-effect relationships that are too complex to fully understand, even through a similar systematic RCA approach. Corrective actions for produce may involve a wide spectrum of interventions from the farm to the consumer; correcting only one potential cause may not be sufficient to mitigate the contamination risk. 37

All of these challenges require creative solutions, which are discussed further in Section IV, subsection F, “What factors should be included in the design of a robust corrective or preventive action plan?”

Root Cause Investigations Spur Policy Changes in Food Industry

Findings from foodborne illness outbreak investigations have led to policy changes to improve food safety, as these four examples show. Investigation of a 2006 outbreak of E. coli O157:H7 illnesses in fresh spinach identified the potential importance of wildlife intrusion into fresh produce fields as a likely source of contamination and established a basis for improved good agricultural practices that emphasize wildlife barriers to control animal populations in the field. 39 A 1994 outbreak of Salmonella illnesses associated with commercially produced ice cream revealed cross-contamination during transportation of ice cream pre-mix and contributed to the passage of the 2005 Sanitary Food Transportation Act requiring FDA to regulate food transportation practices. 40 A 1989 outbreak of Salmonella illnesses associated with mozzarella cheese revealed a breakdown in sanitary procedures as the mozzarella manufacturer approached bankruptcy as the root cause for a multistate outbreak. 41 In restaurant settings, investigation of numerous outbreaks of norovirus associated with ill food workers have identified multiple reasons why food workers work while ill, leading to broader policy changes—from maintaining employee illness logs to mandating provision of employee sick leave. 42

E. Methodology

The recommendations included in this guide were informed by the opinions and suggestions of food safety experts, review of the available literature, working groups, and discussions with RCA experts from other industries. In three separate day-long, in-person meetings, The Pew Charitable Trusts convened key food safety experts and stakeholders from federal, state, and local health agencies, the food industry and relevant trade associations, academia, and consumer protection and public health organizations to discuss the advantages of—and barriers to—conducting food safety RCAs; steps for conducting a successful root cause investigation; and strategies to communicate findings. In addition to these meetings, Pew convened smaller working groups remotely to address key issues identified during the in-person meetings.

This guide is organized by four major questions based on themes that arose from the discussion:

  1. What is an RCA?
  2. What should be considered before conducting an RCA?
  3. How should an RCA be conducted?
  4. How should findings from an RCA be communicated?

II. What is an RCA?

For this document, an RCA, or environmental assessment, is defined as a retrospective investigation method used to identify why an incident occurred. 38 An incident can be an outbreak, an event that could have caused microbial, chemical, or physical contamination, a processing failure, or a food safety system failure. The goal of this type of investigation is to determine the factor(s) underlying the problem and identify actions that can be taken to eliminate the problem, prevent its recurrence, and ultimately reduce the risk of foodborne illness. To accomplish these goals, the investigation team should take the necessary steps to identify the actual root causes, or environmental antecedents, of the problem and not just the contributing factors.

Key Definitions

Root cause analysis (also called environmental assessment): a retrospective investigative tool used to determine the underlying reason(s) that caused an incident and what actions can be done to eliminate the problem, prevent recurrence, and reduce risk. Investigation is used synonymously with root cause analysis in this document.

Root cause (also called environmental antecedent): the underlying reasons that resulted in a system breakdown. If the root cause had not occurred, the event would not have occurred or would have been of significantly lower impact.

Contributing factor: the physical, biological, behavioral, or attitudinal factors that directly or indirectly resulted in an outbreak or other incident.

A. What is the difference between a contributing factor and a root cause?

Distinguishing between root causes and contributing factors is crucial to ensuring that the investigation has been sufficiently thorough to arrive at the root causes. 43 A contributing factor is what went wrong, whereas a root cause is why it went wrong. Oftentimes, contributing factors are also violations of food safety regulations (such as improper holding temperature). Inspections or incident investigations should continue after food safety violations are identified to determine why the violation occurred. 44 Root cause findings may or may not be clear regulatory violations; however, ending investigations at violations or contributing factors will diminish the prevention power of the RCA approach.

Control of contributing factors without addressing the underlying reason why they were present can result in a repetitive cycle of short-term correction followed by gradual loss of food safety controls and recurring problems. Making this distinction encourages prevention of the problem rather than just mitigation of the effects of an incident.

To illustrate how to distinguish contributing factors from root causes, here are two hypothetical examples of contamination in different food settings:

Example 1: A processed food is recontaminated after pasteurization and enters the market.

Example 2: Salmonella-contaminated tree nuts cause an outbreak.

Why Accidents Happen Despite Safety Barriers

James Reason’s Swiss cheese model is a frequently used illustration of the difficulties in understanding incidents and fixing root causes in complex systems. 45 Many layers of defense—or slices of Swiss cheese— may be erected as barriers to prevent problems, but each of these slices has holes due to unintended weaknesses. Sometimes these holes align across each barrier to allow a hazard to slip through and cause an accident. Corrective action plans can be adapted to address root causes; however, cementing these changes in complex food systems is not an easy task. An organizational culture that supports constant vigilance and provides tools to help staff validate corrective actions is essential for strengthening food safety practices.

B. In what situations can an RCA be performed?

Because of its broad origins in a variety of industries, RCAs can be used to investigate a wide range of adverse events affecting safety or quality—from singular, unusual events to patterns of recurrences—in a variety of food industry settings. An RCA can be helpful any time a food company or regulatory agency needs to know why a foodborne illness outbreak or other incident occurred and how to prevent it from recurring. There is no event too big or too small for which an RCA cannot be performed; however, it is important that the investigation techniques and approaches be appropriately tailored to match the scope and significance of the event. To be most effective in preventing food safety failures, RCAs should be integrated within a self-evaluating organizational culture and linked with existing continuous process improvement initiatives.

The following examples illustrate who may conduct an RCA in response to an incident in a few common situations:

However, the investigation team is often made up of state and local regulatory and public health agencies from the affected states. If the company involved is a federally regulated facility, state and local regulatory and public health agencies would play a large role in the outbreak investigation but a limited role in the RCA. Federal agencies would play a major role in RCAs at federally regulated facilities. Federal and state public health officials may be integrated into an Incident Command System (ICS), allowing personnel from multiple agencies to collaborate rapidly under a common management structure. 46

C. What are the advantages of performing an RCA? What are the challenges?

Advantages

The findings from RCAs are critically important for understanding what went wrong in a food safety operation so that corrective actions can be implemented. Findings from RCAs should also indicate what went right, such as where safety measures and other aspects of the production system were operating as intended and prevented or mitigated the impact of an event.

By adopting a systems-based approach, companies can equip their employees to better evaluate interdependent factors that can affect food safety (such as environment, facility, supply chain, equipment, or employee behaviors), actively consider and anticipate risk, and foster a food safety culture that prevents contamination rather than responds after it occurs. 47 In addition to adopting a standardized RCA practice, companies may also develop a consistent framework for sharing investigation results and lessons learned, both within their organization as well as broadly to other industry partners and regulatory agencies. This practice encourages positive working relationships between food companies, regulators, and public health investigators, in which industry engages in self-reporting and the agencies work with the industry to examine the data and provide guidance on analysis.

There are economic advantages to conducting an RCA. Strict liability laws mean that responsibilities for foodborne outbreaks fall most heavily on the food industry. Single cases of severe illnesses like hemolytic uremic syndrome associated with E. coli O157:H7 can easily result in liability costs exceeding several million dollars per case. 48 As demonstrated in the energy sector, RCAs can help prevent these financial losses. An operator of more than 17,000 miles of crude oil and natural gas pipeline in North America analyzed internal RCAs and found them to be a cost-effective process for the company. One RCA generated an estimated $16 million in savings. 49 RCA can also help identify inefficiencies that affect production costs and quality issues. Optimizing a company’s food safety system is beneficial in a highly competitive economic environment.

RCA also has value if it is not immediately clear why an incident occurred or how recurrences can be prevented. In certain cases, an RCA may reveal additional vulnerabilities in systems used to implement corrective actions even if the factors that led to the event are known.

The Economic Impact of Foodborne Illness Outbreaks and Recalls

Outbreaks and product recalls can cost companies millions of dollars and damage brand reputations. 50

For example, the 2010 outbreak associated with shell eggs cost the industry more than $100 million that September alone due to price drops from negative media attention. 51 In addition, the recall costs of a multistate outbreak in 2007 linked to Salmonella-contaminated peanut butter were $78 million, with an estimated cost to the entire peanut industry of $1 billion. 52 In retail, with lost sales, legal fees, lawsuits, and fines, the costs of a large foodborne illness outbreak may exceed a restaurant’s annual revenue. 53

Challenges and potential solutions

RCAs tend to be time- and resource-intensive. Because not all RCAs provide an equal amount of insight and value, organizations need to develop a risk-based framework to systematically and consistently prioritize incidents for RCA. Factors used to prioritize RCAs in other industries include the actual or potential impact of the event and the likelihood of recurrence. 54 Events due to known contributing factors, in particular those that are outside of the organization’s control, may receive a lower priority compared to those due to novel contributing factors. This framework should also emphasize the opportunity to target key causes in circumstances in which previous RCAs had multiple potential root causes, particularly where similar events occurred repeatedly.

There are numerous barriers to successfully performing RCAs, including staff’s unwillingness to participate, interprofessional differences, and a lack of time. 55 Additional factors that may hinder investigations include lack of: 56

Small and midsize operations often lack the resources large operations can summon to conduct investigations, and they may not be able to conduct full RCAs, particularly for smaller incidents or near misses. These companies could benefit from the sharing of findings from RCAs conducted by large companies in response to outbreaks and near misses. 57 In addition, access to specialized external expertise could be valuable. The University of Minnesota’s “Team Diarrhea” could be one interesting model for extending resources while simultaneously providing valuable learning opportunities for student investigators. 58 Team Diarrhea is composed of students from the University of Minnesota’s School of Public Health who are extensively trained by the Minnesota Health Department to conduct interviews after foodborne illness cases identified through surveillance. Using a student team approach such as this and expanding the scope of responsibilities from interviewing to RCA investigations or creating separate RCA teams for collaboration with food producers could enhance the efficiency of state health departments and provide much-needed resources to smaller operations. 59

Communicating the results of RCAs may present a challenge for companies concerned about enforcement actions, liability, confidentiality, and brand protection. Companies may withhold or delay the release of RCA results over concerns about confidential or private data or uncertainty of the backlash of disclosing that information. 60 Regarding liability, there are concerns that public disclosure of RCA results could expose companies to consumer litigation. But companies may be able to explore ways to prevent the disclosure of confidential commercial information, such as by adapting existing expert forums and roundtables or potential sources of tools such as the Global Food Safety Initiative, trade associations, USDA Agricultural Research Service, or as part of a university extension. Platforms to share anonymous results may be able to help communication of results between regulators and industry, between large and small operations, and across industry segments. Potential unintended legal consequences of sharing RCA findings and strategies for mitigating these risks, albeit beyond the scope of this guide, are clearly important considerations and deserve further investigation.

Analysis of Repeat Patterns Across Multiple Incidents Helps Identify Root Causes

The NTSB has a long history of assessing similarities across accidents to determine failure patterns and identify root causes. For instance, NTSB pieced together critical information from the investigations into the crashes of United flight 585 in 1991 and USAir flight 427 in 1994, and a nonfatal incident in 1996 involving Eastwind Airlines flight 517 to uncover key evidence that ultimately allowed investigators to identify a mechanical malfunction critical to all three aircraft. 61

Two recent accidents involving new Boeing 737 Max airplanes occurred within five months of each other in Indonesia and Ethiopia, and both planes exhibited similar flight patterns. The similarities led authorities around the world to take immediate action, grounding the aircraft type until the likely root causes can be resolved. In the U.S. the accidents have spurred congressional inquiries into FAA’s certification program for the new Boeing 737 Max airplane and other potential weaknesses in the aviation safety system. 62

III. What should be considered before conducting an RCA?

A. How should the scale of an RCA be determined?

Stakeholders should develop a shared understanding of when to conduct an in-depth, resource-intensive RCA. Developing robust and evidence-based criteria for scoping an RCA before an incident occurs is important for transparent decision-making regarding whether a food safety system failure requires a large-scale RCA or a more abbreviated analysis. Criteria can include triggers to help staff recognize more serious or complex incidents with high public health significance that require additional resources and expertise to investigate and control.

While there is broad consensus among food safety stakeholders regarding the importance of prevention, several factors can limit the feasibility and cost-effectiveness of a full-scale RCA. Examples include:

When an RCA is deemed necessary, investigators will need to adjust the intensity and scale of the analysis to meet the needs of the incident to conserve and appropriately deploy resources. Factors that should be considered when deciding how to scale the investigation include: 63

Tools such as checklists can be useful for determining the appropriate scale of the investigation and can help avoid haphazard decision-making (consult resources from patient safety and occupational health 64 ). For example, the International Association for Food Protection published key factors to consider in the investigation of outbreaks of foodborne diseases. These are organized in tables and can be used by investigation teams to identify and prioritize elements that likely led to contamination so that resources can be used wisely. 65 For instance, if a salmonellosis outbreak is initially linked to a cooked meat product, data from previous outbreaks suggest that a principal cause to investigate would be post-processing contamination from a worker in the facility. 66

Factors Used by NTSB to Scale an Investigation

Following notification of a major aviation accident, the Office of Aviation Safety, in consultation with the safety board, makes the decision to dispatch a core investigation team, called the “go team.” The composition of the go team is then determined, based on the potential scope of the investigation, the magnitude of the tasks, and other factors, including:

Evaluation of initial evidence from the accident is also used to determine whether certain specialists are needed. A full investigation team may be composed of numerous specialists in different areas; for instance, air traffic control, operations, meteorology, human performance, power plants, or metallurgy. Other groups may be formed to interview witnesses or examine the response of aircraft rescue and firefighting personnel. 67

B. Does your organization have sufficient capacity to perform an RCA or have plans and procedures in place for capacity development?

Ideally, the individuals who make up a root cause investigation team will have the opportunity to participate in appropriate training before the need to conduct an RCA arises. (See Appendix A for additional training resources.) Before initiating an investigation, assess the current capacity to conduct RCA for different types of food safety incidents and determine: 68

Capacity to perform RCA can be built internally before any incidents by reviewing available staff and resources, identifying and training a multidisciplinary team that can be mobilized when needed, and identifying a team leader who has specialized training and experience (see Appendix A for training resources). Stakeholders can train front-line investigators in foundational principles, how to recognize indicators of when investigation scaleup is warranted, and how to make appropriate notifications to initiate the expansion when needed. Main steps in capacity development are to: 69

Because multiple disciplines and entities may play roles in an RCA, clearly identifying the relative priority of various objectives and establishing targets for completion are important for creating a shared understanding of what is in and out of the scope of the investigation, and what expertise may be required. Core team members for a major investigation may include:

For a smaller-scale RCA, a facilitator along with a trained investigator or professionals in environmental health and epidemiology may suffice. Companies that do not have ready access to necessary technical experts should consider developing, as part of their scaling criteria, a set of procedures (see Section III, subsection A, “How should the scale of an RCA be determined?”) so that access to relevant expertise can be obtained.

Basic Capacities for an RCA Investigator

The composition of team members along with their roles and responsibilities will differ across RCA teams in different contexts. However, investigators should have the following basic competencies: 70

Evaluate and analyze data: Ability to focus on multiple streams of evidence.

Critical thinking and problem-solving: Ability to identify human factors, equipment issues, or problems in underlying systems that contributed to an incident.

Collaborate and build relationships: Ability to gather evidence from individuals involved in the incident without placing blame.

Communicate effectively: Ability to write and speak clearly to communicate complex causal relationships.

Be flexible: Ability to adapt investigations for near misses or serious accidents to derive benefits gained from investigating a wider range of incidents.

RCA is data-driven, and core team members should be data-oriented with strong problem-solving skills. Some organizations may find new training programs necessary to develop relationship-building and interpersonal communication skills for productive RCAs. In the regulatory context, shifting focus from issuing citations for food safety incidents to building cooperative relationships between companies and public health officials was a successful strategy. 71

Other multidisciplinary team members may be selected when appropriate. The NTSB (see “Factors Used by NTSB to Scale an Investigation,” above) decides on the composition of the investigation team after notification of an accident. Additional experts may be selected based on the investigation scope and the magnitude of the tasks.

C. How long should the RCA take?

Expected length of the RCA depends on many factors, including priority and level of resources that will be dedicated. Initial assessment of the project timeline and periodic re-evaluation as the investigation progresses may be necessary to determine if projections are realistic. By setting expectations and objectives early, the investigation team can plan according to the amount of time allotted to conduct the analysis, based on available resources and schedules. However, enough time should be allowed to complete the investigation even if unexpected delays occur. Unavoidable delays should not stop the investigation team from releasing findings that may prevent incidents even before the root causes are identified.

RCAs should begin with the end in mind; identifying the root causes of the incident should always be an explicit investigation objective, even if an analysis cannot be completed because of lack of evidence, support, time, or competing priorities. Ideally, though, the investigation should be allowed to continue for as long as necessary to identify root causes and to share those findings.

NTSB Shares Findings During Investigations

In 1996, four years before the final investigation report on TWA Flight 800 was released, the NTSB distributed urgent safety recommendations once it determined that an explosion in the center wing fuel tank was the probable cause of the accident (a root cause identified later was a certification process that failed to identify the faults in a design concept that placed heat sources below the fuel tank). The early release of urgent safety information likely prevented accidents in other aircraft. 72

IV. How is an RCA conducted?

A. What steps should be taken before performing an RCA?

Data gathering for RCA should begin as early as possible following the incident when critical evidence can be secured. The following steps are not part of the RCA but are important data-gathering steps that affect the quality of the analysis and are included here for awareness. 73

  1. Remove contaminated product from the marketplace. 74

    When a foodborne outbreak or incident is initially detected, key information is often lacking regarding the food vehicle, source of contamination, contributing factors, and root causes. General control measures are typically used during the early phases of investigations to address the most apparent contributing factors that caused an incident. Actions may involve industry issuing recall notices or government announcements advising the public to destroy any contaminated product in the home after the pathogen or other contaminant is identified (note that this is different from understanding how and why the agent contaminated the food). Specific control measures can be implemented as investigators learn more about the sources, contributing factors, and root causes of the outbreak.
  2. Form the investigation team. 75

Characteristics of Effective Root Cause Analyses

B. What are the steps for conducting an RCA?

Specific steps and procedures for conducting an RCA may vary depending on the organization; different methods may be chosen to identify root causes. General steps are provided below as an overview for those new to the issue, to help with planning and resource allocation, and to ensure a consistent approach. 77

    Collect data and define the problem.

C. What tools are commonly used for conducting RCA?

D. How do you know when you have identified the root cause or causes?

When conducting an RCA, it is important to acknowledge that there typically is no single cause. Often, a combination of several root causes resulted in the event. In these cases, it may be difficult to identify all root causes and to be sure when all have been identified. However, a systematic, data-driven investigative approach may increase the likelihood that all root causes are recognized.

To determine whether the root causes have been successfully identified, take the following steps: