Difference between revisions of "System Resilience"

From SEBoK
Jump to navigation Jump to search
Line 9: Line 9:
 
*The ''functional redundancy design principle'' calls for critical functions to be duplicated using different means.  
 
*The ''functional redundancy design principle'' calls for critical functions to be duplicated using different means.  
 
*The ''layered defense design principle'' states that single point failures should be avoided.   
 
*The ''layered defense design principle'' states that single point failures should be avoided.   
 +
 +
The absorption design principle requires the implementation of traditional specialties, such as [[Reliability]] and [[Safety]]. Most resilience design principles affect the system design processes, such as architecting. The reparability design principle affects the design of the sustainment system.
 
===The Flexibility Attribute===
 
===The Flexibility Attribute===
 
Flexibility is the attribute of a system that allows it to restructure itself in the fact of a threat. The following design principles apply to the capacity attribute:  
 
Flexibility is the attribute of a system that allows it to restructure itself in the fact of a threat. The following design principles apply to the capacity attribute:  
Line 27: Line 29:
 
*The ''[[Inter-Node Interaction (glossary)|inter-node interaction (glossary)]] design principle'' requires that [[Node (glossary)|node (glossary)]] (elements) of a system be ca-pable of communicating, cooperating, and collaborating with each other. This design principle also calls for all nodes to understand the intent of all the other nodes as described by (Billings 1991).
 
*The ''[[Inter-Node Interaction (glossary)|inter-node interaction (glossary)]] design principle'' requires that [[Node (glossary)|node (glossary)]] (elements) of a system be ca-pable of communicating, cooperating, and collaborating with each other. This design principle also calls for all nodes to understand the intent of all the other nodes as described by (Billings 1991).
  
==Linkages to other topics==
 
The absorption design principle requires the implementation of traditional specialties, such as reliability and safety. Most resilience design principles affect the system design processes, such as architecting. The reparability design principle affects the design of the sustainment system.
 
 
==Practical Considerations==
 
==Practical Considerations==
 
Resilience is difficult to achieve for infrastructure systems because the nodes (cities, counties, states, and private entities) are reluctant to cooperate with each other. Another barrier to resilience is cost. Achieving redundancy in, for example, dams and levees can be prohibitively expensive. Other aspects can be low or moderate cost, such as communicating on common frequencies, but even there cultural barriers have to be overcome for implementation.
 
Resilience is difficult to achieve for infrastructure systems because the nodes (cities, counties, states, and private entities) are reluctant to cooperate with each other. Another barrier to resilience is cost. Achieving redundancy in, for example, dams and levees can be prohibitively expensive. Other aspects can be low or moderate cost, such as communicating on common frequencies, but even there cultural barriers have to be overcome for implementation.

Revision as of 15:45, 1 September 2011

According to the Oxford English Dictionary (OED 1973, , p.1807), resilience is “the act of rebounding or springing back.” This definition most directly fits the situation of materials which return to their original shape after deformation. For human-made systems this definition can be extended to say “the ability of a system to recover from a disruption .” The US government (DHS 2010) definition for infrastructure systems is “ability of systems, infrastructures, government, business, communities, and individuals to resist, tolerate, absorb, recover from, prepare for, or adapt to an adverse occurrence that causes harm, destruction, or loss of national significance.” The concept of creating a resilient human-made system or resilience engineering is discussed by (Hollnagel, Woods, and Leveson 2006). The principles are elaborated by (Jackson 2010). Further literature on resilience is (Jackson 2007) and (Madni and Jackson 2009).

Topic Overview

The purpose of resilience engineering and architecting is to achieve the full or partial recovery of a system following the encounter with a threat resulting in the disruption of the functionality of that system. Threats can be natural, such as earthquakes, hurricanes, tornadoes, or tsunamis. Threats can be internal and human-made such as reliability flaws and human error. Threats can be external and human made, such as terrorist attacks. A single incident can often be the result of multiple threats, such a human error committed in the attempt to recover from another threat. The attached diagram depicts the loss and recovery of the functionality of a system. System types include product systems of a technological nature or enterprise systems, such as civil infrastructures. They can be either individual systems or systems of systems. A resilient system possesses four attributes: capacity , flexibility , tolerance , and cohesion . These attributes are adapted from (Hollnagel, Woods, and Leveson 2006), There are 13 top level design principles identified that will achieve these attributes. They are extracted from Hollnagel et al and are elaborated on in (Jackson 2010). Disruption Diagram

The Capacity Attribute

Capacity is the attribute of a system that allows it to withstand a threat. Resilience allows that the capacity of a system may be exceeded forcing the system to rely on the remaining attributes to achieve recovery. The following design principles apply to the capacity attribute:

  • The absorption design principle calls for the system to be designed to withstand a design level threat including adequate margin.
  • The physical redundancy design principle states that the resilience of a system will be enhanced when critical components are physically redundant.
  • The functional redundancy design principle calls for critical functions to be duplicated using different means.
  • The layered defense design principle states that single point failures should be avoided.

The absorption design principle requires the implementation of traditional specialties, such as Reliability and Safety. Most resilience design principles affect the system design processes, such as architecting. The reparability design principle affects the design of the sustainment system.

The Flexibility Attribute

Flexibility is the attribute of a system that allows it to restructure itself in the fact of a threat. The following design principles apply to the capacity attribute:

  • The reorganization design principle says that the system should be able to change its own architecture before, during, or after the encounter with a threat. This design principle is applicable particularly to human systems.
  • The human backup design principle requires that humans be involved to back up automated systems especially when unprecedented threats are involved.
  • The complexity avoidance design principle calls for the minimization of complex elements, such as software and humans, except where they are essential (see human backup design principle.
  • The drift correction design principle states that detected threats or conditions should be corrected before the encounter with the threat. The condition can either be immediate as for example the approach of a threat, or they can be latent within the design or the organization..

The Tolerance Attribute

Tolerance is the attribute of a system that allows it to degrade gracefully following an encounter with a threat. The following design principles apply to the tolerance attribute.

  • The localized capacity design principle states that, when possible, the functionality of a system should be concentrated in individual nodes of the system ansta independent of the other nodes.
  • The loose coupling design principle states that cascading failures in systems should be checked by inserting pauses between the nodes. According to (Perrow 1999) humans at these nodes have been found to be the most effective.
  • The neutral state design principle states that systems should be brought into a neutral state before actions are taken.
  • The reparability design principle states that systems should be reparable to bring the system back to full or partial functionality.

The Cohesion Attribute

Cohesion is the attribute that allows it to operate as a system before, during, and after an encounter with a threat. According to (Hitchins 2009), cohesion is a basic characteristic of a system. The following global design principle applies to the cohesion attribute.

  • The inter-node interaction design principle requires that node (elements) of a system be ca-pable of communicating, cooperating, and collaborating with each other. This design principle also calls for all nodes to understand the intent of all the other nodes as described by (Billings 1991).

Practical Considerations

Resilience is difficult to achieve for infrastructure systems because the nodes (cities, counties, states, and private entities) are reluctant to cooperate with each other. Another barrier to resilience is cost. Achieving redundancy in, for example, dams and levees can be prohibitively expensive. Other aspects can be low or moderate cost, such as communicating on common frequencies, but even there cultural barriers have to be overcome for implementation.

Glossary

resilience

capacity

flexibility

tolerance

cohesion

disruption

threat

recovery

References

Billings, Charles. 1991. Aviation Automation: A Concept and Guidelines. Moffett Field, California: National Aeronautics and Space Administration (NASA).

DHS. 2010. DHS [Department of Homeland Security] Risk Lexicon.

Hitchins, Derek. 2009. What are the General Principles Applicable to Systems? Insight, 59-63.

Hollnagel, Erik, David D. Woods, and Nancy Leveson, eds. 2006. Resilience Engineering: Concepts and Precepts. Aldershot, UK: Ashgate Publishing Limited.

Jackson, Scott. 2010. Architecting Resilient Systems: Accident Avoidance and Survival and Recovery from Disruptions. Edited by A. P. Sage, Wiley Series in Systems Engineering and Management. Hoboken, NJ, USA: John Wiley & Sons.

Jackson, Scott. 2007. A multidisciplinary framework for resilience to disasters and disruptions. Journal of Design and Process Science 11:91-108.

Madni, Azad,, and Scott Jackson. 2009. Towards a conceptual framework for resilience engineering. Institute of Electrical and Electronics Engineers (IEEE) Systems Journal 3 (2):181-191.

OED. 1973. In The Shorter Oxford English Dictionary on Historical Principles, edited by C. T. Onions. Oxford: Oxford Univeristy Press. Original edition, 1933.

Perrow, Charles. 1999. Normal Accidents. Princeton, NJ: Princeton University Press.

Citations

Billings, Charles. 1991. Aviation Automation: A Concept and Guidelines. Moffett Field, California: National Aeronautics and Space Administration (NASA).

DHS. 2010. DHS [Department of Homeland Security] Risk Lexicon.

Hitchins, Derek. 2009. What are the General Principles Applicable to Systems? Insight, 59-63.

Hollnagel, Erik, David D. Woods, and Nancy Leveson, eds. 2006. Resilience Engineering: Concepts and Precepts. Aldershot, UK: Ashgate Publishing Limited.

Jackson, Scott. 2010. Architecting Resilient Systems: Accident Avoidance and Survival and Recovery from Disruptions. Edited by A. P. Sage, Wiley Series in Systems Engineering and Management. Hoboken, NJ, USA: John Wiley & Sons.

Jackson, Scott. 2007. A multidisciplinary framework for resilience to disasters and disruptions. Journal of Design and Process Science 11:91-108.

Madni, Azad,, and Scott Jackson. 2009. Towards a conceptual framework for resilience engineering. Institute of Electrical and Electronics Engineers (IEEE) Systems Journal 3 (2):181-191.

OED. 1973. In The Shorter Oxford English Dictionary on Historical Principles, edited by C. T. Onions. Oxford: Oxford Univeristy Press. Original edition, 1933.

Perrow, Charles. 1999. Normal Accidents. Princeton, NJ: Princeton University Press.

Primary References

OED. 1973. In The Shorter Oxford English Dictionary on Historical Principles, edited by C. T. Onions. Oxford: Oxford Univeristy Press. Original edition, 1933.

DHS. 2010. DHS [Department of Homeland Security] Risk Lexicon.

Jackson, Scott. 2010. Architecting Resilient Systems: Accident Avoidance and Survival and Recovery from Disruptions. Edited by A. P. Sage, Wiley Series in Systems Engineering and Management. Hoboken, NJ, USA: John Wiley & Sons.

Additional References

Billings, Charles. 1991. Aviation Automation: A Concept and Guidelines. Moffett Field, California: National Aeronautics and Space Administration (NASA).

Hitchins, Derek. 2009. What are the General Principles Applicable to Systems? Insight, 59-63.

Hollnagel, Erik, David D. Woods, and Nancy Leveson, eds. 2006. Resilience Engineering: Concepts and Precepts. Aldershot, UK: Ashgate Publishing Limited.

Jackson, Scott. 2007. A multidisciplinary framework for resilience to disasters and disruptions. Journal of Design and Process Science 11:91-108.

Madni, Azad,, and Scott Jackson. 2009. Towards a conceptual framework for resilience engineering. Institute of Electrical and Electronics Engineers (IEEE) Systems Journal 3 (2):181-191.

Perrow, Charles. 1999. Normal Accidents. Princeton, NJ: Princeton University Press.


Article Discussion

[Go to discussion page]

<- Previous Article | Parent Article | Next Article ->

Signatures

--Bkcase 19:10, 22 August 2011 (UTC) (on behalf of Dick Fairley)