Difference between revisions of "System Hardware Assurance"

Revision as of 21:02, 30 March 2021

Authors: Michael Bear, Donald Davidson, Shawn Fetteroff, Darin Leonhardt, Daniel Radack, Karen Johnson, Elizabeth A. McDaniel Contributors: Michael Berry, Brian Cohen, Diganta Das, Houman Homayoun, Thomas McDermott

This article describes the discipline of hardware assurance, especially as it relates to systems engineering. It is part of the SE and Quality Attributes Knowledge Area.

Overview

System hardware assurance is a set of system security engineering activities undertaken to quantify and increase the confidence that electronics function as intended and only as intended throughout their life cycle, and well as to manage identified risks. The term hardware refers to electronic components, sometimes called integrated circuits or chips. As products of multi-stage processes involving design, manufacturing and post-manufacturing, packaging, and test, they must function properly under a wide range of circumstances. Hardware components – alone and integrated into subcomponents, subsystems, and systems – have weaknesses and vulnerabilities enabling exploitation. Weaknesses are flaws, bugs, or errors in design, architecture, code, or implementation. Vulnerabilities are weaknesses that are exploitable in the context of use (Martin 2014).

Hardware assurance is conducted to minimize risks related to hardware that can enable adversarial exploitation and subversion of functionality, counterfeit production, and loss of technological advantage. Challenges include increasing levels of sophistication and complexity of hardware architectures, integrated circuits, operating systems, and application software, combined with supply chain risks, emergence of new attack surfaces, and reliance on global sources for some components and technologies.

After identifying concerns and applicable mitigations, hardware assurance offers a range of possible activities and processes. At the component level, hardware assurance focuses on the hardware itself and the supply chain used to design and manufacture it; at the subcomponent, subsystems, and system levels, hardware assurance incorporates the software and firmware integrated with the component.

Engineering efforts to enhance trust in hardware have increased in response to complex hardware architectures, the increasing sophistication of adversarial attacks on hardware, and globalization of supply chains. These factors raise serious concerns about the security, confidentiality, integrity, provenance, authenticity, and availability of hardware. The “root of trust” (NIST 2020) of a system is typically contained in the processes, steps, and layers of hardware components and across the systems engineering development cycle. System hardware assurance focuses on hardware components and their interconnections with software and firmware to reduce risks to proper function or other compromises of the hardware throughout the complete lifecycle of components and systems. Advances in hardware assurance tools and techniques will strengthen designs, and enhance assurance during manufacturing, packaging, test, and deployment.

Lifecycle Concerns of Hardware Components

Hardware assurance should be applied at various stages of a component’s lifecycle from hardware architecture and design, through manufacturing and testing, and finally throughout its inclusion in a larger system. The need for hardware assurance then continues throughout its operational life including sustainment and disposal.

As semiconductor technology advances the complexity of electronic components, it increases the need to “bake-in” assurance. Risks created during architecture, design, and manufacturing are challenging to address during the operational phase. Risks associated with interconnections between and among chips are also a concern. Therefore, improving a hardware assurance posture must occur as early as possible in the lifecycle, thereby reducing the cost and schedule impacts associated with “fixing” components later in the lifecycle of the system.

A conceptual overview of the typical hardware lifecycle (Figure 1) illustrates the phases of the lifecycle of components, as well as the subsystems and systems in which they operate. In each phase multiple parties and processes contribute a large set of variables and corresponding attack surfaces. As a result, the potential exists for compromise of the hardware as well as the subcomponents and systems in which they operate; therefore, matching mitigations should be applied at the time the risks are identified.

INSERT DIAGRAM HERE

Both the value of the hardware component and the associated cost of mitigating risks increase at each stage of the lifecycle. Therefore, it is important to identify and mitigate vulnerabilities as early as possible. It takes longer to find and fix defects later, thereby increasing the complexity of replacing hardware with “corrected” designs. In addition to cost savings, early correction and mitigation avoid delays in creating an operational system. It is essential to re-assess risks associated with hardware components throughout the lifecycle periodically, especially as operational conditions change.

Hardware assurance during system sustainment is a novel challenge given legacy hardware and designs with their associated supply chains. In long-lived high-reliability systems, hardware assurance issues are compounded by obsolescence and diminished sourcing of components, thereby increasing concerns related to counterfeits and acquisitions from the gray market.

Function as Intended and Only as Intended

Exhaustive testing can check system functions against specifications and expectations; however, checking for unintended functions is problematic. Consumers have a reasonable expectation that a purchased product will perform as advertised and function properly (safely and securely, under specified conditions) – but consumers rarely consider if additional functions are built into the product. For example, a laptop with a web-conferencing capability comes with a webcam that will function properly when enabled. What if the webcam also functions when turned off, thereby violating expectations of privacy? Given that a state-of-the-art semiconductor component might have billions of transistors, “hidden” functions might be exploitable by adversaries. The statement “function as intended and only intended” communicates the need to check for unintended functions.

Hardware specifications and information in the design phase are needed to validate that components function properly for systems or missions. If an engineer creates specifications that support assurance that flow down the system development process, the concept of “function as intended” can be validated for the system and mission through accepted verification and validation processes. “Function only as intended” is also a consequence of capturing the requirements and specifications to assure the product is designed and developed without extra functionality. For example, a Field Programmable Gate Array (FPGA) contains programmable logic that is highly configurable; however, the programmable circuitry might be susceptible to exploitation.

Given the specifications of a hardware component, specialized tools and processes can be used to determine with a high degree of confidence whether the component’s performance meets specifications. Research efforts are underway to develop robust methods to validate that a component does not have capabilities that threaten assurance or that are not specified in the original design. Although tools and processes can test for known weaknesses, operational vulnerabilities, and deviations from expected performance, all states of possible anomalous behavior cannot currently be determined or predicted.

Designers, developers, and members of the provider community test the component and provide assurance data. Designers and developers can provide suitably complete descriptions of a design, its fabrication data, and related verification-and-validation data. Providers collect the data to create an assurance assessment for acquirers. Then members of the acquirer, consumer, or user community can conduct static and dynamic acceptance testing in real or simulated operational environments. The providers and/or users can solicit third-party evaluation to collect data independently from more than one source that there is no unintended functionality in the component.

Risks to Hardware

Modern systems depend on complex microelectronics, but advances in hardware without attention to associated risks can expose critical systems, their information, and the people who rely on them. “Hardware is evolving rapidly, thus creating fundamentally new attack surfaces, many of which will never be entirely secured” (Oberg 2020). Therefore, it is imperative that risk be modeled through a dynamic risk profile and be mitigated in depth across the entire profile. Hardware assurance requires extensible mitigations and strategies that can and do evolve as threats do. Hardware assurance methods seek to quantify and improve confidence that weaknesses that can become vulnerabilities that create risks are mitigated.

Most hardware components are commercially designed, manufactured, and inserted into larger assemblies by multi-national companies with global supply chains. Understanding the provenance and participants in complex global supply chains is fundamental to assessing risks associated with the components.

Operational risks that derive from unintentional or intentional features are differentiated based on the source of the feature. Three basic operational risk areas related to goods, products, or items are: failure to meet quality standards, maliciously tainted goods, and counterfeit hardware. Counterfeits are usually offered as legitimate products, but they are not. They may be refurbished or mock items made to appear as originals, re-marked products, the result of overproduction, or substandard production parts rejected by the legitimate producer. Counterfeit risks include avenues for malware insertion and impacts to system performance and availability.

Failure to follow quality standards including safety and security standards, especially in design, can result in unintentional features or flaws being inadvertently introduced. These can occur through mistakes, omissions, or lack of understanding how features might be manipulated by future users for nefarious purposes. Features introduced intentionally for specific purposes can sometimes also make the hardware susceptible to espionage or control of the hardware at some point in its lifecycle.

Quantify and Improve Confidence

The quantification of hardware assurance is a key technical challenge because of the complex interplay among designer, manufacturer, the supply chains, and adversarial intent, as well as the challenge of defining “security” with respect to hardware function. Quantification is necessary to identify and manage hardware risks within program budgets and timeframes. It enables a determination of the required level of hardware assurance and whether quantification is achievable throughout the hardware’s lifecycle.

Current methods for quantifying hardware assurance are adapted from the fields of quality and reliability engineering. For example, Failure Mode and Effects Analysis (FMEA) (SAE 2021) which is semi-quantitative combines probabilistic hardware failure data and input from experts. Adapting FMEA to quantify hardware assurance is challenging as it relies on assigning probabilities to human behavior motivated by money, malicious intent, etc. Expert opinion often varies when quantifying and weighting factors used in generating risk matrices and scores. In response, recent efforts are attempting to develop quantitative methods that reduce subjectivity.

Game theoretic analysis (game theory) is the creation of mathematical models of conflict and cooperation between intelligent and rational decision-makers (Myerson 1991). Models include dynamic, as opposed to static, interactions between attackers and defenders that can quantify the risks associated with potential interactions among adversaries, hardware developers, and manufacturing processes [***7]. Creation of the models forces one to define attack scenarios explicitly and to input detailed knowledge of the hardware development and manufacturing processes. Outputs of the model may include a ranking of the most likely attacks to occur based on cost-benefit constraints on the attackers and defenders [***8]. The results can empower decision-makers to make quantitative trade-off decisions about hardware assurance.

Another quantification method that results in a confidence interval for detecting counterfeit/suspect microelectronics is presented in the SAE AS6171 standard. Confidence is based on knowing the types of defects associated with counterfeits, and the effectiveness of different tests to detect those defects. Along the same lines, a standard for hardware assurance might be developed to quantify the confidence interval by testing against a catalogue of known vulnerabilities, such as those documented in the MITRE Common Vulnerabilities and Exposures (CVE) list 9].

Distributed ledger technology (DLT) is an example of an emerging technology that could enable a standardized approach for quantifying hardware assurance attributes such as data integrity, immutability, and traceability. DLT can be used in conjunction with manufacturing data (such as dimensional measurement, parametric testing, process monitoring, and defect mapping) to improve tamper resistance using component provenance and traceability data. DLT also enables new scenarios of cross-organizational data fusion, opening the door to new classes of hardware integrity checks.

Manage Risks

The selection of specific components for use in subsystems and systems should be the outcome of performance-risk and cost-benefit trade-off assessments in their intended context of use. The goal of risk management and mitigation planning is to select mitigations with the best overall operational risk reduction and the lowest cost impact. The required level of hardware assurance varies with the criticality of a component's use and the system in which it is used.

During a typical development lifecycle of a system – architecture, design, code, and implementation – various types of problems can pose risks to the operational functionality of the hardware components provided. These risks include weaknesses or defects that are inadvertent (unintentional), as well as counterfeits that may be either inadvertent or intentionally injected into the supply chain for financial motivations or malicious components designed to change functionality.

Managing risk in the context of hardware assurance seeks to decrease the risk of weaknesses that create attack surfaces that can be exploited, while improving confidence that an implementation resists exploitation. Ideally, risk management reduces risk and maximizes assurance to an acceptable level. Often, risks are considered in the context of likelihood of consequences and the costs and effectiveness of mitigations.

However, new operationally impactful risks are recognized continuously over the hardware lifecycle and supply chains of components. At the same time hardware weaknesses are often exploited through software or firmware. Therefore, to maximize assurance and minimize operationally impactful risks mitigation-in-depth across all constituent components must be considered. This highlights the need for a dynamic risk profile.

In an example case, a new hardware risk is identified in the operating environment. A dynamic risk profile of a system that is network-accessible can be created to manage the susceptibility of hardware to design and manufacturing defects, as well as obsolescence-related risks. By using programmable logic to re-configure the hardware, mitigation becomes extensible and able to address future, yet unidentified, risks.

Just as with software patches and updates, new attack surfaces on hardware may become exposed through the mitigation being applied, and they will likely take a long time to discover. In the example above, the programmable logic is updated to provide a new configuration to protect the hardware. In this context, access to hardware reconfiguration must be limited to authorized parties to prevent an unauthorized update that introduces weaknesses on purpose. While programmable logic may have mitigated a specific attack surface or type of weakness, additional mitigations are needed to minimize risk more completely. This is mitigation-in-depth – multiple mitigations building upon one another.

Throughout the entire supply chain, critical pieces of information can be inadvertently exposed. The exposure of such information directly enables the creation and exploitation of new attack surfaces. Therefore, the supply chain infrastructure must also be assessed for weaknesses, and the development, use, and maintenance of hardware components assured. The dynamic risk profile offers a framework to balance mitigations in the context of risk and cost throughout the complete hardware and system lifecycles.

References

Works Cited

[2[ Martin, R.A., Non-Malicious Taint: Bad Hygiene is as Dangerous to the Mission as Malicious Intent. CrossTalk Magazine, issue on Mitigating Risks of Counterfeit and Tainted Components, March 2014.

[3] NIST Information Technology Technical Laboratory Computer Science Research Center. https://csrc.nist.gov/projects/hardware-roots-of-trust.

^[4] Reducing Hardware Security Risk. Semiconductor Engineering, Jason Oberg, July 1, 2020, etc.

[5] SAE J1739 - Potential Failure Mode and Effects Analysis (FMEA) Including Design FMEA, Supplemental FMEA-MSR, and Process FMEA,” SAE International, 2021.

[6] R.R. Myerson, Game Theory: Analysis of Conflict. Cambridge, MA: Harvard University Press, 1991.

[7] B.K. Eames and M.H. Johnson. “Trust Analysis in FPGA-based Systems.” GOMACTech 2017, Reno, NV.

[8] J. Graf. “OpTrust: Software for Determining Optimal Test Coverage and Strategies for Trust,” GOMACTech 2017, Reno, NV.

[9] MITRE Corporation. “Common Vulnerabilities and Exposures.” https://cve.mitre.org/cve/.

Primary References

Add

Additional References

Add

< Previous Article | Parent Article | Next Article (Part 7) > SEBoK v. 2.3, released 30 October 2020

@@ Line 76: / Line 76: @@
 ===Works Cited===
-[1] Hardware Assurance Body of Knowledge Core Group, Authors and Contributors of this article. 2020-21.
 [2[ Martin, R.A., Non-Malicious Taint: Bad Hygiene is as Dangerous to the Mission as Malicious Intent. ''CrossTalk Magazine'', issue on Mitigating Risks of Counterfeit and Tainted Components, March 2014.