System Hardware Assurance

Authors: Michael Bear, Donald Davidson, Shawn Fetteroff, Darin Leonhardt, Daniel Radack, Karen Johnson, Elizabeth A. McDaniel Contributors: Michael Berry, Brian Cohen, Diganta Das, Houman Homayoun, Thomas McDermott

This article describes the discipline of hardware assurance, especially as it relates to systems engineering. It is part of the SE and Quality Attributes Knowledge Area.

Overview

System hardware assurance is a set of system security engineering activities undertaken to quantify and increase the confidence that electronics function as intended and only as intended throughout their life cycle, and well as to manage identified risks.(***1) The term hardware refers to electronic components, sometimes called integrated circuits or chips. As products of multi-stage processes involving design, manufacturing and post-manufacturing, packaging, and test, they must function properly under a wide range of circumstances. Hardware components – alone and integrated into subcomponents, subsystems, and systems – have weaknesses and vulnerabilities enabling exploitation. Weaknesses are flaws, bugs, or errors in design, architecture, code, or implementation. Vulnerabilities are weaknesses that are exploitable in the context of use [***2].

Hardware assurance is conducted to minimize risks related to hardware that can enable adversarial exploitation and subversion of functionality, counterfeit production, and loss of technological advantage. Challenges include increasing levels of sophistication and complexity of hardware architectures, integrated circuits, operating systems, and application software, combined with supply chain risks, emergence of new attack surfaces, and reliance on global sources for some components and technologies.

After identifying concerns and applicable mitigations, hardware assurance offers a range of possible activities and processes. At the component level, hardware assurance focuses on the hardware itself and the supply chain used to design and manufacture it; at the subcomponent, subsystems, and system levels, hardware assurance incorporates the software and firmware integrated with the component.

Engineering efforts to enhance trust in hardware have increased in response to complex hardware architectures, the increasing sophistication of adversarial attacks on hardware, and globalization of supply chains. These factors raise serious concerns about the security, confidentiality, integrity, provenance, authenticity, and availability of hardware. The “root of trust” [***3] of a system is typically contained in the processes, steps, and layers of hardware components and across the systems engineering development cycle. System hardware assurance focuses on hardware components and their interconnections with software and firmware to reduce risks to proper function or other compromises of the hardware throughout the complete lifecycle of components and systems. Advances in hardware assurance tools and techniques will strengthen designs, and enhance assurance during manufacturing, packaging, test, and deployment.

Lifecycle Concerns of Hardware Components

Hardware assurance should be applied at various stages of a component’s lifecycle from hardware architecture and design, through manufacturing and testing, and finally throughout its inclusion in a larger system. The need for hardware assurance then continues throughout its operational life including sustainment and disposal.

As semiconductor technology advances the complexity of electronic components, it increases the need to “bake-in” assurance. Risks created during architecture, design, and manufacturing are challenging to address during the operational phase. Risks associated with interconnections between and among chips are also a concern. Therefore, improving a hardware assurance posture must occur as early as possible in the lifecycle, thereby reducing the cost and schedule impacts associated with “fixing” components later in the lifecycle of the system.

A conceptual overview of the typical hardware lifecycle (Figure 1) illustrates the phases of the lifecycle of components, as well as the subsystems and systems in which they operate. In each phase multiple parties and processes contribute a large set of variables and corresponding attack surfaces. As a result, the potential exists for compromise of the hardware as well as the subcomponents and systems in which they operate; therefore, matching mitigations should be applied at the time the risks are identified.

INSERT DIAGRAM HERE

Both the value of the hardware component and the associated cost of mitigating risks increase at each stage of the lifecycle. Therefore, it is important to identify and mitigate vulnerabilities as early as possible. It takes longer to find and fix defects later, thereby increasing the complexity of replacing hardware with “corrected” designs. In addition to cost savings, early correction and mitigation avoid delays in creating an operational system. It is essential to re-assess risks associated with hardware components throughout the lifecycle periodically, especially as operational conditions change.

Hardware assurance during system sustainment is a novel challenge given legacy hardware and designs with their associated supply chains. In long-lived high-reliability systems, hardware assurance issues are compounded by obsolescence and diminished sourcing of components, thereby increasing concerns related to counterfeits and acquisitions from the gray market.

Function as Intended and Only as Intended

Exhaustive testing can check system functions against specifications and expectations; however, checking for unintended functions is problematic. Consumers have a reasonable expectation that a purchased product will perform as advertised and function properly (safely and securely, under specified conditions) – but consumers rarely consider if additional functions are built into the product. For example, a laptop with a web-conferencing capability comes with a webcam that will function properly when enabled. What[EAM1] if the webcam also functions when turned off, thereby violating expectations of privacy? Given that a state-of-the-art semiconductor component might have billions of transistors, “hidden” functions might be exploitable by adversaries. The statement “function as intended and only intended” communicates the need to check for unintended functions.

Hardware specifications and information in the design phase are needed to validate that components function properly for systems or missions. If an engineer creates specifications that support assurance that flow down the system development process, the concept of “function as intended” can be validated for the system and mission through accepted verification and validation processes. “Function only as intended” is also a consequence of capturing the requirements and specifications to assure the product is designed and developed without extra functionality. For example, a Field Programmable Gate Array (FPGA) contains programmable logic that is highly configurable; however, the programmable circuitry might be susceptible to exploitation.

Given the specifications of a hardware component, specialized tools and processes can be used to determine with a high degree of confidence whether the component’s performance meets specifications. Research efforts are underway to develop robust methods to validate that a component does not have capabilities that threaten assurance or that are not specified in the original design. Although tools and processes can test for known weaknesses, operational vulnerabilities, and deviations from expected performance, all states of possible anomalous behavior cannot currently be determined or predicted.

Designers, developers, and members of the provider community test the component and provide assurance data. Designers and developers can provide suitably complete descriptions of a design, its fabrication data, and related verification-and-validation data. Provides collect the data to create an assurance assessment for acquirers. Then members of the acquirer, consumer, or user community can conduct static and dynamic acceptance testing, in real or simulated operational environments. The providers and/or users can solicit third-party evaluation to collect data independently from more than one source that there is no unintended functionality in the component.

[EAM1]

Risks to Hardware

Modern systems depend on complex microelectronics, but advances in hardware without attention to associated risks can expose critical systems, their information, and the people who rely on them. “Hardware is evolving rapidly, thus creating fundamentally new attack surfaces, many of which will never be entirely secured.” In this way hardware assurance mirrors cybersecurity because both require mitigations and strategies that evolve as threats do so. Hardware assurance methods seek to raise confidence in the hardware to mitigate known or expected weaknesses or vulnerabilities.

Most hardware components are commercially designed, manufactured, and then inserted into larger assemblies by multi-national companies with global supply chains. Understanding the provenance and participants in complex global supply chains of components is fundamental to assessing risks associated with the components.

Operational risks that derive from unintentional or intentional features are differentiated based on the source of the feature. Three basic operational risk areas relate to goods, products, or items: failure to follow meet quality standards, maliciously tainted goods, and counterfeit hardware. Counterfeits are usually offered as legitimate products, but they are not. They may be refurbished items, mock items made to appear as the originals, re-marked products, or the product of overproduction/substandard production items that the legitimate producer did not intend to go on the market. The impact of counterfeits include ….

Failure to follow quality standards, that include safety and security standards, especially in design, can result in unintentional features or flaws being inadvertently introduced through mistakes, omissions, or lack of understanding about features that might be manipulated by future users for their nefarious purposes. Features introduced intentionally into hardware for specific purposes make them susceptible to espionage or control of the hardware at some point in its life cycle.

Improve the Confidence

One of the key technical challenges associated with hardware assurance is the development of quantifiable metrics and measurements for concepts such as trust and assurance. While quantification is challenging because of the complex interplay between human designers, manufacturing and supply chains, and adversarial intent, it is important so that hardware risks can be identified and managed within program budgets and timeframes. Quantification enables a determination of the required level of hardware assurance, and whether it is successfully achieved throughout the hardware’s lifecycle.

Quantification of hardware assurance begins with a system-level assessment and ranking of hardware by risks and consequences. Criteria for conducting the hardware risk assessment can be based on factors such as criticality of the hardware to system operation or consequence of technology loss by reverse engineering and intellectual property theft.

Current methods for quantifying hardware risk, trust, and assurance emerged from quality and reliability engineering, which rely on methods like Failure Mode and Effects Analysis (FMEA). FMEA, semi-quantitative in nature, relies on a combination of probabilistic data for hardware failure and input from subject matter experts. Adapting FMEA to quantify hardware assurance is hampered when assigning probabilities to human behaviors motivated by economic incentives, malicious intent, etc. Opinions of experts vary when assigning numeric values and weighting factors used in generating risk matrices and scores; consensus processes can help but are not always perfect.

Manage Risks

The selection of specific components for use in subsystems and systems should be the outcome of performance-risk-cost-benefit trade-off assessments in their intended context of use. The goal of risk management and mitigation planning is to select mitigations with the best overall operational risk reduction and the lowest cost impact. During the life cycle of a system – architecture, design, code, or implementation – various types of problems can pose risks to the operational functionality of the hardware components provided. These include weaknesses or defects that are inadvertent (unintentional), counterfeits that are accidental (unintentional) or intentional, e.g. for financial motivations or malicious components designed to change functionality (intentional).

The purpose of managing risk in the context of hardware assurance is to decrease the risk of weaknesses that can be exploited and increased in the attack surface, while increasing confidence that an implementation resists exploitation. Ideally, risk management eliminates risk and maximizes assurance to an acceptable level. Often, risks are considered in the context of likelihood of consequences and the costs and effectiveness of mitigations.

However, new operationally impactful risks are recognized continuously over the hardware life cycle and supply chains of components. Further, hardware weaknesses are often exploited through software or firmware. As such, to maximize assurance and minimize operationally impactful risks, mitigation in depth across all constituent components must be considered.

An example of a mitigation to a hardware weakness is the use of programmable logic. Through programmable logic, when a new attack surface is identified, a new configuration for the programmed logic can be loaded to protect the hardware through configurability and adaptability. In this case, the programming functions must be assured such that they cannot be exploited for unintended purposes. In this case, a dynamic risk profile highlights the need for flexibility in hardware configuration to provide extensible mitigation. Specifically, a dynamic risk profile highlights the need to reduce the susceptibility of hardware to obsolescence-related risks and weaknesses over its life cycle. Similarly, such an extensible mitigation provides the means to mitigate defects discovered post-fabrication.

Just as with software patches and updates, new attack surfaces on hardware may become exposed through the mitigation being applied, but they will likely take a long time to discover. In the example above, the programmable logic is updated to provide a new configuration to protect the hardware. In this context, access to hardware reconfiguration must be limited to authorized parties to prevent an unauthorized update that introduces weaknesses on purpose. While programmable logic may have mitigated a specific attack surface or type of weakness, additional mitigations are needed to minimize risk more completely. This is mitigation-in-depth – multiple mitigations building upon one another.

Throughout the entire supply chain, critical pieces of information can be inadvertently exposed. The exposure of such information directly enables the creation and exploitation of new attack surfaces. Therefore, the supply chain infrastructure must also be aware of weaknesses and work to protect the creation, use, and maintenance of hardware components the dynamic risk profile offers a framework to balance mitigations in the context of risk and cost throughout the complete hardware and system life cycles.

Current Research

Current efforts seek to move from compliance-based systems to risk-based systems to support mitigation-in-depth in situations when compromises are needed to address the increasing complexity of hardware components, intellectual property of hardware interconnected with software and firmware, and approaches. Promising approaches include game theory analysis, use of confidence intervals for detecting counterfeit defects, and distributed ledger technology to hardware manufacturing data to create an immutable record for component provenance and traceability. Efforts are underway to articulate new standards for hardware assurance and methods to leverage quantifiable data to make inform critical system engineering trades.