Confidential Computing for AI Workloads
An explainer on GPU-CC and its relevance to secure AI exports
Introduction
Two things surprised me at an October 2025 workshop hosted by the Confidential Computing Consortium. First, NVIDIA began researching GPU-level confidential computing six years ago because one customer asked for it. Second, Google’s TPUs do not yet support it.1 The technology exists, but the accelerator landscape is fragmented.
Two trends make accelerator-level confidential computing newly urgent.
First, frontier labs are diversifying their hardware and cloud uses. In November 2025, Anthropic became the first frontier AI company with partnerships across all three major cloud providers: Amazon Web Services, Google Cloud, and Microsoft Azure. It trains and deploys models on custom AWS Trainium and Inferentia, Google TPUs, and NVIDIA GPUs. In October 2025, OpenAI announced deals with Broadcom and AMD to supplement its NVIDIA fleet with custom silicon starting in 2026. Each chip has different security architectures. Model weights are only as secure as the least secure chip or data center they run in.
Second, U.S. export policy is moving toward exporting a full AI stack. The July 2025 AI Action Plan promotes exporting America’s “full AI technology stack—hardware, models, software applications and standards—to all countries willing to join America’s AI alliance.” Executive Order 14320 operationalizes this by requiring full-stack proposals for approved deployments abroad. The vision taking shape is chips and data centers in partner countries, operated by U.S. cloud firms. But this means infrastructure in environments where local employees, data center operators, or state actors pose further security threats.
For frontier AI model weights (the learned parameters that define what an AI model can do), this combination is concerning. These weights represent the core intellectual property of frontier AI labs and, increasingly, national security assets. Security reduces to the weakest link: the least secure chip and data center in the deployment footprint.
In this blog post, I begin with how confidential computing works, what problems it solves, and why businesses adopted it. Then I explain why AI workloads require extending confidential computing from CPUs to GPUs, a development that only became possible in 2023. I turn to the security frameworks that frontier labs use to think about model weight protection, and survey which accelerators currently support the technology. I close with an opportunity for standards innovation and its relevance to policymakers evaluating AI infrastructure security.
Confidential Computing
Traditional security protects data stored on disk (”at rest”) and data moving across a network (”in transit”). But when data is being processed (”in use”), it must be decrypted so a computer can work with it. During processing, the unencrypted data is exposed in the computer’s working memory. Model weights face exactly this vulnerability: during training, they are computed and stored in memory as the model learns, and during inference, they must be loaded into memory.
Several layers of software can access this memory: the operating system, which manages all programs on the machine; the hypervisor, which lets multiple virtual machines share the same physical server; and the cloud provider’s administrative tools. System administrators, cloud provider employees, or attackers who compromise any of these layers can read or modify the data.
This gap has become a major attack surface. As encryption became standard for data at rest and in transit, attackers moved to where encryption stops.
Confidential computing (CC) addresses this vulnerability by protecting data in use. Instead of trusting the cloud provider, system administrators, the hypervisor, and the operating system, confidential computing runs workloads inside hardware-based Trusted Execution Environments (TEEs).
A Trusted Execution Environment is an environment implemented at the processor level that provides security guarantees for the code and data running inside it. Within a TEE, secure enclaves are isolated portions of memory where sensitive data and code can be processed. TEEs provide the comprehensive secure environment at the hardware level, while enclaves are specific protected regions created within that environment.
This shrinks the trusted computing base dramatically. The trusted computing base is the set of components that must be trusted not to leak or tamper with data. For a model running in a cloud region outside an organization’s direct control, this reduced trust requirement becomes powerful.
A TEE must provide three core guarantees:
Data confidentiality: Unauthorized parties cannot view data while it is in use.
Data integrity: Unauthorized parties cannot alter data during processing.
Code integrity: Unauthorized parties cannot modify the running code.
These guarantees would not mean much if one had to trust the cloud provider to confirm that the TEE was genuine. Hardware-based attestation in confidential computing solves this problem. Attestation is a cryptographic protocol that proves what hardware is running, what code is loaded, and whether the TEE is in a trustworthy state. The evidence is signed by keys burned into the chip during manufacturing, making forgery very difficult. Before sending sensitive data to a remote system, one can verify that the enclave is running on genuine hardware and that no one has tampered with it.
Use Cases
Confidential computing has three main applications, demonstrating its usefulness for AI model providers, users, regulated industries, and software developers:
Generative AI. Every large language model interaction involves parties who do not fully trust each other. Model owners want to protect proprietary weights. Clients want confidentiality for prompts, documents, and outputs. Cloud providers want to host workloads without requiring customers to trust them with raw data. Confidential computing resolves this by running inference inside a hardware-secured TEE. The hardware generates attestation proving that the expected code is running in a protected environment. Model owners and clients can verify this proof remotely before sharing weights or data. The cloud provider runs the workload but cannot see inside the enclave. This is where confidential computing intersects most directly with frontier AI governance, particularly the challenge of keeping model weights secure during inference and in potential export regimes.
Multiparty data collaboration. Confidential computing lets multiple organizations run joint analysis on sensitive data without exposing raw records to each other. In financial services, banks cannot share customer data directly due to privacy rules, but criminals exploit this lack of cross-bank visibility. Confidential computing enables secure data clean rooms for fraud detection and anti-money-laundering, supporting faster pattern detection and stronger compliance. Healthcare and pharmaceutical companies use similar approaches: secure enclaves allow hospitals, research labs, and drug developers to pool patient data that could not otherwise be shared, enabling larger datasets for drug discovery and clinical research while preserving regulatory compliance.
Software supply chain security. Modern software is assembled, not written from scratch. Developers pull in open-source libraries, compile them into a runnable application (a process called a build), and release updates through automated pipelines. Each stage is an opportunity for compromise: an attacker can plant malicious code in a library, alter code during the build, or push fraudulent updates through the release pipeline. Confidential computing can secure this chain. Attested enclaves verify that only approved code and libraries are used, and compilation happens inside a protected environment where attackers cannot modify code mid-process. The enclave also protects the signing key, the credential used to certify that an update is legitimate. The key stays locked inside hardware-protected memory, so only verified applications can be published as official releases.
From CPU to GPU
Confidential computing has been available on CPUs for years: ARM TrustZone arrived in 2004, Intel SGX in 2015, AMD SEV in 2017. CPU confidential computing continues to evolve, with newer technologies like Intel TDX and AMD SEV-SNP arriving in the early 2020s and now widely deployed in major cloud platforms.
The problem for AI workloads is that the computation does not happen on the CPU. When a large language model runs inference, the CPU orchestrates the process, but the heavy computation happens on the GPU. If the TEE protects only the CPU, an attacker who can read GPU memory can extract model weights.
GPU-level confidential computing (GPU-CC) extends the TEE boundary to include the accelerator itself, encrypting weights and intermediate computations and providing hardware attestation that the environment is in a trustworthy state.
Accelerator-level confidential computing is relatively new. NVIDIA’s H100, based on the Hopper architecture, was the first GPU to support confidential computing, with the feature shipping in July 2023 and becoming generally available in 2024.
The successor Blackwell architecture, announced in March 2024, extends GPU-CC protections further, including encrypting traffic between GPUs working together on large models.
Now, what threats do frontier model weights actually face, and where does confidential computing fit into the response?
Security Levels for Model Weights
RAND Corporation’s May 2024 report “Securing AI Model Weights” identifies 38 distinct attack vectors and defines five security levels from SL1 to SL5, each calibrated to defend against increasingly sophisticated adversaries, from amateur hackers to top-priority nation-state operations. For frontier models, it recommends SL4 or SL5 protection.
Model weights represent the culmination of everything that goes into training a frontier AI system: the compute, the data, the algorithmic research. Once an attacker has the weights, they have complete control. They can run the model without restrictions, strip its safety measures, and fine-tune it for harmful purposes. The cost of inference on stolen weights is trivial compared to the cost of training. Weight security cannot rely on a few measures, for the attack surface is too broad.
Confidential computing is one of two points of “overwhelming agreement” among the experts RAND consulted, described as “a strategic next step toward improving model weight security.” Because AI inference runs on GPUs, this means accelerator-level confidential computing. RAND specifies three requirements at SL4:
The TEE must include protections against physical attacks. (RAND notes that current GPU implementations do not, citing NVIDIA’s H100 documentation.)
Model weights must be encrypted by a key generated within the TEE and stored only within the TEE.
The TEE must run only prespecified, audited, signed code that decrypts weights, runs inference, and outputs only the model response. The code cannot output weights, the encryption key, or intermediate values.
RAND acknowledges that “the use of confidential computing in GPUs is still nascent and may not be ‘production-ready’” as of mid-2024 but lists it as an urgent priority “because of an overwhelming consensus by experts regarding its importance.”
The presence of accelerator-level confidential computing is not the same as robust security for high-stakes deployments. GPU-CC on H100s has known limitations: GPU memory is protected by access control rather than runtime encryption, RPC metadata and synchronization structures remain exposed in plaintext, and timing patterns in memory transfers could leak information, among other issues. Hopper generation GPUs do not support confidential computing for large-scale training workloads, a limitation that Blackwell is beginning to address.
Policymakers should ask how much vendors have invested in security validation in general, including bug bounty programs and external security audits. They should also consider what scale of workloads the hardware can support while maintaining both confidentiality guarantees and acceptable performance. These questions apply to CPU-level confidential computing as well.
Confidential Inference Systems
In June 2025, Anthropic and Pattern Labs (now Irregular) published a whitepaper on confidential inference systems. The paper builds on RAND’s SL4 and SL5 security levels, designed to withstand weight theft attempts by leading cyber-capable institutions and the most capable nation-states, and details design principles and security risks for using confidential computing to protect model weights during inference.
Two aspects of the whitepaper are relevant here.
First, the threat model assumes the service provider is adversarial. The design presumes that attackers have complete control over any machine associated with the inference system, including the hosts where secure enclaves run. The paper frames this as defense against cyber attacks or insider threats, but the same architecture is what would be needed to deploy models to data centers in jurisdictions where local employees or operators might be compelled to cooperate with state intelligence services.
Second, CPU-only TEEs are insufficient. AI inference runs on accelerators, not CPUs. If the confidential boundary protects only the CPU, an attacker who can read GPU memory can extract model weights. The boundary must extend to the accelerator.
The preferred solution is native TEE support within the accelerator: hardware-level isolation, encryption, and attestation built into the GPU. The accelerator receives encrypted model weights, decrypts them internally, processes them in protected memory, and encrypts outputs before sending them back. NVIDIA’s H100 and Blackwell GPUs now ship with this capability. For accelerators that lack native support, the paper describes a fallback architecture that bridges the CPU enclave to the accelerator. The authors are candid about the limitations: this approach is not airtight and may be susceptible to side-channel attacks.
Anthropic’s whitepaper is not an isolated effort. OpenAI’s May 2024 blog post “Reimagining Secure Infrastructure for Advanced AI” listed accelerator-level confidential computing as its first category of secure infrastructure, calling for trusted computing primitives to extend “beyond the CPU host and into AI accelerators themselves.”2 Google DeepMind’s Frontier Safety Framework (Version 1.0, also May 2024) listed “TPUs with confidential compute capabilities” as part of its highest security level, which it maps to approximately RAND SL5, where model weights should be “generally not accessible to humans, even non-unilaterally.”
The confidential computing architecture protects not only model weights but user conversations too. The whitepaper describes “an end-to-end-encrypted communication channel with the AI model, where nobody except the data owner and the AI model itself can access the data.” Weight security and user privacy are two sides of the same design.
The Accelerator Landscape
To ground this discussion, here is a rough mapping of the accelerators powering American frontier labs. Based on publicly available documentation, NVIDIA GPUs appear to be the only accelerators currently offering on-chip confidential computing support. This creates a potential security asymmetry for labs using multiple chip types.
* AMD Instinct GPUs can integrate with AMD SEV-TIO, an extension that brings PCIe devices into the trust boundary of an SEV-SNP confidential virtual machine. PCIe (Peripheral Component Interconnect Express) is the standard interface connecting GPUs, network cards, and other accelerators to the CPU. SEV-TIO uses the TDISP (TEE Device Interface Security Protocol) to establish encrypted and authenticated channels between the CPU’s TEE and the GPU. This CPU-bridged approach is not equivalent to native on-chip TEE support, as the GPU itself has no secure enclave, and security depends on the integrity of the CPU TEE and the PCIe link. Anthropic and Pattern Labs’ whitepaper describes this as a fallback for accelerators lacking native TEE and notes that it is susceptible to side-channel attacks. The dependency on the CPU is a real concern, as recent research has demonstrated attacks against SEV-SNP itself.
** Google’s TPUv7 includes on-chip security features such as an integrated root of trust, secure boot, and secure test/debug capabilities. Google Cloud also offers platform-level protections through confidential VMs (using AMD SEV or Intel TDX on the CPU). However, based on publicly available documentation, TPUs do not currently have on-chip hardware memory encryption.
*** AWS’s Nitro System provides strong platform isolation, and AWS also offers attestation that cryptographically verifies trusted software is running before model weights are decrypted. However, Nitro Enclaves have historically operated only in the CPU, and based on publicly available documentation, Trainium and Inferentia do not currently have on-chip hardware memory encryption equivalent to NVIDIA’s GPU-CC. When AWS offers NVIDIA GPUs, customers can use both Nitro’s platform isolation and NVIDIA’s native GPU-CC capabilities.
A lab can restrict its most sensitive weights to hardware with confidential computing support, but maintaining that discipline gets harder as deployments grow more heterogeneous.
The Standards Opportunity
The National Institute for Standards and Technology’s (NIST) IR 8320 series has been developing hardware-enabled security guidance since 2021. The foundational document, IR 8320, established a layered approach to platform security for cloud and edge computing. Subsequent reports covered container platform security prototypes (IR 8320A), policy-based governance for trusted container platforms (IR 8320B), and machine identity management (IR 8320C). The most recent, IR 8320D, focuses on CPU and VM-based TEEs. None of these documents cover accelerator-level confidential computing.
The timing explains the gap. NIST last updated its confidential computing guidance in February 2023. NVIDIA shipped the first confidential GPUs in July 2023. The standards were written months before the technology was available, so they could not account for GPU-CC.
Nearly three years have passed, and the landscape has changed. Accelerator-level confidential computing is maturing. Frontier labs cite it in safety frameworks and publish architectures that depend on it. U.S. export policy asks about security measures for AI infrastructure abroad. Yet the standards have not caught up.
This is an area where the United States could lead. If NIST extended its guidance to cover AI workloads, including model weights, inference infrastructure, key management at the accelerator level, it would give the ecosystem a shared technical vocabulary for what accelerator-level security means. Updated standards could also inform the AI Exports Program, which already requires security measures but lacks specific guidance on what that security should look like at the hardware level.
The AI Exports Program
Executive Order 14320 describes the full-stack AI technology package as encompassing “(1) AI-optimized computer hardware (e.g., chips, servers, and accelerators), data center storage, cloud services, and networking, as well as a description of whether and to what extent such items are manufactured in the United States; (2) data pipelines and labeling systems; (3) AI models and systems; (4) measures to ensure the security and cybersecurity of AI models and systems; and (5) AI applications for specific use cases (e.g., software engineering, education, healthcare, agriculture, or transportation).”
Accelerator-level confidential computing pertains to components (1) and (4): it is both a hardware capability and a security measure. For export packages involving frontier model weights, hardware-attested encryption is an obvious consideration.
The Department of Commerce and International Trade Administration’s October 2025 RFI on the American AI Exports Program poses nearly thirty questions about how the program should operate. In a comment submitted in December 2025, I demonstrated how accelerator-level confidential computing is relevant to three: how to evaluate technology stack components (Section B, Question 5), what factors bear on national security (Section G, Question 21), and how to promote technical standards abroad (Section I, Question 27).
Conclusion
In 2023, the NVIDIA team that built the first confidential GPUs wrote: “Today, confidential computing is a great innovation. In a few years’ time, we expect all computing will be confidential, and we will all wonder why it was ever any other way.”
For frontier AI model weights, the question is whether that future arrives in time.
Acknowledgements
Thank you to Mauricio Baker, Liya Palagashvili, Sabrina Shih, and Anjay Friedman for comments.
Earlier last year, I read two blog posts on TPUv7/Ironwood that suggested otherwise. The Google representative at the workshop confirmed that TPUs do not yet support accelerator-level confidential computing. She noted that Google Cloud Platform (GCP) does offer NVIDIA GPUs with GPU-CC support, and that more demand might get TPU-CC on the roadmap.



Hey, great read as always. How does the 'least secure chip' bit praticaly work?