Dr. Aritra Dhar
From 11:00 until 12:30
At CAB H 52 (Seminar) + CNB/F/110 (Lunch) , ETH Zurich
CAB H 52 (Seminar) + CNB/F/110 (Lunch), ETH Zurich
Abstract:
Driven by recent advances in large language models (LLM), generative AI applications have become the dominant workload for the modern cloud. Specialized hardware accelerators, such as GPUs, NPUs, and TPUs, play a key role in AI adoption due to their superior performance over general-purpose CPUs. AI models and the data are often highly sensitive and come from mutually distrusting parties.
Existing industry-standard CPU-based TEEs, such as Intel SGX or AMD SEV, do not adequately protect these accelerators. Device-TEEs like Nvidia-CC only address tightly coupled CPU-GPU systems with a proprietary solution requiring TEE on the host CPU side. On the other hand, existing academic proposals target specific CPU-TEE platforms.
To address this gap, we propose Ascend-CC, a confidential computing architecture for discrete NPU devices that requires no trust in the host system. Ascend-CC secures data, model parameters, and operator binaries through authenticated encryption. Ascend-CC uses delegation-based memory semantics to ensure isolation from the host software stack, and task attestation guarantees strong model integrity. Our Ascend-CC implementation and evaluation with state-of-the-art LLMs such as Llama2 and Llama3 shows that Ascend-CC introduces minimal overhead with no changes in the AI software stack.