2024年9月,国科大密码学院郑昉昱副教授作为通信作者指导学生,在International Conference on Cryptographic Hardware and Embedded Systems (CHES 2024)会议上发表了题为“ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches”的研究论文。
CHES是国际密码协会(IACR)主办的密码工程和硬件安全方向的顶级学术会议,会议上的论文代表了国际学术界和产业界在密码高性能实现领域的最高水平。论文中提出了一种通过AI加速器进行后量子密码算法加速的方法,在当前后量子密码迁移的大趋势下具备很强的应用前景,该研究工作得到CCF-蚂蚁科研基金(No. CCF-AFSG RF20230206)的支持。
Abstract:The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementation, overlooking the intricate internal mechanisms of these devices. Consequently, a significant number of computational resources is underutilized.
In this paper, we present a comprehensive exploration of NVIDIA Tensor Cores and introduce a novel framework tailored specifically for Kyber. Firstly, we propose two innovative approaches that efficiently break down Kyber’s NTT into iterative matrix multiplications, resulting in approximately a 75% reduction in costs compared to the state-of-the-art scanning-based methods. Secondly, by reversing the internal mechanisms, we precisely manipulate the internal resources of Tensor Cores using assembly-level code instead of inefficient standard interfaces, eliminating memory accesses and redundant function calls. Finally, building upon our highly optimized NTT, we provide a complete implementation for all parameter sets of Kyber. Our implementation surpasses the state-of-the-art Tensor Core based work, achieving remarkable speed-ups of 1.93x, 1.65x, 1.22x and 3.55x for polyvec_ntt, KeyGen, Enc and Dec in Kyber-1024, respectively. Even when considering execution latency, our throughput-oriented full Kyber implementation maintains an acceptable execution latency. For instance, the execution latency ranges from 1.02 to 5.68 milliseconds for Kyber-1024 on R3080 when achieving the peak throughput.
论文信息:Tian Zhou, Fangyu Zheng, Guang Fan, Lipeng Wan, Wenxu Tang, Yixuan Song, Yi Bian, Jingqiang Lin, “ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches”, 26th International Conference on Cryptographic Hardware and Embedded Systems (CHES).(CCF-B,密码工程顶会)
(原文链接:https://tches.iacr.org/index.php/TCHES/article/view/11420)