WOLF Advanced Technology outlines the development of NVIDIA® GPU architectures when upgrading from Turing to Blackwell, focusing on high-end embedded devices such as TU104, GA104, AD103 and GB203. Read more >>
The whitepaper details how NVIDIA consistently expanded GPU functionality across general-purpose compute, graphics processing and AI workloads. Each generation introduced updates to CUDA cores, Tensor cores and Ray Tracing cores, while also adopting new manufacturing processes to increase transistor density and clock capability.
Key architectural components of NVIDIA GPU remain consistent across generations. GPUs communicate with host systems through PCIe, interface with external memory, maintain internal cache structures and use dedicated scheduling hardware.
Video encoding and decoding blocks (NVENC and NVDEC) evolve with each release, adding support for more formats and higher resolutions and frame rates. The paper highlights differences in memory technologies, cache sizes, PCIe generations and display capabilities, noting major shifts such as the move from GDDR6 to GDDR7 memory and the increased L2 cache size introduced with Ada.
Significant architectural changes occur within the Graphics Processing Clusters (GPCS) and Streaming Multiprocessors (SM). Later architectures integrate ROP partitions directly into GPCs and increase the number of Texture Processing Clusters (TPC).
The Streaming Multiprocessors (SMs) undergo substantial updates, including expanded shared memory, new task-scheduling behavior, and enhanced CUDA datapath flexibility. Blackwell allows all 128 CUDA cores per SM to perform FP32 or INT32 operations, removing earlier data-path separation.
Tensor cores progress from limited FP16 support in early generations to broader precision options such as FP8, FP4, FP6, BF16 and sparsity acceleration, while Ray Tracing cores gain higher throughput with each release.
The whitepaper concludes by noting optimizations in warp-level scheduling inherited from Volta, as well as continued evolution of software support through the CUDA Toolkit, AI and HPC SDKs and associated libraries. NVIDIA’s architecture updates consistently target higher computational density, improved performance and expanded capability across graphics, compute and AI applications.






