AD102 is also the sub-top core of the Ada Lovelace family, and it is also the configuration of the flagship RTX 40 series game card. It is likely to correspond to RTX 4090 Ti and RTX 4090 graphics cards.
According to the analysis, AD102 has built-in 12 sets of GPC (display computing cluster), which is 70% more than the previous generation GA102. Each set of GPCs consists of 6 TPCs (2 SMs) and each SM unit consists of 4 sub-cores, which are all the same as Ampere, but the difference is that each SM sub-core consists of 128 sets of FP32 cells, plus IN32 integer cells total reached 192.
The complete AD102 includes 24 groups of SMs, all of which are 12288 FP32 units plus 6144 INT32, which is 18432 CUDAs in an easy-to-understand way.
In terms of cache, in the AD102 core, each group of SMs enjoys 192KB L1, a 50% increase over Ampere, totaling 4.5MB. L2 increased to 96MB, 16 times the ampere.
Correspondingly, the scale of ROP and RT light chasing units is naturally rising. The AD102 has a maximum of 384 ROPs, and the RTX 3090 Ti is only 112. In addition, the light chasing unit is upgraded to the third generation, and the Tensor unit is upgraded to the fourth generation.
Based on this, it does not seem that the RTX 4090 will eventually double the performance. As far as FP32 single-precision floating point is concerned, the outside world is expected to reach 90T, while the RTX 3090 Ti is only 40T, at the cost of over 600W of power consumption…