It was only a matter of time before Nvidia took the pristine TU116 graphics processor in its GeForce GTX 1660 Ti and sculpted it a little. to create a low-cost derivative. The new GeForce GTX 1660 is, without surprises, very similar to the higher-end model as it lacks the RT and Tensor cores of Turing architecture. Instead, focus on on-die resources to accelerate today's rasterized games.
Nvidia hasn't even cut much from the TU116 resource pool in creating GeForce GTX 1660: a pair of Streaming Multiprocessors are excised, taking 128 CUDA cores and eight unit tiles with them. But the GPU is otherwise fairly complete. The biggest loss of this card is the lack of GDDR6 memory. By exchanging GDDR5 at 8 Gb / s, instead, the bandwidth drops from 288 GB / s of the 1660 Ti to only 192 GB / s.
Of course, the GeForce GTX 1660 is aimed primarily at FHD games, where 6 GB of slower memory will not damage performance as much as higher resolutions. Can the $ 220 / £ 200 card keep frame rates high enough to take AMD's Radeon RX 590 away with more GDDR5 on a larger bus, though?
TU116 Summary: Turing without RT and Tensor cores
The GPU in the center of GeForce GTX 1660 is specifically named TU116-300-A1. It is a close relative of the TU116-400-A1 of the GeForce GTX 1660 Ti, cut from 24 multiprocessors in streaming at 22. Obviously we are still dealing with a processor without core RT and Tensor from Nvidia, which measure 284 mm² and consists of 6.6 billion transistors produced with TSMC's 12 nm FinFET process.
Despite the smaller transistors, the TU116 is 42 percent larger than the GP106 processor that preceded it. Part of this growth is attributable to the more sophisticated shaders of Turing's architecture. Like the higher-end GeForce RTX 20 series cards, GeForce GTX 1660 supports the simultaneous execution of FP32 arithmetic instructions, which make up most of the shader workloads and INT32 operations (for addressing / retrieval of data, minimum / maximum mobile, comparison, etc.). When one feels that the Turing cores achieve better performance than Pascal at a given clock frequency, this capability largely explains why
Turing's streaming multiprocessors are composed of a lower number of CUDA cores than Pascal, but the design partially compensates by spreading more SM on each GPU. The most recent architecture assigns a programmer to each set of 16 CUDA cores (2x Pascals), along with a sending unit for 16 CUDA cores (like Pascal). Four of these 16-core groupings include SM, along with 96KB of cache that can be configured as 64K or L1 / 32KB shared memory and vice versa and four plot units. Since Turing doubles on the schedulers, it is sufficient to issue an instruction to the CUDA cores every other clock cycle to keep them full. Meanwhile, he is free to give different instructions to any other unit, including INT32 cores.
In the TU116, Nvidia replaces Turing's Tensor cores with 128 dedicated FP16 cores for SM, which allow GeForce GTX 1660 to process half-precision operations at 2x the FP32 rate. The other Turing-based GPUs also boast dual-frequency FP16s through their Tensor cores, so the TU116 configuration serves to maintain the standard through hardware developed specifically for this GPU. The following table is an updated version of the one published in our GeForce GTX 1660 Ti review, which illustrates the tremendous improvement of the TU116 compared to half-precision throughput compared to GeForce GTX 1060 and the Pascal-based GP106 chip.
When we executed Sandra & # 39; s The scientific analysis module, which verifies the multiplication of the general matrix, shows us how many FP16 throughputs reach the Tensor nuclei of TU106 with respect to TU116. GeForce GTX 1060, which symbolically supported only FP16, barely records on the graph.
In addition to the Turing architecture shaders and the unified cache, TU116 also supports a pair of algorithms called Content Adaptive Shading and Motion Adaptive Shading, along with variable-rate shading. We covered this technology in Nvidia's Turing architecture. Explored: inside the GeForce RTX 2080. That story also introduced Turing's accelerated video encoding and decoding features, which also affect GeForce GTX 1660.
Putting it all together …
Nvidia packs 24 SM into TU116, dividing them between three graphics processing clusters. With 64 FP32 cores for SM, it is 1,536 CUDA cores and 96 texture units on the entire GPU. In the loss of two SM, GeForce GTX 1660 ends with 1,408 active CUDA cores and 88 usable texture units.
On-board partners will undoubtedly choose a range of frequencies to differentiate their cards. However, the official base clock frequency is 1,530 MHz with a specific 1.785 MHz GPU Boost . Both of these numbers are slightly higher than the GeForce GTX 1660 Ti watches, although they cannot entirely compensate for the missing MS.
Our sample Gigabyte GeForce GTX 1660 OC 6G maintained a constant 1.935 MHz through three series of Metro: Ultimo Light, runs at around 90 MHz faster than the 1660 Ti we reviewed a few weeks ago . On paper, therefore, GeForce GTX 1660 offers up to 5 TFLOPS of FP32 performance and 10 TFLOPS throughput FP16
Six 32-bit memory controllers give the TU116 a 192-bit aggregate bus, which is populated by 8 Gb / s GDDR5 modules that push up to 192 GB / s. It is comparable to GeForce GTX 1060 6GB and a 33% reduction compared to GeForce GTX 1660 Ti. Combined with the loss of two SM, moving from GDDR6 to GDDR5 memory accounts for the lower performance of GeForce GTX 1660 compared to 1660 Ti.
Each memory controller is associated with eight ROPs and a 256KB L2 cache portion. In total, TU116 exhibits 48 ROPs and 1.5 MBs of L2. The number of ROPs of GeForce GTX 1660 is comparable to that of RTX 2060, which also uses 48 rendering outputs. But the L11 cache slices of the TU116 are half the size of the TU106.
Given the similarities with the GeForce GTX 1660 Ti, it is not surprising that the GeForce GTX 1660 is rated for the same 120W. Unfortunately, none of the graphics cards include multi-GPU support. Nvidia continues to push the narrative that SLI intends to improve absolute performance, rather than offering players a way to combine single GPU configurations.
|Gigabyte GeForce GTX 1660 OC 6G||GeForce GTX 1660 Ti||GeForce RTX 2060 FE||GeForce GTX 1060 FE|| GeForce GTX 1070 FE
| Architecture (GPU)
||Turing (TU116 )||Turing (TU116)||Turing (TU106)||Pascal (GP106)||Pascal (GP104)|
| CUDA Cores
| Peak FP32 Compute
||5 TFLOPS||5.4 TFLOPS|| 6.45 TLFOPS
 4.4 TFLOPS
| Tensor Core
||N / A  N / A||240||N / A|
| ] RT Cores
||N / A||N / A||30||N / A||N / A|
| Weft unit  88
| Base clock frequency
||1530 MHz||1500 MHz||1365 MHz||1506 MHz||1506 MHz|
| GPU increment rate  1785 MHz
|| 1770 MHz
|| 1680 MHz
 1708 MHz
| 1683 MHz
| Memory capacity
||6 GB GDDR5||6 GB GDDR6||6 GB GDDR6||6 GB GDDR5||8 GB GDDR5|
| Memory bus
| Memory bandwidth
||192 GB / s||288 GB / s||336 GB / s||192 GB / s||256 GB / s|
| L2 Cache
||1.5 MB||1.5 MB||3MB||1.5 MB||2 MB|
||120 W||120 W|| 160 W
 120 W
| Transistor count
||6.6 billion||6.6 billion||10.8 billion||4.4 billion||7.2 billion|
| Mold size
||284 mm²||284 mm²  445 mm²||200 mm²||314 mm²|
| No |
|| Yes (MIO)
OTHER: Best graphics cards
OTHER: Table of the performance hierarchy of Desktop GPU
OTHER: all graphic content