SlashDot linked an article that states the following
Based on a number of slides from an independent researcher, the Nvidia Pascal GPU100 features Stacked DRAM (1 TB/s) giving it as much as 12 TFLOPs of Single-Precision (FP32) compute performance. The flagship GPU is purportedly able to provide four TFLOPs of Double-Precision (FP64) compute performance as well.
If the single-precision processing is three times faster than the double-precision, would it make sense to stick with the double-precision emulation that you have for single-precision cards?
Note that 12 TFLOPS is significantly faster than the world’s fastest supercomputer from 2000. By 2005 that number would have claimed ‘only’ #15 on the list of fastest supercomputers. The cost to make #14 that year? $5.2 million.
In contrast, the 980 GPU has a claimed performance of 5 TFLOPS. I believe that covers the flagships of each generation of GPU.
Also, for those enthused over the breakthrough claims of the Pascal, here’s a bit more on what NVIDIA means by ten times the performance:
The idea is that if we look at all the improvements coming up with Pascal compared to Maxwell, they will collectively add up to make it “roughly” 10 times more efficient at deep learning compute tasks. Pascal will feature 3x the memory bandwidth of Maxwell, 2x peak single precision compute performance and 2x the performance per watt.
So if you want to run Infinity:Battlescape on your laptop, Pascal might make it possible without draining the battery in under an hour. Then again, it may drain your bank account far faster.