Why Ashes of the Singularity?

Case in point:

"But it's important to note that a lot of these techniques are for GCN only. For example, Frostbite recently had a paper about doing triangle culling with compute (http://www.wihlidal.ca/Presentations/GDC_2016_Compute.pdf). They note they got a pretty good performance boost using this technique. However from my understanding when that same technique was applied to Intel's and Nvidia's hardware, it wasn't any faster. And the reason had nothing to do with inferior async/concurrent compute "support". The original "performance problem" (idle units) didn't exist on those architectures because they weren't nearly as bottlenecked by triangle throughput as GCN. Even if Intel and Nvidia had the best async compute support (whatever that means), they would not have seen (meaningful) performance gains with this technique on those architectures.

So what does this mean? You could argue that GCN shouldn't have had those bottlenecks in the first place. You could also argue that AMD designed GCN specifically to be flexible enough to workaround those bottlenecks. Both arguments are correct. :-D The point is all async compute techniques are unique and are highly sensitive to the underlining architectures. Perhaps async compute technique X might give better performance gains on architecture A but technique Y might be better for architecture B. It all depends on the bottlenecks of the architecture.

Ultimately (imo) the performance gain of a given async technique means nothing. Perhaps the architecture already wasn't idle. Perhaps the architecture was way more idle than it should have been. :razz: Regardless, hopefully people will at least understand why having a "does support async compute" flag is nonsensical. And why "supports async/concurrent compute" is a loaded term."

/r/pcmasterrace Thread Parent