Fiji & HBM dies x ray'd. Additional interesting benefits to HBM revealed.

Very humorous! The limitation on signal speed is nothing like the speed of light, but instead the ability of drivers to charge the capacitance of the lines. The longer and larger those conductors, the more capacitance there is and the longer it takes for those signals to ramp or the larger the driver layout must be and the less chip area available for other things.

It doesn't really matter how you get to the speed. What matters is that the travel speed on a high quality PCB is around 20cm per nanosecond, which is confirmed by this article: https://en.wikipedia.org/wiki/Signal_velocity

GPU's are latency tolerant as long as they are working on video signals which are repetitive and predictable, so memory can be requested long before it is needed. However, as soon as GPU's start to compute like a CPU, they need fast random access to data just like a CPU.

An here, my friend, you are 100% wrong. Video signals are such a minor part of the BW of a contemporary GPU (~1%) that they aren't even worth talking about. I am totally talking about random accesses.

Let's use some equivalent scenario. You need to know what kind of Xmas present your family members want. You can do this in 2 ways. 1. you write an email to uncle Bob. You wait until you get an answer. Then you send the present. Then you write an email to aunt Sallie. And so on, until you have handled to all family members.

  1. you write an email to uncle Bob. You don't wait until you get an answer, but immediately write an email to aunt Sallie. You keep on writing emails until you get the first reply (from uncle Bob or aunt Sallie, it can be out of order.) When you get a reply, you immediately send out the present, and then go back to writing emails. Until you get the next answer. And so forth.

The first method is a single threaded CPU. The second method is one of a GPU.

As long as there are enough family members to write emails to, you never have to sit idle. In the beginning, you'll be mostly sending emails, while at the end, you'll be sending presents. But you're always busy.

That's how latency hiding works. In a modern GPU, it can take easily 500 cycles for a DRAM read memory access to return data. But there are also so many pixels to be processed that it doesn't matter and you rarely have a stall. (See this discussion for some info on latency hiding. It even mentions that a G80 had 200-300 of access latency. That number can only have gone up dramatically in the last 8 years.)

Note that external GPU's cannot usefully share data structures with the CPU because they do not have direct access to main memory.

You're 100% wrong again. Check out CUDA pinned memory: http://cedric-augonnet.com/accessing-pinned-host-memory-directly-from-the-device/ I don't know exactly when this feature was introduced, but it's at least 5 years ago.

/r/Amd Thread Parent Link - ccftech.com