Quantcast
Channel: Community : All Content - OpenCL
Viewing all articles
Browse latest Browse all 2400

Instruction throughput clarification

$
0
0

Hi,

I am reading AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide-rev-2.7.pdf and on page 6-24 (or 126 in absolute terms) there is: Table 6.3 Instruction Throughput (Operations/Cycle for Each Stream Processor). It lists eg One Quarter-Double-Precision-Speed Devices SPFP MAD Rate (Operations/Cycle) for each Stream Processor as 4. That confuses me. Does it mean that each ALU can effectively complete 8 FLOPs per cycle? Single MAD instruction counts as 2 FLOPs, but if the table shows 4 MADs per ALU, then eg whole Radeon HD 7970 GHz Edition should have peak computational throughput estimated as 2048 (ALUs) * 1 (GHz) * 8 (FLOP / cycle / ALU) ~=~ 16.4 (TFLOPS)?

 

The table with Instruction Throughput for Evergreen and Northern Islands Devices makes much more sense.

 

So to recap, how to interpret the numbers in Instruction Throughput tables?


Viewing all articles
Browse latest Browse all 2400

Latest Images

Trending Articles



Latest Images