GCN Instructions scheduling clarification

Hello All,

Disclaimer: I am new to GPU programming (started on Tuesday but not to parrallel programing or SIMD programming (Connection Machine 16K 1 bit processors running in SIMD fashion... I am dating myself here ;-) .

Also I did read the white paper 2620_final and the southern island ISA plus any other papers/presentations I could put my hands on...

I have a some questions regarding the way instructions are schedule on a GCN CU.

What I have understood so far:

A CU has 4 vector units 1 scalar and 1 LDS and emit one instruction per cycles to each but

vector unit i consume 4 times the same instruction from wavefront wf-ai (4 cycles) (i in {0,1,2,3})

The scalar unit get one instruction per cycle from 4 different wf-b0, wf-b1, wf-b2 and wf-b3. So seen from one vector unit one scalar instructions could be executed per 4 cycles if it belong to a wavefront not running on the vector unit.

Question 1: Could a wavefront with many successive scalar instructions (and no other wavefront in the CU in position to execute a scalar instruction)
run more than one scalar per 4 cycles group? (I will guess not, if the scalar unit as the same needs to hide pipeline latency than the vector units.)

Question 2: Could a LDS instruction and a scalar instruction belonging to the same wavefront run in the same time 4 cycles group?

Same question for LDS and Vector instructions of the same wavefront.

Question 3; Wavefront priority: does the priority affect LDS and scalar instruction dispatch over the whole CU or just on a per vector unit basis?

Question 5: On a Vector unit with two wavefront executing on it (same priority) could it be assumed that they will alternate every 4 cycles (no GDS or LDS pending execution or conflict)?

Questions 6: 64 bits instructions on the vector unit take 4 cycles (16 cycles for a wavefront) I understand it as a pure stall for that Vector unit but could 4 or more Scalar and LDS instructions belonging to other wavefront running on the same vector unit been executed during those cycles?

Voila, I guess that will be it for now...

Thanks,

Eric L.

PS: The targeted app is of symbolic nature (marginal use of floating point) and it his latency sensible.

GCN Instructions scheduling clarification

Trending Articles

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Chelmsford Ladbrokes conman John O'Connor jailed for £10,000 fraud

フジファブリック (Fujifabric) – LIFE [Mora FLAC 24bit/96kHz]

[Release] Yo-kai Watch 3 (1.0) UNDUB - [EUR/USA] (Sukiyaki 4.0)

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Neem Baba Extra Questions Answer Class 6 English Poorvi

10 Best Eid Milad Un Nabi Whatsapp Status in Hindi

The 10 Tennessee Cities With The Largest Black Population For 2021

Class 6 SST Chapter 10 Extra Questions and Answers in Hindi आधारभूत लोकतंत्र...

He's 80, she's 55: Veteran journalist Peter Lim marries girlfriend after long...

Alison Evers Singer Tony Hadley’s Wife

Download: 2Black Men – Vina So (Prod by_ Blue Sky Music)

[ROM] [UNOFFICIAL] [STABLE] [10.0] LightningFastROM for J5 2016

Practice Sheet of Right form of verbs for HSC Students

Falcon Rising (2014) Dual Audio Hindi 720p BluRay x264 1GB

JJ – Back to Forgetting – Single [iTunes Plus M4A]

DT 31025 (Hints)

FOUR PEOPLE APPEAR IN COURT CHARGED WITH STOLEN CAR INCIDENT

Throw Back: Samini — Where My Baby Dey (Prod by Kaywa)

Lenovo Ideapad s10-3T Unlocked ROM Request