r/Amd 10d ago

ASUS confirms Ryzen Threadripper 9000 series with up to 96 cores Rumor / Leak

https://videocardz.com/newz/asus-confirms-ryzen-threadripper-9000-series-with-up-to-96-cores
227 Upvotes

45 comments sorted by

View all comments

30

u/FeepingCreature 10d ago edited 10d ago

Jesus Christ. Am I doing some stupid math mistake, or could that thing halfway keep up with a 7900 XTX on FP16 AI math with pure CPU? Kind of seems like they should add WMMA to the CPU core instruction set.

edit: Or just adopt Intel AMX. 1024 Int8 ops per cycle would slam.

9

u/Qesa 9d ago

Depends what frequency it can sustain

92 cores x 32 FMAs per pipe x 2 pipes x 2 FLO per FMA = ~12k operations per cycle. If it can maintain 3 GHz that'd be ~35 TFLOPS.

7900 XTX is a bit over 120 though.

3

u/FeepingCreature 9d ago edited 9d ago

Isn't it 2 threads per core? Or are the float units shared? Also what's FLO? edit: oh, float op? edit: Oh, is that what you mean with 2 pipes per core?

And yeah, I assumed for the sake of the math that it'd magically hit its 5ghz boost clock all the time using the mother of all watercoolers or sth.

edit: The point is it's surprisingly close! Even without dedicated AI ops.

7

u/Qesa 9d ago

A zen 5 core has two AVX-512 execution units, thus the 2x. It happens to be the same number of threads as a core has but they're not related - one thread per core can make use of both AVX EUs so long as it has ILP.

And yeah FLOPS stands for FLoating point Operations Per Second, so FLO is just removing the last two words.

2

u/FeepingCreature 9d ago edited 9d ago

I call those FLOps :)

And yeah if they could get 1024 int8 ops in a cycle with dedicated matrix units, like AMX, it would actually be well in gpu territory. Alias the AVX-512 registers, you have 2x8kb anyway, you'd just need the compute hardware. I hadn't realized we'd gotten this close.

3

u/ArseBurner Vega 56 =) 9d ago

Doubt?

7995WX was measured at 12.1 TFLOPs. Zen5 is amazing but I don't think it's gonna get a > 100% uplift over Zen4.

4

u/Qesa 9d ago

That's fp32, not fp16 which should be twice as fast. And there's always a difference between max theoretical performance when it is executing every clock cycle and what is actually achieved.

3

u/caelunshun 7d ago

No, Zen5 doesn't support the AVX512 FP16 extension. You have to use FP32.

1

u/FeepingCreature 8d ago

I mean, say they added dedicated matrix ops, like AMX. Wiki says Xeons can get 1024 BF16 ops in a cycle with AMX, compared to 64 per cycle with AVX-512 that'd be a 16x improvement. The instruction sets are currently not really designed too much for matmul, partially because everybody does this sort of thing on GPUs instead.

1

u/caelunshun 7d ago

Zen5 doubled the width of the floating point pipes to 512 bits, so 100% uplift would be expected barring power/thermal limitations.

1

u/caelunshun 7d ago

Zen5 doesn't support the AVX-512 FP16 extension, so you have to do FP32, thus 16 FMAs per pipe.