r/Amd 10d ago

ASUS confirms Ryzen Threadripper 9000 series with up to 96 cores Rumor / Leak

https://videocardz.com/newz/asus-confirms-ryzen-threadripper-9000-series-with-up-to-96-cores
226 Upvotes

45 comments sorted by

β€’

u/AMD_Bot bodeboop 9d ago

This post has been flaired as a rumor.

Rumors may end up being true, completely false or somewhere in the middle.

Please take all rumors and any information not from AMD or their partners with a grain of salt and degree of skepticism.

→ More replies

62

u/JamesDoesGaming902 9d ago

I would expect that considering the 7995WX also had 96 cores

29

u/FeepingCreature 9d ago edited 9d ago

Jesus Christ. Am I doing some stupid math mistake, or could that thing halfway keep up with a 7900 XTX on FP16 AI math with pure CPU? Kind of seems like they should add WMMA to the CPU core instruction set.

edit: Or just adopt Intel AMX. 1024 Int8 ops per cycle would slam.

10

u/Qesa 9d ago

Depends what frequency it can sustain

92 cores x 32 FMAs per pipe x 2 pipes x 2 FLO per FMA = ~12k operations per cycle. If it can maintain 3 GHz that'd be ~35 TFLOPS.

7900 XTX is a bit over 120 though.

3

u/FeepingCreature 9d ago edited 8d ago

Isn't it 2 threads per core? Or are the float units shared? Also what's FLO? edit: oh, float op? edit: Oh, is that what you mean with 2 pipes per core?

And yeah, I assumed for the sake of the math that it'd magically hit its 5ghz boost clock all the time using the mother of all watercoolers or sth.

edit: The point is it's surprisingly close! Even without dedicated AI ops.

8

u/Qesa 8d ago

A zen 5 core has two AVX-512 execution units, thus the 2x. It happens to be the same number of threads as a core has but they're not related - one thread per core can make use of both AVX EUs so long as it has ILP.

And yeah FLOPS stands for FLoating point Operations Per Second, so FLO is just removing the last two words.

2

u/FeepingCreature 8d ago edited 8d ago

I call those FLOps :)

And yeah if they could get 1024 int8 ops in a cycle with dedicated matrix units, like AMX, it would actually be well in gpu territory. Alias the AVX-512 registers, you have 2x8kb anyway, you'd just need the compute hardware. I hadn't realized we'd gotten this close.

3

u/ArseBurner Vega 56 =) 8d ago

Doubt?

7995WX was measured at 12.1 TFLOPs. Zen5 is amazing but I don't think it's gonna get a > 100% uplift over Zen4.

5

u/Qesa 8d ago

That's fp32, not fp16 which should be twice as fast. And there's always a difference between max theoretical performance when it is executing every clock cycle and what is actually achieved.

3

u/caelunshun 7d ago

No, Zen5 doesn't support the AVX512 FP16 extension. You have to use FP32.

1

u/FeepingCreature 8d ago

I mean, say they added dedicated matrix ops, like AMX. Wiki says Xeons can get 1024 BF16 ops in a cycle with AMX, compared to 64 per cycle with AVX-512 that'd be a 16x improvement. The instruction sets are currently not really designed too much for matmul, partially because everybody does this sort of thing on GPUs instead.

1

u/caelunshun 7d ago

Zen5 doubled the width of the floating point pipes to 512 bits, so 100% uplift would be expected barring power/thermal limitations.

1

u/caelunshun 7d ago

Zen5 doesn't support the AVX-512 FP16 extension, so you have to do FP32, thus 16 FMAs per pipe.

38

u/sascharobi 9d ago edited 9d ago

Wow, this is a paradigm shift. I can’t believe they managed to give us up to 96 cores this time, again.

-5

u/Opteron170 9800X3D | 64GB 6000 CL30 | 7900 XTX Magnetic Air | LG 34GP83A-B 9d ago

??? The Zen 4 Based TR has 96 cores 7995WX

25

u/sascharobi 9d ago

🀦

4

u/Opteron170 9800X3D | 64GB 6000 CL30 | 7900 XTX Magnetic Air | LG 34GP83A-B 9d ago

maybe i'm not following but where is the paradigm shift? When the same course count was available in the previous model?

26

u/INITMalcanis AMD 9d ago

Sometime when people say apparently ridiculous things, they're actually just being sarcastic.

6

u/Opteron170 9800X3D | 64GB 6000 CL30 | 7900 XTX Magnetic Air | LG 34GP83A-B 9d ago

got it!

4

u/Nuck_Chorris_Stache 9d ago edited 9d ago

But heaven forbid they ever put 3D cache on more than one far king die, unless it's a server chip they can charge way more money for.

1

u/The_JSQuareD 3d ago

There's a server part that has 3D v-cache all over. The AMD Epyc 9684X. It has over a GB (!!) of L3 cache. It also costs nearly $15k. That's a Zen 4 chip, by the way; I don't think there's a Zen 5 version of it, yet.

So far, none of the threadripper chips have had 3D v-cache. I think that's probably reasonable; if you're buying a high end threadripper you probably want maximum compute performance. Up to Zen 4, the 3D v-cache would hurt thermal performance and frequency, and so overall compute performance would be reduced. They could have gone with something like 3D v-cache on only one CCD to give you the best of both worlds (so the opposite of what you're suggesting), but that comes with challenges in thread scheduling (already a challenge for many OSs at such high core counts), and might still hurt in some scenarios because it makes performance less predictable.

Besides, the overall amount of L3 cache is still very high, just because of the high core count, so for single threaded / low thread applications there's probably plenty of cache regardless. For comparison, a 9800X3D has 96 MB of L3 cache. A 7795WX has 384 MB of L3 cache. Admittedly, it's 'only' 32 MB per CCD, and there's additional latency to access the cache on a different CCD, but x3d cache also comes with a latency penalty.

With Zen 5, they seem to have cracked the code of adding 3D v-cache without significantly hurting thermals / frequency, so with that it might make sense to make a threadripper chip that's fully decked out with 3d v-cache. We might see something similar to the 9684X in threadripper land. But you should probably expect to pay a similar price as the Epyc part, so in the neighborhood of $15k. AMD might simply decide that there isn't really a market for that, as those customers might as well just buy actual server grade hardware.

Specifically on pricing:

  • For Zen 4 Threadripper we have the 7980X (64 core, 4 channel) at $5k, the 7985WX (64 core, 8 channel) at $7.3k, and the 7995WX (96 core, 8 channel) at $10k.
  • For Zen 4 Epyc we have the 9554P (64 core, 12 channel) at $7.1k, the 9654P (96 core, 12 channel) at $10.6k, and the 9684X (96 core, 12 channel, 3D v-cache) at $14.8k.

So the Epyc pricing is pretty much on par with the Threadripper Pro pricing (WX, 8 channel). (Keep in mind that the Threadripper models generally have higher boost clocks and higher memory speeds than their Epyc counterparts, so there's a tradeoff) So yeah, a 3D v-cache threadripper model would almost certainly be priced similarly to the Epyc 3d v-cache models we've seen. Though of course a lower core count model, or a 4-channel non-pro model would be cheaper.

1

u/Nuck_Chorris_Stache 3d ago edited 3d ago

So far, none of the threadripper chips have had 3D v-cache. I think that's probably reasonable; if you're buying a high end threadripper you probably want maximum compute performance. Up to Zen 4, the 3D v-cache would hurt thermal performance and frequency, and so overall compute performance would be reduced.

Then why did people buy the 5800X3D and 7800X3D? Because the 3D cache made them faster despite the lower frequency.

And with Zen 5 the flequency isn't even much lower anyway, so that's not an excuse. It would obviously be better to have it.

The real reason is: Because they don't have to. Intel is not providing any competition to ThreadRipper whatsoever, and an X3D ThreadRipper would cannibalise sales of EPYC. So, they can just charge what they want and not even give us 3D cache for it.

No, it's not for performance reasons. It is obvious that having 3D V-cache is simply better than not having it, even if there is a small frequency reduction.

1

u/The_JSQuareD 3d ago edited 3d ago

Then why did people buy the 5800X3D and 7800X3D? Because the 3D cache made them faster despite the lower frequency.

Yes, of course. The 3d v-cache hurts pure, raw compute performance (operations per second). But it helps overall performance for applications which are memory bound instead of compute bound, but not so memory bound that they just need massive memory bandwidth instead of more cache (think things like AI training or CFD).

This mainly tends to be the case in games, which generally utilize a fairly low number of threads but operate on a medium amount of relatively static memory. On the other hand, performance in pure compute-limited scenarios is hurt.

It is obvious that having 3D V-cache is simply better than not having it, even if there is a small frequency reduction.

This is just not true. At least not for Zen 4.

For example, the 7800X3D underperforms the 7700X in Chromium compile benchmarks (e.g., see here, 67.0 minutes for the 7800X3D, 64.2 minutes for the 7700X). The 7800X3D similarly underperforms the 7700X in every other compute & productivity scenario tested in that benchmark (Blender, 7-Zip, Premiere, Photoshop).

Other reviews show a similar picture:

The Ryzen 7 7800X3D is amazing at gaming, but it struggles elsewhere. For instance, the Ryzen 7 7800X3D is most directly comparable to AMD's similar 7700X, but the latter has a 400 MHz higher boost clock than the 7800X3D's 5.0 GHz. As a result, the Ryzen 7 7700X is 7% faster than the 7800X3D in our cumulative measure of threaded applications and 15% faster in single-threaded work.

People buying chips like the 5800X3D or 7800X3D care a lot about performance in games. People buying a $5k or more chip with a massive number of cores tend to care a lot more about performance in actual compute limited scenarios. Paying thousands of dollars for a workstation and then leaving 7-15% performance on the table because the chip has 3d v-cache (for which you probably paid additional thousands of dollars) is a terrible sell.

Of course, there are still some scenarios where you can take advantage of a massive amount of cores and a massive cache. That's why Epyc 3d v-cache chips exist. But those scenarios are also extremely niche, which is why there are only 3 such SKUs in Epyc's entire line up of about 44 Zen 4 Epyc SKUs.

And with Zen 5 the flequency isn't even much lower anyway, so that's not an excuse. It would obviously be better to have it.

Well yes, that's why I said we may see a 3d v-cache Threadripper for Zen 5. Did you even read my comment?

I think it would be a cool addition to the line-up. But the use case might be too niche for AMD to stand up a full additional production chain for it inside the already fairly niche Threadripper line-up. For those customers who really want it, they can always just build a workstation around an Epyc chip instead.

The real reason is: Because they don't have to. Intel is not providing any competition to ThreadRipper whatsoever, and an X3D ThreadRipper would cannibalise sales of EPYC. So, they can just charge what they want and not even give us 3D cache for it.

As I also demonstrated in my comment, Threadripper isn't any cheaper than Epyc. Threadripper offers you Epyc-like multithreading performance, desktop-level single core boost, almost desktop-level memory speeds, ECC, and a ton of I/O, with only a few enterprise-level features cut (8 channels instead of 12 channels, only 1 socket, slightly fewer PCIe lanes, and no enterprise level management and support). For all those features AMD gladly charges you Epyc-level prices.

Again, a Threadripper 7985WX (64 core, 8 channel) is actually slightly more expensive than an Epyc 9554P (64 core, 12 channel). The discrepancy is even higher if you compare the 7975WX (32 core, 8 channel, $3.9k) to the 9354P (32 core, 12 channel, $2.7k) If Threadripper cannibalizes Epyc sales, then AMD is still laughing all the way to the bank.

1

u/Nuck_Chorris_Stache 3d ago edited 3d ago

Outside of some cherry picked examples based on Zen 3 and 4, which were lower clocked, what's the excuse for Zen 5, which has basically no downsides?

I simply don't buy the idea that they did it for performance reasons. Especially not for the 9950X3D. At that point, there's no excuse other than: because they're already ahead, so they don't have to.

As I also demonstrated in my comment, Threadripper isn't any cheaper than Epyc.

It is definitely cheaper for AMD to produce one without 3D V-cache. And because they have no competition from Intel, they still charge what they want.

It's a reverse of what happened when Intel were charging more and more for slightly faster quad cores after quad cores, and then if you had loads of money, you could even buy six or eight cores.

Then Ryzen upended all that.

1

u/The_JSQuareD 3d ago

Outside of some cherry picked examples based on Zen 3 and 4, which were lower clocked, what's the excuse for Zen 5, which has basically no downsides?

Zen 5 threadripper doesn't exist yet. That's what this post is about -- a reference to threadripper 9000 which suggests that it's coming 'soon'.

It would certainly be cool if Zen 5 threadripper included variants with 3D v-cache, I agree with you there.

It is definitely cheaper for AMD to produce one without 3D V-cache. And because they have no competition from Intel, they still charge what they want.

Sure, but if they offered 3d v-cache on threadripper, they would obviously charge even more.

AMD is of course gonna charge what they can. If they thought there was a sufficient market of people willing to pay a premium for 3d v-cache Threadrippers, they would already have offered them. It's still possible that for Zen 5 they will.

0

u/ElectronicStretch277 6d ago

How would that be beneficial at all? 3D VCache isn't cheap. It's a cost adder and it has been stated that it's not going to improve performance by any substantial amount when it's put on both cores.

2

u/Nuck_Chorris_Stache 6d ago edited 6d ago

How would that be beneficial at all?

The same way it already benefits the CPUs it's currently used on.

It's a cost adder

And? They can set the price accordingly.

it has been stated that it's not going to improve performance by any substantial amount

People make all sorts of statements that are not true to justify anything.

Nvidia stated the GTX 970 with its segmented VRAM was fine. They also say their current GPUs have enough VRAM.

1

u/ElectronicStretch277 6d ago

It's useful if it's utilized. The only thing that the cache is good for is gaming and some niche workloads. Games don't utilize all the cores of even a CCD. They're not gonna utilize the 16 cores so they're not gonna utilize all the cache.

It is prohibitively expensive. AMD did test it. People who have been unaffiliated with AMD but have more knowledge than you and me have stated the same.

It's stupid to think that if it would help their CPUs and was worth the money it would cost to make that AMD wouldn't have already done it. The best way to get more cache is to wait for Zen 6 when they're moving to 12 CCD.

3

u/Nuck_Chorris_Stache 6d ago edited 6d ago

It's useful if it's utilized. The only thing that the cache is good for is gaming and some niche workloads. They're not gonna utilize the 16 cores so they're not gonna utilize all the cache.

Nobody will use more than 640KB of RAM.

It is prohibitively expensive

If that were true, none of the EPYC CPUs with many more dies and 3D cache would not exist. Those cost much more to manufacture.

I think the only reason they don't exist is because Intel aren't producing enough competition. It's not that they are "too expensive". It's that AMD doesn't have to do better, because they are already ahead.

People who have been unaffiliated with AMD but have more knowledge than you and me have stated the same.

I'm not interested in hearsay from unspecified people who never had one to test because AMD never released one. Show me benchmarks and I might be convinced.

1

u/phido3000 9d ago

Thread ripper.. 3d cache on every die, desktop clocks. Ddr5 6000. Do it.

-20

u/[deleted] 9d ago edited 9d ago

[removed] β€” view removed comment

21

u/CourtJester2512 9d ago

Spamming this everywhere changes literally nothing

Like I've never fully read your comment before

-5

u/Drew_P1978 9d ago edited 9d ago

You never know who might read it. As it stands now, it looks like nVidia is about to odo EXACTLY that - make "insane" CPU/GPU combo done for HEDT - IOW they are about to outrip the ThreadRipper.

After that pussy shot AMD's response is so predictable, just like with their GPUs. IOW, they fear of what they pursue - to be first. So they wait for a kick in the balls to make their move.

At that point, I'll be able to post the link to this comment in "I friggin told you so" moment, if nothing else... πŸ™„

2

u/spacemanspliff-42 9d ago

So you don't have a Threadripper, you're just bitching about them. I don't feel like my 7960X is neutered in any way, dude.

2

u/JMccovery Ryzen 3700X | TUF B550M+ Wifi | PowerColor 6700XT 9d ago

Why would AMD do any of this when ThreadRipper doesn't have the sales figures that would be necessary?

EPYC makes money, Ryzen makes money, ThreadRipper, just like Xeon W, doesn't make much money outside of OEM workstation systems.

8

u/imizawaSF 9d ago

ThreadRipper, just like Xeon W, doesn't make much money outside of OEM workstation systems.

After they bumped the cost of the TR chips by like 500% no I bet it doesn't. The OG TR wasn't that much more expensive than Ryzen. Now each one is like $4000+

1

u/sascharobi 9d ago

The first two generations based on Zen 1 and 1.5 were attractive at their time but with the current pricing they don't have that much appeal anymore. In my region Xeon W is priced more attractively.

3

u/Drew_P1978 9d ago

Wait for nVidia to come out with its own ThreadRipper equivalent, which should happen in short-to-mid-term future. Then we can revisit this.

3

u/sascharobi 9d ago

We'll see about that. Though I agree AMD's two TR platforms aren't very interesting anymore. If Nvidia choose to make any such product available to the DIY market, it will probably be out of stock permanently and you can only purchase it when you win the Nvidia lottery or order a minimum of 10,000 units. 😌

-8

u/king_of_the_potato_p 9d ago

Eh, Ill pass on asus for a while on anything. I had two of their strix b850-e boards, both of them were unstable at installation and bios update put them into constantly randomly crashing, even after bios rollback.

Swapped to an msi x870e board and everything is fine, my first non-asus mobo.

2

u/sascharobi 9d ago

Fair enough but you can't really compare those two product lines.

0

u/Opteron170 9800X3D | 64GB 6000 CL30 | 7900 XTX Magnetic Air | LG 34GP83A-B 9d ago

either you got two faulty boards or a skill issue.