@enumerator4829

enumerator4829@sh.itjust.works · 7 hours ago

the H200 has a very impressive bandwith of 4.89 TB/s, but for the same price you can get 37 TB/s spread across 58 RX 9070s, but if this actually works in practice i don’t know.

Your math checks out, but only for some workloads. Other workloads scale out like shit, and then you want all your bandwidth concentrated. At some point you’ll also want to consider power draw:

One H200 is like 1500W when including support infrastructure like networking, motherboard, CPUs, storage, etc.
58 consumer cards will be like 8 servers loaded with GPUs, at like 5kW each, so say 40kW in total.

Now include power and cooling over a few years and do the same calculations.

As for apples and oranges, this is why you can’t look at the marketing numbers, you need to benchmark your workload yourself.

enumerator4829@sh.itjust.works · 16 hours ago

Well, a few issues:

For hosting or training large models you want high bandwidth between GPUs. PCIe is too slow, NVLink has literally a magnitude more bandwidth. See what Nvidia is doing with NVLink and AMD is doing with InfinityFabric. Only available if you pay the premium, and if you need the bandwidth, you are most likely happy to pay.
Same thing as above, but with memory bandwidth. The HBM-chips in a H200 will run in circles around the GDDR-garbage they hand out to the poor people with filthy consumer cards. By the way, your inference and training is most likely bottlenecked by memory bandwidth, not available compute.
Commercially supported cooling of gaming GPUs in rack servers? Lol. Good luck getting any reputable hardware vendor to sell you that, and definitely not at the power densities you want in a data center.
TFLOP16 isn’t enough. Look at 4 and 8 bit tensor numbers, that’s where the expensive silicon is used.
Nvidias licensing agreements basically prohibit gaming cards in servers. No one will sell it to you at any scale.

For fun, home use, research or small time hacking? Sure, buy all the gaming cards you can. If you actually need support and have a commercial use case? Pony up. Either way, benchmark your workload, don’t look at marketing numbers.

Is it a scam? Of course, but you can’t avoid it.

enumerator4829@sh.itjust.works · 3 months ago

Tony Stark was able to build his CA in a cave! With a bunch of dice!

enumerator4829@sh.itjust.works · 3 months ago

This will be so much fun for people with legacy systems

enumerator4829@sh.itjust.works · 3 months ago

How about ”don’t”?

enumerator4829@sh.itjust.works · edit-2 4 months ago

You assume a uniform distribution. I’m guessing that it’s not. The question isn’t ”Does the model contain compressed representations of all works it was trained on”. Enough information on any single image is enough to be a copyright issue.

Besides, the situation isn’t as obviously flawed with image models, when compared to LLMs. LLMs are just broken in this regard, because it only takes a handful of bytes being retained in order to violate copyright.

I think there will be a ”find out” stage fairly soon. Currently, the US projects lots and lots of soft power on the rest of the world to enforce copyright terms favourable to Disney and friends. Accepting copyright violations for AI will erode that power internationally over time.

Personally, I do think we need to rework copyright anyway, so I’m not complaining that much. Change the law, go ahead and make the high seas legal. But set against current copyright laws, most large datasets and most models constitute copyright violations. Just imagine the shitshow if OpenAI was an European company training on material from Disney.

enumerator4829@sh.itjust.works · 4 months ago

Document databases are the future /s

enumerator4829@sh.itjust.works · 4 months ago

There is an argument that training actually is a type of (lossy) compression. You can actually build (bad) language models by using standard compression algorithms to ”train”.

By that argument, any model contains lossy and unstructured copies of all data it was trained on. If you download a 480p low quality h264-encoded Bluray rip of a Ghibli movie, it’s not legal, despite the fact that you aren’t downloading the same bits that were on the Bluray.

Besides, even if we consider the model itself to be fine, they did not buy all the media they trained the model on. The action of downloading media, regardless of purpose, is piracy. At least, that has been the interpretation for normal people sailing the seas, large companies are of course exempt from filthy things like laws.

enumerator4829@sh.itjust.works · 4 months ago

What? Just base64 encrypt it before you store it in the git hub

enumerator4829@sh.itjust.works · 4 months ago

I’m using ”Commercially deployed” in the context of ”company you interacted with had an AI represent them in that communication”. You don’t use AI for that to increase costumer satisfaction. (I wonder why I haven’t seen any AI products targeted at automated B2B sales?)

I won’t argue that GenAI isn’t useful for end consumers using it properly. It is.

(As an aside, I hope you and your grandfather get better!)

enumerator4829@sh.itjust.works · 4 months ago

But why use money to innovate when there is profit to be made and laws are just made up?

AI is the new kid on the block, trying to make a dent in our society. So far, we don’t really have that many useful or productive deployments. It’s on AI to prove it’s worth, and it’s kinda worthless until proven otherwise. (Name one interaction with a commercially deployed AI model you didn’t hate?)

So far, Apple is failing with consumer products, Microsoft is backing off on GPU-orders, research showing commercial GenAI isn’t increasing productivity, NVDA seems to cool off and you expect the benevolent commercial health care industry to come to the rescue?

Yeah, I’ll keep my knee jerk reaction and keep living with my current socialised health care.

enumerator4829@sh.itjust.works · 4 months ago

LLM training is expensive, so are prompt ”engineers”. This will be the cheapest off-the-shelf LLM they can find, prompted by someone’s nephew. People will be eating glue.

enumerator4829@sh.itjust.works · 4 months ago

Exactly, if you need shelf life, you use tape. Shelf life isn’t really a consideration for hard drives or SSDs in real life scenarios.

enumerator4829@sh.itjust.works · 4 months ago

See for example the storage systems from Vast or Pure. You can increase window size for compression and dedup far smaller blocks. Fast random IO also allows you to do that ”online” in the background. In the case of Vast, you also have multiple readers on the same SSD doing that compression and dedup.

So the feature isn’t that special. What you can do with it in practice changes drastically.

enumerator4829@sh.itjust.works · 4 months ago

The flaw with hard drives comes with large pools. The recovery speed is simply too slow when a drive fails, unless you build huge pools. So you need additional drives for more parity.

I don’t know who cares about shelf life. Drives spin all their lives, which is 5-10 years. Use M-Disk or something if you want shelf life.

enumerator4829@sh.itjust.works · 4 months ago

I agree with you, mostly. Margins in the datacenter are thin for some players. Not Nvidia, they are at like 60% pure profit per chip, including software and RnD. That will have an effect on how we design stuff in the next few years.

I think we’ll need both ”GPU” and traditional CPUs for the foreseeable future. GPU-style for bandwidth or compute constrained workloads and CPU-style for latency sensitive workloads or pointer chasing. Now, I do think we’ll slap them both on top of the same memory, APU-style á la MI300A.

That is, as long as x86 has the single-threaded advantage, RISC-V won’t take over that marked, and as long as GPUs have higher bandwidth, RISC-V won’t take over that market.

Finally, I doubt we’ll see a performant RISC-V chip from China the next decade - they simply lack the EUV fabs. From outside of China, maybe, but the demand isn’t nearly as large.

enumerator4829@sh.itjust.works · 4 months ago

Not economical. Storage is already done on far larger fab nodes than CPUs and other components. This is a case where higher density actually can be cheaper. ”Mature” nodes are most likely cheaper than the ”ancient” process nodes simply due to age and efficiency. (See also the disaster in the auto industry during covid. Car makers stopped ordering parts made on ancient process nodes, so the nodes were shut down permanently due to cost. After covid, fun times for automakers that had to modernise.)

Go compare prices, new NVMe M.2 will most likely be cheaper than SATA 2.5” per TB. The extra plastic shell, extra shipping volume and SATA-controller is that difference. 3.5” would make it even worse. In the datacenter, we are moving towards ”rulers” with 61TB available now, probably 120TB soon. Now, these are expensive, but the cost per TB is actually not that horrible when compared to consumer drives.

enumerator4829@sh.itjust.works · 4 months ago

Tape will survive, SSDs will survive. Spinning rust will die

enumerator4829@sh.itjust.works · 4 months ago

Nope. Larger chips, lower yields in the fab, more expensive. This is why we have chiplets in our CPUs nowadays. Production cost of chips is superlinear to size.

enumerator4829@sh.itjust.works · 4 months ago

It’s not the packaging that costs money or limits us, it’s the chips themselves. If we crammed a 3.5” form factor full of flash storage, it would be far outside the budgets of mortals.