this post was submitted on 01 Feb 2025
105 points (100.0% liked)

TechTakes

44 readers
12 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS
 

Sam "wrong side of FOSS history" Altman must be pissing himself.

Direct Nitter Link:

https://nitter.lucabased.xyz/jiayi_pirate/status/1882839370505621655

you are viewing a single comment's thread
view the rest of the comments
[–] reallykindasorta@slrpnk.net 13 points 2 weeks ago* (last edited 2 weeks ago) (3 children)

Non-techie requesting a laymen explanation if anyone has time!

After reading a couple of”what makes nvidias h100 chips so special” articles I’m gathering that they were supposed to have a significant amount more computational capability than their competitors (which I’m taking to mean more computations per second). So the question with deepseek and similar is something like ‘how are they able to get the same results with less computations?’ and the answer is speculated to be more efficient code/instructions for the AI model so it can make the same conclusions with less computations overall, potentially reducing the need for special jacked up cpus to run it?

[–] mountainriver@awful.systems 14 points 2 weeks ago (1 children)

Good question!

The guesses and rumours that you have got as replies makes me lean towards "apparently no one knows".

And because it's slop machines (also referred to as "AI", there is always a high probability of some sort of scam.

[–] froztbyte@awful.systems 9 points 2 weeks ago* (last edited 2 weeks ago)

pretty much my take as well. I haven’t seen any actual information from a primary source, just lots of hearsay and “what we think happened” analyst shit (e.g. that analyst group in the twitter screenshot has names but no citation/links)

and doubly yep on the “everyone could just be lying” bit

[–] justOnePersistentKbinPlease@fedia.io 12 points 2 weeks ago (3 children)

From a technical POV, from having read into it a little:

Deepseek devs worked in a very low level language called Assembly. This language is unlike relatively newer languages like C in that it provides no guardrails at all and is basically CPU instructions in extreme shorthand. An "if" statement would be something like BEQ 1000, where it goes to a specific memory location(in this case address 1000 if two CPU registers are equal.)

The advantage of using it is that it is considerably faster than C. However, it also means that the code is mostly locked to that specific hardware. If you add more memory or change CPUs you have to refactor. This is one of the reasons the language was largely replaced with C and other languages.

Edit: to expound on this: "modern" languages are even slower, but more flexible in terms of hardware. This would be languages like Python, Java and C#

[–] V0ldek@awful.systems 20 points 2 weeks ago* (last edited 2 weeks ago) (2 children)

This is a really weird comment. Assembly is not faster than C, that's a nonsensical statement, C compiles down to assembly. LLVM's optimizations will most likely outperform or directly match whatever hand-crafted assembly you write. Why would BEQ 1000 be "considerably faster" than if (x == y) goto L_1000;? This collapses even further if you consider any application larger than a few hundred lines of code, any sensible compiler is going to beat you on optimizations if you try to write hand-crafted assembly. Try loading up assembly code and manually performing intraprocedural optimizations, lol, there's a reason every compiled language goes through an intermediate representation.

Saying that C# is slower than C is also nonsensical, especially now that C# has built-in PGO it's very likely it could outperform an application written in C. C#'s JIT compiler is not somehow slower because it's flexible in terms of hardware, if anything that's what makes it fast. For example you can write a vectorized loop that will be JIT-compiled to the ideal fastest instruction set available on the CPU running the program, whereas in C or assembly you'd have to manually write a version for each. There's no reason to think that manual implementation would be faster than what the JIT comes up with at runtime, though, especially with PGO.

It's kinda like you're saying that a V12 engine is faster than a Ferrari and that they are both faster than a spaceship because the spaceship doesn't have wheels.

I know you're trying to explain this to a non-technical person but what you said is so terribly misleading I cannot see educational value in it.

[–] froztbyte@awful.systems 12 points 2 weeks ago

and one doesn't program GPUs with assembly (in the sense as it's used with CPUs)

[–] justOnePersistentKbinPlease@fedia.io 1 points 2 weeks ago (4 children)

I have have crafted assembly instructions and have made it faster than the same C code.

Particular to if statements, C will do things push and pull values from the stack which takes a small but occasionally noticeable amount of cycles.

[–] self@awful.systems 8 points 2 weeks ago (1 children)

Particular to if statements, C will do things push and pull values from the stack which takes a small but occasionally noticeable amount of cycles.

holy fuck. llvm in shambles

[–] bitofhope@awful.systems 6 points 2 weeks ago* (last edited 2 weeks ago)

Meanwhile I'm reverse engineering some very much not performance sensitive video game binary patcher program some guy made a decade ago and Ghidra interprets a string splitting function as a no-op because MSVC decided calling conventions are a spook and made up a new one at link time. And it was right to do that.

EDIT: Also me looking for audio data from another old video game, patiently waiting for my program to take about half an hour on my laptop every time I run it. Then I remember to add --release to cargo run and while the compilation takes three seconds longer, the runtime shrinks to about ten seconds. I wonder if the above guy ever tried adding -O2 to his CFLAGS?

load more comments (3 replies)
[–] froztbyte@awful.systems 17 points 2 weeks ago (2 children)

for anyone reading this comment hoping for an actual eli5, the "technical POV" here is nonsense bullshit. you don't program GPUs with assembly.

the rest of the comment is the poster filling in bad comparisons with worse details

[–] pupbiru@aussie.zone 7 points 2 weeks ago

literally looks like LLM-generated generic slop: confidently incorrect without even a shred of thought

[–] justOnePersistentKbinPlease@fedia.io 2 points 2 weeks ago (3 children)

For anyone reading this comment, that person doesnt know anything about assembly or C.

[–] froztbyte@awful.systems 13 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

yep, clueless. can't tell a register apart from a soprano. and allocs? the memory's right there in the machine, it has it already! why does it need an alloc!

fuckin' dipshit

next time you want to do a stupid driveby, pick somewhere else

[–] o7___o7@awful.systems 8 points 2 weeks ago

Sufficiently advanced skiddies are indistinguishable from malloc

[–] dgerard@awful.systems 12 points 2 weeks ago

this user is just too smart for the average awful systems poster to deal with, and has been sent on his way to a more intellectual lemmy

[–] self@awful.systems 11 points 2 weeks ago (1 children)

you know I was having a slow day yesterday cause I only just caught on: you think we program GPUs in plain fucking C? absolute dipshit no notes

[–] froztbyte@awful.systems 10 points 2 weeks ago

the wildest bit is that one could literally just … go do the thing. like you could grab the sdk and run through the tutorial and actually have babby’s first gpu program in not too long at all[0], with all the lovely little bits of knowledge that entails

but nah, easier to just make some nonsense up out of thirdhand conversations misheard out of a gamer discord talking about a news post of a journalist misunderstanding a PR statement, and then confidently spout that synthesis

[0] - I’m eliding “make the cuda toolchain run” for argument of simplicity. could just rent a box that has it, for instance

[–] msage@programming.dev 4 points 2 weeks ago (2 children)

Putting Python, the slowest popular language, alongside Java and C# really irks me bad.

The real benefit of R1 is Mixture of Experts - the model is separated into smaller sections, that are trained and used independently, meaning you don't need the entire model to be active all the time, just parts of it.

Meaning it uses less resources during training and general usage. For example instead of 670 billion parameters all the time, it can use 30 billion for specific question, and you can get away with using 2% of the hardware used by competition.

I used them as they are well known modern languages that the average person might have heard about.

[–] UndercoverUlrikHD@programming.dev 1 points 2 weeks ago (1 children)

Putting Python, the slowest popular language, alongside Java and C# really irks me bad.

I wouldn't call python the slowest language when the context is machine learning. It's essentially C.

[–] msage@programming.dev 1 points 2 weeks ago (1 children)

Python is still the slowest, it just utilizes libraries written in C for this specific math.

And that maths happens to be 99% of the workload

[–] manicdave@feddit.uk 4 points 2 weeks ago* (last edited 2 weeks ago)

The article sort of demonstrates it. Instead of needing inordinate amounts of data and memory to increase it's chance of one-shotting the countdown game. It only needs to know enough to prove itself wrong and roll the dice again.