100
Chinese AI lab DeepSeek massively undercuts OpenAI on pricing — and that's spooking tech stocks
(www.businessinsider.com)
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
it’s actually pretty easy to run locally as well. obviously not as easy as just downloading an app, but it’s gotten relatively straight-forward and the peace of mind is nice
check out ollama, and find an ollama UI
That's not the monster model, though. But yes, I run AI locally (barely on my 1660). What I can run locally is pretty decent in limited ways, but I want to see the o1 competitor.
figured i’d do this in a no comment since it’s been a bit since my last, but i just downloaded and ran the 70b model on my mac and it’s slower but running fine: 15s to first word, and about half as fast generating words after that but it’s running
this matches with what i’ve experienced with other models too: very large models still run; just much much slower
i’m not sure of things when it gets up to 168b model etc, because i haven’t tried but it seems that it just can’t load the whole model at once and there’s just a lot more loading and unloading which makes it much slower
You can look at the stats on how much of the model fits in vram. The lower the percentage the slower it goes although I imagine that's not the only constraint. Some models probably are faster than others regardless, but I really have not done a lot of experimenting. Too slow on my card to really even compare output quality across models. Once I have 2k tokens in context, even a 7B model is a token every second or more. I have about the slowest card that ollama even says you says use. I think there is one worse card.
ETA: I'm pulling the 14B Abliterated model now for testing. I haven't had good luck running a model this big before, but I'll let you know how it goes.
that’s true - i was running 7b and it seemed pretty much instant, so was assuming i could do much larger - turns out only 14b on a 64gb mac