100
Chinese AI lab DeepSeek massively undercuts OpenAI on pricing — and that's spooking tech stocks
(www.businessinsider.com)
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
You can look at the stats on how much of the model fits in vram. The lower the percentage the slower it goes although I imagine that's not the only constraint. Some models probably are faster than others regardless, but I really have not done a lot of experimenting. Too slow on my card to really even compare output quality across models. Once I have 2k tokens in context, even a 7B model is a token every second or more. I have about the slowest card that ollama even says you says use. I think there is one worse card.
ETA: I'm pulling the 14B Abliterated model now for testing. I haven't had good luck running a model this big before, but I'll let you know how it goes.