LocalLLaMA

20 readers

1 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

llama.cpp for GPU only (lemmy.ml)

submitted 1 year ago by bia@lemmy.ml to c/localllama@sh.itjust.works

5 comments fedilink hide all child comments

I’ve been using llama.cpp, gpt-llama and chatbot-ui for a while now, and I’m very happy with it. However, I’m now looking into a more stable setup using only GPU. Is this llama.cpp still still a good candidate for that?

top 5 comments

sorted by: hot top controversial new old

[–] Hudsonius@lemmy.ml 3 points 1 year ago (1 children)

GPTQ-for-llama with ooba booga works pretty well. I’m not sure to what extent it uses CPU, but my GPU is at 100% during inference so it seems to be mainly that.

[–] bia@lemmy.ml 1 points 1 year ago (2 children)

I've looked at that before. Do you use it with any UI?

[–] Hudsonius@lemmy.ml 3 points 1 year ago (1 children)

Yea it’s called Text Generation web UI. If you check out the Ooba Booga git, it goes into good details. From what I can tell it’s based on the automatic1111 UI for stable diffusion.

[–] dragonfyre13@sh.itjust.works 2 points 1 year ago

It's using Gradio, which is what auto1111 also uses. Both of these are pretty heavy modifications/extensions that do a lot to push Gradio to it's limits, but that's package being used in both. Note, it also has an api (checkout the --api flag I believe), and depending on what you want to do there's various UIs that can hook into the Text Gen Web UI (oobabooga) API in various ways.

[–] Equality_for_apples@sh.itjust.works 1 points 1 year ago

Personally, I have nothing but issues with Oogas ui, so I connect Silly Tavern to it or KoboldCPP. Works great