this post was submitted on 14 Dec 2023
39 points (100.0% liked)
Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
1444 readers
15 users here now
⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.
Rules • Full Version
1. Posts must be related to the discussion of digital piracy
2. Don't request invites, trade, sell, or self-promote
3. Don't request or link to specific pirated titles, including DMs
4. Don't submit low-quality posts, be entitled, or harass others
Loot, Pillage, & Plunder
📜 c/Piracy Wiki (Community Edition):
💰 Please help cover server costs.
Ko-fi | Liberapay |
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I got annoyed at not finding CC for the media I have dubbed, so if the show/movie is originally in English and I have it in Spanish, the Spanish subtitles are not from the Spanish audio, but translations of the English audio, so they don't usually match.
(which Tom Scott recently made a video about this issue https://youtu.be/pU9sHwNKc2c)
I found and been using this project https://github.com/jhj0517/Whisper-WebUI
It's been pretty good, for youtube videos (10-30 minutes) has been perfect.
But there are some issues when I tried it with movies, the timings are not great, and sometimes it hallucinates some words in parts where there aren't any. Just a few words are actually wrong/missing. (I tried it with fastwhisper since I don't have that much ram)
Running it through Subtitle Edit with WhisperX can help a lot for longer movies. It breaks the file into much smaller pieces and runs Whisper on them one by one before stitching the result back together.
interesting, that actually sounds like an awesome idea for the OTA tv rips, cuz I doubt I would even be able to find anything that matches by duration on normal sub sites.
I hadn't heard of whisper gui / whisperx before but I see it has a github. ~~do you know if that is cloud-based or something you can run entirely local? (wondering if it is cloud-based in case i need to allow it net access & also curious if it would eat a lot of bandwidth for roughly 2 seasons of broadcast tv shows aka somewhere around 30-35 hrs worth of audio)~~
edit: apparently whisper can be run entirely offline according to this so if whisperx is a fork, then i assume it would allow this too
Does it work well with movies in other languages? I assume due to BGM it might cause errors?
My limited experience has been positive w/ non-English languages.
thanks for the suggestion. i was completely unaware of the whisper project and even if it doesnt help much for movies, that might come in real handy for some of the OTA rips I have from my friends (was pretty sure I was SOL for those but this seems like a dcent option).
sounds like it can even be run entirely offline so even better