[-] McCloud@alien.top 1 points 1 year ago

Unfortunately stable-ts and whisper don’t obviously output which files it is working on, so you’re dependent on trying to decipher it from the logs. I tried to add prints to show which files it has queued and started, but with threading, the std-out sometimes gets lost or buffered in strange ways.

[-] McCloud@alien.top 1 points 1 year ago

I'm not sure yet. Faster-whisper has some benchmarks of the Largev2 model taking about 1 minute for 13 minutes of audio. Smaller models ought to be quicker. Unsure if the specs of the GPU will make much differenece.

[-] McCloud@alien.top 1 points 1 year ago

It can only translate into English, but the source audio can be a foreign language.

[-] McCloud@alien.top 1 points 1 year ago

I just tried, Emby won't actually send out the webook on an action. I can use the test webhook, but it won't trigger off media actions. Documentation half-implies that it's a premiere options?

[-] McCloud@alien.top 1 points 1 year ago

If I knew what the endpoints were, nothing would prohibit it. I can add it to my short list.

[-] McCloud@alien.top 1 points 1 year ago

It should detect the foreign language and make english subtitles, but I haven't personally tried it.

I'm not using whisper.cpp anymore. I did some short comparisons between WhisperX and stable-ts and ultimately decided to go with stable-ts. Functionally, I'm sure they're very similar.

1
submitted 1 year ago by McCloud@alien.top to c/main@selfhosted.forum

Hey all,

Some might remember this from about 9 months ago. I've been running it with zero maintenance since then, but saw there were some new updates that could be leveraged.

What has changed?

  • Jellyfin is supported (in addition to Plex and Tautulli)
  • Moved away from whisper.cpp to stable-ts and faster-whisper (faster-whisper can support Nvidia GPUs)
  • Significant refactoring of the code to make it easier to read and for others to add 'integrations' or webhooks
  • Renamed the webhook from webhook to plex/tautulli/jellyfin
  • New environment variables for additional control

What is this?

This will transcribe your personal media on a Plex or Jellyfin server to create subtitles (.srt). It is currently reliant on webhooks from Jellyfin, Plex, or Tautulli. This uses stable-ts and faster-whisper which can use both Nvidia GPUs and CPUs.

How do I run it?

I recommend reading through the documentation at: McCloudS/subgen: Autogenerate subtitles using OpenAI Whisper Model via Jellyfin, Plex, and Tautulli (github.com) , but quick and dirty, pull mccloud/subgen from Dockerhub, configure Tautulli/Plex/Jellyfin webhooks, and map your media volumes to match Plex/Jellyfin identically.

What can I do?

I'd love any feedback or PRs to update any of the code or the instructions. Also interested to hear if anyone can get GPU transcoding to work. I have a Tesla T4 in the mail to try it out soon.

McCloud

joined 1 year ago