So I have the api working as in I can send audio files and get text back but what I am looking for is a robust way to have streaming functionality. For example, if there is a small duration of silence it should stop recording and send the audio to api etc.

Is there any such library in python?

top 5 comments

sorted by: hot top controversial new old

[–] k110111@feddit.de 4 points 1 year ago

I found this so far: https://github.com/KoljaB/RealtimeSTT

Maybe I can modify it to use whisper api.

[–] Sims@lemmy.ml 4 points 1 year ago (2 children)

Dunno, but this guy (all about ai) builds one with 'faster-whisper', so perhaps you can get a few pointers there? I believe he chunks the Audio on silence. He have a few other speech2x videos. Have fun. https://youtu.be/k6nIxWGdrS4

Also: https://github.com/SYSTRAN/faster-whisper

[–] PipedLinkBot@feddit.rocks 2 points 1 year ago

Here is an alternative Piped link(s):

https://piped.video/k6nIxWGdrS4

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.

[–] Sims@lemmy.ml 2 points 1 year ago

Just stumbled upon this speedy one: https://github.com/sanchit-gandhi/whisper-jax

And this one for word precision time marks: https://github.com/m-bain/whisperX

[–] paf@jlai.lu 3 points 1 year ago

Don't have knowledge to answer your question but you could check how home assistant does it, I think that should point you to the right direction.