Wowza Community

Is it possible to accomplish speech recognition in live streaming using speech-to-text services from Google or Azure?

Speech-to-text services from Google or Azure appears to support only from microphone and the file format as input stream.

So,I’m curious about is there a way to acheive that?

1 Like

Did you find any solution connect wowza SDK with azure cognitive-services-speech SDK?
First step would be to extract the audio stream from the live stream. What features does wowza provide to handle / redirect the audio feed in parallel to the default transcoding process?

I can see that the stage of creating by home means is beginning…
I know it is possible to create a wowza module that will automate the conversion of audio to text /closed captions/.
I don’t understand why the wowza team concentrated their efforts on Wowza Video.
I believe that such an audio-to-text conversion module would attract new WSE users. It would retain current WSE users.
Speech-to-text services from Google works great with open captioning.
Third-party programmers can handle English but other languages are much worse for them.

Hello. I’m former Wowza now working independently.
I’ve been working on a couple speech to text implementations for WSE, although not yet Google. If you are interested in such a module, I am to build it on a contract basis.

Reach out to

1 Like

… and yet interest is emerging. and well. You know the point.
Scott, you’ll get it done faster than Wowza will be interested in such a solution.
Greetings to you

Hi @Scott_Kellicker2

I came across your post about developing speech to text modules for WSE. I’m interested in a module that also integrates with JW Player. Could you provide some insights on feasibility, development time, and cost?


Yes, I could develop such a module.

Let’s connect via email. I’m at

(I’m traveling this week but will reach out to your email Monday)

Scott Kellicker

Hi Axel. Do you still have interest in such a module? I’ve been working on something very close to this.

Let’s chat at .


At Raskenlund we’ve worked with a variety of STT services, incl. IBM, Azure, Google, AWS and a few more.

We have a working module for integration with AWS Transcribe (audio is extracted, sent to AWS Transcribe, then text is added to the stream as subtitles, or you can export it as VTT)