Is it possible to access the live audio stream in a Twilio call?

Two years later, Twilio has released the use case I was trying to do on my own. They have a real-time speech recognition service built into Programmable Voice now. It's in public beta: https://www.twilio.com/blog/2017/05/introducing-speech-recognition.html


For people still looking, Twilio now has Voice Streams that covers this use case ! It's a twiml verb that will communicate the audio through websocket to your server.


Twilio doesn't offer a way to process audio as an IVR input as far as I know. They do offer the use of number input, but that isn't as intelligent as what you are going after: https://www.twilio.com/docs/api/twiml/gather.

You can, however listen to a call that is currently in process, with a catch. It has to be setup as a conference. A conference can do anything a normal dial can do. You can turn off some of the additional features, and then you can use the twilio js library to discreetly join a conference and listen in on a call. I suppose if you were very ambitious you could use some speech to text software to do all kinds of stuff through the Twilio client.

See annyang! for some speech to text interactivity: https://www.talater.com/annyang/