Real-time Speech-to-Text and Translation with Cognitive Services, Azure Functions, and SignalR Service
Tuesday, March 26, 2019
When we do a live presentation — whether online or in person — there are often folks in the audience who are not comfortable with the language we're speaking or they have difficulty hearing us. Microsoft created Presentation Translator to solve this problem in PowerPoint by sending real-time translated captions to audience members' devices.
In this article, we'll look at how (with not too many lines of code) we can build a similar app that runs in the browser. It will transcribe and translate speech using the browser's microphone and broadcast the results to other browsers in real-time. And because we are using serverless and fully managed services, it can scale to support thousands of audience members. Best of all, these services all have generous free tiers so we can get started without paying for anything!
Overview
The app consists of two projects:
- A Vue.js app that is our main interface. It uses the Microsoft Azure Cognitive Services Speech SDK to listen to the device's microphone and perform real-time speech-to-text and translations.
- An Azure Function app providing serverless HTTP APIs that the user interface will call to broadcast translated captions to connected devices using Azure SignalR Service.

When we do a live presentation — whether online or in person — there are often folks in the audience who are not comfortable with the language we're speaking or they have difficulty hearing us. Microsoft created Presentation Translator to solve this problem in PowerPoint by sending real-time translated captions to audience members' devices.
In this article, we'll look at how (with not too many lines of code) we can build a similar app that runs in the browser. It will transcribe and translate speech using the browser's microphone and broadcast the results to other browsers in real-time. And because we are using serverless and fully managed services, it can scale to support thousands of audience members. Best of all, these services all have generous free tiers so we can get started without paying for anything!
Overview
The app consists of two projects:
- A Vue.js app that is our main interface. It uses the Microsoft Azure Cognitive Services Speech SDK to listen to the device's microphone and perform real-time speech-to-text and translations.
- An Azure Function app providing serverless HTTP APIs that the user interface will call to broadcast translated captions to connected devices using Azure SignalR Service.