What is web speech API

Web speech API is used to enable modern browsers recognize and synthesize speech(i.e, voice data into web apps). This API was introduced by W3C Community in the year 2012. It has two main parts:

1. SpeechRecognition (Asynchronous Speech Recognition or Speech-to-Text): It provides the ability to recognize voice context from an audio input and respond accordingly. This is accessed by the SpeechRecognition interface.

The example below shows how to use this API to get text from speech,

javascript

1window.SpeechRecognition =
2
3window.webkitSpeechRecognition || window.SpeechRecognition; // webkitSpeechRecognition for Chrome and SpeechRecognition for FF
4
5const recognition = new window.SpeechRecognition();
6
7recognition.onresult = (event) => {
8  // SpeechRecognitionEvent type
9
10const speechToText = event.results[0][0].transcript;
11
12console.log(speechToText);
13};
14
15recognition.start();

In this API, browser is going to ask you for permission to use your microphone

2. SpeechSynthesis (Text-to-Speech): It provides the ability to recognize voice context from an audio input and respond. This is accessed by the SpeechSynthesis interface.

For example, the below code is used to get voice/speech from text,

javascript

1if ("speechSynthesis" in window) {
2
3var speech = new SpeechSynthesisUtterance("Hello World!");
4
5speech.lang = "en-US";
6
7window.speechSynthesis.speak(speech);
8}

The above examples can be tested on chrome(33+) browser's developer console.

Note: This API is still a working draft and only available in Chrome and Firefox browsers(ofcourse Chrome only implemented the specification)

JavaScript Coding Exercise 45