A Web Speech API Specification has recently been published together with a call for Final Specification Commitments from members of the W3C Speech API Community Group.
The HTML Speech Incubator Group was originally formed in August 2010 with initiating members from Microsoft, Google, Voxeo, AT&T, Mozilla and OpenStream. Proposals for API specifications were made by Google and by Microsoft.
This diagram from the report outlines what items would be in and out of scope for the final solution to the task the group had begun:
(click to enlarge)
- Voice Web Search
- Speech Command Interface
- Domain Specific Grammars Contingent on Earlier Inputs
- Continuous Recognition of Open Dialog
- Domain Specific Grammars Filling Multiple Input Fields
- Speech UI present when no visible UI need be present
- Voice Activity Detection
- Hello World
- Speech Translation
- Speech Enabled Email Client
- Dialog Systems
- Multimodal Interaction
- Speech Driving Directions
- Multimodal Video Game
- Multimodal Search
The remaining two were omitted to keep the API to a minimum:
- Temporal Structure of Synthesis to Provide Visual Feedback
A Speech API Community Group was formed in April 2012 to continue work on this specification. It is chaired by Glen Shires from Google who one of the editors of the draft Speech API, and has five other Google member plus representatives of W3C, the World Wide Web Foundation, Mozilla, OpenReach and some others. Its Web Speech API Specification has been edited by Glen Shires and Hans Wennborg also from Google.
At the moment the API specification doesn't have the status of a W3C Standard nor is it on the W3C Standards Track. So far only the Google member of the Speech API Community Group have committed to the Web Speech Specification. Chrome is the only browser to have the speech API - let's hope the other's follow and we have a standard rather than a mess.