Whisper Open Source Speech Recognition You Can Use

Written by Mike James

Wednesday, 28 September 2022

OpenAI has released a very usable speech recognition and translation program that you can install and use on any machine that runs Python. It could well be useful for more than just research.

OpenAI has received some criticism in the past for not being quite as open as its name suggests. However with the release of Whisper under an MIT licence it has done us all a huge favour.

"Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English."

Put simply you put a voice recording in and out comes a text transcription perhaps in a different language. Unlike many research groups OpenAI has made the code available in a form that makes it very useable. All you need is Python with PyTorch installed plus a few additional packages and a copy of ffmpeg and its Python bindings, ffmpeg-python. The ffmpeg library handles the audio file input and so Whisper will work with files in any format it can handle. Even if you aren't working with Python for AI installation should be relatively easy.

The model comes in five sizes:

Size	Params	English only	Multi- lingual	VRAM	speed
tiny	39 M	tiny.en	tiny	~1 GB	~32x
base	74 M	base.en	base	~1 GB	~16x
small	244 M	small.en	small	~2 GB	~6x
medium	769 M	medium.en	medium	~5 GB	~2x
large	1550 M	N/A	large	~10 GB	1x

The speeds are relative to the large model and, of course, the smaller models don't perform as well as the larger ones.

Using Whisper from the command line is also easy:

whisper audio.mp3 --model medium

and if you want a translation:

whisper japanese.wav --language Japanese --task translate

Using it from Python is just as easy:

import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

What can one say - amazing!

High quality speech recognition and translation on a desktop machine was unthinkable just a short time ago and now it's open source. On an average desktop machine it takes about 2 minutes to transcribe 1 minute of speech. At the moment it is better at English than other languages, which is hardly surprising given only a third of the training dataset was non-English.

You can find out more about the model from the published paper. It is a transformer model, again hardly surprising given how much this approach has revolutionised language processing.

whisper1

Training used 680,000 hours of multilingual voice data.

Both Apple and Google have similar systems which they haven't made generally available or easy to use. Whisper might well force the pace in making off-line speech recognition available.

OpenAI's final comment on the release is:

"We hope Whisper’s high accuracy and ease of use will allow developers to add voice interfaces to a much wider set of applications."

What are you waiting for? This is truly open AI.

openaionblack

More Information

Introducing Whisper

Mozilla Updates Voice Recognition Project

Microsoft researchers achieve speech recognition milestone

Introducing DeepSpeech

Mozilla Wants Your Voice

Mozilla DeepSpeech Gets Smaller

Speech Recognition Breakthrough

Google's Deep Learning - Speech Recognition

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Mozilla Discontinues DeepSpeech
03/07/2025

The DeepSpeech project started by Mozilla has updated its GitHub page with the message "This project is now discontinued", and a change in the project status to archived.

+ Full Story

Two Tools To Elevate Your MongoDB Experience
03/07/2025

The tools contradict each other; the first one allows you to write SQL instead of using Mongo's special syntax, while the other allows you to manipulate the database without having to write SQL a [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 28 September 2022 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles

Comments