Search giant, Google, has introduced a major update to its Cloud Speech API, which was launched in 2016 for developers to transcribe speech to text. The update comes with 30 new language and locale integrations to the already existing voice typing feature which currently supports 89 languages in Gboard on Android, Voice Search, Google Translate and other Google apps.
Google claims that the speech recognition will also support ancient languages such as Georgian (first spoken in 430AD approx.), and also adding Swahili and Amharic, which are two of Africa’s largest languages. It will also be adding many Indian languages such as Gujarati, Tamil, Bengali among others, in a bid to make the internet more inclusive.
Interestingly Google states that it went a step further for the language integrations by working with native speakers for collecting speech samples, asking them to read common phrases.
“This process trained our machine learning models to understand the sounds and words of the new languages and to improve their accuracy when exposed to more examples over time,” said Daan van Esch, Technical Program Manager, Speech, Google.
Apart from this update the company has also introduced voice dictation for emojis in US English. Now people in the US speaking English can say “smile faced emoji” instead of typing the symbol or selecting the emoji from the repository. The company states that it will be bringing this to more languages and locations soon.
Looking at the enterprise side of the update, Google Cloud Speech API was launched in beta last year to improve speech recognition for everything from voice-activated commands to call center routing to data analytics. After getting feedback that consumers want more functionality and control, the company has announced features that expand support for long-form audio and extension for language support to help customers inject AI into their businesses.
Another feature that Google has provided is word-level timestamps – which lets the user jump to the moment in the audio where the text was spoken, or display the relevant text while the audio is playing – time offsets (timestamps) are useful for analyzing longer audio files, where the user may need to search for a particular word in the recognized text and locate it in the original audio.
Talking about longer audio, Google has also increased the length of supported audio files from 80 minutes to up-to 3 hours. Additionally, the files that are longer than 3 hours could be supported on a case-by-case basis by applying for a quota extension through Cloud Support, states the company.