Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail and YouTube. Alongside a set of management tools, it provides a series of modular cloud services including computing, data storage, data analytics and machine learning. Registration requires a credit card or bank account details. Google Cloud Platform provides infrastructure as a service, platform as a service, and server-less computing environments. And one of the GCP that we would discuss today is regarding the Google Speech-to-Text service.
We find it interesting to talk more about the Speech-to-Text service rather than the widely known Text-to-Speech services, since there aren’t much known applications for the Speech-to-Text service. One of prime functions of it is to convert speech into text, which is usually done on a text editor. Basically if you’re hands are tied, you can just speak the words that you would usually type.
Another function of Speech-to-Text is more widely observed in the smart assistant (some famous example using this concept is Apple’s Siri, Amazon’s Alexa, and Google’s Google Assistant). It allows users to communicate with their device via speech instead of the usual touch input.
One of the glaring advantage of using Google’s Speech-to-Text is that it is a cheaper alternative than some of the speech-to-text algorithm that we can find available, apart from being one of the most famous one as well. It’s WAY cheaper than public sites which charge 3-5$ per minute whereas based on the Speech API web page Google charges 0.024$ per minute.
But unfortunately, that advantage is only beneficial if you’re a programming expert, since you need to set up the API on your own to your website’s coding. The reason the other services charge more than google is because they have built in pretty user interfaces, they handle the uploading and storing of files for you, they handle retrying the transcription in the event of an error, they handle Google’s rate limiting and quotas for you, and remove the issue of having to managing security of your files.
Basically speaking, if you’re either:
Maybe something like Dragon Naturally Speaking (Dragon Speech Recognition) would be better suited for you.
It’s probably self explanatory on why you need expert programming skills to install the Google API, but what about the CMS? Most of the famous CMS doesn’t really allow for external API’s to be added to their base coding. This is because they mostly expect their users to make use of the plugins available for any uses (this mostly focuses on WordPress, which has a wide variety of plugins that exists to solve a lot of commonly reoccurring problems – or for aesthetic purposes). Hence, based on that reason alone, I would discourage you to add external API’s to your CMS website. Unless you have like 10 years experience or a few people on your team that can actually work together to piece the API integration together (seriously though, it’s becoming increasingly common for people to throw the word API around and how easy it is to integrate it with a website, but how many people actually know how to do it? It’s long and tedious work and sometimes the results aren’t as you would expect). Basically, foreign API doesn’t sit too well with a CMS’s API.
Now, let’s focus on the two main things Speech-to-Text algorithm are mostly used for:
Now the first one is fairly simple enough:
The second one is a bit more complex:
Now, the concept that you need to understand is that there are a few required components for the Speech-to-Text:
The main audio source is usually the device’s microphone; microphone on your laptop / smartphone, or on the smart assistant device. Besides, the Google’s Speech-to-Text isn’t programmed to detect audio from other video sources on the same device (although that would be neat). As far as technology goes, so far there’s no hardware that allows for detecting other video sources inside the same device.
This one is self explanatory. Google’s Speech-to-Text is somewhat related to Google Translate. Now, have you ever wondered how Google Translate knows all the words that you input into it? Of course, they have a massive database that aids in collecting and recording inputted text and user corrected text. Now for Speech-to-Text, take the Google Translate and multiply it by 2. This is because, for every language available, there are 2 types of data stored; text, and audio. So when the algorithm detects either one of the input method, it can directly know what language is inputted. Just that what Google Speech-to-Text does is translate the audio to text.
Now, this one usually seems to be one of the most forgotten aspect for the Speech-to-Text. In order to output the text, you need to prepare a space for it to show up. If not, then the text would only be kept in the database and have nowhere to go. This is also one of the problems that comes with a pre-formatted CMS. It’s only a probably option to put the Speech-to-Text into the page / post editor, but it’s almost next to impossible to try and create an outside text editor to output onto a page / post (since as we’ve covered before is the usual function for speech-to-text).
Now once you’ve covered all these 3 basic requirements, you can get ready to start coding the algorithm into your website. You just need to set where the audio source is going to come from, and set up a connection to the API database, and set back the output for the speech.
If you’ve been coding for a while, you probably can expect the first try will almost always return back an error. Now there area several common errors in integrating the API, what you can usually expect are:
Nothing Comes Out
Some Misinformed Text Is Outputted
We’ll add more to this posting whenever we can, but for now, that’s about it. Have fun trying to integrated the Google API, and if there’s any question, you can hit us up at our e-mail.