Profile picture

Co-founder @ RMOTR

Google Speech to Text Tutorial

Last updated: August 1st, 20192019-08-01Project preview

The idea for this tutorial came from a few students that were dealing with the process of translating speech to text. With today's advances in Machine Learning, these services are more of a commodity than what they're are a cutting edge technology. Every major provider (AWS, IBM Watson, Azure, etc) offers speech to text services.

We find that Google's is probably the one that works best. It also includes a free trial which makes it ideal for quick tests.

The objective 👊

The objective will be to transcribe pieces of audio using Google Speech to Text. To do that, I'll show you how to setup the Google Cloud account, create a project, enable the Speech to Text service and create the buckets on Google Cloud Storage to upload the audio files.

The problem 🤦

The setup for Google's account + project configuration is tedious. It's not really intuitive the way we have to enable services, generate credentials etc.

This tutorial is built so it can be executed from any platform (even your own local laptop). Google's credentials (authentication service) are available automatically if you're working from withing Google's compute environments, but to do it locally, you must take a couple of extra steps.

The end result ✨

Here's the audio that we'll use for the demo. It's the first 30 seconds of Jacob Kaplan-Moss' Keynote from Pycon 2015.

In [2]:
import IPython
In [3]:
IPython.display.Audio("jacob-keynote.flac")
Out[3]: