How to use the Speech recognition BLOCK

Setup

We’ve prepared an audio file and a file containing two Flow Designer Flows for you to download and use with this guide. Follow the instructions listed below to load them into your own BLOCKS environment.

Data	Explanation
Sample audio file	This audio file is a reading of the following passage from the BLOCKS front page: With an intuitive design and affordable pricing plans, BLOCKS makes getting started with Machine Learning simple. Now businesses can make data-driven decisions themselves, without the need for expensive services and expert help. https://www.magellanic-clouds.com/blocks/en/ Download the sample audio file Click the link to the left to download the sample audio file. Upload to Google Cloud Storage (GCS) Upload the sample audio file to GCS. For help with uploading the sample files to GCS, refer to Uploading files to GCS.
Sample Flow	This file contains two sample Flows. The first simply uses the Speech recognition BLOCK, while the second also sends the results to BigQuery. Download the sample Flows Click the link to the left to download the file containing the sample Flows. Import the Flows into a Flow Designer Import the sample Flows file into your Flow Designer. Refer to Importing and exporting Flows for instructions on how to import Flows into a Flow Designer.

Data

Explanation

Sample audio file

This audio file is a reading of the following passage from the BLOCKS front page:

With an intuitive design and affordable pricing plans, BLOCKS makes getting started with Machine Learning simple. Now businesses can make data-driven decisions themselves, without the need for expensive services and expert help.
https://www.magellanic-clouds.com/blocks/en/

Download the sample audio file

Click the link to the left to download the sample audio file.
Upload to Google Cloud Storage (GCS)

Upload the sample audio file to GCS.

For help with uploading the sample files to GCS, refer to Uploading files to GCS.

Sample Flow

This file contains two sample Flows. The first simply uses the Speech recognition BLOCK, while the second also sends the results to BigQuery.

Download the sample Flows

Click the link to the left to download the file containing the sample Flows.
Import the Flows into a Flow Designer

Import the sample Flows file into your Flow Designer.

Refer to Importing and exporting Flows for instructions on how to import Flows into a Flow Designer.

Explanation

The first Flow is a very simple example of using the Cloud Speech-to-Text API to convert an audio file into text.

Speech recognition BLOCK sample Flow (1)

By simply placing a Speech recognition BLOCK from the Machine Learning category, we can create a text transcription of an audio file that has been uploaded to GCS.

The chart below lists the property settings for each BLOCK (only those that differ from the default settings). Each BLOCK also contains an explanation for what it does in its “BLOCK memos” property, but these are not included in this chart.

BLOCK (Category)	Property	Value
Start of Flow (Basic)	BLOCK name	Cloud Speech API sample (1) We changed the name displayed on this BLOCK to distinguish this Flow from the other sample.
Speech recognition (Machine Learning)	GCP service account	If you have more than one GCP service account, select the account you want to use with this BLOCK.
	Audio file GCS URL	gs://my-bucket/speech_api_sample_voice_en.flac Replace the my-bucket portion with the name of your GCS bucket that contains the audio file.
	Results variable	_
	Encoding	FLAC
	Sampling rate	16000
	Language code	English (United States)
Output to log (Basic)	Variable to output	_

Click the button within the Start of Flow BLOCK's (BLOCK name: Cloud Speech API sample (1)) properties to execute the Flow.

If successful, a log similar to the one shown below will be output to the Logs section.

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "with an intuitive design and affordable pricing plans blocks makes getting started with machine learning simple now businesses can make data-driven decisions themselves without the need for expensive services in expert help",
          "confidence": 0.9624174
        }
      ]
    }
  ],
  "gcs_url": "gs://my-bucket/speech_api_sample_voice_en.flac",
  "timestamp": 1497943130.0
}

"transcript": "with an intuitive design...in expert help" This portion contains the results of transcribing the audio file.

"confidence": 0.9624174 This portion contains a value showing the level of confidence that the results are correct. The confidence level is on a scale of 0.0 to 1.0 with a higher number showing a higher confidence. Generally, this is only output for the text with the highest confidence level.

These results have about a 96% confidence level, and there was a bit that wasn't transcribed correctly (shown in red letters above).

By entering important vocabulary or phrases into the Contextual word/phrase hints property of the Speech recognition BLOCK, we can raise the accuracy level of the transcription.

We'll be doing this in the second example Flow, as well as storing the transcription results into BigQuery.

Speech recognition BLOCK sample Flow (2)

The chart below lists the property settings for each BLOCK (only those that differ from the default settings). Each BLOCK also contains an explanation for what it does in its “BLOCK memos” property, but these are not included in this chart.

BLOCK
(Category)

Property

Value

Start of Flow
(Basic)

BLOCK name

Cloud Speech API sample (2)

We changed the name displayed on this BLOCK to distinguish this Flow from the other sample.

Speech recognition
(Machine Learning)

GCP service account

If you have more than one GCP service account, select the account you want to use with this BLOCK in this property.

Audio file GCS URL

gs://my-bucket/speech_api_sample_voice_en.flac

Replace the my-bucket portion with the name of your GCS bucket that contains the audio file.

Results variable

_

Encoding

FLAC

Sampling rate

16000

Language code

English (US)

Max alternatives

3

Since we'll be storing the results into BigQuery this time, we'll get 3 alternative transcriptions that we can compare later.

Contextual word/phrase hints

and expert

By entering this, we’re hoping to achieve the following results:

“in expert help” → “and expert help”

Load to table from variable
(BigQuery)

GCP service account

If you have more than one GCP service account, select the account you want to use with this BLOCK in this property.

Source variable

_

Destination dataset

blocks_samples

You can freely rename this to the dataset of your choosing.

Destination table

speech_api

You can freely rename this to the table of your choosing.

Schema settings

results	RECORD	REPEATED
results.alternatives	RECORD	REPEATED
results.alternatives.transcript	STRING	NULLABLE
results.alternatives.confidence	FLOAT	NULLABLE
gcs_url	STRING	NULLABLE
timestamp	TIMESTAMP	NULLABLE

You can quickly enter these schema settings by clicking the Edit as JSON link and copy-pasting the following code.

[
 {
  "name": "results",
  "type": "RECORD",
  "mode": "REPEATED",
  "fields": [
   {
    "name": "alternatives",
    "type": "RECORD",
    "mode": "REPEATED",
    "fields": [
     {
      "name": "transcript",
      "type": "STRING",
      "mode": "NULLABLE"
     },
     {
      "name": "confidence",
      "type": "FLOAT",
      "mode": "NULLABLE"
     }
    ]
   }
  ]
 },
 {
  "name": "gcs_url",
  "type": "STRING",
  "mode": "NULLABLE"
 },
 {
  "name": "timestamp",
  "type": "TIMESTAMP",
  "mode": "NULLABLE"
 }
]

In cases of non-empty tables

Overwrite

We'll overwrite the transcription results each time we execute the Flow. If you'd prefer to add new results to the table without overwriting previous results, select Append instead.

Execute query
(BigQuery)

GCP service account

If you have more than one GCP service account, select the account you want to use with this BLOCK in this property.

SQL syntax

Legacy SQL

Query

SELECT
  results.alternatives.transcript as transcript
FROM
  [blocks_samples.speech_api]
WHERE
  results.alternatives.confidence > 0
ORDER BY
  timestamp desc
LIMIT
  1

Be sure to replace the 4th row's blocks_samples and speech_api portions with the values you used in the Load to table from variable BLOCK's Destination dataset and Destination table properties.

Dataset for storing results

(Blank)

Table for storing results

(Blank)

Variable for storing results

(Blank)

If you leave the Dataset for storing results, Table for storing results, and Variable for storing results properties blank, the query's results will be stored into the default BLOCKS variable named _.

Output to log
(Basic)

Variable to output

_

Click the button within the Start of Flow BLOCK's (BLOCK name: Cloud Speech API sample (2)) properties to execute the Flow.

If successful, a log similar to the one shown below will be output to the Logs section.

[
  {
    "transcript": "with an intuitive design and affordable pricing plans blocks makes getting started with machine learning simple now businesses can make data-driven decisions themselves without the need for expensive services and expert help"
  }
]

As with the previous example, the "transcript": "with an intuitive...and expert help" portion contains the results of transcribing the audio file. The results have changed from the first sample since we entered values into the Speech recognition BLOCK's Contextual word/phrase hints property (shown in blue letters above).

Appendix

We'll briefly explain in this section how we recorded the sample audio file. You can use this for reference to test things with your own audio data.

There are built-in applications in Windows and macOS (OS X) that you can use to record audio. However, since these do not record in a format supported by the Speech recognition BLOCK, you cannot use these files right away. You will need to install a separate application to convert audio files into a format supported by the Speech recognition BLOCK.

We will introduce how to install the program Audacity and use it to record audio. Audacity is an open-source program for recording and editing audio that can output files in the FLAC format supported by the Speech recognition BLOCK.

Instructions for installing and using Audacity to record data will be explained in the following order:

Installing Audacity
- For Windows 10 users
- For macOS/OS X users
Recording audio with Audacity

Installing Audacity

For Windows 10 users

Open the Audacity website.
Click the Download Audacity 2.1.3 link.
2.1.3 refers to the Audacity version number. This guide was written on 2017/6/1, and as such the version number may vary. If so, please download and use the latest version of Audacity.
This will open the download page. Click the Audacity for Windows link.
Click the Audacity 2.1.3 installer link within the Audacity Windows version download page.
Open the audacity-win-2.1.3.exe file once it has downloaded.
You will be asked for permission to install Audacity. Click the Yes button.
During the installation, select your preferred language and click the OK button.
The Audacity setup wizard welcome screen will be shown. Click the Next > button.
A screen with important information about Audacity will be shown. Click the Next > button.
Select the folder Audacity will be installed to and click the Next > button.
The Select Additional Tasks screen will be shown. Choose whether or not to create a desktop shortcut and click the Next > button.
The Ready to Install screen will be shown. Click the Install button to start installing Audacity.
A page with important information will be displayed once Audacity finishes installing. Click the Next > button.
The Completing the Audacity Setup Wizard screen will be displayed. Click the Finish button.
By default, the Launch Audacity option is selected, so clicking Finish will also launch Audacity.

For macOS/OS X users

Open the Audacity webpage.
Click the Download Audacity 2.1.3 link.
2.1.3 refers to the Audacity version number. This guide was written on 2017/6/1, and as such the version number may vary. If so, please download and use the latest version of Audacity.
This will open the download page. Click the Audacity for Mac OS X/macOS link.
Click the Audacity 2.1.3 .dmg file link within the Mac version download page.
Once downloaded, double click the audacity-macos-2.1.3.dmg file from the Finder to open it.
Drag and drop the Audacity icon into the Applications folder.
Close the above window.
Right click the Audacity 2.1.3 icon on the desktop and select Eject Audacity 2.1.3. This completes the installation process.

Recording audio with Audacity

The Audacity screen is divided into various sections. The sections required for recording audio and their names are shown in the image below.

Audacity Project Window screen section names

For macOS/OS X users, the menu bar will be at the top of the screen, rather than within the Audacity window.

The process for recording audio is as follows:

Set the “Recording Device” and “Recording Channel” from the Device Toolbar.
- Under “Recording Device”, choose the device you will use to make the recording.
- Under “Recording Channel”, select 1 (Mono) Recording Channel.
From the Selection Toolbar in the Lower Toolbar dock area, set the Project Rate (Hz) to 16000 as recommended for the Cloud Speech API.
With that, you are ready to record the audio. However, we will first check the audio levels (without recording) by clicking the Recording Meter Toolbar.
Adjust the recording level using the Recording Slider portion of the Mixer Toolbar.
After adjusting the recording level, click the Recording Meter Toolbar again to check the recording level. Repeat these steps as needed.
Click the Record button from the Transport Toolbar to begin recording. Press the Stop button to stop the recording.
To export the audio as a FLAC format file, open the File menu and select Export Audio.
From the Export Audio window, set each required item and click the Save button.

Refer to the following list for each item (macOS/OS X terms shown in parentheses):
- File name (Save as): Set the file name.
- Save in (Where): Set the location where the file will be saved.
- Save as type (File type): Select FLAC Files.
- Level: Set the compression level between 0 and 8. The default level is 5. 0 refers to no compression, and the compression rate increases with each higher number.
- Bit depth: Select 16 bit.