Basic Guide

How to use the Speech recognition BLOCK

How to use the Speech recognition BLOCK

This page explains how to use the Speech recognition BLOCK to convert speech to text data.

Overview of using the Speech recognition BLOCK for speech recognition

We've prepared sample audio data for you to use. If you'd like to prepare your own audio file, please refer to the appendix for instructions on how to record and convert files.

For more information on the basics of how to use BLOCKS, refer to the Basic Guide. We recommend viewing the Basic Guide before following the tutorial on this page.

In particular, we recommend the following pages regarding the Flow Designer:

Table of Contents

  1. Setup
    Explains how to prepare the audio data and the Flow Designer's Flow.
  2. Explanation
    Explains details about the Flow prepared in the Setup section.
  3. Appendix
    Explains how to create the audio data used in this guide.

Setup

We’ve prepared an audio file and a file containing two Flow Designer Flows for you to download and use with this guide. Follow the instructions listed below to load them into your own BLOCKS environment.

Data Explanation
Sample audio file

This audio file is a reading of the following passage from the BLOCKS front page:

With an intuitive design and affordable pricing plans, BLOCKS makes getting started with Machine Learning simple. Now businesses can make data-driven decisions themselves, without the need for expensive services and expert help.

https://www.magellanic-clouds.com/blocks-demo/en/
  1. Download the sample audio file

    Click the link to the left to download the sample audio file.

  2. Upload to Google Cloud Storage (GCS)

    Upload the sample audio file to GCS.

    For help with uploading the sample files to GCS, refer to Uploading files to GCS.

Sample Flow

This file contains two sample Flows. The first simply uses the Speech recognition BLOCK, while the second also sends the results to BigQuery.

  1. Download the sample Flows

    Click the link to the left to download the file containing the sample Flows.

  2. Import the Flows into a Flow Designer

    Import the sample Flows file into your Flow Designer.

    Refer to Importing and exporting Flows for instructions on how to import Flows into a Flow Designer.

Explanation

The first Flow is a very simple example of using the Cloud Speech-to-Text API to convert an audio file into text.

Speech recognition BLOCK sample Flow (1)

By simply placing a Speech recognition BLOCK from the Machine Learning category, we can create a text transcription of an audio file that has been uploaded to GCS.

The chart below lists the property settings for each BLOCK (only those that differ from the default settings). Each BLOCK also contains an explanation for what it does in its “BLOCK memos” property, but these are not included in this chart.

BLOCK
(Category)
Property Value
Start of Flow
(Basic)
BLOCK name

Cloud Speech API sample (1)

info_outline We changed the name displayed on this BLOCK to distinguish this Flow from the other sample.

Speech recognition
(Machine Learning)
GCP service account If you have more than one GCP service account, select the account you want to use with this BLOCK.
Audio file GCS URL

gs://my-bucket/speech_api_sample_voice_en.flac

info_outline Replace the my-bucket portion with the name of your GCS bucket that contains the audio file.

Results variable _
Encoding FLAC
Sampling rate 16000
Language code English (United States)
Output to log
(Basic)
Variable to output _

Click the button within the Start of Flow BLOCK's (BLOCK name: Cloud Speech API sample (1)) properties to execute the Flow.

If successful, a log similar to the one shown below will be output to the Logs section.

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "with an intuitive design and affordable pricing plans blocks makes getting started with machine learning simple now businesses can make data-driven decisions themselves without the need for expensive services in expert help",
          "confidence": 0.9624174
        }
      ]
    }
  ],
  "gcs_url": "gs://my-bucket/speech_api_sample_voice_en.flac",
  "timestamp": 1497943130.0
}
 

"transcript": "with an intuitive design...in expert help" This portion contains the results of transcribing the audio file.

"confidence": 0.9624174 This portion contains a value showing the level of confidence that the results are correct. The confidence level is on a scale of 0.0 to 1.0 with a higher number showing a higher confidence. Generally, this is only output for the text with the highest confidence level.

These results have about a 96% confidence level, and there was a bit that wasn't transcribed correctly (shown in red letters above).

By entering important vocabulary or phrases into the Contextual word/phrase hints property of the Speech recognition BLOCK, we can raise the accuracy level of the transcription.

We'll be doing this in the second example Flow, as well as storing the transcription results into BigQuery.

Speech recognition BLOCK sample Flow (2)

The chart below lists the property settings for each BLOCK (only those that differ from the default settings). Each BLOCK also contains an explanation for what it does in its “BLOCK memos” property, but these are not included in this chart.

BLOCK
(Category)
Property Value
Start of Flow
(Basic)
BLOCK name

Cloud Speech API sample (2)

We changed the name displayed on this BLOCK to distinguish this Flow from the other sample.

Speech recognition
(Machine Learning)
GCP service account If you have more than one GCP service account, select the account you want to use with this BLOCK in this property.
Audio file GCS URL

gs://my-bucket/speech_api_sample_voice_en.flac

Replace the my-bucket portion with the name of your GCS bucket that contains the audio file.

Results variable _
Encoding FLAC
Sampling rate 16000
Language code English (US)
Max alternatives

3

Since we'll be storing the results into BigQuery this time, we'll get 3 alternative transcriptions that we can compare later.

Contextual word/phrase hints
  • and expert

By entering this, we’re hoping to achieve the following results:

  • “in expert help” → “and expert help”
Load to table from variable
(BigQuery)
GCP service account If you have more than one GCP service account, select the account you want to use with this BLOCK in this property.
Source variable _
Destination dataset

blocks_samples

You can freely rename this to the dataset of your choosing.

Destination table

speech_api

You can freely rename this to the table of your choosing.

Schema settings
results RECORD REPEATED
results.alternatives RECORD REPEATED
results.alternatives.transcript STRING NULLABLE
results.alternatives.confidence FLOAT NULLABLE
gcs_url STRING NULLABLE
timestamp TIMESTAMP NULLABLE

You can quickly enter these schema settings by clicking the Edit as JSON link and copy-pasting the following code.

[
 {
  "name": "results",
  "type": "RECORD",
  "mode": "REPEATED",
  "fields": [
   {
    "name": "alternatives",
    "type": "RECORD",
    "mode": "REPEATED",
    "fields": [
     {
      "name": "transcript",
      "type": "STRING",
      "mode": "NULLABLE"
     },
     {
      "name": "confidence",
      "type": "FLOAT",
      "mode": "NULLABLE"
     }
    ]
   }
  ]
 },
 {
  "name": "gcs_url",
  "type": "STRING",
  "mode": "NULLABLE"
 },
 {
  "name": "timestamp",
  "type": "TIMESTAMP",
  "mode": "NULLABLE"
 }
]
In cases of non-empty tables

Overwrite

We'll overwrite the transcription results each time we execute the Flow. If you'd prefer to add new results to the table without overwriting previous results, select Append instead.

Execute query
(BigQuery)
GCP service account If you have more than one GCP service account, select the account you want to use with this BLOCK in this property.
SQL syntax Legacy SQL
Query
SELECT
  results.alternatives.transcript as transcript
FROM
  [blocks_samples.speech_api]
WHERE
  results.alternatives.confidence > 0
ORDER BY
  timestamp desc
LIMIT
  1

Be sure to replace the 4th row's blocks_samples and speech_api portions with the values you used in the Load to table from variable BLOCK's Destination dataset and Destination table properties.

Dataset for storing results (Blank)
Table for storing results (Blank)
Variable for storing results

(Blank)

If you leave the Dataset for storing results, Table for storing results, and Variable for storing results properties blank, the query's results will be stored into the default BLOCKS variable named _.

Output to log
(Basic)
Variable to output _

Click the button within the Start of Flow BLOCK's (BLOCK name: Cloud Speech API sample (2)) properties to execute the Flow.

If successful, a log similar to the one shown below will be output to the Logs section.

[
  {
    "transcript": "with an intuitive design and affordable pricing plans blocks makes getting started with machine learning simple now businesses can make data-driven decisions themselves without the need for expensive services and expert help"
  }
]

As with the previous example, the "transcript": "with an intuitive...and expert help" portion contains the results of transcribing the audio file. The results have changed from the first sample since we entered values into the Speech recognition BLOCK's Contextual word/phrase hints property (shown in blue letters above).

Appendix

We'll briefly explain in this section how we recorded the sample audio file. You can use this for reference to test things with your own audio data.

There are built-in applications in Windows and macOS (OS X) that you can use to record audio. However, since these do not record in a format supported by the Speech recognition BLOCK, you cannot use these files right away. You will need to install a separate application to convert audio files into a format supported by the Speech recognition BLOCK.

We will introduce how to install the program Audacity and use it to record audio. Audacity is an open-source program for recording and editing audio that can output files in the FLAC format supported by the Speech recognition BLOCK.

Instructions for installing and using Audacity to record data will be explained in the following order:

Installing Audacity
For Windows 10 users
  1. Open the Audacity website.
    Audacity website
  2. Click the Download Audacity 2.1.3 link.

    2.1.3 refers to the Audacity version number. This guide was written on 2017/6/1, and as such the version number may vary. If so, please download and use the latest version of Audacity.

    Audacity download page
  3. This will open the download page. Click the Audacity for Windows link.
    Audacity for Windows link
  4. Click the Audacity 2.1.3 installer link within the Audacity Windows version download page.
    Audacity Windows version download page
  5. Open the audacity-win-2.1.3.exe file once it has downloaded.
    Audacity for Windows installer
  6. You will be asked for permission to install Audacity. Click the Yes button.
    Audacity for Win installer permission
  7. During the installation, select your preferred language and click the OK button.
    Setup language selection screen
  8. The Audacity setup wizard welcome screen will be shown. Click the Next > button.
    Audacity setup wizard welcome screen
  9. A screen with important information about Audacity will be shown. Click the Next > button.
    Audacity setup important information screen
  10. Select the folder Audacity will be installed to and click the Next > button.
    Audacity setup destination folder screen
  11. The Select Additional Tasks screen will be shown. Choose whether or not to create a desktop shortcut and click the Next > button.
    Audacity setup additional tasks screen
  12. The Ready to Install screen will be shown. Click the Install button to start installing Audacity.
    Audacity installation screen
  13. A page with important information will be displayed once Audacity finishes installing. Click the Next > button.
    Audacity setup important information screen (post-installation)
  14. The Completing the Audacity Setup Wizard screen will be displayed. Click the Finish button.
    By default, the Launch Audacity option is selected, so clicking Finish will also launch Audacity.
    Audacity setup complete screen
For macOS/OS X users
  1. Open the Audacity webpage.
    Audacity webpage
  2. Click the Download Audacity 2.1.3 link.

    2.1.3 refers to the Audacity version number. This guide was written on 2017/6/1, and as such the version number may vary. If so, please download and use the latest version of Audacity.

    Audacity download page
  3. This will open the download page. Click the Audacity for Mac OS X/macOS link.
    Audacity for Mac download page
  4. Click the Audacity 2.1.3 .dmg file link within the Mac version download page.
    Audacity macOS/OS X version download page
  5. Once downloaded, double click the audacity-macos-2.1.3.dmg file from the Finder to open it.
    Audacity downloaded file
  6. Drag and drop the Audacity icon into the Applications folder.
    Install Audacity to the applications folder
  7. Close the above window.
  8. Right click the Audacity 2.1.3 icon on the desktop and select Eject Audacity 2.1.3. This completes the installation process.
    Audacity eject from desktop
Recording audio with Audacity

The Audacity screen is divided into various sections. The sections required for recording audio and their names are shown in the image below.

Audacity Project Window screen section names

For macOS/OS X users, the menu bar will be at the top of the screen, rather than within the Audacity window.

The process for recording audio is as follows:

  1. Set the “Recording Device” and “Recording Channel” from the Device Toolbar.

    Audacity Device Toolbar
    • Under “Recording Device”, choose the device you will use to make the recording.
    • Under “Recording Channel”, select 1 (Mono) Recording Channel.
  2. From the Selection Toolbar in the Lower Toolbar dock area, set the Project Rate (Hz) to 16000 as recommended for the Cloud Speech API.

    Audacity Selection Toolbar
  3. With that, you are ready to record the audio. However, we will first check the audio levels (without recording) by clicking the Recording Meter Toolbar.

    Audacity Recording Meter
  4. Adjust the recording level using the Recording Slider portion of the Mixer Toolbar.

    Audacity Mixer Toolbar
  5. After adjusting the recording level, click the Recording Meter Toolbar again to check the recording level. Repeat these steps as needed.

  6. Click the Record button from the Transport Toolbar to begin recording. Press the Stop button to stop the recording.
    Audacity Transport Toolbar
  7. To export the audio as a FLAC format file, open the File menu and select Export Audio.

    Audacity File menu
  8. From the Export Audio window, set each required item and click the Save button.

    Audacity Export Audio window

    Refer to the following list for each item (macOS/OS X terms shown in parentheses):

    • File name (Save as): Set the file name.
    • Save in (Where): Set the location where the file will be saved.
    • Save as type (File type): Select FLAC Files.
    • Level: Set the compression level between 0 and 8. The default level is 5. 0 refers to no compression, and the compression rate increases with each higher number.
    • Bit depth: Select 16 bit.