BLOCKS Reference

Block Reference

Machine Learning

Cloud Speech API

This BLOCK uses the Google Cloud Speech API to convert speech from audio files into text.

This BLOCK is currently in beta. Be aware that the beta version of this BLOCK will become unavailable post official release.
*Please make use of the official BLOCK once released.

Due to its nature as a beta release, there is the possibility that some functions may not execute properly. We appreciate feedback from users, through the BLOCKS Forum or direct contact, regarding bugs or ways to improve BLOCKS.

warning Note for Self-Service Plan users:
The Google Cloud Speech API must be enabled to use this BLOCK. For details, refer to Basic Guide > Hints > Enabling Google APIs

In order to effectively use this BLOCK, we suggest reading through Google’s Best Practices guide for the Google Cloud Speech API.

Property name Explanation
BLOCK name Designate a name for BLOCKS. The names will be displayed on the BLOCKS.
GCP service account Select the GCP service account for use with this BLOCK.
Audio file URL Designate the GCS URL where the audio file is stored.
Encoding Designate the encoding type of the audio file saved to the “Audio file URL”. The following encoding types may be used:
  • LINEAR16
  • FLAC
  • AMR
  • AMR_WB

FLAC and LINEAR16 are recommended as the best encoding types for voice recognition. For further details, see the Basic Guide entry titled, Audio encoding for Google Cloud Speech API. It explains more about each encoding type, and demonstrates how to convert them.

Sampling rate

Designate the sampling rate of the audio file saved to the “Audio file URL”. Sampling rates may range between 8,000 and 48,000 and are measured in Hertz (Hz).

For best results, Google suggests using a 16,000 Hz sampling rate.

Language code

Designate the language to be detected from the audio file saved in the “Audio file URL”. For example, to detect American English, choose “en-US”.

See Language Support for a full list of possible language codes.

Variable Designate the variable that will store the resultant text data.
BLOCK memos Make any comments regarding BLOCKS.
Max alternatives

When the audio data is converted into text, multiple recognition alternatives can be returned. This property sets the maximum number of these alternative results within a range of 0 to 30.

Setting this to 0 or 1 will return a maximum of 1 alternative result.

Profanity filter Activating this property will turn on the profanity filter, thus removing any swear words from the resultant text data.
Contextual word/phrase hints Provide any words or phrases that might strengthen the Speech API’s recognition accuracy.