BLOCKS Reference

Machine Learning

Speech recognition

Overview

This BLOCK uses the Google Cloud Speech-to-Text API to convert speech from audio files into text.

See Basic Guide > Hints > How to use the Speech recognition BLOCK for details on using the Speech recognition BLOCK.

Note for Self-Service Plan users:
The Google Cloud Speech-to-Text API must be enabled to use this BLOCK. For details, refer to Basic Guide > Hints > Enabling Google APIs

In order to effectively use this BLOCK, we suggest reading through Google’s Best Practices guide for the Google Cloud Speech-to-Text API.

Properties

Property Explanation
BLOCK name Configure the name displayed on this BLOCK.
GCP service account Select the GCP service account to use with this BLOCK.
Audio file URL Designate the GCS URL where the audio file is stored.
Variable

Designate the variable that will store the resultant text data.

Refer to Output specifications > Speech recognition for details.

Encoding Designate the encoding type of the audio file saved to the Audio file URL. The following encoding types may be used:
  • LINEAR16
  • FLAC
  • MULAW
  • AMR
  • AMR_WB

FLAC and LINEAR16 are recommended as the best encoding types for voice recognition. For further details, see the Basic Guide entry titled, Audio encoding for Google Cloud Speech-to-Text API. It explains more about each encoding type, and demonstrates how to convert them.

Sampling rate

Designate the sampling rate of the audio file saved to the Audio file URL. Sampling rates may range between 8,000 and 48,000 and are measured in Hertz (Hz).

For best results, Google suggests using a 16,000 Hz sampling rate.

Language code

Designate the language to be detected from the audio file saved in the Audio file URL. For example, to detect American English, choose en-US.

See Language Support for a full list of possible language codes.

BLOCK memos Make notes about this BLOCK.
Max alternatives

When the audio data is converted into text, multiple recognition alternatives can be returned. This property sets the maximum number of these alternative results within a range of 0 to 30.

Setting this to 0 or 1 will return a maximum of 1 alternative result.

Profanity filter Activating this property will turn on the profanity filter, thus removing any swear words from the resultant text data.
Contextual word/phrase hints Provide any words or phrases that might strengthen the Speech-to-Text API’s recognition accuracy.

この情報は役に立ちましたか?