Speech recognition

最終更新日：2024年06月17日

Machine Learning

Overview

This BLOCK uses the Google Cloud Speech-to-Text API to convert speech from audio files into text.

See Basic Guide > Hints > How to use the Speech recognition BLOCK for details on using the Speech recognition BLOCK.

Note for Self-Service Plan users:
The Google Cloud Speech-to-Text API must be enabled to use this BLOCK. For details, refer to Basic Guide > Hints > Enabling Google APIs

In order to effectively use this BLOCK, we suggest reading through Google’s Best Practices guide for the Google Cloud Speech-to-Text API.

Properties

Property	Explanation
BLOCK name	Configure the name displayed on this BLOCK.
GCP service account	Select the GCP service account to use with this BLOCK.
Audio file URL	Designate the GCS URL where the audio file is stored.
Variable	Designate the variable that will store the resultant text data. Refer to Output specifications > Speech recognition for details.
Encoding	Designate the encoding type of the audio file saved to the Audio file URL. The following encoding types may be used: LINEAR16 FLAC MULAW AMR AMR_WB FLAC and LINEAR16 are recommended as the best encoding types for voice recognition. For further details, see the Basic Guide entry titled, Audio encoding for Google Cloud Speech-to-Text API. It explains more about each encoding type, and demonstrates how to convert them.
Sampling rate	Designate the sampling rate of the audio file saved to the Audio file URL. Sampling rates may range between 8,000 and 48,000 and are measured in Hertz (Hz). For best results, Google suggests using a 16,000 Hz sampling rate.
Language code	Designate the language to be detected from the audio file saved in the Audio file URL. For example, to detect American English, choose en-US. See Language Support for a full list of possible language codes.
BLOCK memos	Make notes about this BLOCK.
Max alternatives	When the audio data is converted into text, multiple recognition alternatives can be returned. This property sets the maximum number of these alternative results within a range of 0 to 30. Setting this to 0 or 1 will return a maximum of 1 alternative result.
Profanity filter	Activating this property will turn on the profanity filter, thus removing any swear words from the resultant text data.
Contextual word/phrase hints	Provide any words or phrases that might strengthen the Speech-to-Text API’s recognition accuracy.

BLOCKS Reference

Machine Learning

Speech recognition

Overview

Properties

この情報は役に立ちましたか？