BLOCKS Reference

Machine Learning

Speech recognition using selected model (beta)


This BLOCK uses a pre-made speech recognition models from the Google Cloud Speech-to-Text service to transcribe videos, phone conversations, and the like.

We recommend reading Google’s Best Practices guide for effectively using the Cloud Speech-to-Text service before using this BLOCK.

Self-Service Plan users:
You must enable the Google Cloud Speech-to-Text API to use this BLOCK. Refer to Basic Guide > Hints > Enabling Google APIs for details.

This BLOCK is currently in beta and may become usable after the official version is released. Please switch to using the official version at that time.

As a beta version, some features may not work as intended. We appreciate your feedback regarding bugs and ways to improve MAGELLAN BLOCKS.


Property Explanation
BLOCK name Configure the name displayed on this BLOCK.
GCP service account Select the GCP service account to use with this BLOCK.
Audio file GCS URL Designate the GCS URL of the audio file that will be analyzed.

Select which of the pre-made models will be used to analyze the audio file.

Model Explanation

Best for phone conversations.


Best for videos and audio files with multiple speakers. This model is ideal for audio recorded at a sample rate over 16,000 Hz.


Best for audio of short voice commands and voice searches.

Results variable

Designate the variable that will store the text transcription of the audio file.

For details about this BLOCK’s output, refer to Output specifications > Speech recognition.


Designate the encoding of the audio file to be transcribed. The following encodings can be selected:

  • LINEAR16
  • FLAC
  • AMR
  • AMR_WB
  • MP3

FLAC and LINEAR16 are recommended as the best encoding types for voice recognition. For further details on each encoding and how to convert types, refer to Audio encoding for Google Cloud Speech-to-Text API.

Sampling rate

Designate the sampling rate of the audio file to be transcribed. Sampling rates can range between 8,000 – 48,000 Hertz (Hz).

For best results, Google recommends using audio with a 16000 Hz sampling rate.


Select the language of the audio to be transcribed.

See Language Support for a full list of supported languages.

BLOCK memos Make notes about this BLOCK.
Max alternatives

When the audio data is converted into text, multiple recognition alternatives can be returned. This property sets the maximum number of these alternative results within a range of 0 to 30.

Setting this to 0 or 1 will return a maximum of 1 result.

Profanity filter Activating this property will turn on the profanity filter, thus removing any swear words from the resultant text data.
Contextual word/phrase hints Provide any words or phrases that might strengthen the Speech-to-Text API’s recognition accuracy.