BLOCKS Reference

Block Reference

BigQuery

Load to multiple tables from GCS

This BLOCK loads data from a group of files in GCS into multiple BigQuery tables in parallel.

Property Explanation
BLOCK name Configure the name displayed on this BLOCK.
GCP service account Select the GCP service account to use with this BLOCK.
Source data file group GCS URL

Designate a GCS URL for files containing the data that will be sent to BigQuery tables. This URL should be formatted as gs://bucketname/objectname*.csv. The asterisk will be replaced with any string greater than 0 characters, so any matching filenames will be read.

[% format character addressable] [variable expansion addressable]

Destination dataset

Designate the destination dataset's ID.

[% format character addressable] [variable expansion addressable]

Destination tables

Designate a prefix for the IDs of the destination tables.

Table IDs are created with this prefix attached to the file names (minus the file extension) designated in the URL of the "Source data file group GCS URL" property.

[% format character addressable] [variable expansion addressable]

Schema settings

Designate schema for the destination tables.

All tables will share the same schema, so the data sent from files must also share the same schema. An error will occur if schema vary.

This property can be skipped when the source data files contain JSON format data.

*You can set schema directly as JSON by clicking the "Edit as JSON" link.

In cases of non-empty tables

Select which action to perform when the destination tables already contain data.

  • Append: Appends new data to the table.
  • Overwrite: Overwrites the table with the new data.
  • Error: An error occurs if the table is not empty.
BLOCK memos Make notes about this BLOCK.
Reattempts in case of errors Configure the number of attempts to try in case of a request error.
Minimum timeout interval Set the number of seconds to wait for results. If results are not returned during this interval, the time will be doubled for each reattempt until the time set in the “Maximum timeout interval” property is reached.
Maximum timeout interval Indicate the maximum number of seconds to wait for results. The timeout interval will start with the value set in the "Minimum timeout interval" property and double with each reattempt until reaching the value set here.
File format

Choose a file format from those in GCS. Permissible formats are as follow.

  • CSV
  • NEWLINE_DELIMITED_JSON
  • DATASTORE_BACKUP
CSV delimiter character

Select the delimiter character used for CSV files.

  • Comma
  • Tab
  • Pipe
  • Other

If you choose "Other", specify a delimiter character in the accompanying field.

Number of skipped rows Configure the number of lead rows to skip for CSV files.
Permit rows with insufficient fields Select whether or not to permit rows with insufficient fields for CSV files.
Designate quotation marks Designate the character used for quotation marks for CSV files.
Allow line breaks within quoted fields Select whether or not allow quoted fields to contain line breaks.
Max number of bad rows Configure how many bad rows to allow before resulting in an error.
Ignore extra fields Select whether or not to ignore excess fields.
Trigger file URL

Designate a URL that BLOCKS will use to check if a file has been saved before starting the data transfer. If left blank, the data transfer will start without checking if a file has been saved.

[% format character addressable] [variable expansion addressable]

File check attempts Configure the maximum number or times to check if a file has been saved to the trigger file URL.
Time between checks Configure how many seconds to wait between checks on whether a file has been saved to the trigger file url.