Last time, we introduced some basics regarding BigQuery and how to operate its web UI.
In this post, we'll show you how to start using BigQuery in MAGELLAN BLOCKS.
It's simple enough to use the web UI or bq Command-Line Tool to execute a single query once.
However, in real work situations, it will become necessary to do things like execute multiple queries or set queries to execute on a regular schedule as batches.
For that, you would usually need to write a program that utilizes the BigQuery API.
Or, by using BigQuery BLOCKS in MAGELLAN BLOCKS, you can accomplish these things without needing to program anything.
With that, let's try using MAGELLAN BLOCKS to make a query.
1. Make a Flow that executes regularly timed BigQuery queries
1-1. Connect BLOCKS
Open the Board editor and drag BLOCKS from the left menu to start connecting them.
We'll be using the following three BLOCKS:
1) Start of Flow (Basic category)
2) Execute query (BigQuery category)
3) End of Flow (Basic category)
*All Flows must start with a Start of Flow BLOCK and end with an End of Flow BLOCK.
1-2. Setting up timed Flow execution
Next, we'll set up our Flow to execute on a timed schedule.
By using crontab format commands, you can flexibly schedule your Flows to execute by the minute/hour/date/month/day, according to your work needs.
This time, we'll make our Flow execute every three minutes.
From the Properties menu of the Start of Flow BLOCK, set Start time to * /3 * * * * and check the box labelled Activating start times.
1-3. Write a query
Next, we'll write the main part of our Flow, the query itself.
We'll use the scores table from the my_school dataset we created last time.
From the data in this table, we want to calculate the average scores in each subject for male and female students separately.
Our query will be like the following:
SELECT sex, AVG( language_score ) as avg_language_score, AVG( math_score ) as avg_math_score, AVG( social_score ) as avg_social_score, AVG( sciencr_score ) as avg_sciencr_score FROM my_school.scores GROUP BY sex
Now, let’s add this into the Properties menu of our Execute query BLOCK.
Set Result storage data set to my_school, and Result storage table to avg_by_sex. For the In cases of non-empty tables setting, choose Overwrite.
The avg_by_sex table will be created automatically by BigQuery based on the results of our query, so there's no need to create it or set its schema beforehand.
*Don't forget to select the GCP service account that we set up in Part 1.
There's one point to keep in mind here:
Within the extra settings found by clicking Advanced settings, you should find an item called Query priority level.
This is set to Batch by default. However, as explained in the Properties menu, this can cause the query to execute slowly since it will wait until BigQuery resources are available. Switching this setting to Interactive will make the executions run faster.
Click the save button in the upper right once you've finished configuring the settings.
That's it! The Flow is done and should be executing!
2. Executing the Flow
Since we set the Flow to execute on a timed schedule, it should now be executing every three minutes. Let's check to make sure.
Open up the Log display at the bottom of the Board editor. Logs of each Flow execution will be output here.
Is your Flow executing every three minutes?
3. Confirm results
Next, let's go to the BigQuery console to make sure the results are being output into the table properly.
Looks like the table we set, avg_by_sex, was created, and the averages in each subject for male and female students are there!
4. Stopping Flow execution
Once you've finished, open the Start of Flow BLOCK's properties and uncheck Activating start times. Then, click Save and the Flow will stop executing.
How was that? This was just a simple example, but we hope you can see how BLOCKS allows for creating and executing batches for BigQuery without the need for complicated programming.
There are still lots of other BLOCKS available for use, so try testing some out yourself by incorporating them into your Flows!