In the last part of this series, we registered our GCP service account information into BLOCKS.
Now we can actually use BigQuery BLOCKs, but there are a few things we should know about BigQuery first.
In today’s post, I’ll introduce some of the main components of BigQuery. Then, we’ll try out using BigQuery within its web-based console. This should give you the basic understanding of BigQuery that you’ll need to start using BigQuery in BLOCKS.
Components of BigQuery
BigQuery is made up of 3 main parts: tables, datasets, and projects. I’ll explain a bit about each of these below.
Tables are the base of BigQuery. They contain the data to be analyzed.
Datasets are groups of tables.
Projects are where you manage groups of datasets.
Testing out BigQuery in the GCP Console
Let’s take a look at what we can do in BigQuery. We’ll try out the following:
- Looking at datasets and tables
- Issuing queries
- Making a dataset
I’ll do my best to explain each of these things in that order.
Looking at datasets and tables
First, open up the menu from the GCP dashboard and click BigQuery.
This will bring up the BigQuery console screen.
Within BigQuery, datasets can be set to public or private. We’ll start off by looking at a public dataset Google provides.
On the left side, choose bigquery-public-data:hacker_news. This should bring up a screen like the one shown below. Here you can see a description of the dataset, its details, and a list of the tables it contains.
Let’s take a look at the tables now. Choose comments from the list its table details will appear. The screenshot below shows the table’s Schema. Here you can see the various components of the table.
Starting from the left we have: item names, format, whether the item is nullable or not, and its explanation.
Next, choose Details.
Here you’ll find the table’s ID, file size, and other information.
Finally, let’s take a look at the Preview tab. In this section, we can see a sampling of some of the table’s contents. This is convenient when you don’t know exactly what kind of data is contained in a table.
Issuing a query
Next, let’s try issuing a query. Queries allow us to analyze the data in tables; finding the largest values, totaling sums of data, and much more.
First, click the Compose Query button on the upper-left of the screen.
Doing so will bring up the editor screen as shown below.
We can write our queries here. For now, try writing:
SELECT MAX(length(text)) as max_length_comments FROM [bigquery-public-data:hacker_news.comments]
This query analyzes the “text” column based on length of character strings and returns the number of characters in the longest string.
For more details about queries, see Google's reference documentation.
Click Run Query and we’ll confirm if our query worked correctly.
If everything was entered properly, it should have returned the length of the longest entry.
Making a dataset
Now we’ll make our own dataset and tables. For the data of the table, we’ve created a file, sample.csv, with the following contents regarding school grades (note that files used must be encoded as UTF-8).
name,sex,language_score,math_score,social_score,science_score Trevor,M,81,41,59,33 Hannah,F,97,99,47,39 Stan,M,93,98,59,99 Michael,M,83,58,97,37 Jane,F,32,39,67,54 Kingsley,M,78,78,58,87 Katie,F,89,39,44,65 Ernie,M,88,85,59,98 Terry,M,38,100,95,81 Nicolas,M,87,87,48,83 Rolanda,F,70,97,55,30 Bellatrix,F,53,42,96,90
To make a dataset, click the button with the triangle icon and choose Create new dataset.
Doing this will bring up a form to set the dataset’s name. You can enter whatever you like. I’ve gone with “my_school” for this tutorial.
Click OK, then hover your cursor over my_school and click the + button. This should bring up the following screen:
Fill it out as shown below.
Starting from the top, upload the csv file, fill out the table name, input the schema details (name, sex, language_score, math_score, social_score, and science_score) and set Header rows to skip to 1 (the first row has the item names, so we want to skip that).
Click Create Table and our new table should appear within our dataset.
That’s it for today’s tutorial on using BigQuery within GCP.
Using BigQuery from MAGELLAN BLOCKS
Now that we’ve created a dataset and table, we’re ready to use BigQuery from MAGELLAN BLOCKS. We’ll go through that process in the next part of this series. See you then.