Engineer's Blog

Engineer's Blog

Refine search

2016.5.30 - 

Getting started with GCP #2: Using BigQuery

In the last part of this series, we registered our GCP service account information into BLOCKS.

Now we can actually use BigQuery BLOCKs, but there are a few things we should know about BigQuery first.

In today’s post, I’ll introduce some of the main components of BigQuery. Then, we’ll try out using BigQuery within its web-based console. This should give you the basic understanding of BigQuery that you’ll need to start using BigQuery in BLOCKS.

 

Components of BigQuery

BigQuery is made up of 3 main parts: tables, datasets, and projects. I’ll explain a bit about each of these below.

Tables

Tables are the base of BigQuery. They contain the data to be analyzed.

Datasets

Datasets are groups of tables.

Projects

Projects are where you manage groups of datasets.

 

Testing out BigQuery in the GCP Console

Let’s take a look at what we can do in BigQuery. We’ll try out the following:

  1. Looking at datasets and tables
  2. Issuing queries
  3. Making a dataset

I’ll do my best to explain each of these things in that order.

Looking at datasets and tables

First, open up the menu from the GCP dashboard and click BigQuery.

This will bring up the BigQuery console screen.

BigQuery console screen

Within BigQuery, datasets can be set to public or private. We’ll start off by looking at a public dataset Google provides.

On the left side, choose bigquery-public-data:hacker_news. This should bring up a screen like the one shown below. Here you can see a description of the dataset, its details, and a list of the tables it contains.

Dataset summary

Let’s take a look at the tables now. Choose comments from the list its table details will appear. The screenshot below shows the table’s Schema. Here you can see the various components of the table.

Table schema

Starting from the left we have: item names, format, whether the item is nullable or not, and its explanation.

Next, choose Details.

Table details

Here you’ll find the table’s ID, file size, and other information.

Finally, let’s take a look at the Preview tab. In this section, we can see a sampling of some of the table’s contents. This is convenient when you don’t know exactly what kind of data is contained in a table.

Table preview

Issuing a query

Next, let’s try issuing a query. Queries allow us to analyze the data in tables; finding the largest values, totaling sums of data, and much more.

First, click the Compose Query button on the upper-left of the screen.

Compose Query

Doing so will bring up the editor screen as shown below.

Editor screen

We can write our queries here. For now, try writing:

 SELECT MAX(length(text)) as max_length_comments FROM [bigquery-public-data:hacker_news.comments]

This query analyzes the “text” column based on length of character strings and returns the number of characters in the longest string.

For more details about queries, see Google's reference documentation.

Click Run Query and we’ll confirm if our query worked correctly.

Run Query

If everything was entered properly, it should have returned the length of the longest entry.

Making a dataset

Now we’ll make our own dataset and tables. For the data of the table, we’ve created a file, sample.csv, with the following contents regarding school grades (note that files used must be encoded as UTF-8).

 name,sex,language_score,math_score,social_score,science_score
 Trevor,M,81,41,59,33
 Hannah,F,97,99,47,39
 Stan,M,93,98,59,99
 Michael,M,83,58,97,37
 Jane,F,32,39,67,54
 Kingsley,M,78,78,58,87
 Katie,F,89,39,44,65
 Ernie,M,88,85,59,98
 Terry,M,38,100,95,81
 Nicolas,M,87,87,48,83
 Rolanda,F,70,97,55,30
 Bellatrix,F,53,42,96,90

To make a dataset, click the button with the triangle icon and choose Create new dataset.

Create new dataset

Doing this will bring up a form to set the dataset’s name. You can enter whatever you like. I’ve gone with “my_school” for this tutorial.

my_school

Click OK, then hover your cursor over my_school and click the + button. This should bring up the following screen:

Click my_school

Fill it out as shown below.

Filling out my_school

Starting from the top, upload the csv file, fill out the table name, input the schema details (name, sex, language_score, math_score, social_score, and science_score) and set Header rows to skip to 1 (the first row has the item names, so we want to skip that).

Click Create Table and our new table should appear within our dataset.

Create table

That’s it for today’s tutorial on using BigQuery within GCP.

 

Using BigQuery from MAGELLAN BLOCKS

Now that we’ve created a dataset and table, we’re ready to use BigQuery from MAGELLAN BLOCKS. We’ll go through that process in the next part of this series. See you then.