Refine search

2018.7.26 - 

Trying out BigQuery ML

Hello, everyone. I'm Satoshi, a consultant at Groovenauts.

I recently tried out the new BigQuery ML service and wanted to share my experience.

With BigQuery ML, you can perform linear regressions and logistic regressions straight from BigQuery, Google’s data warehouse service. Since it’s BigQuery, we can expect the part that creates the model to process in parallel and with fast speeds.

Up until now, you may have done linear regressions with R or Python, or with an Excel add-on (with a 16 variable limit). But now, as long as you can write a query in BigQuery, you don’t need to program anything. This should be convenient for data scientists.

To test out BigQuery ML, I tried making a linear regression model to predict electricity demand. I formatted my source data as follows:

Input features
・Month (1–12 entered as STRINGS)
・Day of the week (0–6 entered as STRINGS)
・Holiday (0–1 entered as STRINGS)
・Weather (clear, cloudy, rain, snow)
・Low temperature (FLOAT)
・High temperature (FLOAT)
・Daytime hours (INTEGER)
Target value
・Electricity usage (INTEGER)

Training data: 2016/04/01–2018/05/31
Prediction data: 2018/06/01–2018/06/16

Its contents were as follows:

Creating the model

Next, I’ll use the data to make the model. It feels a bit strange to train a model with a query! It seems that you need to extract the target variable with the item name, “label”.

# Create the model
create model `linear_reg.elec_model`
options(model_type='linear_reg') as
 ,actual as label
from `linear_reg.elec_data`
where key between cast('2016-04-01' as date) and cast('2018-03-31' as date)

Once a model finishes, you can click Go to model in the bottom left to bring up the model’s details screen.

Next, I used queries to look at some information about the model:

# Training info
select * from ml.training_info(model `linear_reg.elec_model`);
# Feature info
select * from ml.feature_info(model `linear_reg.elec_model`);
# Weight info
select * from ml.weights(model `linear_reg.elec_model`);

There is more info for the weights at the bottom, too:

This is great. It even calculates the weights for each of the categories.

Making predictions

Naturally, you also use queries for making predictions. When I made my predictions, I also compared them with the actual results at the same.

# Predict and compare results
with elec_predict as (
  select * from `linear_reg.elec_data`
  where key between cast('2018-06-01' as date) and cast('2018-06-16' as date)
  predict.key as date
 ,abs(predicted_power - actual_power) as error_power
from (
  select key,predicted_label as predicted_power
  from ml.predict(model `linear_reg.elec_model`, table elec_predict)
) as predict
join (
  select key,actual as actual_power from elec_predict
) as actual
on predict.key = actual.key
order by 1

BigQuery shows you the results, and you can also export them directly into Data Studio, which is quite convenient.

Being able to create regression models and make predictions with just queries is great, isn’t it? The data I used was only a few hundred rows, but with BigQuery ML, you could easily handle data with tens of thousands of rows.

BigQuery ML is another impressive service from Google. We’ll definitely be looking into ways to connect BigQuery ML to our service, MAGELLAN BLOCKS.