Hello, everyone. I'm Satoshi, a consultant at Groovenauts.
I recently tried out the new BigQuery ML service and wanted to share my experience.
With BigQuery ML, you can perform linear regressions and logistic regressions straight from BigQuery, Google’s data warehouse service. Since it’s BigQuery, we can expect the part that creates the model to process in parallel and with fast speeds.
Up until now, you may have done linear regressions with R or Python, or with an Excel add-on (with a 16 variable limit). But now, as long as you can write a query in BigQuery, you don’t need to program anything. This should be convenient for data scientists.
To test out BigQuery ML, I tried making a linear regression model to predict electricity demand. I formatted my source data as follows:
・Month (1–12 entered as STRINGS)
・Day of the week (0–6 entered as STRINGS)
・Holiday (0–1 entered as STRINGS)
・Weather (clear, cloudy, rain, snow)
・Low temperature (FLOAT)
・High temperature (FLOAT)
・Daytime hours (INTEGER)
・Electricity usage (INTEGER)
Training data: 2016/04/01–2018/05/31
Prediction data: 2018/06/01–2018/06/16
Its contents were as follows:
Creating the model
Next, I’ll use the data to make the model. It feels a bit strange to train a model with a query! It seems that you need to extract the target variable with the item name, “label”.
# Create the model create model `linear_reg.elec_model` options(model_type='linear_reg') as select month ,weekday ,holiday ,weather ,min_temp ,max_temp ,rainfall ,daytime ,actual as label from `linear_reg.elec_data` where key between cast('2016-04-01' as date) and cast('2018-03-31' as date) ;
Once a model finishes, you can click Go to model in the bottom left to bring up the model’s details screen.
Next, I used queries to look at some information about the model:
# Training info select * from ml.training_info(model `linear_reg.elec_model`);
# Feature info select * from ml.feature_info(model `linear_reg.elec_model`);
# Weight info select * from ml.weights(model `linear_reg.elec_model`);
There is more info for the weights at the bottom, too:
This is great. It even calculates the weights for each of the categories.
Naturally, you also use queries for making predictions. When I made my predictions, I also compared them with the actual results at the same.
# Predict and compare results with elec_predict as ( select * from `linear_reg.elec_data` where key between cast('2018-06-01' as date) and cast('2018-06-16' as date) ) select predict.key as date ,predicted_power ,actual_power ,abs(predicted_power - actual_power) as error_power from ( select key,predicted_label as predicted_power from ml.predict(model `linear_reg.elec_model`, table elec_predict) ) as predict join ( select key,actual as actual_power from elec_predict ) as actual on predict.key = actual.key order by 1 ;
BigQuery shows you the results, and you can also export them directly into Data Studio, which is quite convenient.
Being able to create regression models and make predictions with just queries is great, isn’t it? The data I used was only a few hundred rows, but with BigQuery ML, you could easily handle data with tens of thousands of rows.
BigQuery ML is another impressive service from Google. We’ll definitely be looking into ways to connect BigQuery ML to our service, MAGELLAN BLOCKS.