Using the Data Editor to train ML models and make predictions

Overview

This feature is currently in beta and may change without notice in the official release. As a beta release, there is also the possibility that some functions may not work as intended. We appreciate your feedback regarding bugs or ways to improve MAGELLAN BLOCKS.

You can use data in the Data Editor to create machine learning models and make predictions all from within the Data Editor. These machine learning models are different from the ones you can train with a Model Generator.

The Data Editor currently supports the following models:

Model Explanation
Linear regression (regression)

Use this model to predict a numerical value. For example, you could train a model with input features like weather and day of the week to predict for sales or event attendees.

Logistic regression (classification)

Use this model to predict which category the data belongs to. For example, you could train a model to predict whether or not individuals are likely to join a membership campaign.

The advantages of the Data Editor machine learning features include:

  • All you need to do to use machine learning is create your data in the Data Editor. No difficult machine learning expertise required.
  • You can prepare your training data, create a model, and evaluate your model all within the Data Editor.

Example linear regression (regression) model:

We’ll use the same data as that created in the Model Generator How-To: Regression page for a model for predicting electricity demand.

If you want to follow along with this guide with the same data, you can follow the steps in the Model Generator How-To: Regression guide to create the data.

Preparing the data

We’ll start by preparing the data to train our model.

We’re going to use the same electricity demand data as from the Model Generator How-To: Regression page. That page splits the training data into a training set and a validation set, but we’ll just use the data directly from BigQuery before that split.

First, import the data into the Data Editor.

Importing the data into the Data Editor

Configure your import settings as follows:

Item Value
Import source

Select Google BigQuery.

GCP service account

Select the GCP service account that can access your table.

Dataset ID

Select the dataset that contains the table with your training data. In our case, we will use the ID blocks_ml_juyo.

Table ID

Select the BigQuery table that contains your training data. In our case, we will use the ID juyo_all_data_2016_daytime.

Name

Enter a name that will help you identify this data in the Data Editor. We will use Electric Demand Training Data.

We’ve also prepared the following CSV data to use for making predictions.

max_temp,min_temp,hours_of_sunshine,humidity_avg,daytime_time
13.8,2,8.9,59,588
13.3,3.8,6.5,65,588
13.7,3.5,8.2,54,589
14,3.6,8.7,48,590

Import this data into the Data Editor, too.

Importing the prediction data into the Data Editor

Configure the import settings as follows:

Item Value
Import source

Select Upload.

GCP service account

Select the same GCP service account you used when importing the training data.

GCS URL

Select the Google Cloud Storage (GCS) location that will store the prediction data CSV file.

Uploading locally-stored files to the Data Editor works by way of storing the files into GCS.

File

Select the file you want to upload. We’ll use a file called elec_demand_predict.csv in this example.

Schema

Select Automatically detect schema.

CSV delimiter character

Select comma.

Number of skipped rows

Since the CSV data contains one header row, set the number of rows to skip to 1.

Dataset ID

Select the BigQuery dataset for the prediction data. We’ll use the ID blocks_ml_juyo (same as the training data).

Table ID

Enter an ID for the BigQuery table for the prediction data. We’ll use the ID elec_demand_predict.

Name

Enter a name to identify this data in the Data Editor. We’ll use Electric Demand Prediction Data.

Creating the model

Once the data is imported, we’ll create the model.

Creating a model in the MAGELLAN BLOCKS Data Editor: Step 1

Find your training data in the data list (Electric Demand Training Data) and click the menu icon () on its right. Click Create Query Model.

Creating a model in the MAGELLAN BLOCKS Data Editor: Step 2

Configure settings for the model and click Create. We’ll configure our example model as follows:

Item Value
Name

Enter a name to identify this model in the Data Editor. We’ll use Electric Demand Model.

Model ID

Enter a name for your model ID. We’ll use elec_demand_model.

Model type

Select the type of model. We’ll leave this as the default setting of Linear Regression (Regression).

Training data settings

Select whether or not to use each column in the data as input features for training the model. You can also change the data type per column.

We’ll change the column date to Unuse and leave the rest as their default settings (used_sum as the Target and the rest as Use).

Click Create. It will take a little bit of time to create the model.

Checking the model

Once the Data Editor finishes creating the model, we can view its contents and evaluate it.

Evaluating a Data Editor model: Step 1

Click Models to open the model list.

Evaluating a Data Editor model: Step 2

Click the model you want to evaluate from the model list. We’re using the model called Electric Demand Model.

Evaluating a Data Editor model: Step 3

On the first page of the model details screen, you can check the schema settings for the model (column names and data types).

Evaluating a Data Editor model: Step 4

Click Trainings to view information on the model’s trainings.

  • Training iteration count: shows each training iteration.
  • Training data loss: The calculated loss (Root Mean Square Error) for the training.
  • Evaluation data loss: The loss for the evaluation data.
  • Completion time (sec): The elapsed time for the training iteration.
  • Learning rate: The learning rate for the training iteration.
Evaluating a Data Editor model: Step 5

Click Training data features to view information for each feature in the training data.

  • Column name: Shows the name of each training data column.
  • Min: Shows the minimum value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Max: Shows the maximum value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Mean: Shows the mean value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • STDDEV: Shows the standard deviation for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Category count: Shows the number of categories. Shows NULL if the column doesn’t contain categorical data.
  • NULL count: Shows the number of NULL entries.
Evaluating a Data Editor model: Step 6

Click Weights to view the weights for each column in the training data.

If the results for the model aren’t good enough for your needs, review the input features (columns) in your training data and recreate your model. Repeat this process until you get a sufficiently accurate model.

Making predictions

If there are no problems with the model, we can use it to make predictions.

Making predictions with the Data Editor: Step 1

From the table list, click your prediction data (Electric Demand Prediction Data).

Making predictions with the Data Editor: Step 2

Click Predict with Query Model.

Making predictions with the Data Editor: Step 3

Select the model you’ll use for the prediction (Electric Demand Model) then click Predict.

Making predictions with the Data Editor: Step 4

The results will be shown in the data editor once the prediction finishes.

Click Download next to Save to download a CSV file of the prediction results.

  • If the prediction results span multiple pages, clicking Download will download data for the pages up to and including the page that is currently displayed.

    For example, if there were three pages of results and you were looking at page two, clicking Download would download the data from pages one and two.

That’s all it takes to train models and make predictions with data in the Data Editor — no technical expertise required.

Example logistic regression (classification) model

In this section, we’ll use the same iris data as the Model Generator How-To: Classification to introduce how to create and use a logistic regression (classification) model in the Data Editor.

If you want to follow along with the steps shown below to actually train a model, follow the steps shown in the Model Generator How-To: Classification to create the training data.

Preparing the data

First, we’ll prepare the data we’ll use for training the model.

We’ll follow the steps shown in the Model Generator How-To: Classification page to prepare the iris training data in the Data Editor.

We also need to prepare data to use for predictions. For this, we’ll use the following CSV data:

sepal_length,sepal_width,petal_length,petal_width
5.9,3.0,4.2,1.5
6.9,3.1,5.4,2.1
5.1,3.3,1.7,0.5

We’ll import this CSV file into the Data Editor.

Importing the iris classification prediction data

Configure the import settings as shown below:

Item Value
Import source

Select Upload.

GCP service account

Select the GCP service account used to create the iris data.

GCS URL

Select the Google Cloud Storage(GCS)that will store the iris prediction data.

When importing a locally-stored file into the Data Editor, it stores it into a GCS location.

File

Select the prediction data file. We’re using the file iris_predict.csv which we created from the CSV data shown earlier.

Schema

Select Automatically select schema.

CSV delimiter character

Select comma.

Number of skipped rows

Since there is one header row in the CSV data, set the number of rows to skip to 1.

Dataset ID

Designate the BigQuery dataset that will store the prediction data. We’ll use doc_samples.

Table ID

Designate the BigQuery table that will store the prediction data. We’ll use iris_predict.

Name

Enter a name to identify your prediction data in the Data Editor. We’ll use Iris Classification Prediction Data.

Creating the model

Once our data is ready, we’ll create our model.

Creating a classification model: Step 1

Find your training data in the table list (Iris Classification (Cleansed)) and click the menu icon () to its right. Click Create Query Model.

Creating a classification model: Step 1

Configure each item and click Create.

We’ll configure our settings as shown below:

Item Value
Name

Enter a name to identify your model in the Data Editor. We’ll use Iris Classification Model.

Model ID

Enter an ID for your model. We’ll use iris_model.

Model type

Select the type of model. We’ll use Logistic Regression (Classification).

Training data settings

Select whether or not to use each column in the data as input features for training the model. You can also change the data type per column.

We’ll leave each item as their default value.

Click Create. It will take a bit of time for the Data Editor to create the model.

Checking the model

Once the Data Editor finishes creating the model, we can view its contents and evaluate it.

Evaluating the classification model: Step 1

Click Models to open the model list.

Evaluating the classification model: Step 2

Click the model you want to evaluate from the model list. We’re using the model Iris Classification Model.

Evaluating the classification model: Step 3

On the first page of the model details screen, you can check the schema settings for the model (column names and data types).

Evaluating the classification model: Step 4

Click Trainings to view information on the model’s trainings.

  • Training iteration count: shows each training iteration.
  • Training data loss: The calculated loss (Root Mean Square Error) for the training.
  • Evaluation data loss: The loss for the evaluation data.
  • Completion time (sec): The elapsed time for the training iteration.
  • Learning rate: The learning rate for the training iteration.
Evaluating the classification model: Step 5

Click Training data features to view information for each feature in the training data.

  • Column name: Shows the name of each training data column.
  • Min: Shows the minimum value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Max: Shows the maximum value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Mean: Shows the mean value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • STDDEV: Shows the standard deviation for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Category count: Shows the number of categories. Shows NULL if the column doesn’t contain categorical data.
  • NULL count: Shows the number of NULL entries.
Evaluating the classification model: Step 6

Click Weights to view the weights for each column in the training data.

If the results for the model aren’t good enough for your needs, review the input features (columns) in your training data and recreate your model. Repeat this process until you get a sufficiently accurate model.

Making predictions

If there are no problems with the model, we can use it to make predictions.

Making predictions with the classification model: Step 1

From the table list, click your prediction data (Iris Classification Prediction Data).

Making predictions with the classification model: Step 2

Click Predict with Query Model.

Making predictions with the classification model: Step 3

Select the model you’ll use for the prediction (Iris Classification Model) then click Predict.

Making predictions with the classification model: Step 4

The results will be shown in the data editor once the prediction finishes.

Click Download next to Save to download a CSV file of the prediction results.

  • If the prediction results span multiple pages, clicking Download will download data for the pages up to and including the page that is currently displayed.

    For example, if there were three pages of results and you were looking at page two, clicking Download would download the data from pages one and two.

With the Data Editor, you can quickly and easily train and use models without the need for technical expertise.