Creating models and making predictions in the Data Editor

Overview

You can use the Data Editor to create machine learning models and make predictions with data stored in the Data Editor.

The Data Editor currently supports the following types of models:

Model Explanation
Regression

This model predicts for numerical values.

Example use cases include using weather/day of the week data to train a model to predict for customer or sales numbers.

Classification

This model classifies data into categories.

Example use cases include predicting whether credit care usage is normal or fraudulent and predicting which users are likely to register for membership campaigns.

Clustering

This model groups data into a set number of clusters based on similarities.

It can be used for things like market analysis and computer vision.

There are two types of classification and regression models that you can create: ones made with the Model Generator and one made without the Model Generator.

Models created with the Model Generator can be used to solve more complicated problems, but as a result, training the model takes a longer time. Models created without the Model Generator can be trained more quickly, but are more suited for simple problems.

We’ll refer to models created with the Model Generator as “Model Generator versions” to distinguish them.

Some advantages to using the Data Editor machine learning features include:

  • All you need to do is create your data in the Data Editor. No difficult machine learning expertise required.
  • You can prepare your training data, create a model, and evaluate your model all within the Data Editor.

Regression model (Model Generator version) example

This section will use the electricity demand data from Demand forecasting with the regression model to demonstrate how to create a regression model (Model Generator version) from the Data Editor:

Preparing the data

We’ll start by preparing the data we’ll use to train and test the model, which is the same data used in the Demand forecasting with the regression model tutorial.

If you haven’t followed followed that tutorial and need to prepare the data, complete the steps in the Preparing the data as a CSV file and Splitting the data in the Data Editor sections.

Creating the model

Once your training data is ready, create the model by doing the following:

Opening the editor for the training data
  1. Click Electricity Demand_train.
Selecting to create a regression (model generator version) model
  1. Click Create Model.
  2. Click Regression.
  3. Enter a name for the model.
  4. Designate the Google Cloud Storage (GCS) folder that will store your models (only if it’s the first time you are creating a Model Generator).
  5. Click Create Folder (only if it’s the first time you are creating a Model Generator).
  6. Select Automatic Setup or Manual Setup:
    • Automatic Setup: Automatically sets the maximum time until timeout to 30 minutes and the maximum number of trials to 20.
    • Manual Setup: Allows you to configure the the maximum time until timeout, maximum number of trials, and training data settings before creating the model.
    • Manual setup example

      The training may stop itself before the maximum time before timeout if its results worsen drastically. The actual training time may differ from the estimated maximum training time.

  7. Click Create Model.

You can check on the progress of the model’s training by doing the following:

The training started message
  1. Click Close.
Returning to the home screen
  1. Click <.
Checking the model’s details
  1. Click Models.
  2. View the progress bar to see the training’s progress.
  3. Click the refresh icon to refresh the progress bar.
  4. Click the name to view the model’s details page.

The following is an example of a model’s details:

Checking the model’s details (2)
  1. In the Schema tab, you can view the model’s schema information (column names and types).
  2. Click < to return to the model list.
Viewing the menu for a model from the model list

You can click a model’s menu icon () (❶) to open a menu with the following options:

  • Change name
  • Stop training
  • Delete

Making predictions

Once the model is created, you can use it with the testing data to make predictions by doing the following:

You can also make predictions from a Flow Designer, which also allows you to set automated schedules and run batch predictions. For more information about making predictions from a Flow Designer, refer to Using Flow templates or the classification/regression tutorial pages (classification predictions/regression predictions).

Open the testing data by doing the following:

Opening the testing data
  1. Click Data.
  2. Click Electricity Demand_test.

This will open the testing data in the editor.

Selecting the model and making predictions
  1. Click Predict.
  2. Select Predict Electricity Demand Model (Model Generator).
  3. Click Predict.
Viewing the prediction results

The prediction results will appear after a moment.

The following chart explains the meaning of each column in the results:

Column Explanation
output

The predicted value.

key
high_temperature
low_temperature
sunlight_hours
average_humidity
daytime_minutes

The testing data.

Regression example

This section will use the electricity demand data from Demand forecasting with the regression model to demonstrate how to create a regression model from the Data Editor:

Preparing the data

We’ll start by preparing the data we’ll use to train and test the model, which is the same data used in the Demand forecasting with the regression model tutorial.

If you haven’t followed followed that tutorial and need to prepare the data, complete the steps in the Preparing the data as a CSV file and Splitting the data in the Data Editor sections.

Creating the model

Once your training data is ready, create the model by doing the following:

Selecting the training data
  1. Click Electricity Demand_train.
Creating a regression model
  1. Click Create Model.
  2. Click Regression.
  3. Set the Model Generator setting to OFF.
  4. Change the name to Predict Electricity Demand Model.
  5. Click Advanced Settings:
    You can configure which columns to use from your training data.
  6. Click Create Model.

Wait for a bit until BLOCKS finishes creating the model.

Checking the model

Once the model is created, you can check its contents by doing the following:

Returning to the home screen
  1. Click <.
Switching to the model list
  1. Click Models.
Switching to the model details page
  1. Click Predict Electricity Demand Model.
Checking the model’s schema info

You can check the model’s schema information (column names and types) from the model details screen.

Checking the training information

Click Trainings to view the following information about the model’s training:

  • Training iteration count: The iteration number of the training.
  • Training data loss: The calculated loss (Root Mean Square Error) for the training.
  • Evaluation data loss: The loss for the evaluation data.
  • Completion time (sec): The elapsed time for the training iteration.
  • Learning rate: The learning rate for the training iteration.
Checking the training data features information

Click Training data features (❶) to view information for each feature in the training data.

  • Column name: Shows the name of each training data column.
  • Min: Shows the minimum value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Max: Shows the maximum value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Mean: Shows the mean value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • STDDEV: Shows the standard deviation for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Category count: Shows the number of categories. Shows NULL if the column doesn’t contain categorical data.
  • NULL count: Shows the number of NULL entries.
Viewing the feature weights

Click Weights to view the weight of each column.

If the model’s results are not good enough, you can reevaluate your training data features and create a new models until you create one that meets your needs.

Making predictions

If there are no problems with the contents of your model, you can use it to make predictions by doing the following:

Returning to the model list
  1. Click <.
Switching to the data list
  1. Click Data.
Clicking the testing data
  1. Click Electricity Demand_test.
Switching to the predict tab
  1. Click Predict.
Selecting the model and making predictions
  1. Click Predict Electricity Demand Model.
  2. Click Predict.
Viewing the prediction results

The results will appear after a bit of time.

You can click Download to download a CSV file of the results.

  • If there are multiple pages of results, clicking download will only download the results from the pages you have loaded in the Data Editor.

    For example, if there were three pages of results and you clicked to view page two, and then you clicked Download, the download would include the results from pages one and two.

Classification (Model Generator version) example

This section will use the iris data from Categorize data with the classification model to demonstrate how to use a classification (Model Generator version) model.

Preparing the data

We’ll start by preparing the data we’ll use to train and test the model, which is the same data used in the Categorize data with the classification model tutorial.

If you haven’t followed followed that tutorial and need to prepare the data, complete the steps in the Preparing the data as a CSV file and Splitting the data in the Data Editor sections.

Creating the model

Once your training data is ready, create the model by doing the following:

Opening the training data in the editor
  1. Click Iris Classification Data_train.
モデル作成タブから数値回帰のモデルを作成する様子
  1. Click Create Model.
  2. Click Classification.
  3. Enter a name for the model.
  4. Designate the Google Cloud Storage (GCS) folder that will store your models (only if it’s the first time you are creating a Model Generator).
  5. Click Create Folder (only if it’s the first time you are creating a Model Generator).
  6. Select Automatic Setup or Manual Setup:
    • Automatic Setup: Automatically sets the maximum time until timeout to 30 minutes and the maximum number of trials to 20.
    • Manual Setup: Allows you to configure the the maximum time until timeout, maximum number of trials, and training data settings before creating the model.
    • Manual setup example

      The training may stop itself before the maximum time before timeout if its results worsen drastically. The actual training time may differ from the estimated maximum training time.

  7. Click Create Model.

You can check on the progress of the model’s training by doing the following:

The training started message
  1. Click Close.
Returning to the home screen
  1. Click <.
Checking the model’s details
  1. Click Models.
  2. View the progress bar to see the training’s progress.
  3. Click the refresh icon to refresh the progress bar.
  4. Click the name to view the model’s details page.

The following is an example of a model’s details:

Checking the model’s details (2)
  1. You can click Schema to view the model’s schema information.
  2. Click < to return to the model list.
Viewing the menu for a model from the model list

You can click a model’s menu icon () (❶) to open a menu with the following options:

  • Change name
  • Stop training
  • Delete

Making predictions

Once the model is created, you can use it with the testing data to make predictions by doing the following:

You can also make predictions from a Flow Designer, which also allows you to set automated schedules and run batch predictions. For more information about making predictions from a Flow Designer, refer to Using Flow templates or the classification/regression tutorial pages (classification predictions/regression predictions).

Open the testing data by doing the following:

Opening the testing data
  1. Click Data.
  2. Click Iris Classification Data_test.

This will open the testing data in the editor.

Selecting the model and making predictions
  1. Click Predict.
  2. Click Iris Classification Data_train_model.
  3. Click Predict.
Viewing the prediction results

The prediction results will appear after a moment.

The following chart explains the meaning of each column in the results:

Column Explanation
label

The predicted label.

For this example, the possible labels are Iris-virginica, Iris-setosa, or Iris-versicolor.

label_index

The index number for the list of values in the score column that corresponds with the value in the label column. A 0 corresponds with the first value in the score column, a 1 with the second, and so on.

For the example, the values in the score column are ordered as Iris-virginica, Iris-setosa, and Iris-versicolor. As such, 0 refers to Iris-virginica, 1 refers to Iris-setosa, and 2 refers to Iris-versicolor.

score

The confidence level for predicting each label.

For this example, the values are ordered as Iris-virginica, Iris-setosa, and Iris-versicolor.

key
sepal_length
sepal_width
petal_length
petal_width

The testing data.

Classification example

This section will use the iris data from Categorize data with the classification model to demonstrate how to use a classification model.

Preparing the data

We’ll start by preparing the data we’ll use to train and test the model, which is the same data used in the Categorize data with the classification model tutorial.

If you haven’t followed followed that tutorial and need to prepare the data, complete the steps in the Preparing the data as a CSV file and Splitting the data in the Data Editor sections.

Creating the model

Once your training data is ready, create the model by doing the following:

Opening the training data in the editor
  1. Click Iris Classification Data_train.
Creating a classification model
  1. Click Create Model.
  2. Click Classification.
  3. Set the Model Generator setting to OFF.
  4. Change the name to Iris Classification Model.
  5. Click Advanced Settings:
    You can configure which columns to use from your training data.
  6. Click Create Model.

Wait for a bit until BLOCKS finishes creating the model.

Checking the model

Once the model is created, you can check its contents by doing the following:

Returning to the home screen
  1. Click <.
Switching to the model list
  1. Click Models.
Switching to the model details page
  1. Click Iris Classification Model.
Checking the model’s schema info

You can check the model’s schema information (column names and types) from the model details screen.

Checking the training information

Click Trainings to view the following information about the model’s training:

  • Training iteration count: The iteration number of the training.
  • Training data loss: The calculated loss (Root Mean Square Error) for the training.
  • Evaluation data loss: The loss for the evaluation data.
  • Completion time (sec): The elapsed time for the training iteration.
  • Learning rate: The learning rate for the training iteration.
Checking the training data features information

Click Training data features (❶) to view information for each feature in the training data.

  • Column name: Shows the name of each training data column.
  • Min: Shows the minimum value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Max: Shows the maximum value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Mean: Shows the mean value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • STDDEV: Shows the standard deviation for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Category count: Shows the number of categories. Shows NULL if the column doesn’t contain categorical data.
  • NULL count: Shows the number of NULL entries.
Viewing the feature weights

Click Weights to view the weight of each column.

If the model’s results are not good enough, you can reevaluate your training data features and create a new models until you create one that meets your needs.

Making predictions

If there are no problems with the contents of your model, you can use it to make predictions by doing the following:

Returning to the home screen
  1. Click <.
Switching to the data list
  1. Click Data.
Opening the editor for the testing data
  1. Click Iris Classification Data_test.
Switching to the predict tab
  1. Click Predict.
Running a prediction
  1. Click Iris Classification Model.
  2. Click Predict.
Viewing the prediction results

The results will appear after a bit of time.

You can click Download to download a CSV file of the results.

  • If there are multiple pages of results, clicking download will only download the results from the pages you have loaded in the Data Editor.

    For example, if there were three pages of results and you clicked to view page two, and then you clicked Download, the download would include the results from pages one and two.

  • The result column for each label (predicted_label_probs) is not included.

Clustering example

This section will use the iris data from Categorize data with the classification model to demonstrate how to use a clustering model.

Preparing the data

We’ll start by preparing the data we’ll use to train and test the model, which is the same data used in the Categorize data with the classification model tutorial.

If you haven’t followed followed that tutorial and need to prepare the data, complete the steps in the Preparing the data as a CSV file and Splitting the data in the Data Editor sections.

Creating the model

Once your training data is ready, create the model by doing the following:

Opening the training data in the editor
  1. Click Iris Classification Data_train.
Creating a clustering model
  1. Click Create Model.
  2. Click Clustering.
  3. Change the name to Iris Clustering Model.
  4. Set the number of clusters to 3.
  5. Click Advanced Settings.
  6. Change the class column to Don’t Use.
  7. Click Create Model.

Wait for a bit until BLOCKS finishes creating the model.

Clustering is a type of unsupervised machine learning, which is why we don’t include the labels (the class column) when training the model.

Checking the model

Once the model is created, you can check its contents by doing the following:

Returning to the home screen
  1. Click <.
Switching to the model list
  1. Click Models.
Switching to the model details page
  1. Click Iris Clustering Model.
Checking the model’s schema info

In the Schema tab, you can view the model’s schema information (column names and types).

Checking the training information

Click Trainings to view the following information about the model’s training:

  • Training iteration count: The iteration number of the training.
  • Training data loss: The calculated loss (Root Mean Square Error) for the training.
  • Cluster centroid ID: IDs for each cluster centroid.
  • Cluster radius: The cluster’s radius.
  • Cluster size: The size of the cluster.
  • Completion time (sec): The amount of time for each training.
Checking the training data features information

Click Training data features (❶) to view the following information for each feature in the training data:

  • Column name: Shows the name of each training data column.
  • Min: Shows the minimum value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Max: Shows the maximum value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Mean: Shows the mean value for each training data column. Shows NULL if the column doesn’t contain numbers.
  • STDDEV: Shows the standard deviation for each training data column. Shows NULL if the column doesn’t contain numbers.
  • Category count: Shows the number of categories. Shows NULL if the column doesn’t contain categorical data.
  • NULL count: Shows the number of NULL entries.
Checking the cluster centroid information

Click Cluster centroids to view the following information:

  • centroid_id: IDs for the cluster centroids.
  • feature: The column names.
  • numerical_value: If the feature contains numbers, this shows a value for the centroid. This contains NULL for non-numerical types.
  • categorical_value.category: If the feature contains STRING-type data, this shows the string that is the centroid. This will be blank for numerical-type columns.
  • categorical_value.value:The level of confidence for the centroid in categorical_value.category shown as a value between 0–1.

Making predictions

If there are no problems with the contents of your model, you can use it to make predictions by doing the following:

Returning to the home screen
  1. Click <.
Switching to the data list
  1. Click Table.
Opening the editor for the testing data
  1. Click Iris Classification Data_test.
Switching to the predict tab
  1. Click Predict.
Running a prediction
  1. Click Iris Clustering Model.
  2. Click Predict.
Viewing the prediction results

The results will appear after a bit of time.

You can click Download to download a CSV file of the prediction results.

  • If there are multiple pages of results, clicking download will only download the results from the pages you have loaded in the Data Editor.

    For example, if there were three pages of results and you clicked to view page two, and then you clicked Download, the download would include the results from pages one and two.

  • The NEAREST_CENTROIDS_DISTANCE column is not included in the downloaded data.