ML Board Help

ML Board Help

ML Board Help

Introduction

This page will explain each of the ML Board creation and training screens. Clicking on any of the help links within those screens will link to a section of this document.

For more detailed information on how to use the ML Board, refer to the following pages:

   ML Boards are currently in alpha. As such, trained models created during the alpha may not be usable in the beta and final releases. Please remake trained models in subsequent releases.

What is an ML Board?

ML Boards are a MAGELLAN BLOCKS feature designed to make machine learning simple and accessible for everyone.

In basic terms, machine learning refers to computers using past experience (training) to make inferences like a human brain does.

For example, you may have heard people say things like, “I know it’s going to rain when my joints hurt.” They've likely had past experiences of their joints hurting when it rained that they are using to base this prediction on.

In machine learning, we mimic this process in two steps: training and prediction.

  1. During the training step, past data is used to train the predictive model.

    Using the previous example, past data could refer to information about dates when joints hurt and whether or not it rained. However, since there’s a lot of individual difference in judging joint pain, we might be better off using objective data that seems related like atmospheric pressure and temperature. Choosing the right training data is very important, as it has a large influence on prediction accuracy.

  2. In the prediction step, the training results are used to make predictions for future data.

    Future data refers to data more recent than that used in the training step. In our example, this could be atmospheric pressure and temperature data from today that we would use to predict if it will rain tomorrow.

The ML Board takes care of the training step, while the prediction step is done using the Cloud ML predict BLOCK on a Bigdata Board.

The training step is referred to as simply “training” within the ML Board. The training data used during this step consists of a training set and a validation set. These are used to train and then optimize the predictive model, leading to more accurate results.

The training set and validation set are made from dividing up the data appropriately.

The ML Board uses Google Cloud Machine Learning (Cloud ML) to implement its machine learning functions.

Create ML Board

This screen contains settings for the new ML Board’s name and training type.

It’s best to choose ML Board names that are easy to remember, as this cannot be changed once the Board is created.

Available training types are as follows:

Type Explanation
Classification type Classifies values into multiple categories. For example, using numerical data such as petal length and color to predict type of flower.
Regression Expresses a continuous relationship between values (a statistical regression). Used to predict level of demand, attendees, and the like.

GCP Service Account Settings

Upload a JSON format GCP service account key on this screen.

Since the ML Board works by creating an environment within the user’s GCP project, it needs permission to use it. This is made possible through a GCP service account key file.

For more information about GCP service accounts, refer to Creating a Google Cloud Platform service account key

Once prepared, the GCP service account key can be uploaded in either of the following ways:

  • Drag and drop the file into the upload area.
  • Click “Select file” and choose the file to upload.

Storage Settings

This screen contains settings for the Google Cloud Storage (GCS) bucket and directory that will contain the training results.

Select GCS bucket

Prepare a bucket for exclusive use by ML Boards, then select this bucket from the menu on this screen. Use the following settings when creating the bucket:

Option Value
Storage class Regional
Location us-central1

GCS directory settings

One bucket can be used to support multiple ML Boards by creating a different directory for each Board. Directories do not need to be created in advance, as a new one with name indicated in this setting will be made automatically if it does not already exist.

Google Cloud Machine Learning Settings

This screen contains settings required for using the Google Cloud Machine Learning service. Each setting uses the Google Cloud Console and are listed below.

  1. Enable the Google Cloud Machine Learning and Google Cloud Resource Manager APIs
  2. Execute a command to initialize the Google Cloud Machine Learning service

Enable APIs

First, the Google Cloud Machine Learning and Google Cloud Resource Manager APIs must be enabled.

Do the following if there is a “Disabled” message next to the API’s name:

  1. Click on the “Google API Console” link.
  2. Click the “Enable” button at the top of the Google API Console screen.
  3. Once the “Enable” button changes to “Disable”, close the Google API Console and return to the ML Board setup screen.

After completing the above for each API, click the “Refresh API status” button and confirm that the “Disabled” message has changed to “Enabled”.

If you are not getting an enabled message, wait a bit of time and try pressing the refresh API status button again. Depending on the circumstances, this process can take a bit of time.

API を有効にする様子の動画

Initialize the Google Cloud Machine Learning service

Next, we need to make the Google Cloud Machine Learning service usable for the ML Board. This only needs to be done once per GCP project.

  1. Click the “Refresh” button within the Initialize Google Cloud Machine Learning section.

  2. If the refresh button does not change, do the following:

    1. Click the Google Cloud Console link.

    2. Click the Activate Google Cloud Shell button () at the top of the Google Cloud Console screen.

    3. If you see the following screen, click the launch Cloud Shell button.

    4. Input gcloud beta ml init-project into the black portion at the bottom of the Google Cloud Shell screen and press return.

      Respond Y to the Do you want to continue (Y/N)? message that will appear.

    5. Click the X button in the upper-right of the Google Cloud Shell to close it.

    6. Close the Google Cloud Console and return to BLOCKS.

    7. Click the refresh button in the Initialize Google Cloud Shell section again and confirm that it does not reappear.

      If the refresh button does reappear, try waiting a few moments and pressing it again. Depending on the circumstances, this process can take a bit of time.

Cloud Shell でコマンドを実行している様子の動画

Training Data Settings

Settings for the training data are configured on this screen.

Training data should be prepared as a CSV files with commas for field-separators.

  • To be used for training, the data should consist of a set of at least one “input variable” and a “results value”. In the classification model’s case, the results value refers to the “answer value”. It refers to the “actual results” for the regression model.
  • Align each row as a set of input data and the results value.
  • Order each row with the input data first, followed by the results value.
  • The results values must be of numerical value type.
  • The training set and validation set must formatted identically.

Input variable settings

Enter information for each input variable. The data for each input variable is referred to (starting from left to right) as “Item 1”, “Item 2”, Item 3”, etc.

    Do not enter information for the results value.

Click the “Add another item” button and enter a name and the type information for each input variable.

Item settings Explanation
Item name Enter a name for each item. Letters, numbers and the underscore symbol (_) may be used.
Type Designate the each item’s data type. The four supported types are: numerical value, month, day, and string. Refer to the chart below for details about each type.

Supported types:

Type Explanation
Numerical value Integers or decimal numbers. When the numerical value type is selected, the number of dimensions can also be configured. If a single item contains multiple numerical values in an enumerated list, dimensions refers to the number of these numerical values. For example, if we have the following data: 98,1.3,0,"A" and we want to treat the 98,1.3 portion as one item, we would choose 2 for the dimensions setting.
Month Integers indicating the month. The range can be either 0–11 or 1–12.
Day Integers indicating the day of the week. The range is 0–6.
String

String-type data. Can select between a “Keyword list” or “Approximate number”.

String type-specific settings Explanation
Keyword list Select this option when the strings have a clear pattern. Enter this pattern as a list separated by commas. For example, if the item contained strings for each blood type, I might enter “A, B, O, AB” for this setting.
Approximate number Select this option when the number of patterns in the strings is not clear. Enter the approximate number of patterns you expect to exist. When there is a clear pattern, you can enter a number in here instead of writing out a keyword list, if preferred.

Number of classes

When creating a classification model, enter the number of classes in the results values here.

Training

The following actions can be performed on this screen:

  • Start a training
  • See a list of trainings
  • Confirm a training’s details
  • Confirm the ML Board’s settings
  • Delete the ML Board

Start a training

Click the “Begin training” button to bring up the “Begin training” screen.

Here you can configure the required information for a new training and start it.

Training list

A list will be displayed when there is more than one training. The following information is contained in the training list:

Information Explanation
Training name Configured in the Begin training settings.
Training start time Shows the date and time when the training began.
Training end time Shows the date and time when the training finished.
Status Shows the training’s status. Possible statuses are: Preparing, Training, Succeeded, and Failed.
Calculation error Shows the results of evaluating the training. For the classification model, this refers to its accuracy in selecting the correct class. For the regression model, it refers to its generalization error in predicting values.
Details Displays a screen where the training’s details can be confirmed.
Apply Shows if a training’s results can be used. If a training has an apply button in this column, it’s results can be selected to be used for making predictions.

Training statuses are not updated automatically. Click the button in the upper-right to refresh the list.

Confirm a training’s details

You can confirm a training’s details by clicking it’s details button.

Item Explanation
Training name The name configured in the Begin training screen’s settings.
Training start time The date and time when the training began.
Training end time The date and time when the training finished.
Status Shows the training’s status. Possible statuses are: Preparing, Training, Succeeded, and Failed.
Calculation error Shows the results of evaluating the training. For the classification model, this refers to its accuracy in selecting the correct class. For the regression model, it refers to its generalization error in predicting values.
Training explanation Shows the explanation input in the Begin training screen’s settings. Displays an underscore (-) if no explanation was given.
Setting details Shows details of what was configured in training data settings

Confirm the ML Board’s settings

Click the “Setting info” button to bring up information about the ML Board’s settings.

Item Explanation
Board name The Board name configured on the Create ML Board screen.
Type The type selected on the Create ML Board screen.
GCP service account The ID of the GCP service account associated with the ML Board.
Google Cloud Machine Learning The model name of the training results. This is used when making predictions.
Training data settings The information configured on the Training Data Settings screen.

Delete Board

Delete the ML Board by clicking the “Delete Board” button at the bottom of the screen.

Begin training

Configure settings necessary to create and start a new training.

Setting Explanation
Training name Assign a name to the new training
Training set URL Designate the GCS URL for the training set (Example: gs://bucketname/filename.csv). The training set must be stored in GCS.
Validation set URL Designate the GCS URL for the validation set (Example: gs://bucketname/filename.csv). The validation set must be stored in GCS.
Max. time until timeout The ML Board’s training uses trial and error from performing several trials to get to the ideal training results. This setting configures the max amount of time that the ML Board will spend on one trial. Set this to 0 to set no limit.
Max. number of trials Set the maximum trials that the ML Board will run as a number 1 or greater.
Training explanation Write an explanation for this training. (optional)

GCP service charges

The ML Board creates an environment in the user’s GCP project that utilizes various GCP services.

As such, GCP service charges will apply separately from MAGELLAN BLOCKS fees. Applicable charges will vary depending on the service. For details, refer to the pricing page for each of the services used by the ML Board