ML Board Help

ML Board Help

ML Board Help

Introduction

This page explains the various screens for creating ML Boards and trainings. Each of those screens contains a help link that leads to a section of this page.

For more detailed instructions on how to use an ML Board, refer to the following pages:

What is an ML Board?

ML Boards are a MAGELLAN BLOCKS feature designed to make Machine Learning simple and accessible for everyone.

There are two basic steps to Machine Learning with BLOCKS: training and prediction.

  1. During the training step, past data is used to train the predictive model.

  2. In the prediction step, the trained model is used to make predictions for new data.

ML Boards are where the training step takes place. For predictions, you can use the ML Board prediction (online) BLOCK on a Big Data Board.

The training step is referred to as simply Training within the ML Board. Each training uses a training dataset and a validation dataset to get optimized learning results (a trained model).

The training and validation data sets are created by splitting up the past data prepared for the training.

The ML Board uses the Google Cloud Machine Learning Engine (Cloud ML Engine) to implement its Machine Learning functions.

GCP service account settings help

This section applies to Self-Service Plan users only.

You will select a GCP service account and enable all APIs required to use Google Cloud Machine Learning on this screen.

GCP service account settings

Select a GCP service account

Since the ML Board creates and uses an environment within the your GCP project, it needs to be given access permission. This is done with a GCP service account file.

Enable APIs

Do the following if there are any API that do not have a checkmark () next to their Check button:

  1. Click for the API that doesn’t have a .
  2. Click Enable at the top of the Google API Console screen.
  3. Once it changes to Disable, close the Google API Console and return to BLOCKS.

Once finished with the above, click Check and confirm that a checkmark appears for the API.

You may need to wait a bit of time for the checkmark to appear. If it does not appear, try clicking Check again and waiting for a moment. Depending on the circumstances, this process can take a bit of time.

If you see a , the issue may be one of the following:

  • The API is not enabled:
    Click the next to the API and confirm whether or not the API is enabled. If not, click Enable.
  • Your GCP service account role is not set to Editor:
    Open the menu () from the upper-left of the GCP console and select IAM & admin. From the IAM menu, confirm that your role is set to Editor. If not, change it to editor.
  • Billing is not enabled for your GCP project:
    Open the menu () from the upper-left of the GCP console and select Billing. Enable billing for your project, if you have not already done so.

Storage Settings

This section applies to Self-Service Plan users only.

In this section, you will configure settings for the Google Cloud Storage (GCS) bucket and directory that will contain the training results.

Storage settings

Select GCS bucket

Prepare a bucket for the ML Board to use, then select this bucket from the list. For best results, use the following settings when creating a bucket for an ML Board:

Option Value
Default storage class Regional
Location us-central1

GCS directory settings

You can have multiple ML Boards within one bucket by using a different directory for each Board. You don’t need to create directories in advance, as a new one will be made automatically with the name you enter in this setting.

Training data settings

You will configure settings for the training data on this screen.

Training data settings

Prepare your training data as CSV files (UTF-8 with no BOM ) with commas as the field delimiter character.

  • Training data should consist of a set of at least one input variable and a results value. In the classification model’s case, the results value refers to the “answer value”. It refers to the “actual result” for the regression model.
  • Align each row as a set of input variables and the results value.
  • Order each row with the input variables first, followed by the results value.
  • The results values must be of numerical value type.
  • The training set and validation set must formatted identically.

Input variable settings

Enter information for each input variable. The data for each input variable is referred to (starting from left to right) as “Item 1”, “Item 2”, Item 3”, and so on.

Do not enter information for the results value here.

Click Add another item and enter a name and the type for each input variable.

Setting Explanation
Item name Enter a name for each item using only letters, numbers, or underscores (_).
Type Designate each item’s type. The five supported types are: numerical values, months, days, strings (enumerated), and sequences. Refer to the chart below for details about each type.

Supported types:

Type Explanation
Numerical value Integers or decimal numbers. You can also configure the number of dimensions when this type is selected. Dimensions refer to the number of numerical values within an item. For example, if you had the following data: 98,1.3,0,"A" and wanted to treat the 98,1.3 portion as one item, you set the dimensions setting to 2.
Month Integers indicating the month. The range can be either 0–11 or 1–12.
Day Integers indicating the day of the week. The range is 0–6.
Strings (enumerated)

String data. You can further select between a Keyword list or Approximate number.

Strings (enumerated) type settings Explanation
Keyword list Select this option there is a clear pattern in the strings in the item. Enter this pattern as a list separated by commas.
Approximate number Select this option when you don’t know the exact pattern of the strings in the item. Enter the approximate number of different strings you expect to exist.
Sequence

Select this type for a list of numerical values whose order has significance.

Only one sequence can be set per training data.

Assume that we want to train a model to make predictions based on the previous week’s climate data (minimum temperature, maximum temperature, precipitation) as shown in the image below.

Sequence type overview

In this case, we use the sequence type because the ordering of the climate data sets (7 days ago, 6 days ago… 1 day ago) has significance.

Next, we set the “Number of elements (length)” and the “Dimensions per element (channels)” for the sequence.

The “Number of elements” is the number of items that have significance in their order. This would be set to 7 for our example, since there are 7 days of climate data in our sequence.

Number of elements example

“Dimensions per element” refers to the number of types of data within each element. In our example, we have 3 types of climate data in each element.

Dimensions per element example

This example uses a chronological sequence, but that is not a requirement. Any sequence of numbers for which the order has significance can be used.

The data used in this example would be written in a CSV file as shown in the image below.

CSV example

Number of classes

When creating a classification model, enter the number of classes in the results values here.

Output dimensions

When using the regression model, set the "Output dimensions" to the number of dimensions (the number of values) within the "results value".

For example, if the "results value" was related to global position and contained values for latitude, longitude, and elevation, the "Output dimensions" should be set to 3.

Using JSON text to enter training data settings

It is possible to enter the training data settings directly as JSON text.

Under the Item section, click Edit directly by JSON to bring up a JSON text editor window (the red area in the image below) where you can enter JSON text for your training data settings.

Training data JSON text editing

Click Edit by form to switch back to the GUI for entering data items.

Make sure that your JSON text for training data adheres to the following:

  • Set the training data information as an array ([Item 1, Item 2, ..., Item n]). Order each item in the same order as the training data CSV file’s items (columns).
  • Set each item as an object ({...}). Follow the guidelines below for the object’s members:
    • Set the item name as "name": "ITEM NAME".
    • Set the item’s type as "type": "TYPE". The "TYPE"
    • Numerical value: "number"
    • Month: "month"
    • Day of the week: "weekday"
    • String (enumerated): "enum"
  • For the numerical value type, set the dimensions as "count": DIMENSIONS.
  • When using the string (enumerated) type’s keyword list, set the keyword list as "keys": "KEYWORD LIST".
  • When using the string (enumerated) type’s approx. number, set the approx. number as "size": APPROX. NUMBER.

Below is an example of JSON text for training data settings:

[
  {
    "name": "item1",
    "type": "number",
    "count": 1
  },
  {
    "name": "item2",
    "type": "month"
  },
  {
    "name": "item3",
    "type": "weekday"
  },
  {
    "name": "item4",
    "type": "enum",
    "keys": "A, B, O, AB"
  },
  {
    "name": "item5",
    "type": "enum",
    "size": 4
  }
]

Label settings

For the Image Object Detection Type ML Board, you must label images before running the training.

For example, the labels dog and cat have been added to the following image:

Labelled image example

In this section, you will register the labels that you will use on your training images. For the example image above, you would register the labels dog and cat.

To register a label, enter its name into the text input field under the label list and click i class="material-icons md-18" aria-hidden="true">add_box. Repeat this process for all of the labels you need to register.

You must adhere to the following naming rules for labels:

  • Use only letters, numbers, or underscores (_)
  • Must be within 64 characters

To delete a label, click the trash icon () to its right.

You cannot register or delete labels after the Board has been created.

Training details screen

The following can be performed on this screen:

ML Board details

Managing image labels

error This section only applies to the Object Detection Type: Manual Setup (limited alpha release)

For the Image Object Detection Type ML Board, you need to apply labels to the images before the training.

Applying labels to an image refers to creating rectangular boundaries with names (labels) around the objects in the image. You can apply labels for multiple objects within a single image.

Example of applied labels

You should prepare the training images that you will label into a single, flat GCS folder (cannot contain subfolders).

error_outline You should use at least 100 labelled training images. Using fewer training images may cause the training to fail.

How to label images
  1. Click Add Label.
  2. Choose the GCS folder that contains your images and click Select.

    Selecting the GCS image folder
  3. Next, the label application screen will appear.

    The label application screen
  4. On the label application screen, select an image and the label to apply, then drag to create rectangular boundary around the corresponding object in the image.

    How to label an object
    • Click the × in the upper-right of the boundary to delete it.
    • Drag the square marker in the bottom-right of the boundary to resize it.
    • Drag from within the boundary to adjust its position.
    Editing label boundaries
  5. Click Save to save the applied labels.

    Files you have labelled will have a checkmark (check_circle) in the file list.

  6. Repeat this process for all of the training images in the list.

  7. Once finished, click Close.

Start a training

Click Start Training to create a new training.

You can configure the required settings for a new training and start it from this screen.

Training list

A list will be displayed here when you have created at least one training. The following information is contained in the training list:

Information Explanation
Training name Configured in the Start Training settings.
Started Shows the date and time when the training began.
Finished Shows the date and time when the training finished.
Status Shows the training’s status. Possible statuses are: Preparing, Running, Succeeded, Failed, and Stopped.
RMSE/Accuracy Shows the results of evaluating the training. For the classification model, this refers to its accuracy in selecting the correct class. For the regression model, it refers to its RMSE (Root Mean Square Error), which is a measure of the difference between the actual results and the predicted results.
Details Displays a screen where the training’s details can be confirmed.
Actions Apply a training (set it to be used for making predictions), or stop a training that is currently running. Only one training can be applied at a time.

Show training details

You can check a training’s details by clicking Show Details.

Item Explanation
Training results

View the training start time, end time, status, and accuracy/RMSE.

The image classification type may also show Incorrectly classified images.

When training, the image classification model automatically splits the data in the image folders with an approximate ratio of 8:2 for training and validation data.

It uses the training data to learn, then tests itself using the validation data. If its classifications are incorrect during this validation phase, the mistakes will be shown under Incorrectly classified images.

If Incorrectly classified images is displayed, please check that there are no problems in the images themselves. If there are, make sure the images are in the correct folder or delete the images with errors and try running a new training. If the incorrectly classified images do not contain any problems, you can leave them as they are.

Settings Shows details of what was configured in the training data settings
Error logs Shows the error logs when a training fails. Refer to these if you want to examine why the training failed.

Confirm the ML Board’s settings

You can confirm various information about the ML Board under the Settings section of its details screen.

Item Explanation
Board name The Board name configured on the Create ML Board screen.
Type The type selected on the Create ML Board screen.
GCP service account The GCP project ID used by the ML Board.
Model name The model name for the training results.
Training data settings The information configured on the Training Data Settings screen.

Delete a Board

Delete an ML Board by clicking the “Delete Board” button at the bottom of the screen.

Start training screen help

You can configure settings necessary to create and start a new training on this screen.

Classification/regression models

Start training screen

Each setting is explained below:

Setting Explanation
Training name Assign a name to the new training
Upload training data

The GCS location where the training data will be saved is shown with the following format: “gs://BUCKETNAME”.

Clicking the link will open the Google Cloud Console in another tab where you can access this GCS location. You will need to sign in using a Google account registered into the GCP access section of the Project settings screen.

Training set URL

Designate the GCS URL for the training set (Example: gs://bucketname/filename.csv).

These URLs should only contain ASCII letters, numbers, underscores (_), hyphens (-), or slashes (/).

Clicking opens a GCS file menu. Select your file and its URL will be entered with the format explained above.

Validation set URL

Designate the GCS URL for the validation set (Example: gs://bucketname/filename.csv).

These URLs should only contain ASCII letters, numbers, underscores (_), hyphens (-), or slashes (/).

Clicking opens a GCS file menu. Select your file and its URL will be entered with the format explained above.

Max. time until timeout

In order to get the most accurate possible results, the ML Board can run multiple training trials.

This property configures the maximum amount of time the Board will spend on each trial.

Set this to 0 if you do not want to set a limit.

If the training’s results (accuracy/RMSE) start to deteriorate, trials will be stopped prior to reaching the “Max. time until timeout”.

Max. number of trials

Set the number of trials that the ML Board will run as a number 1 or greater.

The approximate training time can be calculated as (Max. time until timeout) × (Max. number of trials). The actual time may be a bit longer due to additional/indirect processing times.

Machine type

Select the type of machine to use for the training.

  • BASIC

    Uses the standard machine to run the training.

  • BASIC GPU

    Runs the training using a GPU (Graphic Processing Unit) for generally faster results than the BASIC type. However, GCP fees will cost approximately three times as much.

    Depending on the training data, the speed may not be significantly faster, or may be slower in some cases, compared to the BASIC type.

Explanation Write an explanation for this training. (optional)

Image classification type

Image classification Board: Start training screen

Each setting is explained below:

Setting Explanation
Training name Designate a name for the training.
Upload training data

A link to a Google Cloud Storage (GCS) location to use for uploading training data is displayed here. The URL's format is gs://BUCKETNAME.

Clicking the link will open the Google Cloud Console in another tab. From there you can access this GCS location. You will need to log in using the Google account registered in the GCP access section of the BLOCKS project settings screen.

This is displayed for Full Service Plan users only.

Image folder

Designate the URL of the GCS folder that contains your training images (gs://BUCKETNAME/FOLDER/). The URL must have a / as its last character.

Clicking opens a GCS file menu. Select your folder and its URL will be entered with the format explained above.

The image folder should follow these specifications:

Image folder specifications
  • Prepare a folder in GCS that will contain your training images. We will refer to this as the image folder.
  • Directly in the image folder, prepare folders for each category. These folder names are used as the names for each class.
  • Store your image files into these folders based on their class.
  • Images must be in JPEG format.
  • Images with extreme aspect ratios may not be classified properly.
  • Each class folder should only contain images. Do not use subfolders.

For example, if classifying images of dogs and cats, put pictures of dogs into one folder and pictures of cats into another.

Max. time until timeout (minutes)

Configure the maximum amount of time that the training will take.

Max. number of trials

Set the number of trials that the ML Board will run as a number 1 or greater.

The approximate training time can be calculated as (Max. time until timeout) × (Max. number of trials). The actual time may be a bit longer due to additional/indirect processing times.

Machine type

Select the type of machine to use for the training.

  • BASIC

    Uses the standard machine to run the training.

  • BASIC GPU

    Runs the training using a GPU (Graphic Processing Unit) for generally faster results than the BASIC type. However, GCP fees will cost approximately three times as much.

    Depending on the training data, the speed may not be significantly faster, or may be slower in some cases, compared to the BASIC type.

Explanation Write an explanation for this training. (optional)
Explanation (optional) Enter an explanation for the training.

Object detection type: manual setup

error This type is currently available as a limited alpha release only.

Image object detection type: start training screen

The settings are as follows:

Setting Explanation
Training name Designate a name for the training.
Image folder

Designate the GCS folder that containg the labelled training images.

error_outline You should use at least 100 labelled training images. Using fewer training images may cause the training to fail.

Explanation (optional) Enter an explanation for the training.

GCP service charges

The ML Board creates an environment in the user’s GCP project that utilizes various GCP services.

As such, GCP service charges will apply separately from MAGELLAN BLOCKS fees. Applicable charges will vary depending on the service. For details, refer to the pricing page for each of the services used by the ML Board