BLOCKS Reference

Block Reference

Machine Learning

ML Board prediction (batch)

This BLOCK is currently in beta. The beta version will become unusable after the official version is released.

Some functions may not work properly during the beta. We appreciate your feedback regarding bugs or ways to improve MAGELLAN BLOCKS.

Overview

This BLOCK makes batch predictions using an ML Board’s training with input variable data. It is designed for making predictions with collections of large amounts of input data.

It supports the following four types of ML Board:

  • Classification type
  • Regression type
  • Image classification type
  • Image object detection type

This BLOCK takes a comparatively longer time for predictions than the ML Board prediction (online) BLOCK. However, it is more efficient at making predictions when using large amounts of data.

The BLOCK performs batch predictions for the regression and classification models by reading input variable data from a text file stored in Google Cloud Storage (GCS). It outputs the results as text files to a GCS folder. As a general rule, the results are split into multiple files.

Batch prediction overview

For the image classification model, the BLOCK reads from either image files stored in GCS or a JSON file that contains data for images. Results are output in the same manner as the regression and classification models.

You must apply a training on an ML Board beforehand to make predictions with this BLOCK.

Preparing the input data

Classification and regression types

Prepare input data as a JSON format text file as shown below:

{"key": "1", "sepal_length": 5.9, "sepal_width": 3.0, "petal_length": 4.2, "petal_width": 1.5}
{"key": "2", "sepal_length": 6.9, "sepal_width": 3.1, "petal_length": 5.4, "petal_width": 2.1}
{"key": "3", "sepal_length": 5.1, "sepal_width": 3.3, "petal_length": 1.7, "petal_width": 0.5}
  • Make each line a JSON object ({...}).
  • Separate objects with line breaks.
  • Gather input data for one instance into one JSON object.
  • JSON objects consist of “name” and “value” paits.
    • In each pair, the name and value are separated by a :.
    • The name is on the left of the : and the value is on the right (name: value).
  • Values can be set as the following three types:
    • Numbers: 1, 23.45, and the like (numerical value, month, and day data).
    • Strings: "abc", "xyz", and the like. Strings are enclosed in " (Strings (enumerated) data).
    • Arrays: [1, 2, 3], [4, 5.6, 7.0], and the like. Arrays can contain multiple numerical values enclosed between [ and ] (Numerical value with multiple dimensions and sequence data).

      info_outline This section explains how to set sequence data.

      Sequence data, like the image below for weather data for the past 7 days, has meaning in its ordering. This section explains how to set an array for this example sequence.

      Past 7 days weather data example

      When using this sequence data as part of an ML Board’s training data, you would simple write it out with commas (,) separating each number, as shown below.

      Example using sequence data for ML Board training

      In the same manner, you list sequence data as an array when using it for a prediction.

      "weather": [21, 35, 0, 20, 34, 0, . . . , 18, 32, 20]
      

      (This assumes the item name for the sequence data is weather)

  • Each JSON object (an instance of input data) should include a pair with the name "key". The value should be a string to identify that instance of input data.

The results of the prediction are output as JSON format text files to a GCS folder. The files are named automatically as shown below (the XXXXX and YYYYY change based on the number of files).

prediction.results-XXXXX-of-YYYYY
  • XXXXX: A number starting with 0 that represents the file’s index number (00000, 00001, etc.).
  • YYYYY: The total number of files (00001, 00003, etc.).

The following example shows prediction results for the classification model:

{"label_index": 2, "score": [9.230815578575857e-08, 0.007054927293211222, 0.9929450154304504], "key": "2", "label": "Iris-virginica"}
  • Each row contains one JSON object ({...}).
  • Rows are separated by line breaks.
  • Prediction results for each instance of input data are contained within individual JSON objects.
    "label_index" Shows which element of the "score" array the value of "label" represents. The elements in the array are ordered starting from 0.
    "score" The level of certainty for predicting each class. In this example, the certainty for class 0 is 0.000009231%, class 1 is 0.705492729%, and class 2 is 99.294501543%.
    "key" The value for the "key" used in the prediction input data.
    "label" The predicted class.

The following example shows prediction results for the regression model:

 {"output": 10304.1962890625, "key": "20170103"} 
  • Each row contains one JSON object ({...}).
  • Rows are separated by line breaks.
  • Prediction results for each instance of input data are contained within individual JSON objects.
    "output" The predicted value.
    "key" The value for the "key" used in the prediction input data.
Image classification and image object detection types

There are two ways to make predictions with the image classification or image object detection models.

  • Specify image files stored in GCS
  • Specify a JSON file

When making object detection predictions for multiple images at once, all images must be the same size.

Each method is explained below.

Making predictions with image files stored in GCS

This method is the simplest way to make predictions.

  • Upload your collection of image files into GCS in a single folder.
  • Designate this folder into the Input GCS URL property.
  • Make sure to include a / at the end of the GCS URL.
  • Images for predictions must be JPEG format.
  • The file extension for the images can be .jpg or .jpeg.
  • The folder containing the uploaded images can also contain folders within it (see the following image).
Making predictions with a JSON file

This method makes predictions using a JSON file that contains Base64 encoded image data.

  • The JSON file must contain more than one JSON object with each object separated by line breaks. See the following example:
    {"key": "samp01", "image": {"b64": "/9j/4....../2Q=="}}
    {"key": "samp02", "image": {"b64": "/9j/4....../2Q=="}}
    
  • The JSON objects should be formatted as follows:
    {"key": "key", "image": {"b64": "Base64 encoded image data"}}
    

    *Set the red portions with information for your prediction images.
    Name Value
    "key" Designate a string to act as an identifying key for the prediction image.
    "b64" Designate the Base64 encoded data for the prediction image.
  • Images for predictions must be JPEG format.
  • Upload the JSON file to GCS and enter its GCS URL into the Input GCS URL property.
Prediction results

The results of the prediction are output as JSON format text files to a GCS folder. The files are named automatically as shown below (the XXXXX and YYYYY change based on the number of files).

prediction.results-XXXXX-of-YYYYY
  • XXXXX: A number starting with 0 that represents the file’s index number (00000, 00001, etc.).
  • YYYYY: The total number of files (00001, 00003, etc.).

The following shows example prediction results for the image classification model:

{"labels": ["cat", "dog"], "score": [1.0, 2.0886015139609526e-10], "key": "gs://my-bucket/images/sample_01.jpg", "label": "cat"}
{"labels": ["cat", "dog"], "score": [3.7939051367175125e-07, 0.9999996423721313], "key": "gs://my-bucket/images/sample_02.jpg", "label": "dog"}
  • Each row contains one JSON object ({...}).
  • Rows are separated by line breaks.
  • Prediction results for one image is contained in one JSON object.
    "labels"

    A list of the classes. In this case, there are two classes: "cat" and "dog".

    The order of the "labels" list matches the order in the "score" list that follows.

    "score"

    The certainty for predicting each class.

    In this example, the first value in "score" refers to "cat" and the second value in "score" refers to "dog".

    "key"
    • If you made the prediction using image files uploaded to GCS, this will be the GCS URL for the image file.
    • If you made the prediction using a JSON file, this will be the value for "key" as designated within that file.
    "label"

    The predicted class. This is the class that corresponds with the highest value within "score".

Properties

Property Explanation
BLOCK name Configure the name displayed on this BLOCK.
GCP service account Select the GCP service account to use with this BLOCK.
ML Board Select the ML Board to use for this prediction.
Input GCS URL
  • Classification and regression models: Designate the GCS URL for the text file containing the input data.

    For a GCS bucket named blocks-sample and a text file containing prediction input data named sample.json, the Input GCS URL would be gs://blocks-sample/sample.json.

  • Image classification model: Designate the GCS URL for either a folder containing the image files for the prediction, or the JSON file containing the collected data for the prediction images.

    Image files within a folder in GCS: For a bucket named blocks-sample that contains a folder named images, the Input GCS URL would be gs://blocks-sample/images/. Make sure the URL ends with a /.

    JSON file containing data for images: For a bucket named blocks-sample that contains a JSON file named sample.json, the Input GCS URL would be gs://blocks-sample/sample.json.

Output GCS URL

Designate a GCS URL for the folder that will contain the results of the prediction.

For example, for the results to be stored into a folder named results within a bucket named blocks-sample, you would set this property to gs://blocks-sample/results/.

The BLOCK will create a new folder automatically if the designated folder does not already exist. For existing folders, new files will overwrite older files if they have the same name.

Batch size

For batch predictions, the input feature data is saved into memory (buffered) before the prediction is performed.

Designate the amount (number of records) of input feature data that will be buffered in this property.

There is the possibility that increasing the batch size will improve the speed of the prediction. However, the amount of memory used also increases, so there is also the possibility that the prediction will fail due to a memory shortage.

Because of this, you will need to set an appropriate batch size for your input feature data that will not cause a memory shortage.