BLOCKS Reference

Block Reference

Machine Learning

ML Board prediction (batch)

This BLOCK is currently in beta. The beta version will become unusable after the official version is released.

Some functions may not work properly during the beta. We appreciate your feedback regarding bugs or ways to improve MAGELLAN BLOCKS.

Overview

This BLOCK makes batch predictions using an ML Board’s training with input variable data. It is designed for making predictions with collections of large amounts of input data.

This BLOCK takes a comparatively longer time for predictions than the ML Board prediction (online) BLOCK. However, it is more efficient at making predictions when using large amounts of data.

The BLOCK performs batch predictions for the regression and classification models by reading input variable data from a text file stored in Google Cloud Storage (GCS). It outputs the results as text files to a GCS folder. As a general rule, the results are split into multiple files.

Batch prediction overview

For the image classification model, the BLOCK reads from either image files stored in GCS or a JSON file that contains data for images. Results are output in the same manner as the regression and classification models.

You must apply a training on an ML Board beforehand to make predictions with this BLOCK.

Preparing the input data

Classification and regression types

Prepare input data as a JSON format text file as shown below:

{"key": "1", "sepal_length": 5.9, "sepal_width": 3.0, "petal_length": 4.2, "petal_width": 1.5}
{"key": "2", "sepal_length": 6.9, "sepal_width": 3.1, "petal_length": 5.4, "petal_width": 2.1}
{"key": "3", "sepal_length": 5.1, "sepal_width": 3.3, "petal_length": 1.7, "petal_width": 0.5}
  • Make each line a JSON object ({...}).
  • Separate objects with line breaks.
  • Gather input data for one instance into one JSON object.
  • JSON objects consist of “name” and “value” paits.
    • In each pair, the name and value are separated by a :.
    • The name is on the left of the : and the value is on the right (name: value).
  • Values can be set as the following three types:
    • Numbers: 1, 23.45, and the like (numerical value, month, and day data).
    • Strings: "abc", "xyz", and the like. Strings are enclosed in " (Strings (enumerated) data).
    • Arrays: [1, 2, 3], [4, 5.6, 7.0], and the like. Arrays can contain multiple numerical values enclosed between [ and ] (Numerical value data with multiple dimensions).
  • Each JSON object (an instance of input data) should include a pair with the name "key". The value should be a string to identify that instance of input data.

The results of the prediction are output as JSON format text files to a GCS folder. The files are named automatically as shown below (the XXXXX and YYYYY change based on the number of files).

prediction.results-XXXXX-of-YYYYY
  • XXXXX: A number starting with 0 that represents the file’s index number (00000, 00001, etc.).
  • YYYYY: The total number of files (00001, 00003, etc.).

The following example shows prediction results for the classification model:

 {"score": [9.230815578575857e-08, 0.007054927293211222, 0.9929450154304504], "key": "2", "label": 2} 
  • Each row contains one JSON object ({...}).
  • Rows are separated by line breaks.
  • Prediction results for each instance of input data are contained within individual JSON objects.
    "score" The level of certainty for predicting each class. In this example, the certainty for class 0 is 0.000009231%, class 1 is 0.705492729%, and class 2 is 99.294501543%.
    "key" The value for the "key" used in the prediction input data.
    "label" The predicted class.

The following example shows prediction results for the regression model:

 {"output": 10304.1962890625, "key": "20170103"} 
  • Each row contains one JSON object ({...}).
  • Rows are separated by line breaks.
  • Prediction results for each instance of input data are contained within individual JSON objects.
    "output" The predicted value.
    "key" The value for the "key" used in the prediction input data.
Image classification type

There are two ways to make predictions with the image classification model.

  • Specify image files stored in GCS
  • Specify a JSON file

Each method is explained below.

Making predictions with image files stored in GCS

This method is the simplest way to make predictions.

  • Upload your collection of image files into GCS in a single folder.
  • Designate this folder into the Input GCS URL property.
  • Make sure to include a / at the end of the GCS URL.
  • Images for predictions must be JPEG format.
  • The file extension for the images can be .jpg or .jpeg.
  • The folder containing the uploaded images can also contain folders within it (see the following image).
Making predictions with a JSON file

This method makes predictions using a JSON file that contains Base64 encoded image data.

  • The JSON file must contain more than one JSON object with each object separated by line breaks. See the following example:
    {"key": "samp01", "image": {"b64": "/9j/4....../2Q=="}}
    {"key": "samp02", "image": {"b64": "/9j/4....../2Q=="}}
    
  • The JSON objects should be formatted as follows:
    {"key": "key", "image": {"b64": "Base64 encoded image data"}}
    

    *Set the red portions with information for your prediction images.
    Name Value
    "key" Designate a string to act as an identifying key for the prediction image.
    "b64" Designate the Base64 encoded data for the prediction image.
  • Images for predictions must be JPEG format.
  • Upload the JSON file to GCS and enter its GCS URL into the Input GCS URL property.
Prediction results

The results of the prediction are output as JSON format text files to a GCS folder. The files are named automatically as shown below (the XXXXX and YYYYY change based on the number of files).

prediction.results-XXXXX-of-YYYYY
  • XXXXX: A number starting with 0 that represents the file’s index number (00000, 00001, etc.).
  • YYYYY: The total number of files (00001, 00003, etc.).

The following shows example prediction results for the image classification model:

{"labels": ["cat", "dog"], "score": [1.0, 2.0886015139609526e-10], "key": "gs://my-bucket/images/sample_01.jpg", "label": "cat"}
{"labels": ["cat", "dog"], "score": [3.7939051367175125e-07, 0.9999996423721313], "key": "gs://my-bucket/images/sample_02.jpg", "label": "dog"}
  • Each row contains one JSON object ({...}).
  • Rows are separated by line breaks.
  • Prediction results for one image is contained in one JSON object.
    "labels"

    A list of the classes. In this case, there are two classes: "cat" and "dog".

    The order of the "labels" list matches the order in the "score" list that follows.

    "score"

    The certainty for predicting each class.

    In this example, the first value in "score" refers to "cat" and the second value in "score" refers to "dog".

    "key"
    • If you made the prediction using image files uploaded to GCS, this will be the GCS URL for the image file.
    • If you made the prediction using a JSON file, this will be the value for "key" as designated within that file.
    "label"

    The predicted class. This is the class that corresponds with the highest value within "score".

Properties

Property Explanation
BLOCK name Configure the name displayed on this BLOCK.
GCP service account Select the GCP service account to use with this BLOCK.
ML Board Select the ML Board to use for this prediction.
Input GCS URL
  • Classification and regression models: Designate the GCS URL for the text file containing the input data.

    For a GCS bucket named blocks-sample and a text file containing prediction input data named sample.json, the Input GCS URL would be gs://blocks-sample/sample.json.

  • Image classification model: Designate the GCS URL for either a folder containing the image files for the prediction, or the JSON file containing the collected data for the prediction images.

    Image files within a folder in GCS: For a bucket named blocks-sample that contains a folder named images, the Input GCS URL would be gs://blocks-sample/images/. Make sure the URL ends with a /.

    JSON file containing data for images: For a bucket named blocks-sample that contains a JSON file named sample.json, the Input GCS URL would be gs://blocks-sample/sample.json.

Output GCS URL

Designate a GCS URL for the folder that will contain the results of the prediction.

For example, for the results to be stored into a folder named results within a bucket named blocks-sample, you would set this property to gs://blocks-sample/results/.

The BLOCK will create a new folder automatically if the designated folder does not already exist. For existing folders, new files will overwrite older files if they have the same name.